Mercurial > p > roundup > code
annotate roundup/backends/indexer_dbm.py @ 5973:fe334430ca07
issue2550919 - Anti-bot signup using 4 second delay
Took the code by erik forsberg and massaged it into the core.
So this is no longer needed in the tracker.
Updated devel and responsive trackers to remove timestamp.py and
update input field name.
Docs, changes and tests complete. Hopefully these tracker changes
won't cause an issue for other tests.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sat, 09 Nov 2019 00:30:37 -0500 |
| parents | 8e4c5db44fde |
| children | 3175bb92ca28 |
| rev | line source |
|---|---|
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
1 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
2 # This module is derived from the module described at: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
3 # http://gnosis.cx/publish/programming/charming_python_15.txt |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
4 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
5 # Author: David Mertz (mertz@gnosis.cx) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
6 # Thanks to: Pat Knight (p.knight@ktgroup.co.uk) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
7 # Gregory Popovitch (greg@gpy.com) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
8 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
9 # The original module was released under this license, and remains under |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
10 # it: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
11 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
12 # This file is released to the public domain. I (dqm) would |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
13 # appreciate it if you choose to keep derived works under terms |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
14 # that promote freedom, but obviously am giving up any rights |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
15 # to compel such. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
16 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
17 '''This module provides an indexer class, RoundupIndexer, that stores text |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
18 indices in a roundup instance. This class makes searching the content of |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
19 messages, string properties and text files possible. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
20 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
21 __docformat__ = 'restructuredtext' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
22 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
23 import os, shutil, re, mimetypes, marshal, zlib, errno |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
24 from roundup.hyperdb import Link, Multilink |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
25 from roundup.backends.indexer_common import Indexer as IndexerBase |
|
2872
d530b68e4b42
don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents:
2089
diff
changeset
|
26 |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
27 class Indexer(IndexerBase): |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
28 '''Indexes information from roundup's hyperdb to allow efficient |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
29 searching. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
30 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
31 Three structures are created by the indexer:: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
32 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
33 files {identifier: (fileid, wordcount)} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
34 words {word: {fileid: count}} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
35 fileids {fileid: identifier} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
36 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
37 where identifier is (classname, nodeid, propertyname) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
38 ''' |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
3092
diff
changeset
|
39 def __init__(self, db): |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
40 IndexerBase.__init__(self, db) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
3092
diff
changeset
|
41 self.indexdb_path = os.path.join(db.config.DATABASE, 'indexes') |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
42 self.indexdb = os.path.join(self.indexdb_path, 'index.db') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
43 self.reindex = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
44 self.quiet = 9 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
45 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
46 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
47 # see if we need to reindex because of a change in code |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
48 version = os.path.join(self.indexdb_path, 'version') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
49 if (not os.path.exists(self.indexdb_path) or |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
50 not os.path.exists(version)): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
51 # for now the file itself is a flag |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
52 self.force_reindex() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
53 elif os.path.exists(version): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
54 version = open(version).read() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
55 # check the value and reindex if it's not the latest |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
56 if version.strip() != '1': |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
57 self.force_reindex() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
58 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
59 def force_reindex(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
60 '''Force a reindex condition |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
61 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
62 if os.path.exists(self.indexdb_path): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
63 shutil.rmtree(self.indexdb_path) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
64 os.makedirs(self.indexdb_path) |
|
5380
64c4e43fbb84
Python 3 preparation: numeric literal syntax.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5248
diff
changeset
|
65 os.chmod(self.indexdb_path, 0o775) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
66 open(os.path.join(self.indexdb_path, 'version'), 'w').write('1\n') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
67 self.reindex = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
68 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
69 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
70 def should_reindex(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
71 '''Should we reindex? |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
72 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
73 return self.reindex |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
74 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
75 def add_text(self, identifier, text, mime_type='text/plain'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
76 '''Add some text associated with the (classname, nodeid, property) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
77 identifier. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
78 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
79 # make sure the index is loaded |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
80 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
81 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
82 # remove old entries for this identifier |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
83 if identifier in self.files: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
84 self.purge_entry(identifier) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
85 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
86 # split into words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
87 words = self.splitter(text, mime_type) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
88 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
89 # Find new file index, and assign it to identifier |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
90 # (_TOP uses trick of negative to avoid conflict with file index) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
91 self.files['_TOP'] = (self.files['_TOP'][0]-1, None) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
92 file_index = abs(self.files['_TOP'][0]) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
93 self.files[identifier] = (file_index, len(words)) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
94 self.fileids[file_index] = identifier |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
95 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
96 # find the unique words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
97 filedict = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
98 for word in words: |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
99 if self.is_stopword(word): |
|
2872
d530b68e4b42
don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents:
2089
diff
changeset
|
100 continue |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
101 if word in filedict: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
102 filedict[word] = filedict[word]+1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
103 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
104 filedict[word] = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
105 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
106 # now add to the totals |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
107 for word in filedict: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
108 # each word has a dict of {identifier: count} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
109 if word in self.words: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
110 entry = self.words[word] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
111 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
112 # new word |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
113 entry = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
114 self.words[word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
115 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
116 # make a reference to the file for this word |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
117 entry[file_index] = filedict[word] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
118 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
119 # save needed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
120 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
121 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
122 def splitter(self, text, ftype): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
123 '''Split the contents of a text string into a list of 'words' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
124 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
125 if ftype == 'text/plain': |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
126 words = self.text_splitter(text) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
127 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
128 return [] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
129 return words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
130 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
131 def text_splitter(self, text): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
132 """Split text/plain string into a list of words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
133 """ |
|
5966
8e4c5db44fde
Handle memory db indexer test
John Rouillard <rouilj@ieee.org>
parents:
5963
diff
changeset
|
134 if not text: |
|
8e4c5db44fde
Handle memory db indexer test
John Rouillard <rouilj@ieee.org>
parents:
5963
diff
changeset
|
135 return [] |
|
8e4c5db44fde
Handle memory db indexer test
John Rouillard <rouilj@ieee.org>
parents:
5963
diff
changeset
|
136 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
137 # case insensitive |
|
5963
4c7662c86a36
fixed the dbm indexer test for unicode under python2.
John Rouillard <rouilj@ieee.org>
parents:
5470
diff
changeset
|
138 text = text.upper() |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
139 |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
140 # Split the raw text |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
141 return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength), |
|
5963
4c7662c86a36
fixed the dbm indexer test for unicode under python2.
John Rouillard <rouilj@ieee.org>
parents:
5470
diff
changeset
|
142 text, re.UNICODE) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
143 |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
144 # we override this to ignore too short and too long words |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
145 # and also to fix a bug - the (fail) case. |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
146 def find(self, wordlist): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
147 '''Locate files that match ALL the words in wordlist |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
148 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
149 if not hasattr(self, 'words'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
150 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
151 self.load_index(wordlist=wordlist) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
152 entries = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
153 hits = None |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
154 for word in wordlist: |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
155 if not self.minlength <= len(word) <= self.maxlength: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
156 # word outside the bounds of what we index - ignore |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
157 continue |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
158 word = word.upper() |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
159 if self.is_stopword(word): |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
160 continue |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
161 entry = self.words.get(word) # For each word, get index |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
162 entries[word] = entry # of matching files |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
163 if not entry: # Nothing for this one word (fail) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
164 return {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
165 if hits is None: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
166 hits = {} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
167 for k in entry: |
|
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
168 if k not in self.fileids: |
|
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
169 raise ValueError('Index is corrupted: re-generate it') |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
170 hits[k] = self.fileids[k] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
171 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
172 # Eliminate hits for every non-match |
|
4362
74476eaac38a
more modernisation
Richard Jones <richard@users.sourceforge.net>
parents:
4357
diff
changeset
|
173 for fileid in list(hits): |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
174 if fileid not in entry: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
175 del hits[fileid] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
176 if hits is None: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
177 return {} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
178 return list(hits.values()) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
179 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
180 segments = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_-!" |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
181 def load_index(self, reload=0, wordlist=None): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
182 # Unless reload is indicated, do not load twice |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
183 if self.index_loaded() and not reload: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
184 return 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
185 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
186 # Ok, now let's actually load it |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
187 db = {'WORDS': {}, 'FILES': {'_TOP':(0,None)}, 'FILEIDS': {}} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
188 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
189 # Identify the relevant word-dictionary segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
190 if not wordlist: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
191 segments = self.segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
192 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
193 segments = ['-','#'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
194 for word in wordlist: |
|
5470
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
195 initchar = word[0].upper() |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
196 if initchar not in self.segments: |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
197 initchar = '_' |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
198 segments.append(initchar) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
199 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
200 # Load the segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
201 for segment in segments: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
202 try: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
203 f = open(self.indexdb + segment, 'rb') |
|
5248
198b6e810c67
Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents:
4570
diff
changeset
|
204 except IOError as error: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
205 # probably just nonexistent segment index file |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
206 if error.errno != errno.ENOENT: raise |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
207 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
208 pickle_str = zlib.decompress(f.read()) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
209 f.close() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
210 dbslice = marshal.loads(pickle_str) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
211 if dbslice.get('WORDS'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
212 # if it has some words, add them |
|
5395
23b8e6067f7c
Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5380
diff
changeset
|
213 for word, entry in dbslice['WORDS'].items(): |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
214 db['WORDS'][word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
215 if dbslice.get('FILES'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
216 # if it has some files, add them |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
217 db['FILES'] = dbslice['FILES'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
218 if dbslice.get('FILEIDS'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
219 # if it has fileids, add them |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
220 db['FILEIDS'] = dbslice['FILEIDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
221 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
222 self.words = db['WORDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
223 self.files = db['FILES'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
224 self.fileids = db['FILEIDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
225 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
226 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
227 def save_index(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
228 # only save if the index is loaded and changed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
229 if not self.index_loaded() or not self.changed: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
230 return |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
231 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
232 # brutal space saver... delete all the small segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
233 for segment in self.segments: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
234 try: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
235 os.remove(self.indexdb + segment) |
|
5248
198b6e810c67
Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents:
4570
diff
changeset
|
236 except OSError as error: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
237 # probably just nonexistent segment index file |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
238 if error.errno != errno.ENOENT: raise |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
239 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
240 # First write the much simpler filename/fileid dictionaries |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
241 dbfil = {'WORDS':None, 'FILES':self.files, 'FILEIDS':self.fileids} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
242 open(self.indexdb+'-','wb').write(zlib.compress(marshal.dumps(dbfil))) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
243 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
244 # The hard part is splitting the word dictionary up, of course |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
245 letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_" |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
246 segdicts = {} # Need batch of empty dicts |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
247 for segment in letters: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
248 segdicts[segment] = {} |
|
5395
23b8e6067f7c
Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5380
diff
changeset
|
249 for word, entry in self.words.items(): # Split into segment dicts |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
250 initchar = word[0].upper() |
|
5470
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
251 if initchar not in letters: |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
252 # if it's a unicode character, add it to the '_' segment |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
253 initchar = '_' |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
254 segdicts[initchar][word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
255 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
256 # save |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
257 for initchar in letters: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
258 db = {'WORDS':segdicts[initchar], 'FILES':None, 'FILEIDS':None} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
259 pickle_str = marshal.dumps(db) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
260 filename = self.indexdb + initchar |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
261 pickle_fh = open(filename, 'wb') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
262 pickle_fh.write(zlib.compress(pickle_str)) |
|
5380
64c4e43fbb84
Python 3 preparation: numeric literal syntax.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5248
diff
changeset
|
263 os.chmod(filename, 0o664) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
264 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
265 # save done |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
266 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
267 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
268 def purge_entry(self, identifier): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
269 '''Remove a file from file index and word index |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
270 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
271 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
272 |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
273 if identifier not in self.files: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
274 return |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
275 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
276 file_index = self.files[identifier][0] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
277 del self.files[identifier] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
278 del self.fileids[file_index] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
279 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
280 # The much harder part, cleanup the word index |
|
5395
23b8e6067f7c
Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5380
diff
changeset
|
281 for key, occurs in self.words.items(): |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
282 if file_index in occurs: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
283 del occurs[file_index] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
284 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
285 # save needed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
286 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
287 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
288 def index_loaded(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
289 return (hasattr(self,'fileids') and hasattr(self,'files') and |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
290 hasattr(self,'words')) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
291 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
292 def rollback(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
293 ''' load last saved index info. ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
294 self.load_index(reload=1) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
295 |
|
3613
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
296 def close(self): |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
297 pass |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
298 |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
299 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
300 # vim: set filetype=python ts=4 sw=4 et si |
