Mercurial > p > roundup > code
annotate roundup/backends/indexer_dbm.py @ 7800:2d4684e4702d
fix: enhancement to history command output and % template fix.
Rather than using the key field, use the label field for descriptions.
Call cls.labelprop(default_to_id=True) so it returns id rather than
the first sorted property name.
If labelprop() returns 'id' or 'title', we return nothing. 'id' means
there is no label set and no properties named 'name' or 'title'. So
have the caller do whatever it wants (prepend classname for example)
when there is no human readable name. This prevents %(name)s%(key)s
from producing: 23(23).
Also don't accept the 'title' property. Titles can be too
long. Arguably we could: '%(name)20s' to limit the title
length. However without ellipses or something truncating the title
might be confusing. So again pretend there is no human readable name.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Tue, 12 Mar 2024 11:52:17 -0400 |
| parents | d17e57220a62 |
| children |
| rev | line source |
|---|---|
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
1 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
2 # This module is derived from the module described at: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
3 # http://gnosis.cx/publish/programming/charming_python_15.txt |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
4 # |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
5 # Author: David Mertz (mertz@gnosis.cx) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
6 # Thanks to: Pat Knight (p.knight@ktgroup.co.uk) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
7 # Gregory Popovitch (greg@gpy.com) |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
8 # |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
9 # The original module was released under this license, and remains under |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
10 # it: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
11 # |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
12 # This file is released to the public domain. I (dqm) would |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
13 # appreciate it if you choose to keep derived works under terms |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
14 # that promote freedom, but obviously am giving up any rights |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
15 # to compel such. |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
16 # |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
17 '''This module provides an indexer class, RoundupIndexer, that stores text |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
18 indices in a roundup instance. This class makes searching the content of |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
19 messages, string properties and text files possible. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
20 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
21 __docformat__ = 'restructuredtext' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
22 |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
23 import errno |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
24 import marshal |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
25 import os |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
26 import re |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
27 import shutil |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
28 import zlib |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
29 |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
30 from roundup.backends.indexer_common import Indexer as IndexerBase |
|
2872
d530b68e4b42
don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents:
2089
diff
changeset
|
31 |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
32 |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
33 class Indexer(IndexerBase): |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
34 '''Indexes information from roundup's hyperdb to allow efficient |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
35 searching. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
36 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
37 Three structures are created by the indexer:: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
38 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
39 files {identifier: (fileid, wordcount)} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
40 words {word: {fileid: count}} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
41 fileids {fileid: identifier} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
42 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
43 where identifier is (classname, nodeid, propertyname) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
44 ''' |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
3092
diff
changeset
|
45 def __init__(self, db): |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
46 IndexerBase.__init__(self, db) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
3092
diff
changeset
|
47 self.indexdb_path = os.path.join(db.config.DATABASE, 'indexes') |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
48 self.indexdb = os.path.join(self.indexdb_path, 'index.db') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
49 self.reindex = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
50 self.quiet = 9 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
51 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
52 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
53 # see if we need to reindex because of a change in code |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
54 version = os.path.join(self.indexdb_path, 'version') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
55 if (not os.path.exists(self.indexdb_path) or |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
56 not os.path.exists(version)): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
57 # for now the file itself is a flag |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
58 self.force_reindex() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
59 elif os.path.exists(version): |
|
6491
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
60 fd = open(version) |
|
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
61 version = fd.read() |
|
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
62 fd.close() |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
63 # check the value and reindex if it's not the latest |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
64 if version.strip() != '1': |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
65 self.force_reindex() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
66 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
67 def force_reindex(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
68 '''Force a reindex condition |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
69 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
70 if os.path.exists(self.indexdb_path): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
71 shutil.rmtree(self.indexdb_path) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
72 os.makedirs(self.indexdb_path) |
| 6002 | 73 os.chmod(self.indexdb_path, 0o775) # nosec - allow group write |
|
6491
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
74 fd = open(os.path.join(self.indexdb_path, 'version'), 'w') |
|
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
75 fd.write('1\n') |
|
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
76 fd.close() |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
77 self.reindex = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
78 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
79 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
80 def should_reindex(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
81 '''Should we reindex? |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
82 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
83 return self.reindex |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
84 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
85 def add_text(self, identifier, text, mime_type='text/plain'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
86 '''Add some text associated with the (classname, nodeid, property) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
87 identifier. |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
88 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
89 # make sure the index is loaded |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
90 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
91 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
92 # remove old entries for this identifier |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
93 if identifier in self.files: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
94 self.purge_entry(identifier) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
95 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
96 # split into words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
97 words = self.splitter(text, mime_type) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
98 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
99 # Find new file index, and assign it to identifier |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
100 # (_TOP uses trick of negative to avoid conflict with file index) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
101 self.files['_TOP'] = (self.files['_TOP'][0]-1, None) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
102 file_index = abs(self.files['_TOP'][0]) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
103 self.files[identifier] = (file_index, len(words)) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
104 self.fileids[file_index] = identifier |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
105 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
106 # find the unique words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
107 filedict = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
108 for word in words: |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
109 if self.is_stopword(word): |
|
2872
d530b68e4b42
don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents:
2089
diff
changeset
|
110 continue |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
111 if word in filedict: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
112 filedict[word] = filedict[word]+1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
113 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
114 filedict[word] = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
115 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
116 # now add to the totals |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
117 for word in filedict: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
118 # each word has a dict of {identifier: count} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
119 if word in self.words: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
120 entry = self.words[word] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
121 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
122 # new word |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
123 entry = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
124 self.words[word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
125 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
126 # make a reference to the file for this word |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
127 entry[file_index] = filedict[word] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
128 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
129 # save needed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
130 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
131 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
132 def splitter(self, text, ftype): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
133 '''Split the contents of a text string into a list of 'words' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
134 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
135 if ftype == 'text/plain': |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
136 words = self.text_splitter(text) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
137 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
138 return [] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
139 return words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
140 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
141 def text_splitter(self, text): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
142 """Split text/plain string into a list of words |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
143 """ |
|
5966
8e4c5db44fde
Handle memory db indexer test
John Rouillard <rouilj@ieee.org>
parents:
5963
diff
changeset
|
144 if not text: |
|
8e4c5db44fde
Handle memory db indexer test
John Rouillard <rouilj@ieee.org>
parents:
5963
diff
changeset
|
145 return [] |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
146 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
147 # case insensitive |
|
5963
4c7662c86a36
fixed the dbm indexer test for unicode under python2.
John Rouillard <rouilj@ieee.org>
parents:
5470
diff
changeset
|
148 text = text.upper() |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
149 |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
150 # Split the raw text |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
151 return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength), |
|
5963
4c7662c86a36
fixed the dbm indexer test for unicode under python2.
John Rouillard <rouilj@ieee.org>
parents:
5470
diff
changeset
|
152 text, re.UNICODE) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
153 |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
154 # we override this to ignore too short and too long words |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
155 # and also to fix a bug - the (fail) case. |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
156 def find(self, wordlist): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
157 '''Locate files that match ALL the words in wordlist |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
158 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
159 if not hasattr(self, 'words'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
160 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
161 self.load_index(wordlist=wordlist) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
162 entries = {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
163 hits = None |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
164 for word in wordlist: |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
165 if not self.minlength <= len(word) <= self.maxlength: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
166 # word outside the bounds of what we index - ignore |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
167 continue |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
168 word = word.upper() |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
169 if self.is_stopword(word): |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3613
diff
changeset
|
170 continue |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
171 entry = self.words.get(word) # For each word, get index |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
172 entries[word] = entry # of matching files |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
173 if not entry: # Nothing for this one word (fail) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
174 return {} |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
175 if hits is None: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
176 hits = {} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
177 for k in entry: |
|
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
178 if k not in self.fileids: |
|
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
179 raise ValueError('Index is corrupted: re-generate it') |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
180 hits[k] = self.fileids[k] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
181 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
182 # Eliminate hits for every non-match |
|
4362
74476eaac38a
more modernisation
Richard Jones <richard@users.sourceforge.net>
parents:
4357
diff
changeset
|
183 for fileid in list(hits): |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
184 if fileid not in entry: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
185 del hits[fileid] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
186 if hits is None: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
187 return {} |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
188 return list(hits.values()) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
189 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
190 segments = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_-!" |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
191 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
192 def load_index(self, reload=0, wordlist=None): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
193 # Unless reload is indicated, do not load twice |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
194 if self.index_loaded() and not reload: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
195 return 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
196 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
197 # Ok, now let's actually load it |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
198 db = {'WORDS': {}, 'FILES': {'_TOP': (0, None)}, 'FILEIDS': {}} |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
199 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
200 # Identify the relevant word-dictionary segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
201 if not wordlist: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
202 segments = self.segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
203 else: |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
204 segments = ['-', '#'] |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
205 for word in wordlist: |
|
5470
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
206 initchar = word[0].upper() |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
207 if initchar not in self.segments: |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
208 initchar = '_' |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
209 segments.append(initchar) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
210 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
211 # Load the segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
212 for segment in segments: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
213 try: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
214 f = open(self.indexdb + segment, 'rb') |
|
5248
198b6e810c67
Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents:
4570
diff
changeset
|
215 except IOError as error: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
216 # probably just nonexistent segment index file |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
217 if error.errno != errno.ENOENT: raise # noqa: E701 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
218 else: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
219 pickle_str = zlib.decompress(f.read()) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
220 f.close() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
221 dbslice = marshal.loads(pickle_str) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
222 if dbslice.get('WORDS'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
223 # if it has some words, add them |
|
5395
23b8e6067f7c
Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5380
diff
changeset
|
224 for word, entry in dbslice['WORDS'].items(): |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
225 db['WORDS'][word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
226 if dbslice.get('FILES'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
227 # if it has some files, add them |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
228 db['FILES'] = dbslice['FILES'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
229 if dbslice.get('FILEIDS'): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
230 # if it has fileids, add them |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
231 db['FILEIDS'] = dbslice['FILEIDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
232 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
233 self.words = db['WORDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
234 self.files = db['FILES'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
235 self.fileids = db['FILEIDS'] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
236 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
237 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
238 def save_index(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
239 # only save if the index is loaded and changed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
240 if not self.index_loaded() or not self.changed: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
241 return |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
242 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
243 # brutal space saver... delete all the small segments |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
244 for segment in self.segments: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
245 try: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
246 os.remove(self.indexdb + segment) |
|
5248
198b6e810c67
Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents:
4570
diff
changeset
|
247 except OSError as error: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
248 # probably just nonexistent segment index file |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
249 if error.errno != errno.ENOENT: raise # noqa: E701 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
250 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
251 # First write the much simpler filename/fileid dictionaries |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
252 dbfil = {'WORDS': None, 'FILES': self.files, 'FILEIDS': self.fileids} |
|
7690
d17e57220a62
fix: close file properly in indexer_dbm.py:save_index()
John Rouillard <rouilj@ieee.org>
parents:
6982
diff
changeset
|
253 marshal_fh = open(self.indexdb+'-', 'wb') |
|
d17e57220a62
fix: close file properly in indexer_dbm.py:save_index()
John Rouillard <rouilj@ieee.org>
parents:
6982
diff
changeset
|
254 marshal_fh.write(zlib.compress(marshal.dumps(dbfil))) |
|
d17e57220a62
fix: close file properly in indexer_dbm.py:save_index()
John Rouillard <rouilj@ieee.org>
parents:
6982
diff
changeset
|
255 marshal_fh.close() |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
256 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
257 # The hard part is splitting the word dictionary up, of course |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
258 letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_" |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
259 segdicts = {} # Need batch of empty dicts |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
260 for segment in letters: |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
261 segdicts[segment] = {} |
|
5395
23b8e6067f7c
Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5380
diff
changeset
|
262 for word, entry in self.words.items(): # Split into segment dicts |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
263 initchar = word[0].upper() |
|
5470
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
264 if initchar not in letters: |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
265 # if it's a unicode character, add it to the '_' segment |
|
e2baa4e6ed6d
handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5395
diff
changeset
|
266 initchar = '_' |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
267 segdicts[initchar][word] = entry |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
268 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
269 # save |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
270 for initchar in letters: |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
271 db = {'WORDS': segdicts[initchar], 'FILES': None, 'FILEIDS': None} |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
272 pickle_str = marshal.dumps(db) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
273 filename = self.indexdb + initchar |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
274 pickle_fh = open(filename, 'wb') |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
275 pickle_fh.write(zlib.compress(pickle_str)) |
|
6491
087cae2fbcea
Handle more ResourceWarning issues.
John Rouillard <rouilj@ieee.org>
parents:
6002
diff
changeset
|
276 pickle_fh.close() |
|
5380
64c4e43fbb84
Python 3 preparation: numeric literal syntax.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5248
diff
changeset
|
277 os.chmod(filename, 0o664) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
278 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
279 # save done |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
280 self.changed = 0 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
281 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
282 def purge_entry(self, identifier): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
283 '''Remove a file from file index and word index |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
284 ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
285 self.load_index() |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
286 |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
287 if identifier not in self.files: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
288 return |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
289 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
290 file_index = self.files[identifier][0] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
291 del self.files[identifier] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
292 del self.fileids[file_index] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
293 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
294 # The much harder part, cleanup the word index |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
295 for _key, occurs in self.words.items(): |
|
4357
13b3155869e0
Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents:
4252
diff
changeset
|
296 if file_index in occurs: |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
297 del occurs[file_index] |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
298 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
299 # save needed |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
300 self.changed = 1 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
301 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
302 def index_loaded(self): |
|
6982
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
303 return (hasattr(self, 'fileids') and hasattr(self, 'files') and |
|
e605ddb45701
flake8 - one var rename, import, whitespace
John Rouillard <rouilj@ieee.org>
parents:
6491
diff
changeset
|
304 hasattr(self, 'words')) |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
305 |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
306 def rollback(self): |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
307 ''' load last saved index info. ''' |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
308 self.load_index(reload=1) |
|
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
309 |
|
3613
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
310 def close(self): |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
311 pass |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
312 |
|
5f4db2650da3
implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents:
3555
diff
changeset
|
313 |
|
2089
93f03c6714d8
A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
314 # vim: set filetype=python ts=4 sw=4 et si |
