annotate roundup/backends/indexer_dbm.py @ 5548:fea11d05110e

Avoid errors from selecting "no selection" on multilink (issue2550722). As discussed in issue 2550722 there are various cases where selecting "no selection" on a multilink can result in inappropriate errors from Roundup: * If selecting "no selection" produces a null edit (a value was set in the multilink in an edit with an error, then removed again, along with all other changes, in the next form submission), so the page is rendered from the form contents including the "-<id>" value for "no selection" for the multilink. * If creating an item with a nonempty value for a multilink has an error, and the resubmission changes that multilink to "no selection" (and this in turn has subcases, according to whether the creation then succeeds or fails on the resubmission, which need fixes in different places in the Roundup code). All of these cases have in common that it is expected and OK to have a "-<id>" value for a submission for a multilink when <id> is not set in that multilink in the database (because the original attempt to set <id> in that multilink had an error), so the hyperdb.py logic to give an error in that case is thus removed. In the subcase of the second case where the resubmission with "no selection" has an error, the templating code tries to produce a menu entry for the "-<id>" multilink value, which also results in an error, hence the templating.py change to ignore such values in the list for a multilink.
author Joseph Myers <jsm@polyomino.org.uk>
date Thu, 27 Sep 2018 11:33:01 +0000
parents e2baa4e6ed6d
children 4c7662c86a36
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
1 #
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
2 # This module is derived from the module described at:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
3 # http://gnosis.cx/publish/programming/charming_python_15.txt
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
4 #
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
5 # Author: David Mertz (mertz@gnosis.cx)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
6 # Thanks to: Pat Knight (p.knight@ktgroup.co.uk)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
7 # Gregory Popovitch (greg@gpy.com)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
8 #
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
9 # The original module was released under this license, and remains under
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
10 # it:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
11 #
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
12 # This file is released to the public domain. I (dqm) would
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
13 # appreciate it if you choose to keep derived works under terms
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
14 # that promote freedom, but obviously am giving up any rights
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
15 # to compel such.
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
16 #
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
17 '''This module provides an indexer class, RoundupIndexer, that stores text
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
18 indices in a roundup instance. This class makes searching the content of
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
19 messages, string properties and text files possible.
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
20 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
21 __docformat__ = 'restructuredtext'
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
22
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
23 import os, shutil, re, mimetypes, marshal, zlib, errno
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
24 from roundup.hyperdb import Link, Multilink
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3295
diff changeset
25 from roundup.backends.indexer_common import Indexer as IndexerBase
2872
d530b68e4b42 don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents: 2089
diff changeset
26
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3295
diff changeset
27 class Indexer(IndexerBase):
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
28 '''Indexes information from roundup's hyperdb to allow efficient
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
29 searching.
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
30
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
31 Three structures are created by the indexer::
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
32
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
33 files {identifier: (fileid, wordcount)}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
34 words {word: {fileid: count}}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
35 fileids {fileid: identifier}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
36
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
37 where identifier is (classname, nodeid, propertyname)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
38 '''
3295
a615cc230160 added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
39 def __init__(self, db):
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3295
diff changeset
40 IndexerBase.__init__(self, db)
3295
a615cc230160 added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
41 self.indexdb_path = os.path.join(db.config.DATABASE, 'indexes')
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
42 self.indexdb = os.path.join(self.indexdb_path, 'index.db')
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
43 self.reindex = 0
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
44 self.quiet = 9
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
45 self.changed = 0
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
46
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
47 # see if we need to reindex because of a change in code
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
48 version = os.path.join(self.indexdb_path, 'version')
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
49 if (not os.path.exists(self.indexdb_path) or
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
50 not os.path.exists(version)):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
51 # for now the file itself is a flag
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
52 self.force_reindex()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
53 elif os.path.exists(version):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
54 version = open(version).read()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
55 # check the value and reindex if it's not the latest
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
56 if version.strip() != '1':
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
57 self.force_reindex()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
58
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
59 def force_reindex(self):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
60 '''Force a reindex condition
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
61 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
62 if os.path.exists(self.indexdb_path):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
63 shutil.rmtree(self.indexdb_path)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
64 os.makedirs(self.indexdb_path)
5380
64c4e43fbb84 Python 3 preparation: numeric literal syntax.
Joseph Myers <jsm@polyomino.org.uk>
parents: 5248
diff changeset
65 os.chmod(self.indexdb_path, 0o775)
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
66 open(os.path.join(self.indexdb_path, 'version'), 'w').write('1\n')
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
67 self.reindex = 1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
68 self.changed = 1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
69
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
70 def should_reindex(self):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
71 '''Should we reindex?
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
72 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
73 return self.reindex
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
74
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
75 def add_text(self, identifier, text, mime_type='text/plain'):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
76 '''Add some text associated with the (classname, nodeid, property)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
77 identifier.
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
78 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
79 # make sure the index is loaded
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
80 self.load_index()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
81
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
82 # remove old entries for this identifier
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
83 if identifier in self.files:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
84 self.purge_entry(identifier)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
85
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
86 # split into words
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
87 words = self.splitter(text, mime_type)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
88
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
89 # Find new file index, and assign it to identifier
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
90 # (_TOP uses trick of negative to avoid conflict with file index)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
91 self.files['_TOP'] = (self.files['_TOP'][0]-1, None)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
92 file_index = abs(self.files['_TOP'][0])
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
93 self.files[identifier] = (file_index, len(words))
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
94 self.fileids[file_index] = identifier
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
95
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
96 # find the unique words
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
97 filedict = {}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
98 for word in words:
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3295
diff changeset
99 if self.is_stopword(word):
2872
d530b68e4b42 don't index common words [SF#1046612]
Richard Jones <richard@users.sourceforge.net>
parents: 2089
diff changeset
100 continue
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
101 if word in filedict:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
102 filedict[word] = filedict[word]+1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
103 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
104 filedict[word] = 1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
105
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
106 # now add to the totals
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
107 for word in filedict:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
108 # each word has a dict of {identifier: count}
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
109 if word in self.words:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
110 entry = self.words[word]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
111 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
112 # new word
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
113 entry = {}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
114 self.words[word] = entry
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
115
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
116 # make a reference to the file for this word
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
117 entry[file_index] = filedict[word]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
118
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
119 # save needed
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
120 self.changed = 1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
121
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
122 def splitter(self, text, ftype):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
123 '''Split the contents of a text string into a list of 'words'
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
124 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
125 if ftype == 'text/plain':
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
126 words = self.text_splitter(text)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
127 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
128 return []
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
129 return words
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
130
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
131 def text_splitter(self, text):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
132 """Split text/plain string into a list of words
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
133 """
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
134 # case insensitive
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
135 text = str(text).upper()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
136
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
137 # Split the raw text
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
138 return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength),
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
139 text)
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
140
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
141 # we override this to ignore too short and too long words
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
142 # and also to fix a bug - the (fail) case.
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
143 def find(self, wordlist):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
144 '''Locate files that match ALL the words in wordlist
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
145 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
146 if not hasattr(self, 'words'):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
147 self.load_index()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
148 self.load_index(wordlist=wordlist)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
149 entries = {}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
150 hits = None
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
151 for word in wordlist:
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
152 if not self.minlength <= len(word) <= self.maxlength:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
153 # word outside the bounds of what we index - ignore
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
154 continue
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
155 word = word.upper()
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
156 if self.is_stopword(word):
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 3613
diff changeset
157 continue
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
158 entry = self.words.get(word) # For each word, get index
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
159 entries[word] = entry # of matching files
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
160 if not entry: # Nothing for this one word (fail)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
161 return {}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
162 if hits is None:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
163 hits = {}
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
164 for k in entry:
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
165 if k not in self.fileids:
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
166 raise ValueError('Index is corrupted: re-generate it')
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
167 hits[k] = self.fileids[k]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
168 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
169 # Eliminate hits for every non-match
4362
74476eaac38a more modernisation
Richard Jones <richard@users.sourceforge.net>
parents: 4357
diff changeset
170 for fileid in list(hits):
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
171 if fileid not in entry:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
172 del hits[fileid]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
173 if hits is None:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
174 return {}
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
175 return list(hits.values())
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
176
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
177 segments = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_-!"
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
178 def load_index(self, reload=0, wordlist=None):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
179 # Unless reload is indicated, do not load twice
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
180 if self.index_loaded() and not reload:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
181 return 0
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
182
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
183 # Ok, now let's actually load it
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
184 db = {'WORDS': {}, 'FILES': {'_TOP':(0,None)}, 'FILEIDS': {}}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
185
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
186 # Identify the relevant word-dictionary segments
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
187 if not wordlist:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
188 segments = self.segments
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
189 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
190 segments = ['-','#']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
191 for word in wordlist:
5470
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
192 initchar = word[0].upper()
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
193 if initchar not in self.segments:
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
194 initchar = '_'
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
195 segments.append(initchar)
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
196
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
197 # Load the segments
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
198 for segment in segments:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
199 try:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
200 f = open(self.indexdb + segment, 'rb')
5248
198b6e810c67 Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents: 4570
diff changeset
201 except IOError as error:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
202 # probably just nonexistent segment index file
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
203 if error.errno != errno.ENOENT: raise
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
204 else:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
205 pickle_str = zlib.decompress(f.read())
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
206 f.close()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
207 dbslice = marshal.loads(pickle_str)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
208 if dbslice.get('WORDS'):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
209 # if it has some words, add them
5395
23b8e6067f7c Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents: 5380
diff changeset
210 for word, entry in dbslice['WORDS'].items():
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
211 db['WORDS'][word] = entry
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
212 if dbslice.get('FILES'):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
213 # if it has some files, add them
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
214 db['FILES'] = dbslice['FILES']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
215 if dbslice.get('FILEIDS'):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
216 # if it has fileids, add them
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
217 db['FILEIDS'] = dbslice['FILEIDS']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
218
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
219 self.words = db['WORDS']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
220 self.files = db['FILES']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
221 self.fileids = db['FILEIDS']
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
222 self.changed = 0
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
223
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
224 def save_index(self):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
225 # only save if the index is loaded and changed
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
226 if not self.index_loaded() or not self.changed:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
227 return
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
228
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
229 # brutal space saver... delete all the small segments
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
230 for segment in self.segments:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
231 try:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
232 os.remove(self.indexdb + segment)
5248
198b6e810c67 Use Python-3-compatible 'as' syntax for except statements
Eric S. Raymond <esr@thyrsus.com>
parents: 4570
diff changeset
233 except OSError as error:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
234 # probably just nonexistent segment index file
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
235 if error.errno != errno.ENOENT: raise
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
236
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
237 # First write the much simpler filename/fileid dictionaries
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
238 dbfil = {'WORDS':None, 'FILES':self.files, 'FILEIDS':self.fileids}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
239 open(self.indexdb+'-','wb').write(zlib.compress(marshal.dumps(dbfil)))
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
240
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
241 # The hard part is splitting the word dictionary up, of course
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
242 letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_"
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
243 segdicts = {} # Need batch of empty dicts
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
244 for segment in letters:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
245 segdicts[segment] = {}
5395
23b8e6067f7c Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents: 5380
diff changeset
246 for word, entry in self.words.items(): # Split into segment dicts
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
247 initchar = word[0].upper()
5470
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
248 if initchar not in letters:
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
249 # if it's a unicode character, add it to the '_' segment
e2baa4e6ed6d handle words starting with unicode characters
Christof Meerwald <cmeerw@cmeerw.org>
parents: 5395
diff changeset
250 initchar = '_'
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
251 segdicts[initchar][word] = entry
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
252
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
253 # save
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
254 for initchar in letters:
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
255 db = {'WORDS':segdicts[initchar], 'FILES':None, 'FILEIDS':None}
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
256 pickle_str = marshal.dumps(db)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
257 filename = self.indexdb + initchar
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
258 pickle_fh = open(filename, 'wb')
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
259 pickle_fh.write(zlib.compress(pickle_str))
5380
64c4e43fbb84 Python 3 preparation: numeric literal syntax.
Joseph Myers <jsm@polyomino.org.uk>
parents: 5248
diff changeset
260 os.chmod(filename, 0o664)
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
261
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
262 # save done
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
263 self.changed = 0
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
264
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
265 def purge_entry(self, identifier):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
266 '''Remove a file from file index and word index
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
267 '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
268 self.load_index()
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
269
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
270 if identifier not in self.files:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
271 return
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
272
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
273 file_index = self.files[identifier][0]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
274 del self.files[identifier]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
275 del self.fileids[file_index]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
276
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
277 # The much harder part, cleanup the word index
5395
23b8e6067f7c Python 3 preparation: update calls to dict methods.
Joseph Myers <jsm@polyomino.org.uk>
parents: 5380
diff changeset
278 for key, occurs in self.words.items():
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4252
diff changeset
279 if file_index in occurs:
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
280 del occurs[file_index]
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
281
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
282 # save needed
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
283 self.changed = 1
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
284
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
285 def index_loaded(self):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
286 return (hasattr(self,'fileids') and hasattr(self,'files') and
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
287 hasattr(self,'words'))
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
288
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
289 def rollback(self):
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
290 ''' load last saved index info. '''
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
291 self.load_index(reload=1)
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
292
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3555
diff changeset
293 def close(self):
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3555
diff changeset
294 pass
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3555
diff changeset
295
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3555
diff changeset
296
2089
93f03c6714d8 A few big changes in this commit:
Richard Jones <richard@users.sourceforge.net>
parents:
diff changeset
297 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/