annotate roundup/backends/indexer_common.py @ 4651:beb8d43f4d9d

issue2550765: Don't show links in calendar that will fail. Found and fixed by Cédric Krier.
author Bernhard Reiter <bernhard@intevation.de>
date Wed, 01 Aug 2012 08:49:41 +0200
parents 6e3e4f24c753
children 4960a2c21590
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
1 import re
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
2 # Python 2.3 ... 2.6 compatibility:
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
3 from roundup.anypy.sets_ import set
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
4
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
5 from roundup import hyperdb
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
6
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
7 STOPWORDS = [
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
8 "A", "AND", "ARE", "AS", "AT", "BE", "BUT", "BY",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
9 "FOR", "IF", "IN", "INTO", "IS", "IT",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
10 "NO", "NOT", "OF", "ON", "OR", "SUCH",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
11 "THAT", "THE", "THEIR", "THEN", "THERE", "THESE",
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
12 "THEY", "THIS", "TO", "WAS", "WILL", "WITH"
3092
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
13 ]
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
14
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
15 def _isLink(propclass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
16 return (isinstance(propclass, hyperdb.Link) or
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
17 isinstance(propclass, hyperdb.Multilink))
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
18
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
19 class Indexer:
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
20 def __init__(self, db):
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
21 self.stopwords = set(STOPWORDS)
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
22 for word in db.config[('main', 'indexer_stopwords')]:
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
23 self.stopwords.add(word)
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
24 # Do not index anything longer than 25 characters since that'll be
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
25 # gibberish (encoded text or somesuch) or shorter than 2 characters
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
26 self.minlength = 2
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
27 self.maxlength = 25
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
28
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
29 def is_stopword(self, word):
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
30 return word in self.stopwords
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
31
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
32 def getHits(self, search_terms, klass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
33 return self.find(search_terms)
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
34
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
35 def search(self, search_terms, klass, ignore={}):
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
36 """Display search results looking for [search, terms] associated
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
37 with the hyperdb Class "klass". Ignore hits on {class: property}.
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
38 """
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
39 # do the index lookup
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
40 hits = self.getHits(search_terms, klass)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
41 if not hits:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
42 return {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
43
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
44 designator_propname = {}
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
45 for nm, propclass in klass.getprops().iteritems():
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
46 if _isLink(propclass):
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
47 designator_propname.setdefault(propclass.classname,
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
48 []).append(nm)
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
49
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
50 # build a dictionary of nodes and their associated messages
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
51 # and files
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
52 nodeids = {} # this is the answer
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
53 propspec = {} # used to do the klass.find
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
54 for l in designator_propname.itervalues():
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
55 for propname in l:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
56 propspec[propname] = {} # used as a set (value doesn't matter)
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
57
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
58 # don't unpack hits entries as sqlite3's Row can't be unpacked :(
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
59 for entry in hits:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
60 # skip this result if we don't care about this class/property
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
61 classname = entry[0]
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
62 property = entry[2]
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
63 if (classname, property) in ignore:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
64 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
65
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
66 # if it's a property on klass, it's easy
3998
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
67 # (make sure the nodeid is str() not unicode() as returned by some
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
68 # backends as that can cause problems down the track)
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
69 nodeid = str(entry[1])
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
70 if classname == klass.classname:
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
71 if nodeid not in nodeids:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
72 nodeids[nodeid] = {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
73 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
74
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
75 # make sure the class is a linked one, otherwise ignore
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
76 if classname not in designator_propname:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
77 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
78
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
79 # it's a linked class - set up to do the klass.find
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
80 for linkprop in designator_propname[classname]:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
81 propspec[linkprop][nodeid] = 1
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
82
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
83 # retain only the meaningful entries
4359
b9abbdd15259 another module modernised
Richard Jones <richard@users.sourceforge.net>
parents: 4357
diff changeset
84 for propname, idset in list(propspec.items()):
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
85 if not idset:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
86 del propspec[propname]
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
87
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
88 # klass.find tells me the klass nodeids the linked nodes relate to
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
89 propdefs = klass.getprops()
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
90 for resid in klass.find(**propspec):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
91 resid = str(resid)
4017
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
92 if resid in nodeids:
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
93 continue # we ignore duplicate resids
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
94 nodeids[resid] = {}
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
95 node_dict = nodeids[resid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
96 # now figure out where it came from
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
97 for linkprop in propspec:
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
98 v = klass.get(resid, linkprop)
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
99 # the link might be a Link so deal with a single result or None
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
100 if isinstance(propdefs[linkprop], hyperdb.Link):
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
101 if v is None: continue
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
102 v = [v]
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
103 for nodeid in v:
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
104 if nodeid in propspec[linkprop]:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
105 # OK, this node[propname] has a winner
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
106 if linkprop not in node_dict:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
107 node_dict[linkprop] = [nodeid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
108 else:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
109 node_dict[linkprop].append(nodeid)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
110 return nodeids
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
111

Roundup Issue Tracker: http://roundup-tracker.org/