annotate roundup/backends/indexer_common.py @ 4740:fe9568a6cbd6

Untangle template selection logic from template loading functionality.
author anatoly techtonik <techtonik@gmail.com>
date Tue, 15 Jan 2013 00:10:01 +0300
parents 4960a2c21590
children e74c3611b138
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
1 import re
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
2
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
3 from roundup import hyperdb
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
4
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
5 STOPWORDS = [
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
6 "A", "AND", "ARE", "AS", "AT", "BE", "BUT", "BY",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
7 "FOR", "IF", "IN", "INTO", "IS", "IT",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
8 "NO", "NOT", "OF", "ON", "OR", "SUCH",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
9 "THAT", "THE", "THEIR", "THEN", "THERE", "THESE",
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
10 "THEY", "THIS", "TO", "WAS", "WILL", "WITH"
3092
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
11 ]
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
12
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
13 def _isLink(propclass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
14 return (isinstance(propclass, hyperdb.Link) or
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
15 isinstance(propclass, hyperdb.Multilink))
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
16
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
17 class Indexer:
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
18 def __init__(self, db):
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
19 self.stopwords = set(STOPWORDS)
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
20 for word in db.config[('main', 'indexer_stopwords')]:
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
21 self.stopwords.add(word)
4252
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
22 # Do not index anything longer than 25 characters since that'll be
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
23 # gibberish (encoded text or somesuch) or shorter than 2 characters
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
24 self.minlength = 2
2ff6f39aa391 Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents: 4089
diff changeset
25 self.maxlength = 25
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
26
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
27 def is_stopword(self, word):
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
28 return word in self.stopwords
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
29
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
30 def getHits(self, search_terms, klass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
31 return self.find(search_terms)
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
32
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
33 def search(self, search_terms, klass, ignore={}):
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
34 """Display search results looking for [search, terms] associated
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
35 with the hyperdb Class "klass". Ignore hits on {class: property}.
4089
eddb82d0964c Add compatibility package to allow us to deal with Python versions 2.3..2.6.
Richard Jones <richard@users.sourceforge.net>
parents: 4017
diff changeset
36 """
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
37 # do the index lookup
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
38 hits = self.getHits(search_terms, klass)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
39 if not hits:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
40 return {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
41
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
42 designator_propname = {}
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
43 for nm, propclass in klass.getprops().iteritems():
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
44 if _isLink(propclass):
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
45 designator_propname.setdefault(propclass.classname,
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
46 []).append(nm)
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
47
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
48 # build a dictionary of nodes and their associated messages
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
49 # and files
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
50 nodeids = {} # this is the answer
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
51 propspec = {} # used to do the klass.find
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
52 for l in designator_propname.itervalues():
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
53 for propname in l:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
54 propspec[propname] = {} # used as a set (value doesn't matter)
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
55
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
56 # don't unpack hits entries as sqlite3's Row can't be unpacked :(
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
57 for entry in hits:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
58 # skip this result if we don't care about this class/property
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
59 classname = entry[0]
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
60 property = entry[2]
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
61 if (classname, property) in ignore:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
62 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
63
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
64 # if it's a property on klass, it's easy
3998
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
65 # (make sure the nodeid is str() not unicode() as returned by some
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
66 # backends as that can cause problems down the track)
20c9a1cefb39 make sure item ids are str()
Richard Jones <richard@users.sourceforge.net>
parents: 3997
diff changeset
67 nodeid = str(entry[1])
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
68 if classname == klass.classname:
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
69 if nodeid not in nodeids:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
70 nodeids[nodeid] = {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
71 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
72
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
73 # make sure the class is a linked one, otherwise ignore
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
74 if classname not in designator_propname:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
75 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
76
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
77 # it's a linked class - set up to do the klass.find
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
78 for linkprop in designator_propname[classname]:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
79 propspec[linkprop][nodeid] = 1
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
80
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
81 # retain only the meaningful entries
4359
b9abbdd15259 another module modernised
Richard Jones <richard@users.sourceforge.net>
parents: 4357
diff changeset
82 for propname, idset in list(propspec.items()):
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
83 if not idset:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
84 del propspec[propname]
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
85
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
86 # klass.find tells me the klass nodeids the linked nodes relate to
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
87 propdefs = klass.getprops()
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
88 for resid in klass.find(**propspec):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
89 resid = str(resid)
4017
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
90 if resid in nodeids:
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
91 continue # we ignore duplicate resids
605f4a7910b4 Small performance-improvement and bug-fix for indexer:
Ralf Schlatterbeck <schlatterbeck@users.sourceforge.net>
parents: 3998
diff changeset
92 nodeids[resid] = {}
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
93 node_dict = nodeids[resid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
94 # now figure out where it came from
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
95 for linkprop in propspec:
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
96 v = klass.get(resid, linkprop)
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
97 # the link might be a Link so deal with a single result or None
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
98 if isinstance(propdefs[linkprop], hyperdb.Link):
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
99 if v is None: continue
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
100 v = [v]
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
101 for nodeid in v:
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
102 if nodeid in propspec[linkprop]:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
103 # OK, this node[propname] has a winner
4357
13b3155869e0 Beginnings of a big code cleanup / modernisation to make 2to3 happy
Richard Jones <richard@users.sourceforge.net>
parents: 4281
diff changeset
104 if linkprop not in node_dict:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
105 node_dict[linkprop] = [nodeid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
106 else:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
107 node_dict[linkprop].append(nodeid)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
108 return nodeids
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
109

Roundup Issue Tracker: http://roundup-tracker.org/