annotate roundup/backends/indexer_common.py @ 3997:edbb89730dc2

Fix indexer handling of indexed Link properties
author Richard Jones <richard@users.sourceforge.net>
date Mon, 18 Aug 2008 06:57:49 +0000
parents 44603dd791b7
children 20c9a1cefb39
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
1 #$Id: indexer_common.py,v 1.9 2008-08-18 06:57:49 richard Exp $
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
2 import re, sets
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
3
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
4 from roundup import hyperdb
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
5
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
6 STOPWORDS = [
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
7 "A", "AND", "ARE", "AS", "AT", "BE", "BUT", "BY",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
8 "FOR", "IF", "IN", "INTO", "IS", "IT",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
9 "NO", "NOT", "OF", "ON", "OR", "SUCH",
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
10 "THAT", "THE", "THEIR", "THEN", "THERE", "THESE",
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
11 "THEY", "THIS", "TO", "WAS", "WILL", "WITH"
3092
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
12 ]
a8c2371f45b6 Some cleanup:
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents: 3088
diff changeset
13
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
14 def _isLink(propclass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
15 return (isinstance(propclass, hyperdb.Link) or
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
16 isinstance(propclass, hyperdb.Multilink))
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
17
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
18 class Indexer:
3544
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
19 def __init__(self, db):
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
20 self.stopwords = sets.Set(STOPWORDS)
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
21 for word in db.config[('main', 'indexer_stopwords')]:
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
22 self.stopwords.add(word)
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
23
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
24 def is_stopword(self, word):
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
25 return word in self.stopwords
5cd1c83dea50 Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents: 3092
diff changeset
26
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
27 def getHits(self, search_terms, klass):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
28 return self.find(search_terms)
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
29
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
30 def search(self, search_terms, klass, ignore={}):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
31 '''Display search results looking for [search, terms] associated
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
32 with the hyperdb Class "klass". Ignore hits on {class: property}.
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
33
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
34 "dre" is a helper, not an argument.
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
35 '''
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
36 # do the index lookup
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
37 hits = self.getHits(search_terms, klass)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
38 if not hits:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
39 return {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
40
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
41 designator_propname = {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
42 for nm, propclass in klass.getprops().items():
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
43 if _isLink(propclass):
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
44 designator_propname.setdefault(propclass.classname,
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
45 []).append(nm)
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
46
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
47 # build a dictionary of nodes and their associated messages
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
48 # and files
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
49 nodeids = {} # this is the answer
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
50 propspec = {} # used to do the klass.find
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
51 for l in designator_propname.values():
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
52 for propname in l:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
53 propspec[propname] = {} # used as a set (value doesn't matter)
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
54
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
55 # don't unpack hits entries as sqlite3's Row can't be unpacked :(
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
56 for entry in hits:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
57 # skip this result if we don't care about this class/property
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
58 classname = entry[0]
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
59 property = entry[2]
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
60 if ignore.has_key((classname, property)):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
61 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
62
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
63 # if it's a property on klass, it's easy
3718
0d561b24ceff support sqlite3
Richard Jones <richard@users.sourceforge.net>
parents: 3613
diff changeset
64 nodeid = entry[1]
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
65 if classname == klass.classname:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
66 if not nodeids.has_key(nodeid):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
67 nodeids[nodeid] = {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
68 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
69
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
70 # make sure the class is a linked one, otherwise ignore
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
71 if not designator_propname.has_key(classname):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
72 continue
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
73
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
74 # it's a linked class - set up to do the klass.find
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
75 for linkprop in designator_propname[classname]:
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
76 propspec[linkprop][nodeid] = 1
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
77
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
78 # retain only the meaningful entries
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
79 for propname, idset in propspec.items():
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
80 if not idset:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
81 del propspec[propname]
3751
44603dd791b7 full-text search wasn't coping with multiple multilinks to the same class
Richard Jones <richard@users.sourceforge.net>
parents: 3718
diff changeset
82
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
83 # klass.find tells me the klass nodeids the linked nodes relate to
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
84 propdefs = klass.getprops()
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
85 for resid in klass.find(**propspec):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
86 resid = str(resid)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
87 if not nodeids.has_key(id):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
88 nodeids[resid] = {}
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
89 node_dict = nodeids[resid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
90 # now figure out where it came from
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
91 for linkprop in propspec.keys():
3997
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
92 v = klass.get(resid, linkprop)
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
93 # the link might be a Link so deal with a single result or None
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
94 if isinstance(propdefs[linkprop], hyperdb.Link):
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
95 if v is None: continue
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
96 v = [v]
edbb89730dc2 Fix indexer handling of indexed Link properties
Richard Jones <richard@users.sourceforge.net>
parents: 3751
diff changeset
97 for nodeid in v:
3058
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
98 if propspec[linkprop].has_key(nodeid):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
99 # OK, this node[propname] has a winner
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
100 if not node_dict.has_key(linkprop):
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
101 node_dict[linkprop] = [nodeid]
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
102 else:
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
103 node_dict[linkprop].append(nodeid)
1c063814d567 Move search method duplicated in indexer_dbm and indexer_tsearch2...
Johannes Gijsbers <jlgijsbers@users.sourceforge.net>
parents:
diff changeset
104 return nodeids
3613
5f4db2650da3 implement close() on all indexers [SF#1242477]
Richard Jones <richard@users.sourceforge.net>
parents: 3544
diff changeset
105

Roundup Issue Tracker: http://roundup-tracker.org/