Mercurial > p > roundup > code
annotate roundup/backends/indexer_xapian.py @ 7021:4e25815961a7
flake8: remove trailing whitespace; blank lines for definitions
E305 expected 2 blank lines after class or function definition, found 1
E306 expected 1 blank line before a nested definition, found 0
W291 trailing whitespace
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sun, 09 Oct 2022 17:30:47 -0400 |
| parents | 0b6c54893ec5 |
| children | 1505f6ab86ec |
| rev | line source |
|---|---|
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
1 ''' This implements the full-text indexer using the Xapian indexer. |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
2 ''' |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
3 import os |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
4 import re |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
5 import time |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
6 import xapian |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
7 |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
8 from roundup.anypy.strings import b2s, s2b |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
9 from roundup.backends.indexer_common import Indexer as IndexerBase |
|
6353
9d209d2b34ae
Add indexer_language to change stemmer for xapian FTS indexer
John Rouillard <rouilj@ieee.org>
parents:
5964
diff
changeset
|
10 from roundup.i18n import _ |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
11 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
12 # TODO: we need to delete documents when a property is *reindexed* |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
13 |
|
5491
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
14 # Note that Xapian always uses UTF-8 encoded string, see |
|
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
15 # https://xapian.org/docs/bindings/python3/introduction.html#strings: |
|
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
16 # "Where std::string is returned, it's always mapped to bytes in |
|
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
17 # Python..." |
|
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
18 |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
19 |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
20 class Indexer(IndexerBase): |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
21 def __init__(self, db): |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
22 IndexerBase.__init__(self, db) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
23 self.db_path = db.config.DATABASE |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
24 self.reindex = 0 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
25 self.transaction_active = False |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
26 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
27 def _get_database(self): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
28 index = os.path.join(self.db_path, 'text-index') |
|
5142
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
29 for n in range(10): |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
30 try: |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
31 # if successful return |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
32 return xapian.WritableDatabase(index, xapian.DB_CREATE_OR_OPEN) |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
33 except xapian.DatabaseLockError: |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
34 # adaptive sleep. Get longer as count increases. |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
35 time_to_sleep = 0.01 * (2 << min(5, n)) |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
36 time.sleep(time_to_sleep) |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
37 # we are back to the for loop |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
38 |
|
93832cec4c31
issue2550839: Xapian, DatabaseLockError: Unable to get write lock on
John Rouillard <rouilj@ieee.org>
parents:
5108
diff
changeset
|
39 # Get here only if we dropped out of the for loop. |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
40 raise xapian.DatabaseLockError(_( |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
41 "Unable to get lock after 10 retries on %s.") % index) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
42 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
43 def save_index(self): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
44 '''Save the changes to the index.''' |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
45 if not self.transaction_active: |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
46 return |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
47 database = self._get_database() |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
48 database.commit_transaction() |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
49 self.transaction_active = False |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
50 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
51 def close(self): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
52 '''close the indexing database''' |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
53 pass |
|
3887
c7363442cdbb
change xapian stemmer to use "new" API
Justus Pendleton <jpend@users.sourceforge.net>
parents:
3555
diff
changeset
|
54 |
|
3555
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
55 def rollback(self): |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
56 if not self.transaction_active: |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
57 return |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
58 database = self._get_database() |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
59 database.cancel_transaction() |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
60 self.transaction_active = False |
|
91c495476db3
pre-release stuff and test fix
Richard Jones <richard@users.sourceforge.net>
parents:
3547
diff
changeset
|
61 |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
62 def force_reindex(self): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
63 '''Force a reindexing of the database. This essentially |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
64 empties the tables ids and index and sets a flag so |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
65 that the databases are reindexed''' |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
66 self.reindex = 1 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
67 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
68 def should_reindex(self): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
69 '''returns True if the indexes need to be rebuilt''' |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
70 return self.reindex |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
71 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
72 def add_text(self, identifier, text, mime_type='text/plain'): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
73 ''' "identifier" is (classname, itemid, property) ''' |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
74 if mime_type != 'text/plain': |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
75 return |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
76 if not text: |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
77 text = '' |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
78 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
79 # open the database and start a transaction if needed |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
80 database = self._get_database() |
|
4378
477f2a47cbca
- Indexer Xapian, made Xapian 1.2 compatible.
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
4252
diff
changeset
|
81 |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
82 # XXX: Xapian now supports transactions, |
|
4378
477f2a47cbca
- Indexer Xapian, made Xapian 1.2 compatible.
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
4252
diff
changeset
|
83 # but there is a call to save_index() missing. |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
84 # if not self.transaction_active: |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
85 # database.begin_transaction() |
|
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
86 # self.transaction_active = True |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
87 |
|
6353
9d209d2b34ae
Add indexer_language to change stemmer for xapian FTS indexer
John Rouillard <rouilj@ieee.org>
parents:
5964
diff
changeset
|
88 stemmer = xapian.Stem(self.language) |
|
3547
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
89 |
|
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
90 # We use the identifier twice: once in the actual "text" being |
|
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
91 # indexed so we can search on it, and again as the "data" being |
|
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
92 # indexed so we know what we're matching when we get results |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
93 identifier = s2b('%s:%s:%s' % identifier) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
94 |
|
3547
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
95 # create the new document |
|
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
96 doc = xapian.Document() |
|
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
97 doc.set_data(identifier) |
|
4511
931370d96c34
Xapian indexing improved:
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
4470
diff
changeset
|
98 doc.add_term(identifier, 0) |
|
3547
7728ee93efd2
fix reindexing in Xapian
Richard Jones <richard@users.sourceforge.net>
parents:
3544
diff
changeset
|
99 |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3932
diff
changeset
|
100 for match in re.finditer(r'\b\w{%d,%d}\b' |
|
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3932
diff
changeset
|
101 % (self.minlength, self.maxlength), |
|
5964
5bf7b5debb09
Fix xapian indexer for unicode
John Rouillard <rouilj@ieee.org>
parents:
5491
diff
changeset
|
102 text.upper(), re.UNICODE): |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
103 word = match.group(0) |
|
3544
5cd1c83dea50
Features and fixes.
Richard Jones <richard@users.sourceforge.net>
parents:
3295
diff
changeset
|
104 if self.is_stopword(word): |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
105 continue |
|
5491
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
106 term = stemmer(s2b(word.lower())) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
107 doc.add_posting(term, match.start(0)) |
|
4511
931370d96c34
Xapian indexing improved:
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
4470
diff
changeset
|
108 |
|
931370d96c34
Xapian indexing improved:
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
4470
diff
changeset
|
109 database.replace_document(identifier, doc) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
110 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
111 def find(self, wordlist): |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
112 '''look up all the words in the wordlist. |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
113 If none are found return an empty dictionary |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
114 * more rules here |
|
3887
c7363442cdbb
change xapian stemmer to use "new" API
Justus Pendleton <jpend@users.sourceforge.net>
parents:
3555
diff
changeset
|
115 ''' |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
116 if not wordlist: |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
117 return {} |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
118 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
119 database = self._get_database() |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
120 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
121 enquire = xapian.Enquire(database) |
|
6353
9d209d2b34ae
Add indexer_language to change stemmer for xapian FTS indexer
John Rouillard <rouilj@ieee.org>
parents:
5964
diff
changeset
|
122 stemmer = xapian.Stem(self.language) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
123 terms = [] |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3932
diff
changeset
|
124 for term in [word.upper() for word in wordlist |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
125 if self.minlength <= len(word) <= self.maxlength]: |
|
4252
2ff6f39aa391
Indexers behaviour made more consistent regarding length of indexed words...
Bernhard Reiter <Bernhard.Reiter@intevation.de>
parents:
3932
diff
changeset
|
126 if not self.is_stopword(term): |
|
5491
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
127 terms.append(stemmer(s2b(term.lower()))) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
128 query = xapian.Query(xapian.Query.OP_AND, terms) |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
129 |
|
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
130 enquire.set_query(query) |
|
4841
3ff1a288fb9c
issue2550583, issue2550635 Do not limit results with Xapian indexer
Thomas Arendsen Hein <thomas@intevation.de>
parents:
4570
diff
changeset
|
131 matches = enquire.get_mset(0, database.get_doccount()) |
|
3295
a615cc230160
added Xapian indexer; replaces standard indexers if Xapian is available
Richard Jones <richard@users.sourceforge.net>
parents:
diff
changeset
|
132 |
|
5491
e72573996caf
fixed encoding issues for Xapian indexer
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5142
diff
changeset
|
133 return [tuple(b2s(m.document.get_data()).split(':')) |
|
6979
0b6c54893ec5
flake8 - also added translation to an error string.
John Rouillard <rouilj@ieee.org>
parents:
6356
diff
changeset
|
134 for m in matches] |
