annotate roundup/backends/indexer_sqlite_fts.py @ 6588:91ab3e0ffcd0

Summary: Add test cases for sqlite fts Add support for using the FTS5 full text query engine for sqlite. Also stubbed out some sections for adding postgresql FTS support as well. Added nee indexer type native-fts. It is not selected by default. The indexer=native is used if no indexer is set. This prevents an upgrade from seeming to wipe out the native index if upgraded and indexer=native is not explicitly set. Docs updated. Also changed section headers to sentence case for the current release notes. Indexing backend can control if the full text search phrase is broken into a list of words or passed intact. For backends with query languages (sqlite and can be enabled for whoosh and xapian) we do not want the phrase "tokenized" on whitespace. This also updates the rdbms database version to version 7 to add FTS table. I will be using the same version when I add postgresql. If somebody runs this version on postgresql, they will have to manually add the fts tables for postgresql if they want to use it. Added a new renderError method to client. This allows errors to be reported still using page.html rather than raw html. It also supports templates for any error code. If no template for the error code (e.g. 400) is found, the error in raw html with no page frame is shown. New IndexerQueryError exception to pass back message about query syntax errors.
author John Rouillard <rouilj@ieee.org>
date Sun, 23 Jan 2022 18:57:45 -0500
parents
children 9ff091537f43
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6588
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
1 """ This implements the full-text indexer using fts5 in sqlite.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
2 The table consists of (Class, propname, itemid) instances as columns
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
3 along with a textblob column. The textblob column is searched using
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
4 MATCH and the instances returned.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
5
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
6 sqlite test commands to manage schema version change required by
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
7 this update.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
8
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
9 -- check length before and after
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
10 select length(schema) from schema;
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
11
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
12 -- reset from version 7 (with fts index) to version 6
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
13 update schema set schema = (select replace(schema,
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
14 '''version'': 7','''version'': 6') as new_schema from schema);
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
15
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
16 -- check version. Good thing it's at the front of the schema
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
17 select substr(schema,0,15) from schema;
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
18 {'version': 6,
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
19 """
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
20
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
21 from roundup.backends.indexer_common import Indexer as IndexerBase
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
22 from roundup.i18n import _
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
23 from roundup.cgi.exceptions import IndexerQueryError
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
24
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
25 try:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
26 import sqlite3 as sqlite
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
27 if sqlite.sqlite_version_info < (3, 9, 0):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
28 raise ValueError('sqlite minimum version for FTS5 is 3.9.0+ '
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
29 '- %s found' % sqlite.sqlite_version)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
30 except ImportError:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
31 raise ValueError('Unable to import sqlite3 to support FTS.')
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
32
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
33
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
34 class Indexer(IndexerBase):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
35 def __init__(self, db):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
36 IndexerBase.__init__(self, db)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
37 self.db = db
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
38 self.reindex = 0
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
39 self.query_language = True
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
40
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
41 def close(self):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
42 """close the indexing database"""
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
43 # just nuke the circular reference
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
44 self.db = None
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
45
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
46 def save_index(self):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
47 """Save the changes to the index."""
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
48 # not necessary - the RDBMS connection will handle this for us
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
49 pass
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
50
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
51 def force_reindex(self):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
52 """Force a reindexing of the database. This essentially
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
53 empties the __fts table and sets a flag so
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
54 that the databases are reindexed"""
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
55 self.reindex = 1
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
56
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
57 def should_reindex(self):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
58 """returns True if the indexes need to be rebuilt"""
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
59 return self.reindex
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
60
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
61 def add_text(self, identifier, text, mime_type='text/plain'):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
62 """ "identifier" is (classname, itemid, property) """
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
63 if mime_type != 'text/plain':
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
64 return
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
65
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
66 # Ensure all elements of the identifier are strings 'cos the itemid
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
67 # column is varchar even if item ids may be numbers elsewhere in the
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
68 # code. ugh.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
69 identifier = tuple(map(str, identifier))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
70
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
71 # removed pre-processing of text that incudes only words with:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
72 # self.minlength <= len(word) <= self.maxlength
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
73 # Not sure if that is correct.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
74
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
75 # first, find the rowid of the (classname, itemid, property)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
76 a = self.db.arg # arg is the token for positional parameters
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
77 sql = 'select rowid from __fts where _class=%s and '\
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
78 '_itemid=%s and _prop=%s' % (a, a, a)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
79 self.db.cursor.execute(sql, identifier)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
80 r = self.db.cursor.fetchone()
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
81 if not r:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
82 # not previously indexed
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
83 sql = 'insert into __fts (_class, _itemid, _prop, _textblob)'\
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
84 ' values (%s, %s, %s, %s)' % (a, a, a, a)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
85 self.db.cursor.execute(sql, identifier + (text,))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
86 else:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
87 id = int(r[0])
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
88 sql = 'update __fts set _textblob=%s where rowid=%s' % \
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
89 (a, a)
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
90 self.db.cursor.execute(sql, (text, id))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
91
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
92 def find(self, wordlist):
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
93 """look up all the words in the wordlist.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
94 For testing wordlist is actually a list.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
95 In production, wordlist is a list of a single string
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
96 that is a sqlite MATCH query.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
97
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
98 https://www.sqlite.org/fts5.html#full_text_query_syntax
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
99 """
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
100 if not wordlist:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
101 return []
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
102
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
103 a = self.db.arg # arg is the token for positional parameters
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
104
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
105 # removed filtering of word in wordlist to include only
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
106 # words with: self.minlength <= len(word) <= self.maxlength
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
107
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
108 sql = 'select _class, _itemid, _prop from __fts '\
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
109 'where _textblob MATCH %s' % a
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
110
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
111 try:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
112 # tests supply a multi element word list. Join them.
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
113 self.db.cursor.execute(sql, (" ".join(wordlist),))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
114 except sqlite.OperationalError as e:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
115 if 'no such column' in e.args[0]:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
116 raise IndexerQueryError(
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
117 _("Search failed. Try quoting any terms that "
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
118 "include a '-' and retry the search."))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
119 else:
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
120 raise IndexerQueryError(e.args[0].replace("fts5:",
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
121 "Query error:"))
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
122
91ab3e0ffcd0 Summary: Add test cases for sqlite fts
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
123 return self.db.cursor.fetchall()

Roundup Issue Tracker: http://roundup-tracker.org/