comparison test/test_indexer.py @ 6915:9ff091537f43

postgresql native-fts; more indexer tests 1) Make postgresql native-fts actually work. 2) Add simple stopword filtering to sqlite native-fts indexer. 3) Add more tests for indexer_common get_indexer Details: 1) roundup/backends/indexer_postgresql_fts.py: ignore ValueError raised if we try to index a string with a null character in it. This could happen due to an incorrect text/ mime type on a file that has nulls in it. Replace ValueError raised by postgresql with customized IndexerQueryError if a search string has a null in it. roundup/backends/rdbms_common.py: Make postgresql native-fts work. When specified it was using using whatever was returned from get_indexer(). However loading the native-fts indexer backend failed because there was no connection to the postgresql database when this call was made. Simple solution, move the call after the open_connection call in Database::__init__(). However the open_connection call creates the schema for the database if it is not there. The schema builds tables for indexer=native type indexing. As part of the build it looks at the indexer to see the min/max size of the indexed tokens. No indexer define, we get a crash. So it's a a chicken/egg issue. I solved it by setting the indexer to the Indexer from indexer_common which has the min/max token size info. I also added a no-op save_indexer to this Indexer class. I claim save_indexer() isn't needed as a commit() on the db does all the saving required. Then after open_connection is called, I call get_indexer to retrieve the correct indexer and indexer_postgresql_fts woks since the conn connection property is defined. roundup/backends/indexer_common.py: add save_index() method for indexer. It does nothing but is needed in rdbms backends during schema initialization. 2) roundup/backends/indexer_sqlite_fts.py: when this indexer is used, the indexer test in DBTest on the word "the" fail. This is due to missing stopword filtering. Implement basic stopword filtering for bare stopwords (like 'the') to make the test pass. Note: this indexer is not currently automatically run by the CI suite, it was found during manual testing. However there is a FIXME to extract the indexer tests from DBTest and run it using this backend. roundup/configuration.py, roundup/doc/admin_guide.txt: update doc on stopword use for sqlite native-fts. test/db_test_base.py: DBTest::testStringBinary creates a file with nulls in it. It was breaking postgresql with native-fts indexer. Changed test to assign mime type application/octet-stream that prevents it from being processed by any text search indexer. add test to exclude indexer searching in specific props. This code path was untested before. test/test_indexer.py: add test to call find with no words. Untested code path. add test to index and find a string with a null \x00 byte. it was tested inadvertently by testStringBinary but this makes it explicit and moves it to indexer testing. (one version each for: generic, postgresql and mysql) Renamed Get_IndexerAutoSelectTest to Get_IndexerTest and renamed autoselect tests to include autoselect. Added tests for an invalid indexer and using native-fts with anydbm (unsupported combo) to make sure the code does something useful if the validation in configuration.py is broken. test/test_liveserver.py: add test to load an issue add test using text search (fts) to find the issue add tests to find issue using postgresql native-fts test/test_postgresql.py, test/test_sqlite.py: added explanation on how to setup integration test using native-fts. added code to clean up test environment if native-fts test is run.
author John Rouillard <rouilj@ieee.org>
date Mon, 05 Sep 2022 16:25:20 -0400
parents a23eaa3013e6
children c6b2534a58a9
comparison
equal deleted inserted replaced
6914:6010c20dc104 6915:9ff091537f43
95 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') 95 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world')
96 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), 96 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'),
97 ('test', '2', 'foo')]) 97 ('test', '2', 'foo')])
98 self.assertSeqEqual(self.dex.find(['blah']), [('test', '2', 'foo')]) 98 self.assertSeqEqual(self.dex.find(['blah']), [('test', '2', 'foo')])
99 self.assertSeqEqual(self.dex.find(['blah', 'hello']), []) 99 self.assertSeqEqual(self.dex.find(['blah', 'hello']), [])
100 self.assertSeqEqual(self.dex.find([]), [])
100 101
101 def test_change(self): 102 def test_change(self):
102 self.dex.add_text(('test', '1', 'foo'), 'a the hello world') 103 self.dex.add_text(('test', '1', 'foo'), 'a the hello world')
103 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') 104 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world')
104 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), 105 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'),
205 206
206 self.assertSeqEqual(self.dex.find([ u'Spr\xfcnge']), 207 self.assertSeqEqual(self.dex.find([ u'Spr\xfcnge']),
207 [('test', '1', 'a'), ('test', '2', 'a')]) 208 [('test', '1', 'a'), ('test', '2', 'a')])
208 self.assertSeqEqual(self.dex.find([u'\u0440\u0443\u0441\u0441\u043a\u0438\u0439']), 209 self.assertSeqEqual(self.dex.find([u'\u0440\u0443\u0441\u0441\u043a\u0438\u0439']),
209 [('test', '2', 'a')]) 210 [('test', '2', 'a')])
211
212 def testNullChar(self):
213 """Test with null char in string. Postgres FTS will not index
214 it will just ignore string for now.
215 """
216 string="\x00\x01fred\x255"
217 self.dex.add_text(('test', '1', 'a'), string)
218 self.assertSeqEqual(self.dex.find(string), [])
210 219
211 def tearDown(self): 220 def tearDown(self):
212 shutil.rmtree('test-index') 221 shutil.rmtree('test-index')
213 if hasattr(self, 'db'): 222 if hasattr(self, 'db'):
214 self.db.close() 223 self.db.close()
245 from roundup.backends.indexer_xapian import Indexer 254 from roundup.backends.indexer_xapian import Indexer
246 self.dex = Indexer(db) 255 self.dex = Indexer(db)
247 def tearDown(self): 256 def tearDown(self):
248 IndexerTest.tearDown(self) 257 IndexerTest.tearDown(self)
249 258
250 class Get_IndexerAutoSelectTest(anydbmOpener, unittest.TestCase): 259 class Get_IndexerTest(anydbmOpener, unittest.TestCase):
251 260
252 def setUp(self): 261 def setUp(self):
253 # remove previous test, ignore errors 262 # remove previous test, ignore errors
254 if os.path.exists(config.DATABASE): 263 if os.path.exists(config.DATABASE):
255 shutil.rmtree(config.DATABASE) 264 shutil.rmtree(config.DATABASE)
263 self.db.close() 272 self.db.close()
264 if os.path.exists(config.DATABASE): 273 if os.path.exists(config.DATABASE):
265 shutil.rmtree(config.DATABASE) 274 shutil.rmtree(config.DATABASE)
266 275
267 @skip_xapian 276 @skip_xapian
268 def test_xapian_select(self): 277 def test_xapian_autoselect(self):
269 indexer = get_indexer(self.db.config, self.db) 278 indexer = get_indexer(self.db.config, self.db)
270 self.assertIn('roundup.backends.indexer_xapian.Indexer', str(indexer)) 279 self.assertIn('roundup.backends.indexer_xapian.Indexer', str(indexer))
271 280
272 @skip_whoosh 281 @skip_whoosh
273 def test_whoosh_select(self): 282 def test_whoosh_autoselect(self):
274 import mock, sys 283 import mock, sys
275 with mock.patch.dict('sys.modules', 284 with mock.patch.dict('sys.modules',
276 {'roundup.backends.indexer_xapian': None}): 285 {'roundup.backends.indexer_xapian': None}):
277 indexer = get_indexer(self.db.config, self.db) 286 indexer = get_indexer(self.db.config, self.db)
278 self.assertIn('roundup.backends.indexer_whoosh.Indexer', str(indexer)) 287 self.assertIn('roundup.backends.indexer_whoosh.Indexer', str(indexer))
279 288
280 def test_native_select(self): 289 def test_native_autoselect(self):
281 import mock, sys 290 import mock, sys
282 with mock.patch.dict('sys.modules', 291 with mock.patch.dict('sys.modules',
283 {'roundup.backends.indexer_xapian': None, 292 {'roundup.backends.indexer_xapian': None,
284 'roundup.backends.indexer_whoosh': None}): 293 'roundup.backends.indexer_whoosh': None}):
285 indexer = get_indexer(self.db.config, self.db) 294 indexer = get_indexer(self.db.config, self.db)
286 self.assertIn('roundup.backends.indexer_dbm.Indexer', str(indexer)) 295 self.assertIn('roundup.backends.indexer_dbm.Indexer', str(indexer))
296
297 def test_invalid_indexer(self):
298 """There is code at the end of indexer_common::get_indexer() to
299 raise an AssertionError if the indexer name is invalid.
300 This should never be triggered. If it is, it means that
301 the code in configure.py that validates indexer names
302 allows a name through that get_indexer can't handle.
303
304 Simulate that failure and make sure that the
305 AssertionError is raised.
306
307 """
308
309 with self.assertRaises(ValueError) as cm:
310 self.db.config['INDEXER'] = 'no_such_indexer'
311
312 # mangle things so we can test AssertionError at end
313 # get_indexer()
314 from roundup.configuration import IndexerOption
315 IndexerOption.allowed.append("unrecognized_indexer")
316 self.db.config['INDEXER'] = "unrecognized_indexer"
317
318 with self.assertRaises(AssertionError) as cm:
319 indexer = get_indexer(self.db.config, self.db)
320
321 # unmangle state
322 IndexerOption.allowed.pop()
323 self.assertNotIn("unrecognized_indexer", IndexerOption.allowed)
324 self.db.config['INDEXER'] = ""
325
326 def test_unsupported_by_db(self):
327 """This requires that the db associated with the test
328 is not sqlite or postgres. anydbm works fine to trigger
329 the error.
330 """
331 self.db.config['INDEXER'] = 'native-fts'
332 with self.assertRaises(AssertionError) as cm:
333 get_indexer(self.db.config, self.db)
334
335 self.assertIn("native-fts", cm.exception.args[0])
336 self.db.config['INDEXER'] = ''
287 337
288 class RDBMSIndexerTest(object): 338 class RDBMSIndexerTest(object):
289 def setUp(self): 339 def setUp(self):
290 # remove previous test, ignore errors 340 # remove previous test, ignore errors
291 if os.path.exists(config.DATABASE): 341 if os.path.exists(config.DATABASE):
518 self.assertIn('search configuration "foo" does', ctx.exception.args[0]) 568 self.assertIn('search configuration "foo" does', ctx.exception.args[0])
519 self.db.rollback() 569 self.db.rollback()
520 570
521 self.db.config["INDEXER_LANGUAGE"] = "english" 571 self.db.config["INDEXER_LANGUAGE"] = "english"
522 572
573 def testNullChar(self):
574 """Test with null char in string. Postgres FTS throws a ValueError
575 on indexing which we ignore. This could happen when
576 indexing a binary file with a bad mime type. On find, it
577 throws a ProgrammingError that we remap to
578 IndexerQueryError and pass up. If a null gets to that
579 level on search somebody entered it (not sure how you
580 could actually do that) but we want a crash in that case
581 as the person is probably up to "no good" (R) (TM).
582
583 """
584 import psycopg2
585
586 string="\x00\x01fred\x255"
587 self.dex.add_text(('test', '1', 'a'), string)
588 with self.assertRaises(IndexerQueryError) as ctx:
589 self.assertSeqEqual(self.dex.find(string), [])
590
591 self.assertIn("null", ctx.exception.args[0])
592
523 @skip_mysql 593 @skip_mysql
524 class mysqlIndexerTest(mysqlOpener, RDBMSIndexerTest, IndexerTest): 594 class mysqlIndexerTest(mysqlOpener, RDBMSIndexerTest, IndexerTest):
525 def setUp(self): 595 def setUp(self):
526 mysqlOpener.setUp(self) 596 mysqlOpener.setUp(self)
527 RDBMSIndexerTest.setUp(self) 597 RDBMSIndexerTest.setUp(self)
659 self.dex.find(['hello world + ^the']) 729 self.dex.find(['hello world + ^the'])
660 730
661 error = 'Query error: syntax error near "^"' 731 error = 'Query error: syntax error near "^"'
662 self.assertEqual(str(ctx.exception), error) 732 self.assertEqual(str(ctx.exception), error)
663 733
734 def testNullChar(self):
735 """Test with null char in string. FTS will throw
736 an error on null.
737 """
738 import psycopg2
739
740 string="\x00\x01fred\x255"
741 self.dex.add_text(('test', '1', 'a'), string)
742 with self.assertRaises(IndexerQueryError) as cm:
743 self.assertSeqEqual(self.dex.find(string), [])
744
664 # vim: set filetype=python ts=4 sw=4 et si 745 # vim: set filetype=python ts=4 sw=4 et si

Roundup Issue Tracker: http://roundup-tracker.org/