Mercurial > p > roundup > code
comparison test/test_indexer.py @ 6915:9ff091537f43
postgresql native-fts; more indexer tests
1) Make postgresql native-fts actually work.
2) Add simple stopword filtering to sqlite native-fts indexer.
3) Add more tests for indexer_common get_indexer
Details:
1) roundup/backends/indexer_postgresql_fts.py:
ignore ValueError raised if we try to index a string with a null
character in it. This could happen due to an incorrect text/ mime
type on a file that has nulls in it.
Replace ValueError raised by postgresql with customized
IndexerQueryError if a search string has a null in it.
roundup/backends/rdbms_common.py:
Make postgresql native-fts work. When specified it was using using
whatever was returned from get_indexer(). However loading the
native-fts indexer backend failed because there was no connection to
the postgresql database when this call was made.
Simple solution, move the call after the open_connection call in
Database::__init__().
However the open_connection call creates the schema for the
database if it is not there. The schema builds tables for
indexer=native type indexing. As part of the build it looks at the
indexer to see the min/max size of the indexed tokens. No indexer
define, we get a crash.
So it's a a chicken/egg issue. I solved it by setting the indexer
to the Indexer from indexer_common which has the min/max token size
info. I also added a no-op save_indexer to this Indexer class. I
claim save_indexer() isn't needed as a commit() on the db does all
the saving required. Then after open_connection is called, I call
get_indexer to retrieve the correct indexer and
indexer_postgresql_fts woks since the conn connection property is
defined.
roundup/backends/indexer_common.py:
add save_index() method for indexer. It does nothing but is needed
in rdbms backends during schema initialization.
2) roundup/backends/indexer_sqlite_fts.py:
when this indexer is used, the indexer test in DBTest on the word
"the" fail. This is due to missing stopword filtering. Implement
basic stopword filtering for bare stopwords (like 'the') to make the
test pass. Note: this indexer is not currently automatically run by
the CI suite, it was found during manual testing. However there is a
FIXME to extract the indexer tests from DBTest and run it using this
backend.
roundup/configuration.py, roundup/doc/admin_guide.txt:
update doc on stopword use for sqlite native-fts.
test/db_test_base.py:
DBTest::testStringBinary creates a file with nulls in it. It was
breaking postgresql with native-fts indexer. Changed test to assign
mime type application/octet-stream that prevents it from being
processed by any text search indexer.
add test to exclude indexer searching in specific props. This code
path was untested before.
test/test_indexer.py:
add test to call find with no words. Untested code path.
add test to index and find a string with a null \x00 byte. it was
tested inadvertently by testStringBinary but this makes it explicit
and moves it to indexer testing. (one version each for: generic,
postgresql and mysql)
Renamed Get_IndexerAutoSelectTest to Get_IndexerTest and renamed
autoselect tests to include autoselect. Added tests for an invalid
indexer and using native-fts with anydbm (unsupported combo) to make
sure the code does something useful if the validation in
configuration.py is broken.
test/test_liveserver.py:
add test to load an issue
add test using text search (fts) to find the issue
add tests to find issue using postgresql native-fts
test/test_postgresql.py, test/test_sqlite.py:
added explanation on how to setup integration test using native-fts.
added code to clean up test environment if native-fts test is run.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Mon, 05 Sep 2022 16:25:20 -0400 |
| parents | a23eaa3013e6 |
| children | c6b2534a58a9 |
comparison
equal
deleted
inserted
replaced
| 6914:6010c20dc104 | 6915:9ff091537f43 |
|---|---|
| 95 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') | 95 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') |
| 96 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), | 96 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), |
| 97 ('test', '2', 'foo')]) | 97 ('test', '2', 'foo')]) |
| 98 self.assertSeqEqual(self.dex.find(['blah']), [('test', '2', 'foo')]) | 98 self.assertSeqEqual(self.dex.find(['blah']), [('test', '2', 'foo')]) |
| 99 self.assertSeqEqual(self.dex.find(['blah', 'hello']), []) | 99 self.assertSeqEqual(self.dex.find(['blah', 'hello']), []) |
| 100 self.assertSeqEqual(self.dex.find([]), []) | |
| 100 | 101 |
| 101 def test_change(self): | 102 def test_change(self): |
| 102 self.dex.add_text(('test', '1', 'foo'), 'a the hello world') | 103 self.dex.add_text(('test', '1', 'foo'), 'a the hello world') |
| 103 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') | 104 self.dex.add_text(('test', '2', 'foo'), 'blah blah the world') |
| 104 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), | 105 self.assertSeqEqual(self.dex.find(['world']), [('test', '1', 'foo'), |
| 205 | 206 |
| 206 self.assertSeqEqual(self.dex.find([ u'Spr\xfcnge']), | 207 self.assertSeqEqual(self.dex.find([ u'Spr\xfcnge']), |
| 207 [('test', '1', 'a'), ('test', '2', 'a')]) | 208 [('test', '1', 'a'), ('test', '2', 'a')]) |
| 208 self.assertSeqEqual(self.dex.find([u'\u0440\u0443\u0441\u0441\u043a\u0438\u0439']), | 209 self.assertSeqEqual(self.dex.find([u'\u0440\u0443\u0441\u0441\u043a\u0438\u0439']), |
| 209 [('test', '2', 'a')]) | 210 [('test', '2', 'a')]) |
| 211 | |
| 212 def testNullChar(self): | |
| 213 """Test with null char in string. Postgres FTS will not index | |
| 214 it will just ignore string for now. | |
| 215 """ | |
| 216 string="\x00\x01fred\x255" | |
| 217 self.dex.add_text(('test', '1', 'a'), string) | |
| 218 self.assertSeqEqual(self.dex.find(string), []) | |
| 210 | 219 |
| 211 def tearDown(self): | 220 def tearDown(self): |
| 212 shutil.rmtree('test-index') | 221 shutil.rmtree('test-index') |
| 213 if hasattr(self, 'db'): | 222 if hasattr(self, 'db'): |
| 214 self.db.close() | 223 self.db.close() |
| 245 from roundup.backends.indexer_xapian import Indexer | 254 from roundup.backends.indexer_xapian import Indexer |
| 246 self.dex = Indexer(db) | 255 self.dex = Indexer(db) |
| 247 def tearDown(self): | 256 def tearDown(self): |
| 248 IndexerTest.tearDown(self) | 257 IndexerTest.tearDown(self) |
| 249 | 258 |
| 250 class Get_IndexerAutoSelectTest(anydbmOpener, unittest.TestCase): | 259 class Get_IndexerTest(anydbmOpener, unittest.TestCase): |
| 251 | 260 |
| 252 def setUp(self): | 261 def setUp(self): |
| 253 # remove previous test, ignore errors | 262 # remove previous test, ignore errors |
| 254 if os.path.exists(config.DATABASE): | 263 if os.path.exists(config.DATABASE): |
| 255 shutil.rmtree(config.DATABASE) | 264 shutil.rmtree(config.DATABASE) |
| 263 self.db.close() | 272 self.db.close() |
| 264 if os.path.exists(config.DATABASE): | 273 if os.path.exists(config.DATABASE): |
| 265 shutil.rmtree(config.DATABASE) | 274 shutil.rmtree(config.DATABASE) |
| 266 | 275 |
| 267 @skip_xapian | 276 @skip_xapian |
| 268 def test_xapian_select(self): | 277 def test_xapian_autoselect(self): |
| 269 indexer = get_indexer(self.db.config, self.db) | 278 indexer = get_indexer(self.db.config, self.db) |
| 270 self.assertIn('roundup.backends.indexer_xapian.Indexer', str(indexer)) | 279 self.assertIn('roundup.backends.indexer_xapian.Indexer', str(indexer)) |
| 271 | 280 |
| 272 @skip_whoosh | 281 @skip_whoosh |
| 273 def test_whoosh_select(self): | 282 def test_whoosh_autoselect(self): |
| 274 import mock, sys | 283 import mock, sys |
| 275 with mock.patch.dict('sys.modules', | 284 with mock.patch.dict('sys.modules', |
| 276 {'roundup.backends.indexer_xapian': None}): | 285 {'roundup.backends.indexer_xapian': None}): |
| 277 indexer = get_indexer(self.db.config, self.db) | 286 indexer = get_indexer(self.db.config, self.db) |
| 278 self.assertIn('roundup.backends.indexer_whoosh.Indexer', str(indexer)) | 287 self.assertIn('roundup.backends.indexer_whoosh.Indexer', str(indexer)) |
| 279 | 288 |
| 280 def test_native_select(self): | 289 def test_native_autoselect(self): |
| 281 import mock, sys | 290 import mock, sys |
| 282 with mock.patch.dict('sys.modules', | 291 with mock.patch.dict('sys.modules', |
| 283 {'roundup.backends.indexer_xapian': None, | 292 {'roundup.backends.indexer_xapian': None, |
| 284 'roundup.backends.indexer_whoosh': None}): | 293 'roundup.backends.indexer_whoosh': None}): |
| 285 indexer = get_indexer(self.db.config, self.db) | 294 indexer = get_indexer(self.db.config, self.db) |
| 286 self.assertIn('roundup.backends.indexer_dbm.Indexer', str(indexer)) | 295 self.assertIn('roundup.backends.indexer_dbm.Indexer', str(indexer)) |
| 296 | |
| 297 def test_invalid_indexer(self): | |
| 298 """There is code at the end of indexer_common::get_indexer() to | |
| 299 raise an AssertionError if the indexer name is invalid. | |
| 300 This should never be triggered. If it is, it means that | |
| 301 the code in configure.py that validates indexer names | |
| 302 allows a name through that get_indexer can't handle. | |
| 303 | |
| 304 Simulate that failure and make sure that the | |
| 305 AssertionError is raised. | |
| 306 | |
| 307 """ | |
| 308 | |
| 309 with self.assertRaises(ValueError) as cm: | |
| 310 self.db.config['INDEXER'] = 'no_such_indexer' | |
| 311 | |
| 312 # mangle things so we can test AssertionError at end | |
| 313 # get_indexer() | |
| 314 from roundup.configuration import IndexerOption | |
| 315 IndexerOption.allowed.append("unrecognized_indexer") | |
| 316 self.db.config['INDEXER'] = "unrecognized_indexer" | |
| 317 | |
| 318 with self.assertRaises(AssertionError) as cm: | |
| 319 indexer = get_indexer(self.db.config, self.db) | |
| 320 | |
| 321 # unmangle state | |
| 322 IndexerOption.allowed.pop() | |
| 323 self.assertNotIn("unrecognized_indexer", IndexerOption.allowed) | |
| 324 self.db.config['INDEXER'] = "" | |
| 325 | |
| 326 def test_unsupported_by_db(self): | |
| 327 """This requires that the db associated with the test | |
| 328 is not sqlite or postgres. anydbm works fine to trigger | |
| 329 the error. | |
| 330 """ | |
| 331 self.db.config['INDEXER'] = 'native-fts' | |
| 332 with self.assertRaises(AssertionError) as cm: | |
| 333 get_indexer(self.db.config, self.db) | |
| 334 | |
| 335 self.assertIn("native-fts", cm.exception.args[0]) | |
| 336 self.db.config['INDEXER'] = '' | |
| 287 | 337 |
| 288 class RDBMSIndexerTest(object): | 338 class RDBMSIndexerTest(object): |
| 289 def setUp(self): | 339 def setUp(self): |
| 290 # remove previous test, ignore errors | 340 # remove previous test, ignore errors |
| 291 if os.path.exists(config.DATABASE): | 341 if os.path.exists(config.DATABASE): |
| 518 self.assertIn('search configuration "foo" does', ctx.exception.args[0]) | 568 self.assertIn('search configuration "foo" does', ctx.exception.args[0]) |
| 519 self.db.rollback() | 569 self.db.rollback() |
| 520 | 570 |
| 521 self.db.config["INDEXER_LANGUAGE"] = "english" | 571 self.db.config["INDEXER_LANGUAGE"] = "english" |
| 522 | 572 |
| 573 def testNullChar(self): | |
| 574 """Test with null char in string. Postgres FTS throws a ValueError | |
| 575 on indexing which we ignore. This could happen when | |
| 576 indexing a binary file with a bad mime type. On find, it | |
| 577 throws a ProgrammingError that we remap to | |
| 578 IndexerQueryError and pass up. If a null gets to that | |
| 579 level on search somebody entered it (not sure how you | |
| 580 could actually do that) but we want a crash in that case | |
| 581 as the person is probably up to "no good" (R) (TM). | |
| 582 | |
| 583 """ | |
| 584 import psycopg2 | |
| 585 | |
| 586 string="\x00\x01fred\x255" | |
| 587 self.dex.add_text(('test', '1', 'a'), string) | |
| 588 with self.assertRaises(IndexerQueryError) as ctx: | |
| 589 self.assertSeqEqual(self.dex.find(string), []) | |
| 590 | |
| 591 self.assertIn("null", ctx.exception.args[0]) | |
| 592 | |
| 523 @skip_mysql | 593 @skip_mysql |
| 524 class mysqlIndexerTest(mysqlOpener, RDBMSIndexerTest, IndexerTest): | 594 class mysqlIndexerTest(mysqlOpener, RDBMSIndexerTest, IndexerTest): |
| 525 def setUp(self): | 595 def setUp(self): |
| 526 mysqlOpener.setUp(self) | 596 mysqlOpener.setUp(self) |
| 527 RDBMSIndexerTest.setUp(self) | 597 RDBMSIndexerTest.setUp(self) |
| 659 self.dex.find(['hello world + ^the']) | 729 self.dex.find(['hello world + ^the']) |
| 660 | 730 |
| 661 error = 'Query error: syntax error near "^"' | 731 error = 'Query error: syntax error near "^"' |
| 662 self.assertEqual(str(ctx.exception), error) | 732 self.assertEqual(str(ctx.exception), error) |
| 663 | 733 |
| 734 def testNullChar(self): | |
| 735 """Test with null char in string. FTS will throw | |
| 736 an error on null. | |
| 737 """ | |
| 738 import psycopg2 | |
| 739 | |
| 740 string="\x00\x01fred\x255" | |
| 741 self.dex.add_text(('test', '1', 'a'), string) | |
| 742 with self.assertRaises(IndexerQueryError) as cm: | |
| 743 self.assertSeqEqual(self.dex.find(string), []) | |
| 744 | |
| 664 # vim: set filetype=python ts=4 sw=4 et si | 745 # vim: set filetype=python ts=4 sw=4 et si |
