Mercurial > p > roundup > code
comparison doc/admin_guide.txt @ 6604:0d99ae7c8de6
Allow Roundup to use PostgreSQL database native full text search
back_postgreql.py - schema version changes for schema version 7.
configuration.py - added indexer_language checks for postgresql. Hardcoded
list for now.
Docs admin_guide and upgrading
Tests.
This also restructures the version upgrade tests for the rdbms
backends. They can run all of them now as the proper cascade is
developed to roll back changes to version 6.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 27 Jan 2022 19:48:48 -0500 |
| parents | 24e2eeb2ed9a |
| children | 328ad56e438a |
comparison
equal
deleted
inserted
replaced
| 6603:57dc15ad648d | 6604:0d99ae7c8de6 |
|---|---|
| 274 brotli. If there is a precompressed gzip file present the client will | 274 brotli. If there is a precompressed gzip file present the client will |
| 275 get the gzip version and not a brotli compressed version. This | 275 get the gzip version and not a brotli compressed version. This |
| 276 mechanism allows the admin to allow use of brotli and zstd for | 276 mechanism allows the admin to allow use of brotli and zstd for |
| 277 dynamic content, but not for static content. | 277 dynamic content, but not for static content. |
| 278 | 278 |
| 279 Configuring native-fts Full Text Search | |
| 280 ======================================= | |
| 281 | |
| 282 Roundup release 2.2.0 can use database native full text search. | |
| 283 SQLite (minimum version 3.9.0) with FTS5 and PostgreSQL (minimum | |
| 284 version 11.0) with websearch_to_tsvector are supported. | |
| 285 | |
| 286 To enable this method, change the ``indexer`` setting in the tracker's | |
| 287 config.ini to ``native-fts``. Then reindex using ``roundup-admin -i | |
| 288 tracker_home reindex``. The amount of time it takes to reindex | |
| 289 depends on the amount of data in your tracker, the speed of your | |
| 290 disks, etc. It can take hours. | |
| 291 | |
| 292 SQLite details | |
| 293 -------------- | |
| 294 | |
| 295 The SQLite native-fts changes the full text search query a little bit. | |
| 296 For the other search methods, the search terms are split on white | |
| 297 space and each item in the index: a field (e.g. title), message | |
| 298 content and file content is searched for all the terms. If any term is | |
| 299 missing that item is ignored. Once the items are found they are mapped | |
| 300 to an issue and the normal issue index is displayed. | |
| 301 | |
| 302 When using FTS5, the search terms can use the full text search query | |
| 303 language described at: | |
| 304 https://www.sqlite.org/fts5.html#full_text_query_syntax. This | |
| 305 supports: | |
| 306 | |
| 307 * plain word search (joined with and similar to other search methods) | |
| 308 * phrase search with terms enclosed in ``"`` quotes | |
| 309 * proximity search with varying distances using ``NEAR()`` | |
| 310 * boolean operations by grouping with parentheses and using ``AND`` | |
| 311 and ``OR`` | |
| 312 * exclusion using ``NOT`` | |
| 313 * prefix searching by prefixing the term with``^`` | |
| 314 | |
| 315 All of the data that is indexed is in a single column, so when column | |
| 316 specifiers are used they usually result in an error which is detected | |
| 317 and an enhanced error message is produced. | |
| 318 | |
| 319 Unlike the native, xapian and whoosh indexers, there are no stopwords, | |
| 320 and there is no limit to the length of terms that are indexed. Keeping | |
| 321 these would break proximity and phrase searching. This may be helpful | |
| 322 or problematic for your particular tracker. | |
| 323 | |
| 324 To support the most languages available, the unicode61 tokenizer is | |
| 325 used without porter stemming. Using the ``indexer_language`` setting | |
| 326 to enable stemming for ``english`` is not available in this | |
| 327 implementation. Also ranking information is not used in this | |
| 328 implementation. These are areas for improvement. | |
| 329 | |
| 330 PostgreSQL info | |
| 331 --------------- | |
| 332 | |
| 333 The PostgreSQL native-fts changes the full text search query a little | |
| 334 bit. When using PostgreSQL full text search, two different query | |
| 335 languages are supported. | |
| 336 | |
| 337 1. websearch - described at the end of | |
| 338 `Parsing Queries`_ under websearch_to_tsquery. This is the default. | |
| 339 | |
| 340 2. tsquery - described at the beginning of `Parsing Queries`_ with | |
| 341 to_tsquery. It is enabled by starting the search phrase with ``ts:``. | |
| 342 | |
| 343 .. _Parsing Queries: \ | |
| 344 https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES | |
| 345 | |
| 346 Websearch provides a more natural style of search and supports: | |
| 347 | |
| 348 * plain word search (stemmed in most cases) | |
| 349 * phrase search with terms enclosed in quotes | |
| 350 * exclusion by prefixing a term/phrase with ``-`` | |
| 351 * alternative/or searching with ``or`` between terms | |
| 352 * ignores non-word characters including punctuation | |
| 353 | |
| 354 Tsquery supports: | |
| 355 | |
| 356 * a strict query syntax | |
| 357 * plain word search | |
| 358 * phrase search with the ``<->`` operator or enclosing the phrase in | |
| 359 ``'`` single quotes (note this will use a stemmer on the terms | |
| 360 in the phrase). | |
| 361 * proximity search with varying distances using ``<N>`` | |
| 362 * boolean operations by grouping with parentheses and using ``&`` | |
| 363 and ``|`` | |
| 364 * exclusion using ``!`` | |
| 365 * prefix searching using ``:*`` at the end of the prefix | |
| 366 | |
| 367 All of the data that is indexed is in a single column and input | |
| 368 weighing is not used. | |
| 369 | |
| 370 Depending on the FTS configuration (determined by the | |
| 371 ``indexer_language`` setting), stopwords are supported. PostgreSQL | |
| 372 takes the stopwords into account when calculating the data needed for | |
| 373 proximity/near searches. Like SQLite FTS, there is no limit to the | |
| 374 length of terms that are indexed. Again this may be helpful or | |
| 375 problematic for your particular tracker. | |
| 376 | |
| 377 The config.ini ``indexer_language`` setting is used to define the | |
| 378 configuration used for indexing. For example with the default | |
| 379 ``english`` setting a snowball stemmer (english_stem) is used. So | |
| 380 words like 'drive' and 'driving' and 'drive-in' will all match a | |
| 381 search for 'drive' but will not match 'driver'. | |
| 382 | |
| 383 The indexer_language is used as the configuration name for every call | |
| 384 to the text search functions (to_tsvector, to_tsquery). Changing this | |
| 385 requires reindexing. | |
| 386 | |
| 387 The `configuration list can be obtained using using psql's`_ | |
| 388 ``\dF`` command. | |
| 389 | |
| 390 .. _configuration list can be obtained using using psql's: \ | |
| 391 https://www.postgresql.org/docs/current/textsearch-psql.html | |
| 392 | |
| 393 Roundup includes a hardcoded list for all languages supported by | |
| 394 PostgreSQL 14.1. The list includes 5 custom "languages" | |
| 395 ``custom1`` ... ``custom5`` to allow you to set up your `own textsearch | |
| 396 configuration`_ using one of the custom names. Depending on your | |
| 397 PostgreSQL version, we may allow an invalid language to be configured. | |
| 398 You will see an error about ``text search configuration ... does not | |
| 399 exist``. | |
| 400 | |
| 401 .. _own textsearch configuration: \ | |
| 402 https://www.postgresql.org/docs/14/textsearch-configuration.html | |
| 403 | |
| 404 It may be possible to append to this list using the tracker's | |
| 405 interfaces.py. For details, see ``test/test_indexer.py`` in the | |
| 406 roundup distribution and search for ``valid_langs``. If you succeed | |
| 407 please email roundup-users AT lists.sourceforge.net with a description | |
| 408 of your success. | |
| 409 | |
| 410 After changing the configuration language, you must reindex the | |
| 411 tracker since the index must match the configuration language used for | |
| 412 querying. | |
| 413 | |
| 414 Also there are various `dictionaries`_ that allow you to: | |
| 415 | |
| 416 * add stopwords | |
| 417 * override stemming for a term | |
| 418 * add synonyms (e.g. a search for "pg" can also match 'psql' | |
| 419 "postgresql") | |
| 420 * add terms that expand/contract the search space (Thesaurus | |
| 421 dictionary) | |
| 422 * additional transforms | |
| 423 | |
| 424 .. _dictionaries: https://www.postgresql.org/docs/14/textsearch-dictionaries.html | |
| 425 | |
| 426 Use of these is beyond this documentation. Please visit the | |
| 427 appropriate PostgreSQL documents. | |
| 428 | |
| 429 Ranking information is not used in this implementation. Also stop | |
| 430 words set in config.ini are ignored. These are areas for improvement. | |
| 431 | |
| 432 Cleaning up old native indexes | |
| 433 ------------------------------ | |
| 434 | |
| 435 If you are happy with the database fts indexing, you can save some space by | |
| 436 removing the data from the native text indexing tables. This requires | |
| 437 using the ``sqlite3`` or ``psql`` commands to execute SQL to delete the | |
| 438 rows in the ``__textids`` and ``__words`` tables. You can do this with | |
| 439 the following SQL commands:: | |
| 440 | |
| 441 delete from __words; | |
| 442 delete from __textids; | |
| 443 | |
| 444 Note this deletes data from the tables and does *not* delete the | |
| 445 table. | |
| 279 | 446 |
| 280 Users and Security | 447 Users and Security |
| 281 ================== | 448 ================== |
| 282 | 449 |
| 283 Roundup holds its own user database which primarily contains a username, | 450 Roundup holds its own user database which primarily contains a username, |
