comparison doc/admin_guide.txt @ 6604:0d99ae7c8de6

Allow Roundup to use PostgreSQL database native full text search back_postgreql.py - schema version changes for schema version 7. configuration.py - added indexer_language checks for postgresql. Hardcoded list for now. Docs admin_guide and upgrading Tests. This also restructures the version upgrade tests for the rdbms backends. They can run all of them now as the proper cascade is developed to roll back changes to version 6.
author John Rouillard <rouilj@ieee.org>
date Thu, 27 Jan 2022 19:48:48 -0500
parents 24e2eeb2ed9a
children 328ad56e438a
comparison
equal deleted inserted replaced
6603:57dc15ad648d 6604:0d99ae7c8de6
274 brotli. If there is a precompressed gzip file present the client will 274 brotli. If there is a precompressed gzip file present the client will
275 get the gzip version and not a brotli compressed version. This 275 get the gzip version and not a brotli compressed version. This
276 mechanism allows the admin to allow use of brotli and zstd for 276 mechanism allows the admin to allow use of brotli and zstd for
277 dynamic content, but not for static content. 277 dynamic content, but not for static content.
278 278
279 Configuring native-fts Full Text Search
280 =======================================
281
282 Roundup release 2.2.0 can use database native full text search.
283 SQLite (minimum version 3.9.0) with FTS5 and PostgreSQL (minimum
284 version 11.0) with websearch_to_tsvector are supported.
285
286 To enable this method, change the ``indexer`` setting in the tracker's
287 config.ini to ``native-fts``. Then reindex using ``roundup-admin -i
288 tracker_home reindex``. The amount of time it takes to reindex
289 depends on the amount of data in your tracker, the speed of your
290 disks, etc. It can take hours.
291
292 SQLite details
293 --------------
294
295 The SQLite native-fts changes the full text search query a little bit.
296 For the other search methods, the search terms are split on white
297 space and each item in the index: a field (e.g. title), message
298 content and file content is searched for all the terms. If any term is
299 missing that item is ignored. Once the items are found they are mapped
300 to an issue and the normal issue index is displayed.
301
302 When using FTS5, the search terms can use the full text search query
303 language described at:
304 https://www.sqlite.org/fts5.html#full_text_query_syntax. This
305 supports:
306
307 * plain word search (joined with and similar to other search methods)
308 * phrase search with terms enclosed in ``"`` quotes
309 * proximity search with varying distances using ``NEAR()``
310 * boolean operations by grouping with parentheses and using ``AND``
311 and ``OR``
312 * exclusion using ``NOT``
313 * prefix searching by prefixing the term with``^``
314
315 All of the data that is indexed is in a single column, so when column
316 specifiers are used they usually result in an error which is detected
317 and an enhanced error message is produced.
318
319 Unlike the native, xapian and whoosh indexers, there are no stopwords,
320 and there is no limit to the length of terms that are indexed. Keeping
321 these would break proximity and phrase searching. This may be helpful
322 or problematic for your particular tracker.
323
324 To support the most languages available, the unicode61 tokenizer is
325 used without porter stemming. Using the ``indexer_language`` setting
326 to enable stemming for ``english`` is not available in this
327 implementation. Also ranking information is not used in this
328 implementation. These are areas for improvement.
329
330 PostgreSQL info
331 ---------------
332
333 The PostgreSQL native-fts changes the full text search query a little
334 bit. When using PostgreSQL full text search, two different query
335 languages are supported.
336
337 1. websearch - described at the end of
338 `Parsing Queries`_ under websearch_to_tsquery. This is the default.
339
340 2. tsquery - described at the beginning of `Parsing Queries`_ with
341 to_tsquery. It is enabled by starting the search phrase with ``ts:``.
342
343 .. _Parsing Queries: \
344 https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
345
346 Websearch provides a more natural style of search and supports:
347
348 * plain word search (stemmed in most cases)
349 * phrase search with terms enclosed in quotes
350 * exclusion by prefixing a term/phrase with ``-``
351 * alternative/or searching with ``or`` between terms
352 * ignores non-word characters including punctuation
353
354 Tsquery supports:
355
356 * a strict query syntax
357 * plain word search
358 * phrase search with the ``<->`` operator or enclosing the phrase in
359 ``'`` single quotes (note this will use a stemmer on the terms
360 in the phrase).
361 * proximity search with varying distances using ``<N>``
362 * boolean operations by grouping with parentheses and using ``&``
363 and ``|``
364 * exclusion using ``!``
365 * prefix searching using ``:*`` at the end of the prefix
366
367 All of the data that is indexed is in a single column and input
368 weighing is not used.
369
370 Depending on the FTS configuration (determined by the
371 ``indexer_language`` setting), stopwords are supported. PostgreSQL
372 takes the stopwords into account when calculating the data needed for
373 proximity/near searches. Like SQLite FTS, there is no limit to the
374 length of terms that are indexed. Again this may be helpful or
375 problematic for your particular tracker.
376
377 The config.ini ``indexer_language`` setting is used to define the
378 configuration used for indexing. For example with the default
379 ``english`` setting a snowball stemmer (english_stem) is used. So
380 words like 'drive' and 'driving' and 'drive-in' will all match a
381 search for 'drive' but will not match 'driver'.
382
383 The indexer_language is used as the configuration name for every call
384 to the text search functions (to_tsvector, to_tsquery). Changing this
385 requires reindexing.
386
387 The `configuration list can be obtained using using psql's`_
388 ``\dF`` command.
389
390 .. _configuration list can be obtained using using psql's: \
391 https://www.postgresql.org/docs/current/textsearch-psql.html
392
393 Roundup includes a hardcoded list for all languages supported by
394 PostgreSQL 14.1. The list includes 5 custom "languages"
395 ``custom1`` ... ``custom5`` to allow you to set up your `own textsearch
396 configuration`_ using one of the custom names. Depending on your
397 PostgreSQL version, we may allow an invalid language to be configured.
398 You will see an error about ``text search configuration ... does not
399 exist``.
400
401 .. _own textsearch configuration: \
402 https://www.postgresql.org/docs/14/textsearch-configuration.html
403
404 It may be possible to append to this list using the tracker's
405 interfaces.py. For details, see ``test/test_indexer.py`` in the
406 roundup distribution and search for ``valid_langs``. If you succeed
407 please email roundup-users AT lists.sourceforge.net with a description
408 of your success.
409
410 After changing the configuration language, you must reindex the
411 tracker since the index must match the configuration language used for
412 querying.
413
414 Also there are various `dictionaries`_ that allow you to:
415
416 * add stopwords
417 * override stemming for a term
418 * add synonyms (e.g. a search for "pg" can also match 'psql'
419 "postgresql")
420 * add terms that expand/contract the search space (Thesaurus
421 dictionary)
422 * additional transforms
423
424 .. _dictionaries: https://www.postgresql.org/docs/14/textsearch-dictionaries.html
425
426 Use of these is beyond this documentation. Please visit the
427 appropriate PostgreSQL documents.
428
429 Ranking information is not used in this implementation. Also stop
430 words set in config.ini are ignored. These are areas for improvement.
431
432 Cleaning up old native indexes
433 ------------------------------
434
435 If you are happy with the database fts indexing, you can save some space by
436 removing the data from the native text indexing tables. This requires
437 using the ``sqlite3`` or ``psql`` commands to execute SQL to delete the
438 rows in the ``__textids`` and ``__words`` tables. You can do this with
439 the following SQL commands::
440
441 delete from __words;
442 delete from __textids;
443
444 Note this deletes data from the tables and does *not* delete the
445 table.
279 446
280 Users and Security 447 Users and Security
281 ================== 448 ==================
282 449
283 Roundup holds its own user database which primarily contains a username, 450 Roundup holds its own user database which primarily contains a username,

Roundup Issue Tracker: http://roundup-tracker.org/