annotate roundup/cgi/accept_language.py @ 6593:e70e2789bc2c

issue2551189 - increase text search maxlength This removes I think all the magic references to 25 and 30 (varchar size) and replaces them with references to maxlength or maxlength+5. I am not sure why the db column is 5 characters larger than the size of what should be the max size of a word, but I'll keep the buffer of 5 as making it 1/5 the size of maxlength makes less sense. Also added tests for fts search in templating which were missing. Added postgres, mysql and sqlite native indexing backends in which to test fts. Added fts test to native-fts as well to make sure it's working. I want to commit this now for CI. Todo: add test cases for the use of FTS in the csv output in actions.py. There is no test coverage of the match case there. change maxlength to a higher value (50) as requested in the ticket. Modify existing extremewords test cases to allow words > 25 and < 51 write code to migrate column sizes for mysql and postgresql to match maxlength I will roll this into the version 7 schema update that supports use of database fts support.
author John Rouillard <rouilj@ieee.org>
date Tue, 25 Jan 2022 13:22:00 -0500
parents 3b945aee0919
children 63c9680eed20
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
1 """Parse the Accept-Language header as defined in RFC2616.
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
2
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
3 See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
4 for details. This module should follow the spec.
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
5 Author: Hernan M. Foffani (hfoffani@gmail.com)
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
6 Some use samples:
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
7
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
8 >>> parse("da, en-gb;q=0.8, en;q=0.7")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
9 ['da', 'en_gb', 'en']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
10 >>> parse("en;q=0.2, fr;q=1")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
11 ['fr', 'en']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
12 >>> parse("zn; q = 0.2 ,pt-br;q =1")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
13 ['pt_br', 'zn']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
14 >>> parse("es-AR")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
15 ['es_AR']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
16 >>> parse("es-es-cat")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
17 ['es_es_cat']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
18 >>> parse("")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
19 []
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
20 >>> parse(None)
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
21 []
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
22 >>> parse(" ")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
23 []
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
24 >>> parse("en,")
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
25 ['en']
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
26 """
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
27
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
28 import re
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
29 import heapq
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
30
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
31 # regexp for languange-range search
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
32 nqlre = "([A-Za-z]+[-[A-Za-z]+]*)$"
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
33 # regexp for languange-range search with quality value
6030
ed8a9974c1bd flake8 cleanups. whie space changes.
John Rouillard <rouilj@ieee.org>
parents: 5809
diff changeset
34 qlre = r"([A-Za-z]+[-[A-Za-z]+]*);q=([\d\.]+)"
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
35 # both
6030
ed8a9974c1bd flake8 cleanups. whie space changes.
John Rouillard <rouilj@ieee.org>
parents: 5809
diff changeset
36 lre = re.compile(nqlre + "|" + qlre)
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
37
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
38 whitespace = ' \t\n\r\v\f'
5439
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
39 try:
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
40 # Python 3.
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
41 remove_ws = (str.maketrans('', '', whitespace),)
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
42 except AttributeError:
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
43 # Python 2.
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
44 remove_ws = (None, whitespace)
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
45
6030
ed8a9974c1bd flake8 cleanups. whie space changes.
John Rouillard <rouilj@ieee.org>
parents: 5809
diff changeset
46
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
47 def parse(language_header):
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
48 """parse(string_with_accept_header_content) -> languages list"""
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
49
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
50 if language_header is None: return []
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
51
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
52 # strip whitespaces.
5439
b00cd44fea16 Python 3 preparation: update string translate method call in cgi/accept_language.py.
Joseph Myers <jsm@polyomino.org.uk>
parents: 4362
diff changeset
53 lh = language_header.translate(*remove_ws)
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
54
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
55 # if nothing, return
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
56 if lh == "": return []
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
57
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
58 # split by commas and parse the quality values.
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
59 pls = [lre.findall(x) for x in lh.split(',')]
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
60
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
61 # drop uncomformant
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
62 qls = [x[0] for x in pls if len(x) > 0]
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
63
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
64 # use a heap queue to sort by quality values.
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
65 # the value of each item is 1.0 complement.
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
66 pq = []
6347
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
67 order=0
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
68 for l in qls:
6347
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
69 order +=1
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
70 if l[0] != '':
6347
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
71 heapq.heappush(pq, (0.0, order, l[0]))
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
72 else:
6347
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
73 heapq.heappush(pq, (1.0-float(l[2]), order, l[1]))
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
74
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
75 # get the languages ordered by quality
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
76 # and replace - by _
6347
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
77 return [ heapq.heappop(pq)[2].replace('-','_')
3b945aee0919 accept_language parse; fix priority order; preserve insertion order
John Rouillard <rouilj@ieee.org>
parents: 6030
diff changeset
78 for x in range(len(pq)) ]
3426
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
79
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
80 if __name__ == "__main__":
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
81 import doctest
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
82 doctest.testmod()
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
83
52f89836d05b Parse the Accept-Language header as defined in RFC2616.
Alexander Smishlajev <a1s@users.sourceforge.net>
parents:
diff changeset
84 # vim: set et sts=4 sw=4 :

Roundup Issue Tracker: http://roundup-tracker.org/