Commit afcef31
committed
regex: Fix expansion of multibyte characters
The expansion routine was mistakenly forcing all \u expansion to
be preceeded by a \, to tell the regex engine to treat the
resulting character as a literal. When it comes to multi-byte
characters this was injecting a \ between the two characters.
Instead check if we have a high surrogate, and if so check for a
following low surrogate and expand them as a pair.
Bug: T403212
Change-Id: I612c24a4b1c7035341e318110c1442c5ea154a451 parent 9600080 commit afcef31
File tree
3 files changed
+31
-10
lines changed- lucene-regex-rewriter/src
- main/java/org/wikimedia/utils/regex
- test/java/org/wikimedia/utils/regex
3 files changed
+31
-10
lines changedLines changed: 21 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | 92 | | |
| 93 | + | |
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
| 111 | + | |
111 | 112 | | |
112 | | - | |
| 113 | + | |
113 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
114 | 131 | | |
115 | 132 | | |
116 | 133 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
130 | 134 | | |
131 | 135 | | |
132 | 136 | | |
| |||
Lines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
345 | 345 | | |
346 | 346 | | |
347 | 347 | | |
| 348 | + | |
348 | 349 | | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | 358 | | |
359 | 359 | | |
360 | | - | |
| 360 | + | |
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
| |||
0 commit comments