MD013: Incorrect count on lines with multi-byte unicode characters

Hi, I copied a paragraph from a PDF and it contained hardcoded unicode italic characters which take 4 bytes in UTF-8 or 2 bytes in UTF-16. After pasting that to a markdown file and saving it in a file in UTF-8 encoding I started receiving `Line length [Expected: 80, Actual: 85]` warning, even though there are only 74 unicode characters displayed on the line (stored as 107 bytes).

```md
- $\forall 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝑇_𝑛: 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝐾 \iff
```

(I assume the intention of the rule is to consider the "visual count of characters" as rendered in the editor - 74 in this case)

I may be missing some context or detail of the implementation but I think the issue is a combination of JS handling everything as UTF-16 rather than UTF-8 (that is the seemingly incorrect `.length` of the line reported) and the usage of regular "unicode-unaware" regular expressions, where `.` again matches on UTF-16 character.

So I think the correct way to handle these would be `[...line].length` to get the total length of the line and the inclusion of the `u` flag for the regular expressions to switch them to unicode mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions