Skip to content

Conversation

@3w36zj6
Copy link

@3w36zj6 3w36zj6 commented Nov 2, 2025

CommonMark has a problem that the following emphasis marks ** are not recognized as emphasis marks in CJK.

**このアスタリスクは強調記号として認識されず、そのまま表示されます。**この文のせいで。

**该星号不会被识别,而是直接显示。**这是因为它没有被识别为强调符号。

**이 별표는 강조 표시로 인식되지 않고 그대로 표시됩니다(이 괄호 때문에)**이 문장 때문에.

This pull request introduces support for CJK-friendly emphasis handling in the Markdown parser, aligning with the CommonMark CJK-friendly amendments specification.

It adds a new option to enable CJK-friendly emphasis parsing, updates the delimiter run logic to properly handle CJK characters and punctuation, and includes comprehensive tests to verify the new behavior. By default, the feature is disabled to maintain backward compatibility.

In addition to the specification, I also refer to the Tips for Implementers and Concrete ranges of each terms in tats-u/markdown-cjk-friendly for implementation.

Comment on lines 2255 to 2265
#[inline]
fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {
let mut iter = s[..ix].chars();
let mut prev_prev = None;
let mut prev = None;
while let Some(ch) = iter.next() {
prev_prev = prev;
prev = Some(ch);
}
(prev, prev_prev)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates through the full string, which makes emphasis parsing O(n^2), as caught by CI.

The previous implementation uses .chars().last(), which takes advantage of DoubleEndedIterator. Also, I wouldn't put #[inline] on an internal function unless a benchmark indicates it helps.

Suggested change
#[inline]
fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {
let mut iter = s[..ix].chars();
let mut prev_prev = None;
let mut prev = None;
while let Some(ch) = iter.next() {
prev_prev = prev;
prev = Some(ch);
}
(prev, prev_prev)
}
fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {
let mut iter = s[..ix].chars().rev();
let prev = iter.next();
let prev_prev = iter.next();
(prev, prev_prev)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, maybe this should not take ix, so that it's the caller's responsibility to slice the string.

@3w36zj6 3w36zj6 requested a review from ollpu November 4, 2025 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants