-
Notifications
You must be signed in to change notification settings - Fork 261
Implement differential fuzzer for pandoc #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3b43baf to
f28b622
Compare
|
Thanks for your contribution. The goal of the project is supporting CommonMark and Github Flavored Markdown, Pandoc target is far from the scope. May this fuzzer provide help to catch errors in CommonMark+GFM? The only case I find is when both pulldown-cmark and commonmark.js are wrong and Pandoc does the job well. On the other hand, this code is independent of the final binary and only a dev tool. What do you think, @raphlinus? |
That's what I'm thinking, yeah. Pandoc lets you select your extensions, such as |
fuzz/src/lib.rs
Outdated
| Ok(events) | ||
| } | ||
|
|
||
| pub fn normalize_pandoc(events: Vec<Event<'_>>) -> Vec<Event<'_>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is cool! I would rename normalize below to normalize_commonmarkjs or similar.
26d9ebb to
c459690
Compare
64de992 to
567866b
Compare
69b81e6 to
d03e618
Compare
0e44799 to
56b45e0
Compare
| use pulldown_cmark::{Event, Tag, TagEnd}; | ||
| match event { | ||
| Event::Start(Tag::FootnoteDefinition(id)) => { | ||
| if id.starts_with("\n") || id.ends_with("\n") || id.starts_with("\r") || id.ends_with("\r") || id.starts_with(" ") || id.starts_with("\t") || id.contains(" ") || id.contains("\t ") || id.contains(" \t") || id.contains("\t\t") || id.ends_with(" ") || id.ends_with("\t") { return }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be simpler to use the slice variants of the starts_with and ends_with patterns:
| if id.starts_with("\n") || id.ends_with("\n") || id.starts_with("\r") || id.ends_with("\r") || id.starts_with(" ") || id.starts_with("\t") || id.contains(" ") || id.contains("\t ") || id.contains(" \t") || id.contains("\t\t") || id.ends_with(" ") || id.ends_with("\t") { return }; | |
| let whitespace = &['\n', '\r', ' ', '\t']; | |
| if id.starts_with(whitespace) | |
| || id.ends_with(whitespace) | |
| || id.contains("\t ") | |
| || id.contains(" \t") | |
| || id.contains("\t\t") | |
| { | |
| return; | |
| }; |
8d3923e to
2098f86
Compare
7cc429e to
e6caf75
Compare
c8a7098 to
9b5cd31
Compare
9b5cd31 to
30bbeb4
Compare
99ca650 to
7c97616
Compare
7c97616 to
a17b615
Compare
Based on pulldown-cmark#622 and copied from https://github.com/ollpu/pulldown-cmark/tree/alt-math. Co-authored-by: rhysd <lin90162@yahoo.co.jp>
This feature is loosely based on what 63a29a1 described, but copies [commonmark-hs] more closely (the balanced braces feature is added). [commonmark-hs]: https://github.com/nschloe/github-math-bugs It largely ignores GitHub, because its math parsing [is very buggy]. [is very buggy]: https://github.com/nschloe/github-math-bugs
This approach, based on @ollpu's suggestion, tracks single `$`s in the inline tree, and merges them later. It avoids having to merge and unmerge them in some corner cases.
The essential problem is: every time you write `$$x$}`, you get another
entry added to a hash table. Even if it's not [theoretically] *quadratic*,
it's still slow. Hard limiting it to 255 entries makes this not a problem.
Interestingly enough, when I tried to write an analogous torture test
for code spans, I couldn't find a way to do it because code spans are
keyed by their *length* instead of their *position*. In order to get
N entries in the hash table, I basically had to write N `` ` `` in a
row, forcing me to write quadratic amounts of input text.
Comparison:
```
michaelhowell@Michael-Howells-Macbook-Pro pulldown-cmark % python3 -c 'print("$$x$}"*5000)' | time target/release/pulldown-cmark.old -M > /dev/null
target/release/pulldown-cmark.old -M > /dev/null 2.63s user 0.02s system 99% cpu 2.673 total
michaelhowell@Michael-Howells-Macbook-Pro pulldown-cmark % python3 -c 'print("$$x$}"*5000)' | time target/release/pulldown-cmark.new -M > /dev/null
target/release/pulldown-cmark.new -M > /dev/null 0.01s user 0.00s system 6% cpu 0.109 total
```
[theoretically]: http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html
Co-authored-by: Linda_pp <rhysd@users.noreply.github.com>
This changes things so that `$$ $ $$` is not parsed as display math. Doing that doesn't actually make sense, since it's going to make a parse error at the end anyway. https://pandoc.org/try/?params=%7B%22text%22%3A%22%24%24+%24+%24%24%22%2C%22to%22%3A%22html5%22%2C%22from%22%3A%22commonmark_x%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D
- Disallow $$ matching a closing $ and then marching delimiters in `make_math_span`. Instead, retry scanning at the second position. - Remove the `seen_first` optimization from `MathDelims`. It doesn't work with the retry strategy.
Co-authored-by: Michael Howell <michael@notriddle.com>
a17b615 to
4e02fac
Compare
No description provided.