-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Running the below (scala!)
val textCollectingVisitor = new TextCollectingVisitor()
textCollectingVisitor.collectAndGetSegments(document)
Against this markdown
Use `git status`
Git commands
```
git status
git add
git commit
```
This site was built using [GitHub Pages](https://pages.github.com/).
I get the below strings extracted (which; by the way - was a really huge start on exactly what I was trying to get to!)
"Use",
"Git commands",
"This site was built using",
"GitHub Pages"
I would also like to extract the content of the below strings.
"git status",
"git status
git add
git commit",
"https://pages.github.com/"
1.) Would the right way to approach this be to add additional VisitHandlers to the NodeVisitor in an updated version of TextCollectingVisitor?
2.) Any suggestions on how I can figure out what classes the strings that did not get extracted are parsed into? Not sure if that question is understandable. For example; does git status get parsed as a com.vladsch.flexmark.ast.Code element in the AST? Could I override some method and add a line of logging for it to tell me what AST node is being processed?
3.) I'm looking to extract all "human readable text" in an md document; just stripping out content which relates to formatting. I'm not sure if my question is clear enough to be able to answer; but here goes - how many of the classes under https://github.com/vsch/flexmark-java/tree/master/flexmark/src/main/java/com/vladsch/flexmark/ast should I provide a handlers for in order to achieve that?
Sorry again if some of my questions are unclear!
Thanks,
Brent