Skip to content

Adding TextCollectingVisitor class handlers #93

@brentgracey

Description

@brentgracey

Running the below (scala!)

val textCollectingVisitor = new TextCollectingVisitor()
textCollectingVisitor.collectAndGetSegments(document)

Against this markdown

Use `git status`

Git commands
```
git status
git add
git commit
```

This site was built using [GitHub Pages](https://pages.github.com/).

I get the below strings extracted (which; by the way - was a really huge start on exactly what I was trying to get to!)

"Use",
"Git commands",
"This site was built using",
"GitHub Pages"

I would also like to extract the content of the below strings.

"git status",
"git status
git add
git commit",
"https://pages.github.com/"

1.) Would the right way to approach this be to add additional VisitHandlers to the NodeVisitor in an updated version of TextCollectingVisitor?

2.) Any suggestions on how I can figure out what classes the strings that did not get extracted are parsed into? Not sure if that question is understandable. For example; does git status get parsed as a com.vladsch.flexmark.ast.Code element in the AST? Could I override some method and add a line of logging for it to tell me what AST node is being processed?

3.) I'm looking to extract all "human readable text" in an md document; just stripping out content which relates to formatting. I'm not sure if my question is clear enough to be able to answer; but here goes - how many of the classes under https://github.com/vsch/flexmark-java/tree/master/flexmark/src/main/java/com/vladsch/flexmark/ast should I provide a handlers for in order to achieve that?

Sorry again if some of my questions are unclear!

Thanks,
Brent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions