monk

monk is a command-line tool for parsing HTML and applying CSS selectors. It reads HTML from standard input, a local file, or a URL, applies one or more CSS selectors, and prints the matching nodes. Output can be formatted as an indented tree (default), plain text, JSON, or a single attribute value.

Installation

go install github.com/janbodnar/monk@latest

Or build from source:

git clone https://github.com/janbodnar/monk
cd monk
go build -o monk .

Usage

monk [flags] [selectors]

HTML is read from standard input unless -f or -u is given.

xh https://example.com | monk h1
monk -f page.html "ul li"
monk -u https://example.com "a"

Flags

Flag	Description
`-c`	Print result with color
`-f <file>`	Read HTML from a file
`-u <url>`	Fetch HTML from a URL
`-i <n\|char>`	Number of spaces (or character) to use for indentation
`-n`	Print the number of matched elements
`-l <n>`	Restrict output to `n` levels deep
`-p`	Don't escape HTML entities in output
`-r`	Raw output (no newlines between tags)
`--pre`	Preserve whitespace inside `<pre>` elements
`--charset <cs>`	Specify the character set of the input
`--json`	Output matched nodes as JSON
`--text`	Output the text content of matched nodes
`--attr <name>`	Output the value of the named attribute
`-v`, `--version`	Display version

Selectors

monk supports standard CSS selectors via goquery, plus several extensions:

Selector	Description
`tag`	Match elements by tag name
`#id`	Match element by id attribute
`.class`	Match elements by class name
`[attr]`	Match elements that have the attribute
`[attr=val]`	Match elements where attribute equals value
`a b`	Descendant: match `b` anywhere inside `a`
`a > b`	Child: match `b` that is a direct child of `a`
`a + b`	Adjacent sibling: match `b` immediately after `a`
`sel1, sel2`	Union: match elements from either selector
`:nth-child(n)`	Match element that is the nth child
`:first-child`	Match the first child element
`:last-child`	Match the last child element
`:contains("text")`	Match elements with a direct text child containing `text`
`:matches("regex")`	Match elements with a direct text child matching the regex
`:parent-of(sel)`	Match elements that have a direct child matching `sel`
`head(n)`	Keep only the first `n` matched elements
`tail(n)`	Keep only the last `n` matched elements

Usage Examples

Command	Description
`monk -f page.html h1`	Select all `<h1>` elements from a file
`monk -u https://example.com title`	Fetch a URL and select the `<title>` element
`curl -s https://example.com \| monk p`	Pipe HTML and select all `<p>` elements
`monk -f page.html "ul li"`	Select all `<li>` elements inside `<ul>`
`monk -f page.html "div > p"`	Select `<p>` elements that are direct children of `<div>`
`monk -f page.html "h2 + p"`	Select `<p>` immediately following an `<h2>`
`monk -f page.html --text p`	Print the text content of all `<p>` elements
`monk -f page.html --attr href a`	Print the `href` attribute of all `<a>` elements
`monk -f page.html --json ul`	Output matched `<ul>` nodes as JSON
`monk -f page.html -n li`	Print the count of matched `<li>` elements
`monk -f page.html -l 2 body`	Print the `<body>` tree up to 2 levels deep
`monk -f page.html "li:contains(\"blue\")"`	Select `<li>` elements containing the text `blue`
`monk -f page.html "li:matches(\"^g\")"`	Select `<li>` elements whose text starts with `g`
`monk -f page.html "ul li head(3)"`	Select the first 3 `<li>` elements
`monk -f page.html "ul li tail(2)"`	Select the last 2 `<li>` elements
`monk -f page.html "h1, h2"`	Select all `<h1>` and `<h2>` elements
`monk -f page.html "#one p"`	Select `<p>` inside the element with `id="one"`
`monk -f page.html ".level-1 p"`	Select `<p>` inside elements with `class="level-1"`
`monk -f page.html -c ul li`	Print matched nodes with syntax highlighting
`monk -f page.html -r p`	Raw output without extra newlines

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
TODO		TODO
config.go		config.go
display.go		display.go
go.mod		go.mod
go.sum		go.sum
index.html		index.html
main.go		main.go
monk_test.go		monk_test.go
parse.go		parse.go
selector.go		selector.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

monk

Installation

Usage

Flags

Selectors

Usage Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

monk

Installation

Usage

Flags

Selectors

Usage Examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages