What is the proposed feature?
Problem
html-to-markdown renders GFM tables with column padding so cells are visually aligned in the source Markdown:
| name | a | b |
| --------- | --- | ---- |
| short | 1 | 2 |
This is great for human readability but wasteful when the Markdown is consumed by an LLM (RAG pipelines, embedding-based retrieval, prompt construction). The padding is pure whitespace that:
- inflates token count significantly — on my example (pricing plan-comparison pages) we measured padding at ~57% of the rendered article body;
- pollutes embeddings (whitespace runs become part of the chunk);
- raises inference cost with no semantic benefit, since the rendered HTML is identical for both forms.
Proposed feature
A new option, something like compact_tables=True (or a more general table_style="compact" | "padded"), that emits the same table without column padding:
| name | a | b |
| --- | --- | --- |
| short | 1 | 2 |
Requirements
- valid GFM (renders identically in any GFM-compatible viewer);
- alignment markers preserved: :-: (center), --: (right), --- (default/left);
- cell content untouched (only inter-cell whitespace collapsed);
- separator line uses the minimum 3 dashes per column.
Why would this be a good addition?
Current workaround
We post-process the output by re-parsing it through mistletoe with a custom MarkdownRenderer that overrides calculate_table_column_widths, table_separator_line_to_text, and table_row_to_line to
skip padding. It works but adds a second Markdown round-trip and an extra dependency just to strip whitespace.
Use case
Document-ingestion pipelines for RAG / LLM applications where token cost and chunking quality matter more than source-side readability.
What is the proposed feature?
Problem
html-to-markdown renders GFM tables with column padding so cells are visually aligned in the source Markdown:
This is great for human readability but wasteful when the Markdown is consumed by an LLM (RAG pipelines, embedding-based retrieval, prompt construction). The padding is pure whitespace that:
Proposed feature
A new option, something like compact_tables=True (or a more general table_style="compact" | "padded"), that emits the same table without column padding:
Requirements
Why would this be a good addition?
Current workaround
We post-process the output by re-parsing it through
mistletoewith a custom MarkdownRenderer that overrides calculate_table_column_widths, table_separator_line_to_text, and table_row_to_line toskip padding. It works but adds a second Markdown round-trip and an extra dependency just to strip whitespace.
Use case
Document-ingestion pipelines for RAG / LLM applications where token cost and chunking quality matter more than source-side readability.