Skip to content

Add BeautifulSoup integration documentation and tests#1

Draft
Copilot wants to merge 6 commits into
masterfrom
copilot/implement-html-parser-with-bs4
Draft

Add BeautifulSoup integration documentation and tests#1
Copilot wants to merge 6 commits into
masterfrom
copilot/implement-html-parser-with-bs4

Conversation

Copilot AI commented Dec 29, 2025

Copy link
Copy Markdown

html5lib works as a parser backend for BeautifulSoup via BeautifulSoup(markup, 'html5lib'), but this integration lacked documentation and test coverage.

Changes

Tests (html5lib/tests/test_beautifulsoup.py)

  • Parser initialization and basic operation
  • html5lib vs html.parser behavior differences (document structure, error handling)
  • Malformed HTML handling, encoding detection, attribute parsing
  • Empty document and fragment parsing

Documentation (doc/beautifulsoup.rst)

  • Usage patterns and API
  • Key differences: html5lib adds full document structure, follows HTML5 spec error handling
  • When to use html5lib (compliance over performance, broken HTML, encoding challenges)
  • Limitations: no SoupStrainer support, slower than lxml

Example (examples/beautifulsoup_example.py)

  • Side-by-side comparison with html.parser
  • Malformed HTML demonstration

README update

  • Added BeautifulSoup to Optional Dependencies
from bs4 import BeautifulSoup

# html5lib provides HTML5-compliant parsing
soup = BeautifulSoup('<p>Unclosed<p>Another', 'html5lib')
# Properly closes tags, adds <html><head></head><body>...</body></html>

# vs html.parser which keeps fragments minimal
soup = BeautifulSoup('<p>Unclosed<p>Another', 'html.parser')
# No wrapping, simpler structure

All code is Python 2.7+ compatible.

Original prompt

BeautifulSoup(markup, "html.parser")


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 5 commits December 29, 2025 17:54
Co-authored-by: Sawsqr68 <173052944+Sawsqr68@users.noreply.github.com>
Co-authored-by: Sawsqr68 <173052944+Sawsqr68@users.noreply.github.com>
Co-authored-by: Sawsqr68 <173052944+Sawsqr68@users.noreply.github.com>
Co-authored-by: Sawsqr68 <173052944+Sawsqr68@users.noreply.github.com>
Co-authored-by: Sawsqr68 <173052944+Sawsqr68@users.noreply.github.com>
Copilot AI changed the title [WIP] Add BeautifulSoup HTML parser implementation Add BeautifulSoup integration documentation and tests Dec 29, 2025
Copilot AI requested a review from Sawsqr68 December 29, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants