Sawsqr68 · Copilot · Dec 29, 2025 · Dec 29, 2025 · Dec 29, 2025 · Dec 29, 2025
diff --git a/README.rst b/README.rst
@@ -110,10 +110,14 @@ functionality:
   walking) under CPython (but *not* PyPy where it is known to cause
   segfaults);
 
-- ``genshi`` has a treewalker (but not builder); and
+- ``genshi`` has a treewalker (but not builder);
 
 - ``chardet`` can be used as a fallback when character encoding cannot
-  be determined.
+  be determined; and
+
+- ``beautifulsoup4`` can use html5lib as a parser backend for
+  HTML5-compliant parsing. Simply pass ``'html5lib'`` as the parser
+  name when creating a BeautifulSoup object.
 
 
 Bugs

diff --git a/doc/beautifulsoup.rst b/doc/beautifulsoup.rst
@@ -0,0 +1,173 @@
+BeautifulSoup Integration
+=========================
+
+html5lib can be used as a parser backend for `BeautifulSoup 4 <https://www.crummy.com/software/BeautifulSoup/>`_, 
+providing HTML5-compliant parsing for your BeautifulSoup projects.
+
+Using html5lib with BeautifulSoup
+----------------------------------
+
+To use html5lib as your BeautifulSoup parser, simply pass ``'html5lib'`` as the parser name:
+
+.. code-block:: python
+
+    from bs4 import BeautifulSoup
+
+    markup = '<p>Hello <span>World</span>!</p>'
+    soup = BeautifulSoup(markup, 'html5lib')
+
+    print(soup.prettify())
+
+This will output:
+
+.. code-block:: html
+
+    <html>
+     <head>
+     </head>
+     <body>
+      <p>
+       Hello
+       <span>
+        World
+       </span>
+       !
+      </p>
+     </body>
+    </html>
+
+Key Differences from Other Parsers
+-----------------------------------
+
+When using html5lib with BeautifulSoup, there are some important differences compared to other parsers:
+
+Document Structure
+~~~~~~~~~~~~~~~~~~
+
+html5lib always creates a complete HTML5 document structure, even when parsing fragments:
+
+.. code-block:: python
+
+    from bs4 import BeautifulSoup
+
+    markup = '<p>Fragment</p>'
+
+    # With html5lib - adds full document structure
+    soup_html5lib = BeautifulSoup(markup, 'html5lib')
+    print(soup_html5lib.html is not None)  # True
+    print(soup_html5lib.body is not None)  # True
+
+    # With html.parser - keeps it as a fragment
+    soup_htmlparser = BeautifulSoup(markup, 'html.parser')
+    print(soup_htmlparser.html is None)  # True
+    print(soup_htmlparser.body is None)  # True
+
+Error Handling
+~~~~~~~~~~~~~~
+
+html5lib follows the HTML5 specification's error handling rules, which means it will:
+
+- Automatically close unclosed tags
+- Fix misnested tags
+- Handle invalid markup gracefully
+
+.. code-block:: python
+
+    from bs4 import BeautifulSoup
+
+    # Malformed HTML with missing closing tags
+    markup = '<p>Paragraph 1<p>Paragraph 2'
+    soup = BeautifulSoup(markup, 'html5lib')
+
+    # html5lib properly closes and structures the paragraphs
+    paragraphs = soup.find_all('p')
+    print(len(paragraphs))  # 2
+
+Encoding Detection
+~~~~~~~~~~~~~~~~~~
+
+html5lib has sophisticated encoding detection capabilities and handles various character encodings correctly:
+
+.. code-block:: python
+
+    from bs4 import BeautifulSoup
+
+    markup = '<p>Héllo Wörld</p>'
+    soup = BeautifulSoup(markup, 'html5lib')
+
+    print('Héllo' in soup.get_text())  # True
+    print('Wörld' in soup.get_text())  # True
+
+When to Use html5lib
+--------------------
+
+Consider using html5lib with BeautifulSoup when you need:
+
+- **HTML5 compliance**: You want parsing that matches how modern web browsers handle HTML
+- **Robust error handling**: You're dealing with malformed or broken HTML
+- **Consistent behavior**: You need parsing that follows the HTML5 specification exactly
+- **Encoding detection**: You're working with documents in various character encodings
+
+Performance Considerations
+--------------------------
+
+html5lib prioritizes correctness and compliance over speed. If you're parsing large amounts of HTML and performance is critical, you might want to consider other parsers like lxml. However, if correctness and compliance with HTML5 standards are more important than raw speed, html5lib is an excellent choice.
+
+Installation
+------------
+
+To use html5lib with BeautifulSoup, you need to install both packages:
+
+.. code-block:: bash
+
+    pip install beautifulsoup4 html5lib
+
+Limitations
+-----------
+
+When using html5lib with BeautifulSoup, note these limitations:
+
+- ``SoupStrainer`` is not supported - the entire document will be parsed
+- Some BeautifulSoup features that depend on custom element types may not work
+- html5lib is generally slower than other parsers
+
+Example: Complete Workflow
+---------------------------
+
+Here's a complete example showing how to use html5lib with BeautifulSoup:
+
+.. code-block:: python
+
+    from bs4 import BeautifulSoup
+
+    # Read HTML from a file or string
+    html_content = '''
+    <html>
+        <head><title>Example Page</title></head>
+        <body>
+            <h1>Welcome</h1>
+            <p>This is a <a href="/page1">link</a></p>
+            <p>Another <a href="/page2">link</a></p>
+        </body>
+    </html>
+    '''
+
+    # Parse with html5lib for HTML5-compliant parsing
+    soup = BeautifulSoup(html_content, 'html5lib')
+
+    # Navigate the parse tree
+    title = soup.find('title')
+    print('Page title: {}'.format(title.get_text()))
+
+    # Find all links
+    links = soup.find_all('a')
+    for link in links:
+        href = link.get('href')
+        text = link.get_text()
+        print('{}: {}'.format(text, href))
+
+See Also
+--------
+
+- `BeautifulSoup Documentation <https://www.crummy.com/software/BeautifulSoup/bs4/doc/>`_
+- `HTML5 Specification <https://html.spec.whatwg.org/>`_
diff --git a/doc/index.rst b/doc/index.rst
@@ -8,6 +8,7 @@ Overview
    :maxdepth: 2
 
    movingparts
+   beautifulsoup
    modules
    changes
    License <license>

diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,22 @@
+# html5lib Examples
+
+This directory contains example scripts demonstrating various uses of html5lib.
+
+## BeautifulSoup Integration
+
+**File:** `beautifulsoup_example.py`
+
+This example demonstrates how to use html5lib as a parser backend for BeautifulSoup. It compares the behavior of html5lib with Python's built-in html.parser and shows the advantages of using html5lib for HTML5-compliant parsing.
+
+To run:
+```bash
+python beautifulsoup_example.py
+```
+
+Requirements:
+- beautifulsoup4
+- html5lib
+
+## About html5lib
+
+html5lib is a pure-Python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as implemented by all major web browsers. This makes it particularly useful when you need parsing behavior that matches what browsers do, rather than just parsing valid HTML.
diff --git a/examples/beautifulsoup_example.py b/examples/beautifulsoup_example.py
@@ -0,0 +1,83 @@
+#!/usr/bin/env python
+"""
+Example demonstrating html5lib integration with BeautifulSoup.
+
+This example shows how to use html5lib as a parser backend for BeautifulSoup,
+providing HTML5-compliant parsing with robust error handling for malformed HTML.
+"""
+
+from __future__ import print_function
+
+try:
+    from bs4 import BeautifulSoup
+except ImportError:
+    print("Error: BeautifulSoup4 is required to run this example.")
+    print("Install it with: pip install beautifulsoup4 html5lib")
+    exit(1)
+
+import html5lib
+
+
+def main():
+    print("=" * 60)
+    print("html5lib with BeautifulSoup - Example")
+    print("=" * 60)
+
+    # Test markup with potential parsing challenges
+    markup = '''
+    <html>
+        <body>
+            <p>Hello <span>World</span>!</p>
+            <p>Unclosed paragraph
+            <div>Nested content</div>
+        </body>
+    </html>
+    '''
+
+    print("\n1. Parsing with html5lib via BeautifulSoup:")
+    print("-" * 60)
+    soup_html5lib = BeautifulSoup(markup, 'html5lib')
+    print("Parser used: html5lib")
+    print("Found <p> tags:", len(soup_html5lib.find_all('p')))
+    print("Found <span> tags:", len(soup_html5lib.find_all('span')))
+    print("Found <div> tags:", len(soup_html5lib.find_all('div')))
+
+    print("\n2. Parsing with html.parser via BeautifulSoup:")
+    print("-" * 60)
+    soup_htmlparser = BeautifulSoup(markup, 'html.parser')
+    print("Parser used: html.parser")
+    print("Found <p> tags:", len(soup_htmlparser.find_all('p')))
+    print("Found <span> tags:", len(soup_htmlparser.find_all('span')))
+    print("Found <div> tags:", len(soup_htmlparser.find_all('div')))
+
+    print("\n3. Direct html5lib parsing:")
+    print("-" * 60)
+    doc = html5lib.parse(markup)
+    print("Parser: html5lib (direct)")
+    print("Document type:", type(doc))
+
+    print("\n4. Comparing results:")
+    print("-" * 60)
+
+    # Test with malformed HTML
+    malformed = '<p>First<p>Second<p>Third'
+
+    soup_html5lib = BeautifulSoup(malformed, 'html5lib')
+    soup_htmlparser = BeautifulSoup(malformed, 'html.parser')
+
+    print("Malformed HTML: {}".format(malformed))
+    print("html5lib found {} paragraphs".format(len(soup_html5lib.find_all('p'))))
+    print("html.parser found {} paragraphs".format(len(soup_htmlparser.find_all('p'))))
+
+    print("\n" + "=" * 60)
+    print("CONCLUSION:")
+    print("=" * 60)
+    print("html5lib works correctly as a BeautifulSoup parser backend.")
+    print("It provides HTML5-compliant parsing with robust error handling.")
+    print("The choice between 'html5lib' and 'html.parser' depends on your")
+    print("specific needs for compliance vs. performance.")
+    print("=" * 60)
+
+
+if __name__ == '__main__':
+    main()