view .github/workflows/build-xapian.yml @ 8491:520075b29474

feat: support justhtml parsing library to convert email to plain text justhtml is an pure python, fast, HTML5 compliant parser. It is now an option for converting html only emails to plain text. Its output format differs slightly from dehtml or beautifulsoup. Mostly by removing extra blank lines. dehtml.py: Using the stream parser of justhtml. Unable to get the full document parser to successfully strip script and style blocks. If I can fix this and use the standard parser, I can in theory generate markdown from the DOM tree generated by justhtml. Updated test case to include inline elements that should not cause a line break when they are encountered. Running dehtml as: `python roundup/dehtml.py foo.html` will load foo.html and parse it using all available parsers. configuration.py: justhtml is available as an option. docs: updated CHANGES.txt, doc/tracker_config.txt added beautifulsoup and justhtml to the optional software section of doc/installtion.txt. test_mailgw.py, .github/workflows/ci-test Updated tests and install justhtml as part of CI.
author John Rouillard <rouilj@ieee.org>
date Sun, 14 Dec 2025 22:40:46 -0500
parents 4e0944649af7
children 951db0950174
line wrap: on
line source


name: build-xapian

on: 
    push:
         # skip if github.ref is 'refs/heads/maint-1.6'
         #   aka github.ref_name of 'maint-1.6'
         # see https://github.com/orgs/community/discussions/26253
         # for mechanism to control matrix based on branch
         branches: [ "*", '!maint-1.6' ]
    workflow_dispatch:
      inputs:
        debug_enabled:
          type: boolean
          description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
          required: false
          default: false
# GITHUB_TOKEN only has read repo context.
permissions:
  contents: read

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  test:
    name: build xapian
    runs-on: ubuntu-24.04

    env:
     # get colorized pytest output even without a controlling tty
      PYTEST_ADDOPTS: "--color=yes"
      #  OS: ${{ matrix.os }}
      PYTHON_VERSION: 3.13

    steps:
      # Checkout the latest code from the repo
      - name: Checkout source
        # example directives:
          # disable step
        # if: {{ false }}
          # continue running if step fails
        # continue-on-error: true
        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1

        # Setup version of Python to use
      - name: Set Up Python 3.13
        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
        with:
          python-version: 3.13
          allow-prereleases: true
          cache: 'pip'

      - name: Install build tools - setuptools
        run: pip install setuptools

      # Display the Python version being used
      - name: Display Python and key module versions
        run: |
          python -c "import sys; print('python version: ', sys.version)"
          python -c "import sqlite3; print('sqlite version: ', sqlite3.sqlite_version)"
          python -c "import setuptools; print('setuptools version: ', setuptools.__version__);"

      - name: Update pip
        run: python -m pip install --upgrade pip
            
      # https://github.com/mxschmitt/action-tmate
      # allow remote ssh into the CI container. I need this to debug
      # some xfail cases 
      - name: Setup tmate session
        uses: mxschmitt/action-tmate@v3
        if: ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled }}
        timeout-minutes: 60
        with:
          limit-access-to-actor: true

      - name: Install xapian
        run: |
          set -xv
          sudo apt-get install libxapian-dev
          # Sphinx required to build the xapian python bindings. Use 1.8.5 on
          # older python and newest on newer.
          pip install sphinx
          XAPIAN_VER="1.4.22"; echo $XAPIAN_VER;
          cd /tmp
          curl -s -O https://oligarchy.co.uk/xapian/$XAPIAN_VER/xapian-bindings-$XAPIAN_VER.tar.xz
          tar -Jxvf xapian-bindings-$XAPIAN_VER.tar.xz
          cd xapian-bindings-$XAPIAN_VER/
          # edit the configure script.
          # distutils.sysconfig.get_config_vars('SO')  doesn't work for
          # 3.11 or newer.
          # Change distutils.sysconfig... to just sysconfig and SO
          # to EXT_SUFFIX to get valid value.
          # DISABLED use their script
          if [[ $PYTHON_VERSION == "X."* ]]; then
            cp configure configure.FCS;
            sed -i \
              -e '/PYTHON3_SO=/s/distutils\.//g' \
              -e '/PYTHON3_SO=/s/"SO"/"EXT_SUFFIX"/g' \
              -e '/PYTHON3_CACHE_TAG=/s/imp;print(imp.get_tag())/sys;print(sys.implementation.cache_tag)/' \
              -e '/PYTHON3_CACHE_OPT1_EXT=/s/imp\.get_tag()/sys.implementation.cache_tag/g' \
              -e '/PYTHON3_CACHE_OPT1_EXT=/s/imp\b/importlib/g' \
            configure;
            diff -u configure.FCS configure || true;
          fi
          ./configure --prefix=$VIRTUAL_ENV --with-python3 --disable-documentation
          make && sudo make install
          python -c 'import xapian'

Roundup Issue Tracker: http://roundup-tracker.org/