- Overview
- Workflow Architecture
- Workflows
- Usage Guide
- Secrets Configuration
- Troubleshooting
- Advanced Topics
This repository uses a split release pipeline architecture to optimize release times and provide flexibility. The release process is divided into two independent workflows:
- Release Pipeline (
release.yml) - Fast PyPI and GitHub release publication - Docker Release (
docker-release.yml) - Multi-architecture Docker image builds with caching
Problem: Docker multi-architecture builds take 10-15 minutes, blocking quick package releases.
Solution: Separate Docker builds into an independent workflow that runs in parallel.
Benefits:
- ✅ PyPI package available in ~2-3 minutes
- ✅ GitHub release published immediately
- ✅ Docker images build in parallel (non-blocking)
- ✅ Can rebuild Docker images independently
- ✅ Faster subsequent builds with layer caching
Tag Push (v1.2.3)
│
├─► Release Pipeline (release.yml)
│ ├─ Version validation
│ ├─ Build Python package
│ ├─ Upload to PyPI ✓
│ └─ Create GitHub Release ✓
│ │
│ └─► Triggers Docker Release (docker-release.yml)
│ ├─ Build multi-arch images
│ ├─ Use GitHub Actions cache
│ └─ Push to Docker Hub ✓
│
└─► Total Time:
- PyPI/GitHub: 2-3 minutes
- Docker: 1-15 minutes (parallel)
graph TD
A[Push tag v1.2.3] --> B[release.yml triggered]
B --> C{Version Check}
C -->|Match| D[Build Package]
C -->|Mismatch| E[❌ Fail - Update __version__.py]
D --> F[Upload to PyPI]
F --> G[Create GitHub Release]
G --> H[docker-release.yml triggered]
H --> I[Build Docker Images]
I --> J[Push to Docker Hub]
K[Push tag docker-rebuild-v1.2.3] --> H
File: .github/workflows/release.yml
on:
push:
tags:
- 'v*' # Matches: v1.2.3, v2.0.0, etc.
- '!test-v*' # Excludes: test-v1.2.3# Extracts version from tag
v1.2.3 → 1.2.3Validates that the git tag matches crawl4ai/__version__.py:
# crawl4ai/__version__.py must contain:
__version__ = "1.2.3" # Must match tag v1.2.3Failure Example:
Tag version: 1.2.3
Package version: 1.2.2
❌ Version mismatch! Please update crawl4ai/__version__.py
- Installs build dependencies (
build,twine) - Builds source distribution and wheel:
python -m build - Validates package:
twine check dist/*
twine upload dist/*
# Uploads to: https://pypi.org/project/crawl4ai/Environment Variables:
TWINE_USERNAME:__token__(PyPI API token authentication)TWINE_PASSWORD:${{ secrets.PYPI_TOKEN }}
Creates a release with:
- Tag:
v1.2.3 - Title:
Release v1.2.3 - Body: Installation instructions + changelog link
- Status: Published (not draft)
Note: The release body includes a link to the Docker workflow status, informing users that Docker images are building.
Generates a GitHub Actions summary with:
- PyPI package URL and version
- GitHub release URL
- Link to Docker workflow status
| Artifact | Location | Time |
|---|---|---|
| PyPI Package | https://pypi.org/project/crawl4ai/ | ~2-3 min |
| GitHub Release | Repository releases page | ~2-3 min |
File: .github/workflows/docker-release.yml
This workflow has two independent triggers:
on:
release:
types: [published]Triggers when release.yml publishes a GitHub release.
on:
push:
tags:
- 'docker-rebuild-v*'Allows rebuilding Docker images without creating a new release.
Use case: Fix Dockerfile, rebuild images for existing version.
Intelligently detects version from either trigger:
# From release event:
github.event.release.tag_name → v1.2.3 → 1.2.3
# From docker-rebuild tag:
docker-rebuild-v1.2.3 → 1.2.3VERSION=1.2.3
MAJOR=1 # First component
MINOR=1.2 # First two componentsUsed for Docker tag variations.
Configures multi-architecture build support:
- Platform: linux/amd64, linux/arm64
- Builder: Buildx with QEMU emulation
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}Docker Tags Created:
unclecode/crawl4ai:1.2.3 # Exact version
unclecode/crawl4ai:1.2 # Minor version
unclecode/crawl4ai:1 # Major version
unclecode/crawl4ai:latest # Latest stable
Platforms:
linux/amd64(x86_64 - Intel/AMD processors)linux/arm64(ARM processors - Apple Silicon, AWS Graviton)
Caching Configuration:
cache-from: type=gha # Read from GitHub Actions cache
cache-to: type=gha,mode=max # Write all layers to cacheGenerates a summary with:
- Published image tags
- Supported platforms
- Pull command example
How It Works:
Docker builds images in layers:
FROM python:3.12 # Layer 1 (base image)
RUN apt-get update # Layer 2 (system packages)
COPY requirements.txt . # Layer 3 (dependency file)
RUN pip install -r ... # Layer 4 (Python packages)
COPY . . # Layer 5 (application code)Cache Behavior:
| Change Type | Cached Layers | Rebuild Time |
|---|---|---|
| No changes | 1-5 | ~30-60 sec |
| Code only | 1-4 | ~1-2 min |
| Dependencies | 1-3 | ~3-5 min |
| Dockerfile | None | ~10-15 min |
Cache Storage:
- Location: GitHub Actions cache
- Limit: 10GB per repository
- Retention: 7 days for unused cache
- Cleanup: Automatic (LRU eviction)
Cache Efficiency Example:
# First build (v1.0.0)
Build time: 12m 34s
Cache: 0% (cold start)
# Second build (v1.0.1 - code change only)
Build time: 1m 47s
Cache: 85% hit rate
Cached: Base image, system packages, Python dependencies
# Third build (v1.0.2 - dependency update)
Build time: 4m 12s
Cache: 60% hit rate
Cached: Base image, system packages| Artifact | Location | Tags | Time |
|---|---|---|---|
| Docker Images | Docker Hub | 4 tags | 1-15 min |
Docker Hub URL: https://hub.docker.com/r/unclecode/crawl4ai
Edit crawl4ai/__version__.py:
__version__ = "1.2.3"git add crawl4ai/__version__.py
git commit -m "chore: bump version to 1.2.3"
git tag v1.2.3
git push origin main
git push origin v1.2.3Release Pipeline (~2-3 minutes):
✓ Version check passed
✓ Package built
✓ Uploaded to PyPI
✓ GitHub release created
Docker Release (~1-15 minutes, runs in parallel):
✓ Images built for amd64, arm64
✓ Pushed 4 tags to Docker Hub
✓ Cache updated
# Check PyPI
pip install crawl4ai==1.2.3
# Check Docker
docker pull unclecode/crawl4ai:1.2.3
docker run unclecode/crawl4ai:1.2.3 --versionWhen to Use:
- Dockerfile fixed after release
- Security patch in base image
- Rebuild needed without new version
Process:
# Rebuild Docker images for existing version 1.2.3
git tag docker-rebuild-v1.2.3
git push origin docker-rebuild-v1.2.3This triggers only docker-release.yml, not release.yml.
Result:
- Docker images rebuilt with same version tag
- PyPI package unchanged
- GitHub release unchanged
PyPI does not allow re-uploading the same version. Instead:
# Publish a patch version
git tag v1.2.4
git push origin v1.2.4Then update documentation to recommend the new version.
# Option 1: Rebuild with fixed code
git tag docker-rebuild-v1.2.3
git push origin docker-rebuild-v1.2.3
# Option 2: Manually retag in Docker Hub (advanced)
# Not recommended - use git tags for traceabilityConfigure these in: Repository Settings → Secrets and variables → Actions
Purpose: Authenticate with PyPI for package uploads
How to Create:
- Go to https://pypi.org/manage/account/token/
- Create token with scope: "Entire account" or "Project: crawl4ai"
- Copy token (starts with
pypi-) - Add to GitHub secrets as
PYPI_TOKEN
Format:
pypi-AgEIcHlwaS5vcmcCJGQ4M2Y5YTM5LWRjMzUtNGY3MS04ZmMwLWVhNzA5MjkzMjk5YQACKl...
Purpose: Docker Hub username for authentication
Value: Your Docker Hub username (e.g., unclecode)
Purpose: Docker Hub access token for authentication
How to Create:
- Go to https://hub.docker.com/settings/security
- Click "New Access Token"
- Name:
github-actions-crawl4ai - Permissions: Read, Write, Delete
- Copy token
- Add to GitHub secrets as
DOCKER_TOKEN
Format:
dckr_pat_1a2b3c4d5e6f7g8h9i0j
Purpose: Create GitHub releases
Note: Automatically provided by GitHub Actions. No configuration needed.
Permissions: Configured in workflow file:
permissions:
contents: write # Required for creating releasesError:
❌ Version mismatch! Tag: 1.2.3, Package: 1.2.2
Please update crawl4ai/__version__.py to match the tag version
Cause: Git tag doesn't match __version__ in crawl4ai/__version__.py
Fix:
# Option 1: Update __version__.py and re-tag
vim crawl4ai/__version__.py # Change to 1.2.3
git add crawl4ai/__version__.py
git commit -m "fix: update version to 1.2.3"
git tag -d v1.2.3 # Delete local tag
git push --delete origin v1.2.3 # Delete remote tag
git tag v1.2.3 # Create new tag
git push origin main
git push origin v1.2.3
# Option 2: Use correct tag
git tag v1.2.2 # Match existing __version__
git push origin v1.2.2Error:
HTTPError: 403 Forbidden
Causes & Fixes:
-
Invalid Token:
- Verify
PYPI_TOKENin GitHub secrets - Ensure token hasn't expired
- Regenerate token on PyPI
- Verify
-
Version Already Exists:
HTTPError: 400 File already exists- PyPI doesn't allow re-uploading same version
- Increment version number and retry
-
Package Name Conflict:
- Ensure you own the
crawl4aipackage on PyPI - Check token scope includes this project
- Ensure you own the
Error:
failed to solve: process "/bin/sh -c ..." did not complete successfully
Debug Steps:
-
Check Build Logs:
- Go to Actions tab → Docker Release workflow
- Expand "Build and push Docker images" step
- Look for specific error
-
Test Locally:
docker build -t crawl4ai:test . -
Common Issues:
Dependency installation fails:
# Check requirements.txt is valid # Ensure all packages are available
Architecture-specific issues:
# Test both platforms locally (if on Mac with Apple Silicon) docker buildx build --platform linux/amd64,linux/arm64 -t test .
-
Cache Issues:
# Clear cache by pushing a tag with different content # Or wait 7 days for automatic cache eviction
Error:
Error: Cannot perform an interactive login from a non TTY device
Cause: Docker Hub credentials invalid
Fix:
- Verify
DOCKER_USERNAMEis correct - Regenerate
DOCKER_TOKENon Docker Hub - Update secret in GitHub
Issue: Pushed tag v1.2.3, but docker-release.yml didn't run
Causes:
-
Release Not Published:
- Check if
release.ymlcompleted successfully - Verify GitHub release is published (not draft)
- Check if
-
Workflow File Syntax Error:
# Validate YAML syntax yamllint .github/workflows/docker-release.yml -
Workflow Not on Default Branch:
- Workflow files must be on
mainbranch - Check if
.github/workflows/docker-release.ymlexists onmain
- Workflow files must be on
Debug:
# Check workflow files
git ls-tree main .github/workflows/
# Check GitHub Actions tab for workflow runsIssue: Every build takes 10-15 minutes despite using cache
Causes:
-
Cache Scope:
- Cache is per-branch and per-workflow
- First build on new branch is always cold
-
Dockerfile Changes:
- Any change invalidates subsequent layers
- Optimize Dockerfile layer order (stable → volatile)
-
Base Image Updates:
FROM python:3.12pulls latest monthly- Pin to specific digest for stable cache
Optimization:
# Good: Stable layers first
FROM python:3.12
RUN apt-get update && apt-get install -y ...
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
# Bad: Volatile layers first (breaks cache often)
FROM python:3.12
COPY . .
RUN pip install -r requirements.txt| Platform | Architecture | Use Cases |
|---|---|---|
| linux/amd64 | x86_64 | AWS EC2, GCP, Azure, Traditional servers |
| linux/arm64 | aarch64 | Apple Silicon, AWS Graviton, Raspberry Pi |
# Buildx uses QEMU to emulate different architectures
docker buildx create --use # Create builder
docker buildx build --platform linux/amd64,linux/arm64 ...Under the Hood:
- For each platform:
- Spawn QEMU emulator
- Execute Dockerfile instructions
- Generate platform-specific image
- Create manifest list (multi-arch index)
- Push all variants + manifest to registry
Pull Behavior:
# Docker automatically selects correct platform
docker pull unclecode/crawl4ai:latest
# On M1 Mac: Pulls arm64 variant
# On Intel Linux: Pulls amd64 variant
# Force specific platform
docker pull --platform linux/amd64 unclecode/crawl4ai:latestv1.2.3
│ │ │
│ │ └─ Patch: Bug fixes, no API changes
│ └─── Minor: New features, backward compatible
└───── Major: Breaking changes
| Git Tag | Docker Tags Created | Use Case |
|---|---|---|
| v1.2.3 | 1.2.3, 1.2, 1, latest | Full version chain |
| v2.0.0 | 2.0.0, 2.0, 2, latest | Major version bump |
Example Evolution:
# Release v1.0.0
Tags: 1.0.0, 1.0, 1, latest
# Release v1.1.0
Tags: 1.1.0, 1.1, 1, latest
# Note: 1.0 still exists, but 1 and latest now point to 1.1.0
# Release v1.2.0
Tags: 1.2.0, 1.2, 1, latest
# Note: 1.0 and 1.1 still exist, but 1 and latest now point to 1.2.0
# Release v2.0.0
Tags: 2.0.0, 2.0, 2, latest
# Note: All v1.x tags still exist, but latest now points to 2.0.0User Pinning Strategies:
# Maximum stability (never updates)
docker pull unclecode/crawl4ai:1.2.3
# Get patch updates only
docker pull unclecode/crawl4ai:1.2
# Get minor updates (features, bug fixes)
docker pull unclecode/crawl4ai:1
# Always get latest (potentially breaking)
docker pull unclecode/crawl4ai:latest# BEFORE (cache breaks often)
FROM python:3.12
COPY . /app # Changes every commit
RUN pip install -r requirements.txt
RUN apt-get install -y ffmpeg
# AFTER (cache-optimized)
FROM python:3.12
RUN apt-get update && apt-get install -y ffmpeg # Rarely changes
COPY requirements.txt /app/requirements.txt # Changes occasionally
RUN pip install -r /app/requirements.txt
COPY . /app # Changes every commit# Build stage (cached separately)
FROM python:3.12 as builder
COPY requirements.txt .
RUN pip install --user -r requirements.txt
# Runtime stage
FROM python:3.12-slim
COPY --from=builder /root/.local /root/.local
COPY . /app
ENV PATH=/root/.local/bin:$PATHBenefits:
- Builder stage cached independently
- Runtime image smaller
- Faster rebuilds
# Cache pip packages
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Cache apt packages
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y ...Note: Requires BuildKit (enabled by default in GitHub Actions)
# VOLATILE (updates monthly, breaks cache)
FROM python:3.12
# STABLE (fixed digest, cache preserved)
FROM python:3.12@sha256:8c5e5c77e7b9e44a6f0e3b9e8f5e5c77e7b9e44a6f0e3b9e8f5e5c77e7b9e44aFind digest:
docker pull python:3.12
docker inspect python:3.12 | grep -A 2 RepoDigestsNever:
# DON'T: Hardcode secrets
run: echo "my-secret-token" | docker login
# DON'T: Log secrets
run: echo "Token is ${{ secrets.PYPI_TOKEN }}"Always:
# DO: Use environment variables
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: twine upload dist/*
# DO: Use action inputs (masked automatically)
uses: docker/login-action@v3
with:
password: ${{ secrets.DOCKER_TOKEN }}# Specific permissions only
permissions:
contents: write # Only what's needed
# NOT: permissions: write-all# DON'T: Use floating versions
uses: actions/checkout@v4
# DO: Pin to SHA (immutable)
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1PyPI Token:
- Scope: Project-specific (
crawl4aionly) - Not: Account-wide access
Docker Token:
- Permissions: Read, Write (not Delete unless needed)
- Expiration: Set to 1 year, rotate regularly
Available in Actions tab:
- Workflow run duration
- Success/failure rates
- Cache hit rates
- Artifact sizes
Add to workflow summary:
- name: Build Metrics
run: |
echo "## Build Metrics" >> $GITHUB_STEP_SUMMARY
echo "- Duration: $(date -u -d @$SECONDS +%T)" >> $GITHUB_STEP_SUMMARY
echo "- Cache hit rate: 85%" >> $GITHUB_STEP_SUMMARYWebhooks: Configure in Settings → Webhooks
{
"events": ["workflow_run"],
"url": "https://your-monitoring-service.com/webhook"
}Status Badges:
[](https://github.com/user/repo/actions/workflows/release.yml)
[](https://github.com/user/repo/actions/workflows/docker-release.yml)Current Backup:
.github/workflows/release.yml.backup
Recommended:
# Automatic backup before modifications
cp .github/workflows/release.yml .github/workflows/release.yml.backup-$(date +%Y%m%d)
git add .github/workflows/*.backup*
git commit -m "backup: workflow before modification"Scenario: v1.2.3 release failed mid-way
Steps:
-
Identify what succeeded:
- Check PyPI:
pip search crawl4ai - Check Docker Hub: https://hub.docker.com/r/unclecode/crawl4ai/tags
- Check GitHub Releases
- Check PyPI:
-
Clean up partial release:
# Delete tag git tag -d v1.2.3 git push --delete origin v1.2.3 # Delete GitHub release (if created) gh release delete v1.2.3
-
Fix issue and retry:
# Fix the issue # Re-tag and push git tag v1.2.3 git push origin v1.2.3
Note: Cannot delete PyPI uploads. If PyPI succeeded, increment to v1.2.4.
Add pre-commit hook:
# .git/hooks/pre-commit
#!/bin/bash
VERSION_FILE="crawl4ai/__version__.py"
VERSION=$(python -c "exec(open('$VERSION_FILE').read()); print(__version__)")
echo "Current version: $VERSION"Use conventional commits:
git commit -m "feat: add new scraping mode"
git commit -m "fix: handle timeout errors"
git commit -m "docs: update API reference"Generate changelog:
# Use git-cliff or similar
git cliff --tag v1.2.3 > CHANGELOG.mdAdd test workflow:
# .github/workflows/test.yml
on:
push:
tags:
- 'test-v*'
jobs:
test-release:
runs-on: ubuntu-latest
steps:
- name: Build package
run: python -m build
- name: Upload to TestPyPI
run: twine upload --repository testpypi dist/*Create issue template:
## Release Checklist
- [ ] Update version in `crawl4ai/__version__.py`
- [ ] Update CHANGELOG.md
- [ ] Run tests locally: `pytest`
- [ ] Build package locally: `python -m build`
- [ ] Create and push tag: `git tag v1.2.3 && git push origin v1.2.3`
- [ ] Monitor Release Pipeline workflow
- [ ] Monitor Docker Release workflow
- [ ] Verify PyPI: `pip install crawl4ai==1.2.3`
- [ ] Verify Docker: `docker pull unclecode/crawl4ai:1.2.3`
- [ ] Announce releaserelease.yml- Main release workflowdocker-release.yml- Docker build workflowrelease.yml.backup- Original combined workflow
| Date | Version | Changes |
|---|---|---|
| 2025-01-XX | 2.0 | Split workflows, added Docker caching |
| 2024-XX-XX | 1.0 | Initial combined workflow |
For issues or questions:
- Check Troubleshooting section
- Review GitHub Actions logs
- Create issue in repository
Last Updated: 2025-01-21 Maintainer: Crawl4AI Team