Nemanja Mitic

Building a simple Bash backup script with Docker, MySQL and rsync

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Tue, 07 Apr 2026 00:00:00 GMT

Introduction

This article walks through the thought process behind designing a minimalistic backup script in Bash. While there are robust and sophisticated backup solutions (such as zerobyte or plakar), the goal here is to build something simple and minimal. It also serves as a practical exercise for improving both design thinking and Bash scripting skills.

We won’t start entirely from scratch. Instead, we’ll use the existing todiadiyatmo/bash-backup-rotation-script as a starting point and adapt and iterate it to fit our use case.

As a sample application, we’ll use the latest MyBB forum running on PHP and MySQL, based on nemanjam/mybb-docker, deployed with Docker.

Requirements

Let’s start by clearly defining the requirements the script should fulfill so we can address them properly:

It should back up both the database and multiple, arbitrary file assets.
It should dump a MySQL database running inside a Docker container.
It should assume and enforce a predefined folder structure for both the application and the backups.
It should retain multiple, configurable daily, weekly, and monthly copies (as in the original script).
It should support both local and remote backups.

These core requirements are enough to get us started.

Structure

At the beginning, we need to make some core decisions about how to structure the code that creates and manages backups, as well as how to organize the folder structure for the application code, backup scripts, and backup files.

Note: The terms “local” and “remote” are used relative to the server where the application being backed up is running. “Local” refers to the server’s filesystem, while “remote” refers to machines where backup copies are stored permanently. These are often devices (such as a laptop, Raspberry Pi, or home server) within a local network, so don’t be confused by the terminology.

Local and remote scripts

We prefer having local backups that can be quickly restored without needing to fetch and transfer data from a remote machine. At the same time, we also want true remote backups - synced copies stored on one or more external machines, since we treat our server instance as disposable.

There are two main approaches to this, depending on whether the primary backup script is stored and scheduled locally or on a remote machine:

A local backup script that creates backups and a separate script syncs them to remote machines.
Remote machines independently connect over SSH, send and execute the backup script on the server, and download the resulting backup.

The first approach is simpler and easier to understand, so we’ll go with that. It also provides a solid foundation if we decide to extend the system later and implement the second approach as well.

# Main, local backup script
backup-files-and-mysql.sh

# Remote sync script
backup-rsync-local.sh

However, both approaches rely on a clear, predefined folder structure, which should be explicitly validated when the script runs.

Folder structure

Bash scripts rely completely on relative paths, which means the script code and the surrounding folder structure are tightly coupled. A simple and understandable structure is a prerequisite for clean and reliable code.

Local folder structure

Below is the expected folder structure for both the server and local backups. In this context, “local” means local to the running application itself, on the same machine and file system. This backup repository serves as the source of truth for all other synchronized, remote backup copies.

Here, mybb/ is the application root directory, containing both the application files and the backup. Accordingly, mybb/backup/ is the backup directory, which includes the backup Bash scripts (mybb/backup/scripts/) and the backup data (mybb/backup/data/). The generated .zip archive contains a mysql_database/ folder for the MySQL dump, while adjacent asset folders retain their original names.

This folder structure is mandatory and fixed, as all file paths in the backup-files-and-mysql.sh script are defined relative to the script location (mybb/backup/scripts/).

Another useful detail is that the application files in mybb/ are versioned, including the backup-files-and-mysql.sh script. Since we don’t use a .env file for configuration, the production server contains an unversioned copy named backup-files-and-mysql-run.sh, which includes both the actual variables and executable code. This approach simplifies git pull operations and repository updates. Naturally, all backup data in mybb/backup/data/ is excluded from version control via .gitignore.

#  Local (server) backup folder structure:
#
# mybb/
# ├─ backup/
# │   ├─ scripts/                             - backup scripts
# │   │  ├─ backup-files-and-mysql.sh         - versioned
# │   │  └─ backup-files-and-mysql-run.sh     - current script
# │   └─ data/                                - backups data
# │      ├─ mybb_files_and_mysql-daily-2026-01-20.zip
# │      │  ├─ inc/
# │      │  ├─ images/custom/
# │      │  └─ mysql_database/
# │      │     └─ mybb.sql
# │      ├─ mybb_files_and_mysql-daily-2026-01-19.zip
# │      ├─ mybb_files_and_mysql-weekly-2026-01-14.zip
# │      └─ mybb_files_and_mysql-monthly-2026-01-01.zip
# ├── data                                    - Docker volumes
# │   ├── mybb-data                           - PHP forum files
# │   └── mysql-data                          - database data
# ├── docker-compose.yml                      - containers definitions
# |
# ...
# |
# ├── .gitignore
# └── README.md

Remote (synced) folder structure

This is the synchronized backup repository, which is considered remote from the server’s perspective. In most cases, it resides on one of our local machines where backups are stored.

As a synchronized mirror, its folder structure is identical, with one important distinction: it only contains the mybb/backup/ directory and does not include any application files, only the backups themselves. Additionally, this repository is not versioned.

#  Remote (synced) backup folder structure:
#
# mybb/
# └─ backup/
#    ├─ scripts/
#    │  └─ backup-rsync-local.sh              - current script
#    └─ data/
#       ├─ .gitkeep
#       ├─ mybb_files_and_mysql-daily-2026-01-20.zip
#       │  ├─ inc/
#       │  ├─ images/custom/
#       │  └─ mysql_database/
#       │     └─ mybb.sql
#       ├─ mybb_files_and_mysql-daily-2026-01-19.zip
#       ├─ mybb_files_and_mysql-weekly-2026-01-14.zip
#       └─ mybb_files_and_mysql-monthly-2026-01-01.zip

Local backup script

This is the main script that creates a backup on the server’s local filesystem. It dumps the MySQL database running in Docker into a plain UTF-8 .sql file and also backs up predefined application files and folders. A temporary staging folder is used to create a well-structured .zip archive.

Let’s walk through the main script responsible for creating backups of MySQL database and arbitrary application assets, section by section.

Entire script: https://github.com/nemanjam/bash-backup/blob/main/backup-files-and-mysql.sh

Configurable variables

These are the real variables that can be freely adjusted to fit a specific application and use case. Normally, they would be defined in a .env file, but for simplicity, they are hardcoded directly in the Bash script. Modifying these values does not require changing the script’s code.

The variables are fairly self-explanatory:

DB_* - variables used for connecting to the MySQL database instance we want to back up.
LOCAL_BACKUP_DIR - the directory where backup files are stored.
SRC_CODE_DIRS - an associative array listing the files and directories to include in the backup.
BACKUP_RETENTION_* - the number of daily, weekly, and monthly backups to retain.
MAX_RETENTION - the upper limit for any BACKUP_RETENTION_* value.

# ---------- Configuration ----------

# MySQL credentials
DB_CONTAINER_NAME="mybb-database"
DB_NAME="mybb"
DB_USER="mybbuser"
DB_PASS="password"

# Note: all commands run from script dir, NEVER call cd, for relative paths to work

# Dirs paths
# Local folder is root, all other paths are relative to it
# script located at ~/traefik-proxy/apps/mybb/backup/scripts
LOCAL_BACKUP_DIR="../data"

# File or directory
# Relative to script dir, ../../ returns to: apps/mybb/
declare -A SRC_CODE_DIRS=(
    ["inc"]="../../data/mybb-data/inc/config.php"
    ["images/custom"]="../../data/mybb-data/images/custom"
)

# Retention
MAX_RETENTION=6 # 6 months for monthly backups
BACKUP_RETENTION_DAILY=3
BACKUP_RETENTION_WEEKLY=2
BACKUP_RETENTION_MONTHLY=6

Logging variables

These are additional configurable variables used specifically to control logging behavior. Since the backup script runs periodically via cron, proper logging is essential for monitoring, validation, and debugging. The last thing we want is a incorrect or misconfigured script silently producing invalid and unusable backups for months.

The variables are as follows:

LOG_TO_FILE - a boolean that determines whether logs are written to a file or output to the terminal.
LOG_FILE - the path to the log file.
LOG_MAX_SIZE_MB and LOG_KEEP_SIZE_MB - used to prevent unlimited log file growth. LOG_MAX_SIZE_MB a float defining the maximum file size (in MB), after which the log is truncated, while LOG_KEEP_SIZE_MB defines the size to retain after truncation.
LOG_TIMEZONE - the time zone used for log timestamps.

# ---------- Logging vars ----------

# Enable only when running from cron
# Cron has no TTY, interactive shell does
LOG_TO_FILE=false
[ -z "$PS1" ] && LOG_TO_FILE=true

# Log file
LOG_FILE="./log-backup-files-and-mysql.txt"

# Log size limits (MB, float allowed)
LOG_MAX_SIZE_MB=1.0   # truncate when log exceeds this
LOG_KEEP_SIZE_MB=0.5  # keep last N MB after truncation

# Timezone for log timestamps
LOG_TIMEZONE="Europe/Belgrade"

Constants

These are constant global variables, defined in a single place and reused throughout the script. Unlike configuration variables, they are not meant to be modified, as they are tightly coupled with the script’s logic. Changing them requires corresponding updates to the code.

MYSQL_ZIP_DIR_NAME and FILES_ZIP_DIR_NAME - directory names used inside the archive.
ZIP_PREFIX - prefix for backup archive filenames.
FREQ_PLACEHOLDER - placeholder string to be replaced with the actual retention frequency in archive names.
DATE - date string included in the archive filename.
DAY_OF_* - numeric values (e.g., day of week/month) included in archive names.
BACKUP_* - boolean flags derived from BACKUP_RETENTION_* variables.
SCRIPT_DIR - absolute path of the current script, intended for resolving relative paths (currently unused).

# ---------- Constants ----------

# Zip vars
# Both inside zip
MYSQL_ZIP_DIR_NAME="mysql_database" 
FILES_ZIP_DIR_NAME="source_code"

# Must match backup-rsync-local.sh
ZIP_PREFIX="mybb_files_and_mysql"
FREQ_PLACEHOLDER='frequency'

DATE=$(date +"%Y-%m-%d")
ZIP_PATH="$LOCAL_BACKUP_DIR/$ZIP_PREFIX-$FREQ_PLACEHOLDER-$DATE.zip"

# Current day and weekday
DAY_OF_MONTH=$((10#$(date +%d))) # Force decimal, avoid bash octal bug on 08/09
DAY_OF_WEEK=$((10#$(date +%u))) # 1=Monday … 7=Sunday

# Must do it like this for booleans
BACKUP_DAILY=$([[ $BACKUP_RETENTION_DAILY -gt 0 ]] && echo true || echo false)
BACKUP_WEEKLY=$([[ $BACKUP_RETENTION_WEEKLY -gt 0 ]] && echo true || echo false)
BACKUP_MONTHLY=$([[ $BACKUP_RETENTION_MONTHLY -gt 0 ]] && echo true || echo false)

# Script dir absolute path, unused
# mybb/backup/scripts
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

Setup logging

At the top of the script, right below the variable and constant definitions, we conditionally call the setup_logging() function based on the LOG_TO_FILE variable. This function redirects both standard output and error output (e.g., from echo commands) to the LOG_FILE.

Additionally, it avoids a common pitfall of infinitely growing log files by truncating the file to LOG_KEEP_SIZE_MB whenever it reaches LOG_MAX_SIZE_MB. This ensures that only the most recent log entries are preserved.

# ---------- Enable logging ----------

setup_logging() {
    local max_size keep_size

    # Convert MB -> bytes (rounded down)
    max_size=$(echo "$LOG_MAX_SIZE_MB * 1024 * 1024 / 1" | bc)
    keep_size=$(echo "$LOG_KEEP_SIZE_MB * 1024 * 1024 / 1" | bc)

    # Ensure log file exists (do not truncate)
    if [ ! -f "$LOG_FILE" ]; then
        touch "$LOG_FILE"
    fi

    # Truncate log if too big
    local size size_mb
    size=$(stat -c%s "$LOG_FILE")
    size_mb=$(awk "BEGIN {printf \"%.2f\", $size/1024/1024}")  # convert bytes -> MB

    if (( size > max_size )); then
        tail -c "$keep_size" "$LOG_FILE" > "$LOG_FILE.tmp" && mv "$LOG_FILE.tmp" "$LOG_FILE"

        # Log truncation message happens before the exec redirection, so it needs its own timestamp
        echo "$(TZ="$LOG_TIMEZONE" date '+%Y-%m-%d %H:%M:%S') [INFO] Log truncated: original_size=${size_mb}MB, max_size=${LOG_MAX_SIZE_MB}MB, keep_size=${LOG_KEEP_SIZE_MB}MB" >> "$LOG_FILE"
    fi

    # Redirect stdout + stderr to log file with timestamps
    exec > >(while IFS= read -r line; do
        echo "$(TZ="$LOG_TIMEZONE" date '+%Y-%m-%d %H:%M:%S') $line"
    done >> "$LOG_FILE") 2>&1

    # Per-run separator — just echo, timestamps added automatically
    echo
    echo "========================================"
    echo "[INFO] Logging started"
    echo "[INFO] Log file: $LOG_FILE"
    echo "[INFO] Max size: ${LOG_MAX_SIZE_MB}MB, keep: ${LOG_KEEP_SIZE_MB}MB"
    echo "========================================"
    echo
}

if [ "$LOG_TO_FILE" = true ]; then
    setup_logging
fi

Validate configuration

Before executing any other logic, we validate the existence and correctness of the configuration variables and ensure that the environment meets the basic requirements needed to produce a usable and meaningful backup. The following checks are performed:

The MySQL container is running.
A connection to the MySQL database inside the container can be established.
The local backup directory is defined, exists, and is not the root directory (to avoid catastrophic deletion).
All defined asset paths exist.
At least one of daily, weekly, or monthly backups is enabled.
Any temporary backup archive from the previous run is deleted. This also allows the backup to be recreated and overwritten on the same day.

# ---------- Validate config ------------

is_valid_config() {
    local non_zero_found=0

    echo "[INFO] Validating configuration..."

    # Check that MySQL container is running
    if ! docker inspect -f '{{.State.Running}}' "$DB_CONTAINER_NAME" 2>/dev/null | grep -q true; then
        echo "[ERROR] MySQL container not running or not found: DB_CONTAINER_NAME=$DB_CONTAINER_NAME" >&2
        return 1
    fi

    # Check MySQL connectivity inside container
    if ! docker exec "$DB_CONTAINER_NAME" \
        mysql -u"$DB_USER" -p"$DB_PASS" "$DB_NAME" -e "SELECT 1;" >/dev/null 2>&1; then
        echo "[ERROR] MySQL connection failed: container=$DB_CONTAINER_NAME user=$DB_USER db=$DB_NAME" >&2
        return 1
    fi

    # Check local backup directory variable is set, dir exists, and is not root
    if [ -z "$LOCAL_BACKUP_DIR" ] || [ ! -d "$LOCAL_BACKUP_DIR" ] || [ "$LOCAL_BACKUP_DIR" = "/" ]; then
        echo "[ERROR] Local backup directory invalid: path=$LOCAL_BACKUP_DIR" >&2
        return 1
    fi

    # Check source code paths exist (file or directory)
    for path in "${SRC_CODE_DIRS[@]}"; do
        if [ ! -e "$path" ]; then
            echo "[ERROR] Source path missing: path=$SCRIPT_DIR/$path" >&2
            return 1
        fi
    done

    # Validate retention values
    for var in BACKUP_RETENTION_DAILY BACKUP_RETENTION_WEEKLY BACKUP_RETENTION_MONTHLY; do
        value="${!var}"

        if [[ ! "$value" =~ ^[0-9]+$ ]]; then
            echo "[ERROR] Retention value is not a number: $var=$value" >&2
            return 1
        fi

        if (( value > MAX_RETENTION )); then
            echo "[ERROR] Retention value too large: $var=$value max=$MAX_RETENTION" >&2
            return 1
        fi

        (( value > 0 )) && non_zero_found=1
    done

    if (( non_zero_found == 0 )); then
        echo "[ERROR] All retention values are zero: daily=$BACKUP_RETENTION_DAILY weekly=$BACKUP_RETENTION_WEEKLY monthly=$BACKUP_RETENTION_MONTHLY" >&2
        return 1
    fi

    # Delete existing temp backup file for this day (idempotent, can run on same day)
    if [[ -f "$ZIP_PATH" ]]; then
        rm -f "$ZIP_PATH"
        echo "[WARN] Existing temporary backup file deleted: $ZIP_PATH"
    fi

    echo "[INFO] Configuration is valid. Creating backup..."

    return 0
}

Backup logic

The create_backup() function is the core of the entire script. It assembles a MySQL database dump and file assets into a structured archive with well-organized relative paths, while also cleaning up any temporary files created during the process.

To build the archive with proper relative paths, we use a temporary STAGING_DIR directory. Inside this directory, the database dump is stored under a folder named by the MYSQL_ZIP_DIR_NAME constant, while file assets are grouped under the FILES_ZIP_DIR_NAME directory.

Using a combination of docker exec and mysqldump, we export the database contents as a plain UTF-8 .sql file and place it into the staging directory. The file is named after the database itself, using the DB_NAME configuration variable.

We then copy all asset files and directories defined in SRC_CODE_DIRS into the staging directory under the FILES_DIR parent folder. If SRC_CODE_DIRS is empty, the entire FILES_DIR directory is removed from the staging area to avoid unnecessary clutter.

Next, we create a .zip archive from STAGING_DIR using a subshell to safely change directories without affecting the main script, which would otherwise break relative path handling. Note that the script is intentionally designed to rely exclusively on relative paths. The resulting archive is saved to the ZIP_PATH location.

Finally, we remove the STAGING_DIR directory regardless of whether archive creation succeeds or fails. This ensures idempotency and prevents leftover temporary files from accumulating.

create_backup() {
    # Note: use staging dir with relative paths to have nice overview in GUI archive utility

    # Local scope
    # staging dir: mybb/backup/data/staging_dir
    # temp db dir: mybb/backup/data/staging_dir/mysql_database
    # working dir: mybb/backup/scripts
    local STAGING_DIR="$LOCAL_BACKUP_DIR/staging_dir"
    local TEMP_DB_DIR="$STAGING_DIR/$MYSQL_ZIP_DIR_NAME"
    local FILES_DIR="$STAGING_DIR/$FILES_ZIP_DIR_NAME"

    # Reset staging dir from previous broken state 
    rm -rf "$STAGING_DIR"
    mkdir -p "$TEMP_DB_DIR"  # Will recreate staging dir
    mkdir -p "$FILES_DIR"    # Folder to group all source code

    echo "[INFO] Created staging directory: $STAGING_DIR"
    echo "[INFO] Created temporary DB directory: $TEMP_DB_DIR"
    echo "[INFO] Created files directory: $FILES_DIR"

    # Dump MySQL as plain UTF-8 .sql
    docker exec "$DB_CONTAINER_NAME" sh -c \
        'mysqldump --no-tablespaces -u"$DB_USER" -p"$DB_PASS" "$DB_NAME"' \
        > "$TEMP_DB_DIR/$DB_NAME.sql"

    echo "[INFO] MySQL database dumped: db_name=$DB_NAME -> path=$TEMP_DB_DIR/$DB_NAME.sql"

    # Copy source code folders grouped into FILES_DIR dir
    for SRC_CODE_DIR in "${!SRC_CODE_DIRS[@]}"; do
        SRC_CODE_DIR_PATH="${SRC_CODE_DIRS[$SRC_CODE_DIR]}"
        cp -a "$SRC_CODE_DIR_PATH" "$FILES_DIR/"
        echo "[INFO] Added to staging: $SRC_CODE_DIR_PATH -> $FILES_ZIP_DIR_NAME/"
    done

    # Remove FILES_DIR if empty
    if [ -d "$FILES_DIR" ] && [ -z "$(ls -A "$FILES_DIR")" ]; then
        rm -rf "$FILES_DIR"
        echo "[INFO] Removed empty files directory: $FILES_DIR"
    fi

    # Create zip with clean relative paths
    # ( ... ) - subshell, cd wont affect working dir of the main script
    (
        cd "$STAGING_DIR" || {
            echo "[ERROR] Failed to cd into staging directory: $STAGING_DIR" >&2
            exit 1
        }

        # There was cd in subshell
        # Adjust zip path relative to staging_dir
        zip -r "../$ZIP_PATH" .
    ) || {
        echo "[ERROR] Zip creation failed: $ZIP_PATH" >&2
        rm -rf "$STAGING_DIR"
        exit 1
    }

    echo "[INFO] Created zip archive: $ZIP_PATH"

    # Cleanup
    rm -rf "$STAGING_DIR"
    echo "[INFO] Removed staging directory: $STAGING_DIR"
    echo "[INFO] Backup file created successfully: $ZIP_PATH"
}

The create_retention_copies() function manages retention by creating time-based copies (daily, weekly, monthly) of a freshly generated backup file. The original backup file path (ZIP_PATH) includes a placeholder string (frequency), defined by the FREQ_PLACEHOLDER constant.

For each retention option, the function evaluates the current date (e.g., Sunday for weekly, the first day of the month for monthly) and checks whether the corresponding BACKUP_* variable is enabled. If the conditions are satisfied, the original backup is copied and renamed by replacing the placeholder with the appropriate frequency.

To ensure idempotency, the function checks whether a retention copy for the current day already exists. If it does, it is removed before creating a new one, allowing the script to be safely re-run on the same day.

Finally, the temporary backup file with the placeholder name is deleted, as it is no longer needed after the retention copies are created.

create_retention_copies() {
    local IS_WEEKLY=$(( DAY_OF_WEEK == 7 )) # Sunday
    local IS_MONTHLY=$(( DAY_OF_MONTH == 1 )) # First day of month

    if [[ ! -f "$ZIP_PATH" ]]; then
        echo "[ERROR] Backup file does not exist: $ZIP_PATH"
        return 1
    fi

    for FREQ in daily weekly monthly; do
        case "$FREQ" in
            daily)
                [[ "$BACKUP_DAILY" == true ]] || continue
                ;;
            weekly)
                [[ "$IS_WEEKLY" -eq 1 && "$BACKUP_WEEKLY" == true ]] || continue
                ;;
            monthly)
                [[ "$IS_MONTHLY" -eq 1 && "$BACKUP_MONTHLY" == true ]] || continue
                ;;
        esac

        # Placeholder 'frequency' string replacement
        TARGET_FILE="${ZIP_PATH/$FREQ_PLACEHOLDER/$FREQ}"

        # Delete existing backup for this frequency (idempotent, can run on same day)
        if [[ -f "$TARGET_FILE" ]]; then
            rm -f "$TARGET_FILE"
            echo "[WARN] Existing $FREQ backup removed: $TARGET_FILE"
        fi

        cp "$ZIP_PATH" "$TARGET_FILE"
        echo "[INFO] $FREQ backup copied successfully: $TARGET_FILE"
    done

    rm -f "$ZIP_PATH"
    echo "[INFO] Removed temporary backup file: $ZIP_PATH"
}

We don’t want to accumulate an unlimited number of backup copies; instead, we delete outdated ones according to the retention limits defined by the BACKUP_RETENTION_* variables.

The prune_old_backups() function enforces these retention limits by removing older backups for each frequency (daily, weekly, monthly).

Based on the BACKUP_RETENTION_* variables, we dynamically calculate the RETENTION integer value for each frequency. If it is zero or unset, we exit early from the loop.

We then list the contents of the LOCAL_BACKUP_DIR and delete outdated copies by processing the output of the ls command through a pipeline:

We filter out files that do not contain values from the ZIP_PREFIX and FREQ variables.
We skip the first RETENTION lines, which effectively keeps the RETENTION most recent backups.
Using xargs and rm -R, we delete all remaining filenames line by line, ignoring any error messages to keep logs clean.

prune_old_backups() {
    for FREQ in daily weekly monthly; do
        # Determine retention variable dynamically
        RETENTION_VAR="BACKUP_RETENTION_${FREQ^^}"  # uppercase: daily -> DAILY
        RETENTION="${!RETENTION_VAR}"

        # Skip if retention is zero or unset
        [[ -z "$RETENTION" || "$RETENTION" -le 0 ]] && continue

        # Find old backups and delete them
        ls -t "$LOCAL_BACKUP_DIR" \
            | grep "$ZIP_PREFIX" \
            | grep "$FREQ" \
            | sed -e 1,"$RETENTION"d \
            | xargs -d '\n' -I{} rm -R "$LOCAL_BACKUP_DIR/{}" > /dev/null 2>&1

        echo "[INFO] Pruned $FREQ backups, keeping last $RETENTION"
    done
}

Main invocation

Finally, we invoke the functions defined above.

First, we ensure that all required configuration is correct. If validation fails, the script prints an error message to stderr and immediately exits with a non-zero status, preventing any further execution.

If the configuration is valid, the script continues by creating a backup archive, then generating additional retention copies, and finally removing old backups.

# ---------- Main script ----------

if ! is_valid_config; then
    echo "[ERROR] Configuration validation failed. Aborting backup." >&2
    exit 1
fi

create_backup

create_retention_copies

prune_old_backups

echo "[INFO] Backup completed successfully."

Below is an example log entry from a successful run when creating a backup:

2026-04-07 00:30:01 ========================================
2026-04-07 00:30:01 [INFO] Logging started
2026-04-07 00:30:01 [INFO] Log file: ./log-backup-files-and-mysql.txt
2026-04-07 00:30:01 [INFO] Max size: 1.0MB, keep: 0.5MB
2026-04-07 00:30:01 ========================================
2026-04-07 00:30:01 
2026-04-07 00:30:01 [INFO] Validating configuration...
2026-04-07 00:30:01 [INFO] Configuration is valid. Creating backup...
2026-04-07 00:30:01 [INFO] Created staging directory: ../data/staging_dir
2026-04-07 00:30:01 [INFO] Created temporary DB directory: ../data/staging_dir/mysql_database
2026-04-07 00:30:01 [INFO] Created files directory: ../data/staging_dir/source_code
2026-04-07 00:30:01 mysqldump: [Warning] Using a password on the command line interface can be insecure.
2026-04-07 00:30:02 [INFO] MySQL database dumped: db_name=mybb -> path=../data/staging_dir/mysql_database/mybb.sql
2026-04-07 00:30:02 [INFO] Added to staging: ../../data/mybb-data/images/custom -> source_code/
2026-04-07 00:30:02 [INFO] Added to staging: ../../data/mybb-data/inc/config.php -> source_code/
2026-04-07 00:30:02   adding: mysql_database/ (stored 0%)
2026-04-07 00:30:02   adding: mysql_database/mybb.sql (deflated 83%)
2026-04-07 00:30:02   adding: source_code/ (stored 0%)
2026-04-07 00:30:02   adding: source_code/config.php (deflated 62%)
2026-04-07 00:30:02   adding: source_code/custom/ (stored 0%)
2026-04-07 00:30:02   adding: source_code/custom/logo-blue-153x75.png (stored 0%)
2026-04-07 00:30:02   adding: source_code/custom/logo-blue-588x288.png (deflated 0%)
2026-04-07 00:30:02 [INFO] Created zip archive: ../data/mybb_files_and_mysql-frequency-2026-04-07.zip
2026-04-07 00:30:02 [INFO] Removed staging directory: ../data/staging_dir
2026-04-07 00:30:02 [INFO] Backup file created successfully: ../data/mybb_files_and_mysql-frequency-2026-04-07.zip
2026-04-07 00:30:02 [INFO] daily backup copied successfully: ../data/mybb_files_and_mysql-daily-2026-04-07.zip
2026-04-07 00:30:02 [INFO] Removed temporary backup file: ../data/mybb_files_and_mysql-frequency-2026-04-07.zip
2026-04-07 00:30:02 [INFO] Pruned daily backups, keeping last 3
2026-04-07 00:30:02 [INFO] Pruned weekly backups, keeping last 2
2026-04-07 00:30:02 [INFO] Pruned monthly backups, keeping last 6
2026-04-07 00:30:02 [INFO] Backup completed successfully.

Remote syncing script

Besides the main backup script running directly on the server, we use an additional script that replicates the original backup on remote machines.

This script synchronizes the backups created on the server (which serves as the source of truth) to remote machines used for storing backup copies. It connects to the server via SSH, validates both the backup and local folder structure, and finally synchronizes the data using the rsync command.

Entire script: https://github.com/nemanjam/bash-backup/blob/main/backup-rsync-local.sh

Configurable variables

Similarly to the local scripts, these are configurable variables that would typically be defined in a .env file. They are used to configure the synchronization script.

REMOTE_HOST - the server host used for the SSH and rsync connections.
REMOTE_BACKUP_DIR - the absolute path (avoid shell expansion) to the backup directory on the server (source of truth).
LOCAL_BACKUP_DIR - the relative path to the local directory where backups are synchronized.
MIN_BACKUP_SIZE_MB - a human-readable float value (in MB) representing the minimum valid backup size.
MIN_BACKUP_SIZE_BYTES - the equivalent integer value in bytes, used for actual validation and computations.

# ---------- Configuration ----------

REMOTE_HOST="arm2"
# Full, absolute path - can't use ~/, used both locally and remote with ssh/rsync
REMOTE_BACKUP_DIR="/home/ubuntu/traefik-proxy/apps/mybb/backup/data"

# Note: all commands run from script dir, NEVER call cd, for relative LOCAL paths to work
LOCAL_BACKUP_DIR="../data"

# Minimum valid backup size, ZIP size, compressed
# Only db for blank forum, zip=158.2 KiB
# Float
MIN_BACKUP_SIZE_MB=0.1
# Integer
MIN_BACKUP_SIZE_BYTES=$(echo "$MIN_BACKUP_SIZE_MB * 1024 * 1024 / 1" | bc) # rounded to integer

Constants

The only constant used is ZIP_PREFIX and it must match the one used in backup creation script backup-files-and-mysql.sh.

# ---------- Constants ----------

# Must match backup-files-and-mysql.sh
ZIP_PREFIX="mybb_files_and_mysql"

# Script dir absolute path, unused
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

Logging variables

These are the same as in the Local backup script. The logging format is identical in both the local and remote scripts.

# ---------- Logging vars ----------

# Enable only when running from cron
# Cron has no TTY, interactive shell does
LOG_TO_FILE=false
[ -z "$PS1" ] && LOG_TO_FILE=true

# Log file
LOG_FILE="./log-backup-rsync-local.txt"

# Log size limits (MB, float allowed)
LOG_MAX_SIZE_MB=1.0   # truncate when log exceeds this
LOG_KEEP_SIZE_MB=0.5  # keep last N MB after truncation

# Timezone for log timestamps
LOG_TIMEZONE="Europe/Belgrade"

Setup logging

Identical as in the local script.

Validate configuration

Same as in the local script, we validate the existence and correctness of the configuration variables before performing any other logic. In this case, we ensure that:

The SSH connection to the remote host can be established.
The remote backup directory exists.
The local backup directory exists.

# ---------- Validate config ------------

is_valid_config() {
    echo "----------------------------------------"
    echo "[INFO] Validating configuration"

    # Check SSH connectivity to remote host
    if ! ssh -o BatchMode=yes -o ConnectTimeout=5 "$REMOTE_HOST" "true" >/dev/null 2>&1; then
        echo "[ERROR] Cannot connect to remote host via SSH: REMOTE_HOST=$REMOTE_HOST" >&2
        return 1
    fi
    echo "[INFO] SSH connection established: REMOTE_HOST=$REMOTE_HOST"

    # Check remote backup directory exists
    if ! ssh "$REMOTE_HOST" "[ -d \"$REMOTE_BACKUP_DIR\" ]" >/dev/null 2>&1; then
        echo "[ERROR] Remote backup directory does not exist: REMOTE_HOST=$REMOTE_HOST REMOTE_BACKUP_DIR=$REMOTE_BACKUP_DIR" >&2
        return 1
    fi
    echo "[INFO] Remote backup directory exists: $REMOTE_BACKUP_DIR"

    # Check local backup directory exists
    if [ ! -d "$LOCAL_BACKUP_DIR" ]; then
        echo "[ERROR] Local backup directory does not exist: path=$SCRIPT_DIR/$LOCAL_BACKUP_DIR" >&2
        return 1
    fi
    echo "[INFO] Local backup directory exists: $SCRIPT_DIR/$LOCAL_BACKUP_DIR"

    echo "[INFO] Configuration validation successful"
    echo "----------------------------------------"

    return 0
}

Utility functions

Besides validating configuration variables, we also need to validate the remote backup on the server before syncing locally. This is a more complex task, so we break it down into a few smaller, reusable utility functions for clarity and readability.

get_latest_date() expects a list of filenames via stdin, extracts the date part in YYYY-MM-DD format, sorts them in descending order, and returns the top item, which is the latest date in the list.
split_backup_types() accepts a list of filenames as a single string (e.g. output from ls) as the first argument. The second argument is a mutable associative array passed by reference, used to store the result. The function parses the raw input string and groups backup filenames into categories (daily, weekly, monthly), storing each group in the corresponding key of the associative array.
check_count() compares remote and local backup counts (for a given type). If the remote count is lower than the local count, it prints an error and returns failure.
check_date() does the same for dates. If the latest remote backup date is older than the latest local backup date, it prints an error and returns failure.
bytes_to_human() converts a size in bytes into a human-readable format (KB, MB, GB). It is used to improve log readability.
check_file_size() validates that all remote backup files meet a minimum size requirement. It fetches file names and sizes from the remote server via SSH and iterates through each file. For each one, it logs the size and checks whether it is smaller than the minimum allowed size MIN_BACKUP_SIZE_BYTES. If any file is too small, it logs the filename and exits early with an error.

# ------------ Utils ------------

# Extract latest YYYY-MM-DD date from backup filenames
get_latest_date() {
    sed -E 's/.*-([0-9]{4}-[0-9]{2}-[0-9]{2})\.zip/\1/' \
        | sort | tail -n 1
}

# Split a list of filenames into daily/weekly/monthly assoc array
split_backup_types() {
    local files="$1"
    declare -n arr=$2  # pass assoc array by name

    while IFS= read -r file; do
        case "$file" in
            *-daily-*.zip)   arr[daily]+="$file"$'\n' ;;
            *-weekly-*.zip)  arr[weekly]+="$file"$'\n' ;;
            *-monthly-*.zip) arr[monthly]+="$file"$'\n' ;;
        esac
    done <<< "$files"
}

# Ensure remote has at least as many backups as local
check_count() {
    local remote_count="$1"
    local local_count="$2"
    local backup_type="$3"

    if (( remote_count < local_count )); then
        echo "ERROR: remote has fewer type=$backup_type backups than local, remote_count=$remote_count, local_count=$local_count"
        return 1
    fi
}

# Ensure remote backups are not older than local
check_date() {
    local remote_latest="$1"
    local local_latest="$2"
    local backup_type="$3"

    if [[ -n "$local_latest" && "$remote_latest" < "$local_latest" ]]; then
        echo "ERROR: remote type=$backup_type backup is older than local, remote_latest=$remote_latest, local_latest=$local_latest"
        return 1
    fi
}

# Convert bytes to human-readable format
bytes_to_human() {
    local size=$1
    if (( size < 1024 )); then
        echo "${size}B"
    elif (( size < 1024*1024 )); then
        echo "$((size/1024))KB"
    elif (( size < 1024*1024*1024 )); then
        echo "$((size/1024/1024))MB"
    else
        echo "$((size/1024/1024/1024))GB"
    fi
}

# Ensure all remote backups are larger than minimum size
check_file_size() {
    local bad_file bad_file_size
    local remote_file size
    local remote_files_info

    # Store SSH output in a variable
    remote_files_info=$(ssh "$REMOTE_HOST" "
        for f in $REMOTE_BACKUP_DIR/${ZIP_PREFIX}-*.zip; do
            [ -f \"\$f\" ] || continue
            stat -c '%n %s' \"\$f\"
        done
    ")

    # Iterate over each line in the variable
    while read -r remote_file size; do
        echo "[INFO] Remote file: $remote_file, size=$(bytes_to_human $size)"

        if (( size < MIN_BACKUP_SIZE_BYTES )); then
            bad_file="$remote_file"
            bad_file_size="$size"
            break
        fi
    done <<< "$remote_files_info"

    if [[ -n "$bad_file" ]]; then
        echo "ERROR: remote backup file too small: $bad_file, size=$(bytes_to_human $bad_file_size), min=$(bytes_to_human $MIN_BACKUP_SIZE_BYTES)"
        return 1
    fi

    echo "[INFO] All remote backup files meet minimum size, min=$(bytes_to_human $MIN_BACKUP_SIZE_BYTES)"
    return 0
}

Validate source of truth

The is_valid_backup() function is the final check before synchronization. It validates both remote (source of truth) and local backups, compares them for consistency, and ensures that we do not accidentally overwrite the local backup with a corrupted or inconsistent remote backup from the server. It composes the utility functions defined above and adds its own logic:

It verifies that all remote backup files meet a minimum size requirement.
For each backup type (daily, weekly, monthly), it ensures that:
- The remote contains more backups than the local.
- The latest backup date on the remote is newer than the latest local backup.

If any validation fails, the function prints an error and exits early with a non-zero status. If all checks pass, it confirms successful validation and returns success.

# ---------- Validation ----------

is_valid_backup() {
    echo "----------------------------------------"
    echo "[INFO] Validating backups"

    # Local variables
    local -A remote_lists local_lists
    local remote_all_files local_all_files

    # Loop variables
    local backup_type
    local remote_list local_list
    local remote_count local_count
    local remote_latest local_latest

    # Global size validation (run once)
    if ! check_file_size; then
        echo "ERROR: remote backup contains file(s) smaller than minimum size, min=$(bytes_to_human $MIN_BACKUP_SIZE_BYTES)"
        return 1
    fi
    echo "[INFO] Remote backup file sizes validated, min=$(bytes_to_human $MIN_BACKUP_SIZE_BYTES)"

    # Store remote backup filenames in a variable and split, ignores .gitkeep
    remote_all_files=$(ssh "$REMOTE_HOST" "ls -1 $REMOTE_BACKUP_DIR/${ZIP_PREFIX}-*.zip 2>/dev/null")
    split_backup_types "$remote_all_files" remote_lists
	echo "[INFO] Remote backup file list loaded for type(s):"
	echo "$remote_all_files"

    # Store local backup filenames in a variable and split
    local_all_files=$(ls -1 "$LOCAL_BACKUP_DIR/${ZIP_PREFIX}-*.zip" 2>/dev/null)
    split_backup_types "$local_all_files" local_lists
	echo "[INFO] Local backup file list loaded:"
	echo "$local_all_files"

    for backup_type in daily weekly monthly; do
        echo "[INFO] Checking backup type: $backup_type"

        # Set filename lists
        remote_list="${remote_lists[$backup_type]}"
        local_list="${local_lists[$backup_type]}"

        # Check counts
        remote_count=$(echo "$remote_list" | grep -c . || true)
        local_count=$(echo "$local_list" | grep -c . || true)
        if ! check_count "$remote_count" "$local_count" "$backup_type"; then
            echo "ERROR: backup count mismatch for type=$backup_type: remote=$remote_count is less than local=$local_count"
            return 1
        fi
        echo "[INFO] Backup count valid: type=$backup_type remote=$remote_count local=$local_count"

        # Check latest dates
        remote_latest=$(echo "$remote_list" | get_latest_date)
        local_latest=$(echo "$local_list" | get_latest_date)
        if ! check_date "$remote_latest" "$local_latest" "$backup_type"; then
            echo "ERROR: latest backup date mismatch for type=$backup_type: remote=$remote_latest is older than local=$local_latest"
            return 1
        fi
        echo "[INFO] Latest backup date valid: type=$backup_type date=$remote_latest"
    done

    echo "[INFO] Backup validation successful"
    echo "----------------------------------------"

    return 0
}

Synchronizing local copy

Finally, we can invoke the validation functions from above and synchronize remote backup to the local machine.

It first verifies that the configuration is valid, and then that the remote backups are valid. If any validation fails, the script aborts and logs error message.

If both checks pass, it proceeds to mirror the remote backup directory locally using rsync, preserving structure and deleting any local files that no longer exist on the remote.

# ---------- Sync ----------

if ! is_valid_config; then
    echo "[ERROR] Configuration validation failed. Aborting script." >&2
    exit 1
fi

# Exit early if remote backup is not valid
if ! is_valid_backup; then
    echo "ERROR: Backup validation failed - aborting"
    exit 1
fi

# Note: no fallback logic for now

echo "[INFO] Remote backup valid - syncing data"

# Mirror remote data directory locally
rsync -ah --progress --delete "$REMOTE_HOST:$REMOTE_BACKUP_DIR/" "$LOCAL_BACKUP_DIR/"

echo "[INFO] Backup sync completed successfully."

Below is an example log entry from a successful run when synchronizing a backup:

2026-04-07 00:45:02 ========================================
2026-04-07 00:45:02 [INFO] Logging started
2026-04-07 00:45:02 [INFO] Log file: ./log-backup-rsync-local.txt
2026-04-07 00:45:02 [INFO] Max size: 1.0MB, keep: 0.5MB
2026-04-07 00:45:02 ========================================
2026-04-07 00:45:02 
2026-04-07 00:45:02 ----------------------------------------
2026-04-07 00:45:02 [INFO] Validating configuration
2026-04-07 00:45:03 [INFO] SSH connection established: REMOTE_HOST=arm2
2026-04-07 00:45:04 [INFO] Remote backup directory exists: / ... /traefik-proxy/apps/mybb/backup/data
2026-04-07 00:45:04 [INFO] Local backup directory exists: / ... /mybb-backup/scripts/../data
2026-04-07 00:45:04 [INFO] Configuration validation successful
2026-04-07 00:45:04 ----------------------------------------
2026-04-07 00:45:04 ----------------------------------------
2026-04-07 00:45:04 [INFO] Validating backups
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-05.zip, size=279KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-06.zip, size=293KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-07.zip, size=285KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-02-01.zip, size=292KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-03-01.zip, size=334KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-04-01.zip, size=332KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-weekly-2026-03-22.zip, size=326KB
2026-04-07 00:45:05 [INFO] Remote file: / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-weekly-2026-04-05.zip, size=279KB
2026-04-07 00:45:05 [INFO] All remote backup files meet minimum size, min=102KB
2026-04-07 00:45:05 [INFO] Remote backup file sizes validated, min=102KB
2026-04-07 00:45:06 [INFO] Remote backup file list loaded for type(s):
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-05.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-06.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-daily-2026-04-07.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-02-01.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-03-01.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-monthly-2026-04-01.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-weekly-2026-03-22.zip
2026-04-07 00:45:06 / ... /traefik-proxy/apps/mybb/backup/data/mybb_files_and_mysql-weekly-2026-04-05.zip
2026-04-07 00:45:06 [INFO] Local backup file list loaded:
2026-04-07 00:45:06 
2026-04-07 00:45:06 [INFO] Checking backup type: daily
2026-04-07 00:45:06 [INFO] Backup count valid: type=daily remote=3 local=0
2026-04-07 00:45:06 [INFO] Latest backup date valid: type=daily date=2026-04-07
2026-04-07 00:45:06 [INFO] Checking backup type: weekly
2026-04-07 00:45:06 [INFO] Backup count valid: type=weekly remote=2 local=0
2026-04-07 00:45:06 [INFO] Latest backup date valid: type=weekly date=2026-04-05
2026-04-07 00:45:06 [INFO] Checking backup type: monthly
2026-04-07 00:45:06 [INFO] Backup count valid: type=monthly remote=3 local=0
2026-04-07 00:45:06 [INFO] Latest backup date valid: type=monthly date=2026-04-01
2026-04-07 00:45:06 [INFO] Backup validation successful
2026-04-07 00:45:06 ----------------------------------------
2026-04-07 00:45:06 [INFO] Remote backup valid - syncing data
2026-04-07 00:45:07 receiving incremental file list
2026-04-07 00:45:07 deleting mybb_files_and_mysql-daily-2026-04-04.zip
2026-04-07 00:45:07 ./
2026-04-07 00:45:07 mybb_files_and_mysql-daily-2026-04-07.zip
2026-04-07 00:45:07 
              0   0%    0.00kB/s    0:00:00  
        292.31K 100%    1.91MB/s    0:00:00 (xfr#1, to-chk=5/10)
2026-04-07 00:45:07 [INFO] Backup sync completed successfully.

Cron jobs

Now that we have implemented the scripts, we just need to schedule them to run daily by defining cron jobs on the server and on each machine that stores synced copies.

As a reminder, we can list and edit cron jobs using the following commands:

# List crons
crontab -l

# Edit crons
crontab -e

We need to carefully choose times that capture application data near the end of the day in our target time zone. The sync script must run after the backup creation script, so we need to estimate the execution time of the backup process and schedule the sync within a safe margin. With this in mind, we can schedule the backup at 23:30 and the sync at 23:45.

Another important detail is that both scripts rely on relative paths, so cron must execute them from the correct working directory. This can be achieved with cd /.../backup/scripts && bash ./my-script.sh. Additionally, we should use absolute paths in the cron configuration (avoiding shortcuts like ~ for the home directory), as such expansions may fail in a cron environment.

On server:

# Create backup every day at 23:30 Belgrade (UTC+2) time (21:30 UTC)
30 21 * * * cd /home/username/traefik-proxy/apps/mybb/backup/scripts && /usr/bin/bash ./run-backup-files-and-mysql.sh

On syncing machines:

# Sync backup every day at 23:45 Belgrade (UTC+2) time (21:45 UTC)
45 21 * * * cd /home/username/mybb-backup/scripts && /usr/bin/bash ./run-backup-rsync-local.sh

Interestingly, I couldn’t find a reliable way to set a custom time zone for cron jobs. I tried setting the TZ and CRON_TZ variables, but they were ignored, and cron always fell back to UTC.

# None of these actually sets the time zone successfully
# Always falls back to UTC

# Global
TZ=Europe/Belgrade
CRON_TZ=Europe/Belgrade

# Per job
30 21 * * * TZ=Europe/Belgrade cd /home/username/traefik-proxy/apps/mybb/backup/scripts && /usr/bin/bash ./run-backup-files-and-mysql.sh
30 21 * * * CRON_TZ=Europe/Belgrade cd /home/username/traefik-proxy/apps/mybb/backup/scripts && /usr/bin/bash ./run-backup-files-and-mysql.sh

Room for improvements

We described an example implementation along with the thought process behind designing a usable backup script. It is certainly not perfect or final, and it can be enhanced and improved in a number of ways. Here are some possible improvements:

Extract all configuration variables from the script and load them from an .env file.
Add an ENABLE_ASSETS boolean flag to enable or disable including application file assets in the backup. Currently, this requires commenting out all keys in the SRC_CODE_DIRS associative array.
Create a decentralized solution by avoiding a single “source of truth” backup on the server. Instead, allow local backup repositories to connect to the server via SSH and execute code that creates a temporary backup, which can then be downloaded locally and deleted afterward. In practice, backup-rsync-local.sh would send and execute the backup-files-and-mysql.sh script remotely and clean up the temporary backup after downloading.
Set up a test environment with sample data to conveniently test and validate the scripts without waiting for scheduled time intervals (daily, weekly, monthly copies).
Find a reliable and actually working solution for configuring the time zone for cron jobs on Ubuntu.

Completed code

Backup script: https://github.com/nemanjam/bash-backup
Example application: https://github.com/nemanjam/mybb-docker

Conclusion

The assumption was that we need to track the state of an application, including the database, configuration (source files), and assets (images), and that we need a simple, minimalistic, yet functional solution. Although this is a quite common use case, after searching I couldn’t find a convincing, up-to-date Bash script for this purpose. So, I decided to build upon and adapt the closest existing script I could find. That process is described in this article.

This is a pragmatic, custom script focused on simplicity, with no ambition to become a comprehensive backup solution covering many use cases and features. Such a solution would require a much larger scope of work, and many robust backup tools already exist.

How do you approach creating and managing backups for your applications? Let me know in the comments.

References

The original script https://github.com/todiadiyatmo/bash-backup-rotation-script
Crontab set custom time zone https://serverfault.com/questions/848829/how-to-use-timezone-with-cron-tab

Why you should use rsync instead of scp in deployments

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Fri, 13 Mar 2026 00:00:00 GMT

Introduction

Many of you can probably guess the point of this article just by reading the title, but it’s still useful to have a clear reminder backed by some real-world measurements. This will be a practical, straight-to-the-point article.

The problem with scp in deployments

When copying the dist folder to a deployment server, the first instinct is usually to clear the existing folder and upload the new one using scp. While this works, you can achieve significant long-term improvements by replacing a few lines and using rsync instead.

The important facts to keep in mind are:

This is a repeated operation.
The server will (almost) always already contain a previous copy of the dist folder.
Not all files in the build artifacts change on every build, many remain exactly the same and can be reused.

Clearing the dist folder and scp the entire content each time simply ignores the facts above. scp it is not optimized for repeated copying where most files remain unchanged.

Bash deployment scripts and Github Actions deployment workflows run frequently, so any unnecessary time or performance overhead accumulates and wastes energy and resources. It’s important to optimize as much as possible, especially when it requires very little effort.

Why rsync is faster

rsync is designed specifically for efficient file synchronization. Instead of copying everything every time, it compares the source and destination and transfers only the files that have changed.

This dramatically reduces the amount of data that needs to be sent during deployments. In most cases, only a small subset of files changes between builds, which means rsync can complete the transfer much faster than scp.

Another advantage is that rsync can resume partially transferred files and optionally compress data during transfer. These features make it especially well suited for automated deployment workflows where speed and reliability are important.

rsync flags for deployments

A typical rsync command used in deployments looks like this:

rsync -az --delete ./dist/ user@server:/var/www/site

Some commonly used flags include:

-a (archive) preserves permissions, timestamps, and recursively copies files.
-z enables compression during transfer, which reduces network usage.
--delete removes files on the destination that no longer exist in the source, keeping the deployment directory in sync.
--partial allows interrupted transfers to resume instead of restarting from scratch.

Together, these options make rsync a powerful and efficient tool for copying build artifacts during automated deployments.

A complete list of options is available in the command’s manual: https://download.samba.org/pub/rsync/rsync.1#OPTION_SUMMARY.

Example: deployment with scp

For both scp and rsync, we will consider two examples: a Bash script used to deploy from a local development environment, and a Github Actions workflow. These represent two common approaches to deployments.

For the sake of context and completeness, the full scripts are included so you can reuse them or run your own tests and performance comparisons.

Bash script

Naturally, the only truly important part is the scp line. However, let’s briefly review the rest of the script, since it demonstrates what we would typically use in a real-world scenario.

The first assumption is that we have a local dist folder containing the compiled application built with a local .env file, and an Nginx web server with a webroot directory on a remote server. Our Bash script accepts three input arguments: LOCAL_PATH, REMOTE_PATH, and REMOTE_HOST, which we validate before performing the copy.

Next, we establish an initial ssh connection to delete the existing application artifacts from the previous deployment. During this step, we also log some information by printing the file list and the total number of files in the Nginx webroot before and after removing the old files.

Note 1: When removing old artifacts, we delete the contents of the Nginx webroot directory, not the webroot directory itself. Removing the directory could disrupt the current Nginx session and would require restarting the Nginx process or container.

Note 2: Below the scp line, I also include a tar ... | ssh command example that compresses the artifacts before piping them through the SSH connection. In theory, this should provide performance similar to rsync in scenarios where we always completely clear the previous deployment. I will include it in the measurements so we can see how it performs.

# Navigate to ~/traefik-proxy/apps/nmc-nginx-with-volume/website
cd $REMOTE_PATH

# Clear the contents, not the `/website` path segment
rm -rf *

https://github.com/nemanjam/nemanjam.github.io/blob/main/scripts/deploy-nginx.sh

#!/bin/bash

LOCAL_PATH="./dist"
# REMOTE_PATH="~/traefik-proxy/apps/nmc-nginx-with-volume/website"
# REMOTE_HOST="arm1"

REMOTE_PATH=$1
REMOTE_HOST=$2

# Check if all arguments are provided
if [[ -z "$REMOTE_PATH" || -z "$REMOTE_HOST" ]]; then
  echo "Incorrect args, usage: $0 <remote_path> <remote_host>"
  exit 1
fi

# Navigate to the website folder on the remote server and clear contents of the website folder
ssh $REMOTE_HOST "cd $REMOTE_PATH && \
                  echo 'Listing files before clearing:' && \
                  echo 'List before clearing:' && \
                  ls && \
                  echo 'Count before clearing:' && \
                  ls -l | grep -v ^l | wc -l && \

                  # Only possible to skip with rsync --delete

                  echo 'Clearing contents of the folder...' && \
                  rm -rf * && \
                  echo 'List after clearing:' && \
                  ls && \
                  echo 'Count after clearing:' && \
                  find . -type f | wc -l && \

                  echo 'Copying new contents...'"

# Copy new contents, 320 MB
# Using scp -rq, slowest, not resumable
scp -rq $LOCAL_PATH/* $REMOTE_HOST:$REMOTE_PATH

# Using tar, fast for cleaned dir
# tar cf - -C "$LOCAL_PATH" . | ssh "$REMOTE_HOST" "tar xvf - -C $REMOTE_PATH" >/dev/null 2>&1

Then we can call the Bash script like this by passing REMOTE_PATH and REMOTE_HOST arguments:

{
  "scripts": {
    // ...

    "deploy:nginx:rpi": "bash scripts/deploy-nginx.sh '~/traefik-proxy/apps/nmc-nginx-with-volume/website' rpi",
    
    // ...
  }
}

Github Actions

The Github Actions workflow provides even more context. It includes environment variables required by the application, sets up Node.js and pnpm, and builds the app. The rest is identical to the Bash script above: we use the appleboy/ssh-action action to establish an SSH connection and clear the previous deployment, and the appleboy/scp-action action to copy the built dist/ folder to the remote server using scp.

In the scp step, most arguments are self-explanatory, but one worth emphasizing is strip_components: 1. This prevents creating an additional dist/ path segment inside the Nginx webroot. In other words, we want the files copied to nmc-nginx-with-volume/website/*, not to nmc-nginx-with-volume/website/dist/*.

https://github.com/nemanjam/nemanjam.github.io/blob/main/.github/workflows/default__deploy-nginx-scp.yml

name: Deploy Nginx scp

on:
  push:
    branches:
      - 'main'
    tags:
      - 'v[0-9]+.[0-9]+.[0-9]+'
  pull_request:
    branches:
      - 'disabled-main'
  workflow_dispatch:

env:
  SITE_URL: 'https://nemanjamitic.com'
  PLAUSIBLE_SCRIPT_URL: 'https://plausible.arm1.nemanjamitic.com/js/script.js'
  PLAUSIBLE_DOMAIN: 'nemanjamitic.com'

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - name: Print commit id, message and tag
        run: |
          git show -s --format='%h %s'
          echo "github.ref -> {{ github.ref }}"

      - name: Set up Node.js and pnpm
        uses: actions/setup-node@v4
        with:
          node-version: 24.13.0
          registry-url: 'https://registry.npmjs.org'

      - name: Install pnpm
        uses: pnpm/action-setup@v4
        with:
          version: 10.30.1

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Build nemanjamiticcom
        run: pnpm build

      - name: Clean up website dir
        uses: appleboy/ssh-action@master
        with:
          host: ${{ secrets.REMOTE_HOST }}
          username: ${{ secrets.REMOTE_USERNAME }}
          key: ${{ secrets.REMOTE_KEY_ED25519 }}
          port: ${{ secrets.REMOTE_PORT }}
          script_stop: true
          script: |
            cd /home/ubuntu/traefik-proxy/apps/nmc-nginx-with-volume/website
            echo "Content before deletion: $(pwd)"
            ls -la
            rm -rf ./*
            echo "Content after deletion: $(pwd)"
            ls -la

      - name: Copy dist folder to remote host
        uses: appleboy/scp-action@v0.1.7
        with:
          host: ${{ secrets.REMOTE_HOST }}
          username: ${{ secrets.REMOTE_USERNAME }}
          key: ${{ secrets.REMOTE_KEY_ED25519 }}
          port: ${{ secrets.REMOTE_PORT }}
          source: 'dist/'
          target: '/home/ubuntu/traefik-proxy/apps/nmc-nginx-with-volume/website'
          # remove /dist path segment
          strip_components: 1

Example: deployment with rsync

Now we modify the existing Bash script and Github Actions workflow by replacing scp with rsync, while keeping the rest of the code identical.

Bash script

Most of the script remains the same. However, since we use rsync --delete, we can omit the step that deletes the previous deployment. In fact, the initial SSH call is no longer necessary, but we will keep it for debugging and transparency.

Another option worth mentioning is --info=progress2, which is very convenient in a live terminal session because it displays the current transfer progress in a concise way. This provides reassurance that the network connection is active and the transfer is progressing.

https://github.com/nemanjam/nemanjam.github.io/blob/main/scripts/deploy-nginx.sh

#!/bin/bash

LOCAL_PATH="./dist"
# REMOTE_PATH="~/traefik-proxy/apps/nmc-nginx-with-volume/website"
# REMOTE_HOST="arm1"

REMOTE_PATH=$1
REMOTE_HOST=$2

# Check if all arguments are provided
if [[ -z "$REMOTE_PATH" || -z "$REMOTE_HOST" ]]; then
  echo "Incorrect args, usage: $0 <remote_path> <remote_host>"
  exit 1
fi

# Navigate to the website folder on the remote server and clear contents of the website folder
ssh $REMOTE_HOST "cd $REMOTE_PATH && \
                  echo 'Listing files before clearing:' && \
                  echo 'List before clearing:' && \
                  ls && \
                  echo 'Count before clearing:' && \
                  ls -l | grep -v ^l | wc -l && \

                  # Only possible to skip with rsync --delete

                  # echo 'Clearing contents of the folder...' && \
                  # rm -rf * && \
                  # echo 'List after clearing:' && \
                  # ls && \
                  # echo 'Count after clearing:' && \
                  # find . -type f | wc -l && \

                  echo 'Copying new contents...'"

# Using rsync, fastest, resumable, deletes without clearing, lot faster with reusing unchanged files (--delete)
rsync -az --delete --info=stats2,progress2 $LOCAL_PATH/ $REMOTE_HOST:$REMOTE_PATH

# List all files after copying
ssh $REMOTE_HOST "cd $REMOTE_PATH && \
                  echo 'List after copying:' && \
                  ls && \
                  echo 'Count after copying:' && \
                  find . -type f | wc -l"

Github Actions

The workflow implements the same logic using the Burnett01/rsync-deployments action. Since this is not a live terminal session, --info=stats2 is sufficient for logging.

Unless you are actively debugging, avoid using the rsync -v flag, as overly verbose logs reduce readability.

https://github.com/nemanjam/nemanjam.github.io/blob/main/.github/workflows/default__deploy-nginx-rsync.yml

name: Deploy Nginx rsync

on:
  push:
    branches:
      - 'main'
    tags:
      - 'v[0-9]+.[0-9]+.[0-9]+'
  pull_request:
    branches:
      - 'disabled-main'
  workflow_dispatch:

env:
  SITE_URL: 'https://nemanjamitic.com'
  PLAUSIBLE_SCRIPT_URL: 'https://plausible.arm1.nemanjamitic.com/js/script.js'
  PLAUSIBLE_DOMAIN: 'nemanjamitic.com'

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - name: Print commit id, message and tag
        run: |
          git show -s --format='%h %s'
          echo "github.ref -> ${{ github.ref }}"

      - name: Set up Node.js and pnpm
        uses: actions/setup-node@v4
        with:
          node-version: 24.13.0
          registry-url: 'https://registry.npmjs.org'

      - name: Install pnpm
        uses: pnpm/action-setup@v4
        with:
          version: 10.30.1

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Build nemanjamiticcom
        run: pnpm build

      - name: Deploy dist via rsync
        uses: burnett01/rsync-deployments@v8
        with:
          switches: -az --delete --info=stats2
          path: dist/
          remote_path: /home/ubuntu/traefik-proxy/apps/nmc-nginx-with-volume/website/
          remote_host: ${{ secrets.REMOTE_HOST }}
          remote_user: ${{ secrets.REMOTE_USERNAME }}
          remote_port: ${{ secrets.REMOTE_PORT }}
          remote_key: ${{ secrets.REMOTE_KEY_ED25519 }}

Performance comparison

For Bash script measurements, I used my local network to deploy to a Raspberry Pi server. I used 1 Gbps Ethernet and 5 GHz, 433 Mbps WiFi 5. For Github Actions workflows, I used the standard Github runners available on the free plan. For each case, I took a few measurements to eliminate random anomalies. I didn’t aim for statistical accuracy.

For deployment, I used this very static Astro website you are currently reading. Its build artifacts consist of 1320 files totaling 347 MB (it contains a number of images).

Method	Network	Transfer strategy	Files sent	Data transferred	Time (s)
Bash + scp	LAN (Ethernet)	Copy all files every deployment	1320	347 MB	9.6
Bash + tar over SSH	LAN (Ethernet)	Archive then copy single file	1	347 MB	4.1
Bash + rsync (cleared)	LAN (Ethernet)	Full file synchronization	1320	347 MB	4.6
Bash + rsync	LAN (Ethernet)	Incremental file synchronization	x	24 MB	8.8
Bash + scp	LAN (WiFi 5)	Copy all files every deployment	1320	347 MB	188
Bash + tar over SSH	LAN (WiFi 5)	Archive then copy single file	1	347 MB	16.6
Bash + rsync (cleared)	LAN (WiFi 5)	Full file synchronization	1320	347 MB	15.1
Bash + rsync	LAN (WiFi 5)	Incremental file synchronization	x	24 MB	14.6
GA + scp	Internet	Copy all files every deployment	1320	347 MB	43
GA + tar over SSH	Internet	Archive then copy single file	1	347 MB	32
GA + rsync (cleared)	Internet	Full file synchronization	1320	347 MB	29
GA + rsync	Internet	Incremental file synchronization	x	24 MB	10

Results discussion

Let’s comment on the results, starting from the worst option:

scp has the worst performance in every case (Ethernet (9.6 s), WiFi (188 s), Github Actions (43 s)). The WiFi result is especially bad (188 seconds). I don’t have an exact explanation, but the WiFi connection probably doesn’t handle a large number of files well.
tar + SSH has decent performance (Ethernet (4.1 s), WiFi (16.6 s), Github Actions (32 s)), considering that it clears the destination and transfers all files every time. Interestingly, on Ethernet it even performs 2× better (4.1 s) than rsync (with synchronization enabled) (8.8 s). I explain this by the fact that hashing and comparing files in rsync can cost more than the file transfer itself on a stable, wired Ethernet connection.
rsync (cleared) (delete the destination and transfer everything each time) is on par with tar + SSH (Ethernet (4.6 s), WiFi (15.1 s), Github Actions (29 s)). This makes sense because those two methods are basically doing the same thing.
rsync (synchronization enabled) overall has the best performance (Ethernet (8.8 s), WiFi (14.6 s), Github Actions (10 s)), with the exception of Ethernet, which I already explained (hashing and file comparison can cost more than network transfer). The Github Actions result (10 s) is especially important, since CI is the most common way to deploy apps in practice. It also creates around 14× less network traffic (24 MB compared to 347 MB).

Meaning, they rank in the following order (from best to worst):

rsync
rsync (cleared) and tar + SSH (equally fast)
scp (worst in every scenario)

Key takeaway: In Github Actions, rsync saves 43 - 10 = 33 seconds on each run compared to scp, which is a significant improvement.

Deployment process and Amdahl’s law

Transferring files is just one of the steps within the deployment process. It is not even the most dominant one. If we look at the times for each step in the Github Actions default__deploy-nginx-rsync.yml workflow, we can see the following:

Set up job                                    2s
Build burnett01/rsync-deployments@v8          9s
Checkout code                                18s
Print commit id, message and tag              0s
Set up Node.js and pnpm                       5s
Install pnpm                                  1s
Install dependencies                          6s
Build nemanjamiticcom                     2m 25s
Deploy dist via rsync                        10s
Post Install pnpm                             0s
Post Set up Node.js and pnpm                  0s
Post Checkout code                            0s
Complete job                                  0s

Deploy dist via rsync is third on the list with 10 seconds. Checkout code is second with 18 seconds. That step already has the fetch-depth: 1 optimization; the repository simply has a large file size. The app build step Build nemanjamiticcom obviously takes the most time and has the greatest potential for optimizing performance and saving time. Although obvious, this fact is also formally articulated by Amdahl’s law, which states:

The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used.

However, the app’s build step is also the most complex to optimize. It spans the app code implementation, build configuration, and caching on both Vite and Github Actions levels. Naturally, it is largely app-dependent and more challenging to generalize.

If I look at the Astro build log, I can see this:

/_astro/snow1.DTiId6LS_Z2cIRXL.webp (reused cache entry) (+2ms) (1044/1101)

This image, snow1.DTiId6LS_Z2cIRXL.webp, has the exact same name in each build and is cached and reused, which drastically improves performance.

On the other hand, in the build log I can also see:

λ src/pages/api/open-graph/[...route].png.ts
  ├─ /api/open-graph/blog/2026-01-03-nextjs-server-actions-fastapi-openapi.png (+1.42s)

This is an Open Graph image generated using a Satori HTML template, and it is regenerated from scratch on each build. I can see two problems with this:

Generating gradient colors in src/utils/gradients.ts uses Math.random(), which makes the generation non-deterministic. Instead, the gradient should use a pseudo-random, deterministic approach, for example by hashing the page title string.
The image snow1.DTiId6LS_Z2cIRXL.webp is originally placed inside the src directory, which registers it as an Astro asset. As a result, Astro handles compression, naming, and caching during the build process. This is not the case with the src/pages/api/open-graph/[...route].png.ts static route and the Satori template; additional configuration would be required to enable caching.

Anyway, that is a separate topic for a completely different article. In this one, we focus on the file transfer step, which can be optimized with minimal effort - simply by replacing a single command.

Completed code

Repository: https://github.com/nemanjam/nemanjam.github.io

The relevant files:

git clone git@github.com:nemanjam/nemanjam.github.io.git

# Bash
scripts/deploy-nginx.sh

# Github Actions
.github/workflows/default__deploy-nginx-scp.yml
.github/workflows/default__deploy-nginx-rsync.yml

Conclusion

You might think, “This is a pretty long and verbose article for something that could be explained in two sentences” and you would probably be right. However, besides the main rsync vs scp point, I wanted to provide a drop-in script and workflow that you can reuse with minimal changes, just adjust the environment variables, build command, and deployment paths.

Additionally, real-world measurements help provide a realistic sense of how significant the performance improvements can be.

What methods do you use to optimize the deployment process in your projects? Let me know in the comments.

References

Rsync repository https://github.com/RsyncProject/rsync
Rsync manual, options https://download.samba.org/pub/rsync/rsync.1#OPTION_SUMMARY
Rsync Github Action https://github.com/Burnett01/rsync-deployments
SSH Github Action https://github.com/appleboy/ssh-action
SCP Github Action https://github.com/appleboy/scp-action
Amdahl’s law https://en.wikipedia.org/wiki/Amdahl%27s_law

Automating the deployment of a static website to Vercel with Github Actions

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Thu, 26 Feb 2026 00:00:00 GMT

Introduction

This article focuses specifically on deploying static websites to Vercel. In a previous article https://nemanjamitic.com/blog/2026-02-22-vercel-deploy-fastapi-nextjs, we covered in detail how to deploy a full-stack application using the Vercel CLI from a local development environment. This time, we will use the same CLI inside a Github Actions runner to automate redeploying a static website on every push, for example, after adding a new blog article in markdown.

As an example, we will deploy the same blog website you are currently reading. The site itself is a statically built Astro application.

Vercel Github integration vs Github Actions

Vercel supports deployments through a Github integration (documented here: https://vercel.com/docs/git/vercel-for-github). You provide Vercel with your Github repository URL and read access, and Vercel automatically redeploys your application on every push. If you prefer not to grant Vercel access to your source code or Github repository, or if you want more control over the deployment process, you can instead use Github Actions, the approach described in this article.

Vercel configuration files

As with any Vercel deployment, you need to provide Vercel with additional information about the project’s build process, such as the framework, build command, and output directory, as well as which files should be included or ignored during deployment.

Before adding any configuration files, go to your Vercel dashboard, create a new project, give it a name, and set all required environment variables.

vercel.json

The contents of the vercel.json file are mostly self-explanatory. We specify the astro framework, and the build command and output directory match those used in the local development environment. With this configuration, Vercel knows exactly how to build the application.

https://github.com/nemanjam/nemanjam.github.io/blob/main/vercel.json

{
  "framework": "astro",
  "buildCommand": "pnpm build",
  "outputDirectory": "dist",
  "cleanUrls": true,
  "trailingSlash": false
}

.vercelignore

For performance reasons, it is important to avoid uploading files that are not used during the build and deployment process, such as dependencies, .env* files, documentation, or Docker-related configuration. The .vercelignore file is used to exclude these unnecessary files. Additionally, on the free tier, your deployment must stay below the 250 MB size limit.

https://github.com/nemanjam/nemanjam.github.io/blob/main/.vercelignore

# Node / package managers
node_modules
.pnpm-store
.npm
.yarn

# ! Needed for commit info
# Vercel omits it by default, now way to upload it
# .git
# .gitignore

# Local env files
.env
.env.*
!.env.*example

# Logs
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Docker & tooling
docker/
scripts/

# Documentation & notes
docs/

# OS / editor junk
.DS_Store
.idea
.vscode

# Astro build cache
.astro/*
# Keep types if build needs them
!.astro/types.d.ts

# Github
.github/

The exact contents of this file depend on your specific project. To ensure you have excluded all unnecessary paths, go to your Vercel dashboard and navigate to My Project -> My Deployment -> Source, where you can clearly see exactly which files are uploaded.

Github Actions workflow

Once again, go to your Vercel dashboard and create an access token in your account settings. Add this token as the VERCEL_TOKEN Github repository secret. Then, in your Vercel project settings, copy your user (organization) ID and project ID and add them as the VERCEL_ORG_ID and VERCEL_PROJECT_ID Github repository secrets.

With this setup, Github is aware of your Vercel project, and NOT the other way around. Vercel only receives the compiled application artifacts and has no access to your Github repository or source code.

https://github.com/nemanjam/nemanjam.github.io/blob/main/.github/workflows/vercel__deploy-manual.yml

name: Deploy to Vercel manually

# Docs example: https://vercel.com/kb/guide/how-can-i-use-github-actions-with-vercel

on:
  push:
    branches:
      - 'main'
    tags:
      - 'v[0-9]+.[0-9]+.[0-9]+'

  pull_request:
    branches:
      - 'disabled-main'

  workflow_dispatch:

permissions:
  contents: read

env:
  # Project vars
  # Redundant, vercel pull will define them
  # SITE_URL: 'https://nemanjam.vercel.app'
  # PLAUSIBLE_DOMAIN: 'nemanjamitic.com'
  # PLAUSIBLE_SCRIPT_URL: 'https://plausible.arm1.nemanjamitic.com/js/script.js'

  # Vercel vars
  VERCEL_ORG_ID: ${{ secrets.VERCEL_ORG_ID }} # user id
  VERCEL_PROJECT_ID: ${{ secrets.VERCEL_PROJECT_ID }}

jobs:
  deploy-vercel:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - name: Print commit id and message
        run: |
          git show -s --format='%h %s'
          echo "github.ref -> ${{ github.ref }}"

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 24.13.0
          registry-url: 'https://registry.npmjs.org'

      - name: Install pnpm
        uses: pnpm/action-setup@v4
        with:
          version: 10.30.1

      - name: Install Vercel CLI
        run: pnpm add -g vercel

      - name: Pull Vercel production environment variables
        run: vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}

      - name: Build project using Vercel
        run: vercel build --prod --token=${{ secrets.VERCEL_TOKEN }}

      - name: Deploy prebuilt project to Vercel
        run: vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}

Repository secrets

Vercel’s official tutorial already does a good job of explaining the basics and provides a solid starting workflow file: https://vercel.com/kb/guide/how-can-i-use-github-actions-with-vercel. In this article, we will focus on a specific use case: deploying a static website.

Let’s start by explaining the Github repository secrets used in this workflow:

VERCEL_TOKEN - an access token that the Github Actions runner uses to authenticate with Vercel and create a deployment
VERCEL_ORG_ID - a Github user or organization ID that identifies who owns the deployment
VERCEL_PROJECT_ID - identifies the Vercel project being deployed

The VERCEL_ORG_ID and VERCEL_PROJECT_ID values are passed as environment variables and are defined at the workflow level, making them available to all jobs. The VERCEL_TOKEN is passed to individual commands as a command-line argument.

Set up Node.js and Vercel CLI

The first part of the workflow is standard and straightforward. We simply check out the repository (fetch-depth: 1 to fetch only the latest commit for speed), then install Node.js, pnpm, and the Vercel CLI. These steps set up the prerequisites needed to build and deploy the project in the following steps.

Environment variables

Here we are referring to your project’s environment variables. Since we are deploying a fully static website, all environment variables are strictly build-time variables, as explained here: https://nemanjamitic.com/blog/2025-12-21-static-website-runtime-environment-variables. The Vercel target environment does not need to define any variables because they are inlined during the build, immutable, and ignored afterward. This also means the build artifacts are specific to the environment they were built for.

Although variables in the target environment are ignored at runtime, it is still a good practice to define them in the Vercel dashboard and use Vercel as the single source of truth for your deployment. This allows you to easily pull them into the Github Actions runner using: vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}

The --environment=production flag selects the production environment. To deploy to preview environments, you can create a separate workflow .yml file triggered by feature branches (any branch other than main) and use vercel pull with the --environment=preview option to fetch the corresponding variables.

on:
  push:
    branches-ignore:
      - main

Note: You can define your project’s environment variables using the env: key at the workflow or job level, but this is generally not recommended. Doing so will lead to conflicts with variables pulled via vercel pull and issues with overriding priority, unless you are confident in managing them. Relying exclusively on variables from vercel pull ensures clarity and simplicity.

Building and deploying

At this point, we are ready to build the project using: vercel build --prod --token=${{ secrets.VERCEL_TOKEN }}. This command generates the application artifacts in the output folder specified in vercel.json, all within the Github Actions runner. After that, the Vercel CLI copies the framework’s output folder (defined in vercel.json) inside the .vercel/output folder, creating a deployment-ready package that can be uploaded directly to Vercel.

The final step is to upload the deployment-ready package inside the .vercel/output folder to Vercel using: vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}. The --prebuilt option tells Vercel to skip the build step since the application has already been built in the Github Actions runner.

That’s it. Add the shown vercel.json, .vercelignore, and .github/workflows/vercel__deploy-manual.yml files to your repository, then run git push to trigger the workflow. Once it completes, you can view your website at <your-project-name>.vercel.app.

Completed code

Repository: https://github.com/nemanjam/nemanjam.github.io
Demo: https://nemanjam.vercel.app

The relevant files:

git clone git@github.com:nemanjam/nemanjam.github.io.git

# Files
.github/workflows/vercel__deploy-manual.yml
vercel.json
.vercelignore

.dockerignore
.gitignore

# Vercel configuration and workflow in a clear diff
https://github.com/nemanjam/nemanjam.github.io/commit/c0d6c6739b3215a6841a463115ec5242ea76e492

Conclusion

CI/CD workflows are the standard way to handle deployments, and deploying to Vercel is no exception. By combining Github Actions with the Vercel CLI, you can implement a fully automated deployment pipeline with just a few lines of configuration.

This approach gives you complete control over the build and deployment process while keeping your source code private and your security model explicit. Once in place, deployments become predictable, repeatable, and hands-off.

How do you automate deployments to Vercel in your projects? Let me know in the comments.

References

Github Actions with Vercel, Vercel official tutorial https://vercel.com/kb/guide/how-can-i-use-github-actions-with-vercel
Astro on Vercel, Vercel docs https://vercel.com/docs/frameworks/frontend/astro
Github integration, Vercel docs https://vercel.com/docs/git/vercel-for-github
vercel deploy --prebuilt option, Vercel docs https://vercel.com/docs/cli/deploy#prebuilt

Deploying a FastAPI and Next.js website to Vercel

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sun, 22 Feb 2026 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’; import DeploymentDiagramImage from ’../../../../content/post/2026/02-22-vercel-deploy-fastapi-nextjs/_images/vercel-fastapi-nextjs.png’; import FrontendScreenshotImage from ’../../../../content/post/2026/02-22-vercel-deploy-fastapi-nextjs/_images/frontend-screenshot-1200x630.png’; import BackendScreenshotImage from ’../../../../content/post/2026/02-22-vercel-deploy-fastapi-nextjs/_images/backend-screenshot-1200x630.png’; import VercelButtonWizardScreenshotImage from ’../../../../content/post/2026/02-22-vercel-deploy-fastapi-nextjs/_images/vercel-button-wizard-screenshot.png’;

Introduction

This article is a practical guide that presents a single, proven approach for deploying a FastAPI and Next.js application on Vercel. It is not intended to be a comprehensive, in depth overview of all Vercel features or deployment options. The main idea is to show how to host demo apps on Vercel for free.

Python is one of the runtimes with first-class support on Vercel, as stated in the documentation: https://vercel.com/docs/functions/runtimes. Additionally, FastAPI is an officially supported backend framework and is documented here: https://vercel.com/docs/frameworks/backend/fastapi.

Naturally, Next.js is developed by Vercel and has full, native support on the platform. Given all of this, deploying a full-stack FastAPI and Next.js application on Vercel is entirely viable, and we can confidently proceed with the configuration and deployment.

Project structure prerequisites

There are certain constraints on how a FastAPI and Next.js project must be structured and configured in order to be deployable on Vercel. When deploying with Docker, we typically use separate containers for the backend and frontend, since a container is designed to run a single process. Similarly, on Vercel we will use two separate deployments, one for the backend and one for the frontend, and connect them using the SITE_URL and API_URL environment variables.

In general, a full-stack application should be designed without making assumptions about the values of SITE_URL and API_URL, allowing the backend and frontend to be hosted independently on any arbitrary domains.

This is especially important for authentication and cookies, since the frontend and backend will use different *.vercel.app domains. Because these domains are included in the public suffix list (https://publicsuffix.org/list), cookie behavior cannot be fully controlled across them.

One effective solution is to enforce a clean separation of concerns: use FastAPI for authentication logic, and rely on Next.js API routes or server actions for setting and unsetting cookies. I covered this approach in detail in a previous article: https://nemanjamitic.com/blog/2026-02-07-github-login-fastapi-nextjs#architecture-overview.

Note: If your project originally hosts the backend and frontend on the same domain and relies on a reverse proxy such as Nginx or Traefik for routing, you will need to reconfigure this setup, as Vercel does not provide that level of routing control.

Configuration

Vercel offers two basic ways to deploy a project: 1. using the CLI to deploy from a local machine, and 2. deploying from a repository URL, such as Github or Gitlab. There are also several variations of these approaches, including the “Vercel button” which launches a setup wizard from a repository URL, and Github Actions, where the CLI is used inside a workflow runner.

In this tutorial, we will use the CLI approach and deploy both the backend and frontend directly from our local development machine.

We will begin by installing the Vercel CLI globally on our local machine and then logging in with our Vercel account.

# Install Vercel CLI
pnpm install -g vercel

# Log in to Vercel
vercel login

Configuring the FastAPI backend

vercel.json

First, we need to expose a FastAPI entry point in a way that can be consumed by a Vercel serverless function.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/backend/app/api/index.py

# Required by vercel.json to locate the FastAPI `app` instance.
# `# noqa: F401` prevents formatters from removing this import.

from app.main import app  # noqa: F401

Then we can use it in vercel.json to define the build path for the serverless function and the root request handler for HTTP requests.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/backend/vercel.json

{
  "builds": [
    {
      "src": "app/api/index.py",
      "use": "@vercel/python"
    }
  ],
  "routes": [
    {
      "src": "/(.*)",
      "dest": "app/api/index.py"
    }
  ]
}

.vercelignore

Next, we need to define the backend’s root project directory and specify which files should be included or excluded from the serverless function. This step is important to avoid unnecessary bloat, as it affects both performance and the 250 MB size limit for Vercel functions on the free plan. By default, the root project directory is the folder where you run the vercel CLI command, though you can override it during the deployment prompt.

The .vercelignore file is used to exclude files from being uploaded during deployment. It should be placed inside the root project directory. If you already have a properly configured .dockerignore, these two files are basically the same. Below is my .vercelignore configuration:

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/backend/.vercelignore

# Python
.venv/
__pycache__/
.ruff_cache/
.mypy_cache/
.pytest_cache/
app.egg-info
*.pyc

# Tests
.coverage
htmlcov

Dockerfile*

# ignore database
!data/database
data/database/*
!data/database/.gitkeep

You can also verify and debug any redundant files in the Vercel dashboard under My Project -> My Deployment -> Source. I highly recommend performing this check to ensure that no unnecessary bloat is included in the serverless function.

Additionally, add the backend/.vercel/ directory to your .gitignore file. This directory contains local Vercel configuration and should not be committed to Git.

# Local Vercel configuration
.vercel/

Environment variables

Your application already relies on certain environment variables that are configured in a specific way. However, their names and formats may not fully align with the predefined variables available in the Vercel environment, so some adaptation is usually required. In my case, this involved the following changes:

I had an ENVIRONMENT variable that needed to be derived from the predefined VERCEL_ENV variable and then reassigned using a Pydantic model validator.
For the database configuration, I originally used five separate variables: POSTGRES_SERVER, POSTGRES_PORT, POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB. These were replaced with a single DATABASE_URL variable, which is exposed by Neon by default. This value can then be piped into a SQLALCHEMY_DATABASE_URI computed property.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/backend/app/core/config.py


# Vercel default vars
VERCEL_ENV: str | None = None

# ...

# for Vercel and Neon
DATABASE_URL: PostgresDsn | None = None

# Local / Docker fallback
POSTGRES_SERVER: str | None = None
POSTGRES_PORT: int = 5432
POSTGRES_USER: str | None = None
POSTGRES_PASSWORD: str | None = None
POSTGRES_DB: str | None = None

# ...

@computed_field  # type: ignore[prop-decorator]
@property
def SQLALCHEMY_DATABASE_URI(self) -> PostgresDsn:
    # Vercel + Neon
    if self.DATABASE_URL:
        database_url = str(self.DATABASE_URL)
        # Force SQLAlchemy to use psycopg v3 on Vercel (Neon provides postgresql:// by default)
        if database_url.startswith("postgresql://"):
            database_url = database_url.replace(
                "postgresql://", "postgresql+psycopg://"
            )
        return database_url

    # Local / Docker
    if not all(
        [
            self.POSTGRES_SERVER,
            self.POSTGRES_USER,
            self.POSTGRES_PASSWORD,
            self.POSTGRES_DB,
        ]
    ):
        raise ValueError(
            "Either DATABASE_URL (Vercel/Neon) or POSTGRES_* variables must be set"
        )

    return MultiHostUrl.build(
        scheme="postgresql+psycopg",
        username=self.POSTGRES_USER,
        password=self.POSTGRES_PASSWORD,
        host=self.POSTGRES_SERVER,
        port=self.POSTGRES_PORT,
        path=self.POSTGRES_DB,
    )

# ...

@model_validator(mode="after")
def resolve_environment(self) -> Self:
    if self.VERCEL_ENV == "production":
        self.ENVIRONMENT = "production"
    elif self.VERCEL_ENV == "preview":
        self.ENVIRONMENT = "staging"
    # else: keep whatever ENVIRONMENT was set from OS/.env
    return self

As already noted, SITE_URL is a particularly important variable, as it is the primary way to inform the backend about the domain on which the frontend is hosted. It is used throughout the backend for CORS whitelisting, redirects, constructing absolute URLs, etc.

You may not have this value available during the initial backend deployment. In that case, you can temporarily set a placeholder value to satisfy Pydantic validation. Once the actual frontend URL is known, update the variable via the Vercel dashboard or CLI and redeploy the application for the change to take effect.

Below is the full list of required and optional environment variables used by this backend application:

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/.env.vercel.example

# ------------ Required vars --------------

# Frontend url
# Used by the backend to generate links in emails to the frontend
SITE_URL=my-frontend-url.vercel.app

# Auth
JWT_SECRET_KEY=my-secret
SESSION_SECRET_KEY=my-secret

# Superuser email and password
FIRST_SUPERUSER=admin@example.com
FIRST_SUPERUSER_PASSWORD=password

# Postgres database, e.g. Neon
# Format: postgresql://<username>:<password>@<host>/<database>?<query>
# Neon example: postgresql://neondb_owner:npg_someHash@ep-solitary-moon-some-hash-pooler.c-3.us-east-1.aws.neon.tech/neondb?sslmode=require
DATABASE_URL=

# ------------ Optional vars --------------

# Used in email templates and OpenAPI docs
PROJECT_NAME="Full stack FastAPI template Next.js"

# Whitelisted frontend urls
# SITE_URL is included by default
BACKEND_CORS_ORIGINS="http://localhost,https://localhost,http://localhost:3000,https://localhost:3000,http://localhost:3001,https://localhost:3001,https://my-frontend-url.vercel.app"

# Environment: local, staging, production
# If omitted defaults to VERCEL_ENV
ENVIRONMENT=production

# If omitted defaults to 7 days = 24 * 7 = 168 hours
ACCESS_TOKEN_EXPIRE_HOURS=168

# Github OAuth id and secret
# Only a single deployment (callback url) per Github app is possible
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=

# Postgres database, e.g. Neon
# Supply either all POSTGRES_* variables or a single DATABASE_URL
# If both are defined DATABASE_URL has precedence
POSTGRES_SERVER=localhost
POSTGRES_PORT=5432
POSTGRES_DB=app
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password

Database migrations and seed

At this point, the only remaining step is to obtain a valid DATABASE_URL that points to a migrated, seeded, and running database instance. You can use any cloud or self-hosted PostgreSQL instance, as long as it is accessible over the internet. In this case, we will use Neon, since it is an officially supported PostgreSQL integration on Vercel.

Create an account at https://neon.com, then create a new project and a new database. Next, connect the database to your Vercel project using the Neon integration and use the default production database branch. When the integration is added, Neon will automatically expose certain environment variables to your Vercel deployment, which you can customize in the Neon integration settings. The DATABASE_URL variable is exposed by default, and since we configured the backend to use it in the previous step, no further action is required at this stage.

At this point, the backend can connect to the database, but the database is still empty. We now need to run migrations to create the tables and seed the initial data.

Note: You might be tempted to automate database migrations and seeding (especially for demo apps) by running them on application startup. However, since Vercel is a serverless environment, there is no single, well-defined application start event (unlike a VPS or Docker-based setup). Instead, multiple instances can start independently, which would cause migrations and seeds to run multiple times in an unpredictable manner. This can lead to serious issues. For this reason, database migrations and initial seeding should be performed as a single, manual step.

We will use our local development environment, where the application is already fully installed and configured, to run the migrations and seed the data. In the local .env file, simply replace the local development database connection with the remote production database connection that is already used by the Vercel deployment.

Also, make sure to check out the vercel-deploy branch locally, as only that branch is configured to handle the DATABASE_URL variable correctly.

# Checkout vercel-deploy branch that has Vercel configuration (backend/vercel.json, backend/.vercelignore, backend/app/api/index.py, modified backend/app/core/config.py)
git checkout vercel-deploy


# Comment out local Postgres database
# POSTGRES_SERVER=localhost
# POSTGRES_PORT=5433
# POSTGRES_DB=app
# POSTGRES_USER=postgres
# POSTGRES_PASSWORD=password

# Neon database url example used on Vercel:
DATABASE_URL=postgresql://neondb_owner:npg_some-slug@ep-rough-cherry-some-slug-pooler.c-3.us-east-1.aws.neon.tech/neondb?sslmode=require

Now we can run the migrations and seed the Neon remote database. Make sure that all backend dependencies are installed and that your Python virtual environment is activated. Be patient and allow the command to complete, as this process can take a few minutes.

# From /backend
cd ./backend

# Create virtual environment
uv venv

# Activate the environment
source .venv/bin/activate

# Install dependencies
uv sync

# Migration and seed scripts need activated venv and Python dependencies

# Await db, run migrations and seed (must have .env)
# With a remote Neon database, this command can take a few minutes to complete
# Be patient and do not interrupt it
bash scripts/prestart.sh

Open the Neon dashboard and verify that the user and item tables have been created and populated with the seed data. At this point, the backend is fully configured and ready to be deployed using the Vercel CLI.

Deploying backend from terminal

First, install the vercel CLI and log in with your Vercel account. Then navigate to the backend directory and start the deployment wizard by running the vercel --prod command. The wizard will prompt you to:

Link an existing Vercel project or create and name a new one (this determines the app’s public URL)
Select the project root directory - choose the current directory (./)
Set environment variables - this step can be skipped for now
Modify the build configuration - select “No”

# Install Vercel CLI
pnpm install -g vercel

# Log in to Vercel
vercel login

# Navigate to the backend folder
cd backend

# Deploy for the first time (production)
# Fill prompts for name, root directory `./` (vercel.json dir)
vercel --prod

# Add required environment variables (production) (after the wizard completes)
echo "Full stack FastAPI template Next.js" | vercel env add PROJECT_NAME production
echo "https://my-frontend-url.vercel.app" | vercel env add SITE_URL production
echo "my-secret" | vercel env add JWT_SECRET_KEY production
echo "my-secret" | vercel env add SESSION_SECRET_KEY production
echo "admin@example.com" | vercel env add FIRST_SUPERUSER production
echo "password" | vercel env add FIRST_SUPERUSER_PASSWORD production
echo "postgresql://user:pass@host/db" | vercel env add DATABASE_URL production
# Set more optional variables...

# List existing environment variables
vercel env ls

# Redeploy after changes
vercel --prod  # production

# After deploy, the CLI outputs the URL
# Example: https://api-full-stack-fastapi-template-nextjs-my-slug.vercel.app

# Debug deployment
vercel inspect https://api-full-stack-fastapi-template-nextjs-my-slug.vercel.app --json

Treat the initial deployment as disposable. At this stage, the frontend is likely not deployed yet, which means the SITE_URL value is not available. You can temporarily set a placeholder URL that satisfies Pydantic validation. Once the frontend is deployed, update SITE_URL with the actual value using the CLI (as shown earlier) or via the Vercel dashboard (Project -> Settings -> Environment Variables).

The same approach applies to other environment variables, such as GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET. After updating any environment variable, you must redeploy the project for the changes to take effect.

# Update (remove and add) an existing env var SITE_URL
vercel env rm SITE_URL production --yes
echo "https://my-new-frontend-url.vercel.app" | vercel env add SITE_URL production

Once the deployment wizard completes successfully, you can verify the result by accessing the FastAPI backend at the URL printed in the terminal, for example: https://api-full-stack-fastapi-template-nextjs-my-slug.vercel.app/docs.

This will display the OpenAPI UI. The exact URL depends on how you named the project. You can view all associated URLs in the Vercel dashboard under the deployment settings.

Configuring the Next.js frontend

Deploying the Next.js application is much simpler, as it requires fewer environment variables and no database. However, there are still a few details to keep in mind, which we will cover here.

vercel.json

We use a vercel.json file to specify the framework, install and build commands, and the output directory. These settings mirror the local development environment and are pretty self-explanatory.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/apps/web/vercel.json

{
  "framework": "nextjs",
  "installCommand": "pnpm install",
  "buildCommand": "pnpm turbo run build --filter=web",
  "outputDirectory": ".next"
}

Note that our Next.js application uses a monorepo setup with Turbo, which has a few implications:

The vercel.json file is NOT placed at the monorepo root (frontend/). Instead, it lives in the Next.js application directory at frontend/apps/web/vercel.json. Despite this, the Vercel project root directory IS the monorepo root (frontend/), since all packages and source files must be uploaded in order to build the application successfully. This is the correct approach for deploying monorepo projects to Vercel.
The .vercelignore file should also be placed in the monorepo root directory (frontend/).

.vercelignore

As with the backend, we need to ignore all unused local files during deployment to prevent unnecessary uploads, performance degradation, and excess bloat, and to stay below the 250 MB limit for a serverless function. Once again, it is a good idea to verify that nothing was missed by checking My Project -> My Deployment -> Source in the Vercel dashboard.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/.vercelignore

# Node.js
**/node_modules/
**/.next/
**/.turbo/
**/dist/

# Ignore all env files, env vars are passed into container explicitly
**/.env*

# Tests
**/test-results/
**/playwright-report/
**/blob-report/
**/playwright/.cache/

Similarly, add the frontend/.vercel/ directory to your frontend .gitignore file. This directory contains local Vercel configuration and should not be committed to Git.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/.gitignore

# Local Vercel configuration
.vercel/

Environment variables

The frontend uses far fewer environment variables, only two in fact: API_URL, which points to the backend URL, and SITE_URL, which represents the frontend’s own URL. The value of SITE_URL can be derived from the predefined VERCEL_PROJECT_PRODUCTION_URL variable exposed by Vercel.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/apps/web/.env.vercel.example

# ------------ Required vars --------------

# Backend url
API_URL=https://my-backend-url.vercel.app

# ------------ Optional vars --------------

# Frontend url
# If omitted defaults to https:// + VERCEL_PROJECT_PRODUCTION_URL
SITE_URL=https://my-frontend-url.vercel.app

With this in mind, we need to include VERCEL_PROJECT_PRODUCTION_URL in the Next.js environment variable handling logic.

Set it as a fallback value in the next-public-env schema.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/apps/web/src/config/process-env.ts

export const { getPublicEnv, PublicEnv } = createPublicEnv(
  {
    NODE_ENV: process.env.NODE_ENV,
    SITE_URL: process.env.SITE_URL || `https://${process.env.VERCEL_PROJECT_PRODUCTION_URL}`,
    API_URL: process.env.API_URL,
  },
  { schema: (z) => getProcessEnvSchemaProps(z) }

Include it in the process.env type.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/apps/web/src/env.d.ts

declare namespace NodeJS {
  interface ProcessEnv {
    readonly NODE_ENV: 'development' | 'production' | 'test';
    readonly SITE_URL: string;
    readonly API_URL: string;
    readonly VERCEL_PROJECT_PRODUCTION_URL: string;
  }
}

Notify Turborepo.

https://github.com/nemanjam/full-stack-fastapi-template-nextjs/blob/vercel-deploy/frontend/turbo.json

{
  "$schema": "https://turbo.build/schema.json",
  "ui": "tui",
  "globalEnv": ["NODE_ENV", "SITE_URL", "API_URL", "NEXT_RUNTIME", "VERCEL_PROJECT_PRODUCTION_URL"],
  
  // ...

}

That’s it. Our Next.js frontend is ready for deployment.

Deploying frontend from terminal

Deploying the frontend is very similar to deploying the backend, with the main difference being the selection of the project root directory, since we are deploying a monorepo. Navigate to the frontend directory and start the deployment wizard by running the vercel --prod command. The wizard will prompt you to:

Link an existing Vercel project or create and name a new one (this determines the app’s public URL)
Select the project root directory - choose the Next.js app directory (./apps/web/)
Set environment variables - this step can be skipped for now
Modify the build configuration - select “No”

# Install Vercel CLI
pnpm install -g vercel

# Log in to Vercel
vercel login

# Navigate to the frontend folder
cd frontend

# Deploy for the first time (production)
# Fill prompts for name, root directory `./apps/web/` (vercel.json dir)
vercel --prod

# Set required environment variables (production)
echo "https://api-full-stack-fastapi-template-nextjs-my-slug.vercel.app" | vercel env add API_URL production
# Set more optional variables...

# List existing environment variables
vercel env ls

# Redeploy after changes
vercel --prod  # production

# After deploy, the CLI outputs the URL
# Example: https://full-stack-fastapi-template-nextjs-my-slug.vercel.app

# Debug deployment
vercel inspect https://full-stack-fastapi-template-nextjs-my-slug.vercel.app --json

Make sure to set the environment variables API_URL and SITE_URL (optional), either through the Vercel dashboard (Project -> Settings -> Environment Variables) or via the CLI. If SITE_URL is not set, it will fall back to the predefined VERCEL_PROJECT_PRODUCTION_URL variable, as explained earlier.

Remember to redeploy the project whenever you update environment variables to apply the changes.

# Update (remove and add) an existing env vars API_URL and SITE_URL

vercel env rm API_URL production --yes
echo "https://api-full-stack-fastapi-template-nextjs-my-slug.vercel.app" | vercel env add API_URL production

vercel env rm SITE_URL production --yes
echo "https://full-stack-fastapi-template-nextjs-my-slug.vercel.app" | vercel env add ITE_UR production

If everything is configured correctly, your full-stack FastAPI and Next.js application should now be fully deployed and functional at the URL displayed in the terminal: https://full-stack-fastapi-template-nextjs-my-slug.vercel.app

The exact URL will depend on the name you chose for the project. Congratulations!

Vercel button

In addition to the CLI, Vercel also allows deploying a project by simply specifying the repository URL. Naturally, the repository must be properly prepared and configured beforehand. One variation of this method is the “Vercel Deploy” button, a URL-encoded href link that points to Vercel and includes query parameters for the repository URL, environment variables, integrations, and other necessary configuration details. When clicked, it launches a setup wizard to complete any remaining deployment configuration.

Vercel also provides a utility form to generate these buttons conveniently: https://vercel.com/docs/deploy-button. The “Vercel Deploy” button can be embedded in markdown or HTML pages, allowing visitors to deploy a live demo of your project quickly and effortlessly.

With this in mind, we can define a single “Vercel Deploy” button to deploy both the backend and frontend of our FastAPI and Next.js project. It will clone a single Github repository and create two separate Vercel projects. We can then include the button’s Markdown in the project’s README.md file.

The button below specifies URLs (repository, demo, images, etc.) for this particular example, you can adjust them to match your own URLs.

Single monorepo Vercel button

This is fairly self-explanatory, but let’s clarify the query parameters for completeness:

repository-url - points to the Github repository and the vercel-deploy branch.
repository-name - the name of the cloned repository.
root-directories - specifies the backend (backend) and frontend root directories (frontend/apps/web).
monorepo - indicates that a single Git repository contains multiple Vercel projects.
totalProjects - number of Vercel projects to create.

Additional parameters include:

project-names - the names the backend and frontend Vercel projects.
demo-* - information and URLs displayed in the wizard for demo purposes.

The products parameter is particularly important, as it activates the Neon integration in the wizard to provision a new PostgreSQL database.

# URL for the "Vercel Deploy" button's href attribute

https://vercel.com/new/clone
?repository-url=https://github.com/nemanjam/full-stack-fastapi-template-nextjs/tree/vercel-deploy
&root-directories=frontend/apps/web,backend
&repository-name=full-stack-fastapi-template-with-next-js
&monorepo=1
&totalProjects=2
&project-names=full-stack-fastapi-frontend,full-stack-fastapi-backend
&demo-description=Build full-stack apps with Next.js and FastAPI.
&demo-image=https://github.com/nemanjam/full-stack-fastapi-template-nextjs/raw/main/docs/screenshots/frontend-screenshot-1200x630.png
&demo-title=Full stack FastAPI template with Next.js
&demo-url=https://full-stack-fastapi-template-nextjs.vercel.app
&skippable-integrations=1
&products=[
  {
    "type": "integration",
    "integrationSlug": "neon",
    "productSlug": "neon",
    "protocol": "storage"
  }
]

Once the URL is constructed, it should be URL-encoded and then embedded in the markdown (or HTML) as shown below.

<!-- Markdown for the button -->

[![Deploy backend to Vercel](https://vercel.com/button)](urlencoded-url-from-above)

The button and wizard will clone a single GitHub repository and create and deploy two separate backend and frontend Vercel projects. The deployments are not functional at this point. The wizard will not set any required environment variables for either the backend or frontend projects. Users will need to set the appropriate environment variables themselves in the Vercel dashboard.

The wizard will also create an integration that provisions an unassigned, blank Neon database. Users will then need to assign the integration to the backend project and run migrations and seed the database manually, as described in the previous section: Database migrations and seed.

After clicking the “Vercel Deploy” button, the user will be taken to a form wizard, as shown below.

Note: You can also use two separate “Vercel Deploy” buttons to clone the backend and frontend as two Github repositories (and deploy them as two Vercel projects). This approach allows you to include the list of environment variables and their default values (env, envDefaults, and envDescription parameters) in the button’s URL.

You can see examples of such buttons at the following links: vercel-button-backend.md, vercel-button-frontend.md

Completed code and demo

Repository and branch: https://github.com/nemanjam/full-stack-fastapi-template-nextjs/tree/vercel-deploy
Demo frontend: https://full-stack-fastapi-template-nextjs.vercel.app
Demo backend: https://api-full-stack-fastapi-template-nextjs.vercel.app/docs

The relevant branch vercel-deploy and files:

git clone git@github.com:nemanjam/full-stack-fastapi-template-nextjs.git

# Checkout the vercel-deploy branch
git checkout vercel-deploy

# Backend
backend/vercel.json
backend/.vercelignore

backend/app/api/index.py
backend/app/core/config.py

.env.vercel.example
docs/notes/vercel-deployment-backend.md

# Frontend
frontend/apps/web/vercel.json
frontend/.vercelignore

frontend/apps/web/src/config/process-env.ts
frontend/apps/web/src/env.d.ts
frontend/turbo.json

frontend/apps/web/.env.vercel.example
docs/notes/vercel-deployment-frontend.md

# Compare branches in a clear diff
https://github.com/nemanjam/full-stack-fastapi-template-nextjs/compare/vercel-deploy?expand=1

# Compare specific commits in a clear diff
https://github.com/nemanjam/full-stack-fastapi-template-nextjs/compare/45c840d48cba2aeab07e0a66f8245110b852571e...e5fa4b4af3c19c8c2c584fe437b7298f9e342083

Conclusion

Vercel is a quite viable option for hosting full-stack FastAPI and Next.js demo projects for free. It works well when the project is designed around serverless constraints. Keeping the backend and frontend as separate deployments, using environment variables consistently, and handling database migrations manually results in a setup that is simple, predictable, and reliable on the free tier.

This guide focused on one practical approach that is easy to reproduce and debug. With the provided repository and Vercel Deploy buttons, you can use this setup as a solid baseline for demos, prototypes, or small production projects.

Have you done something similar yourself and used a different approach? Leave a comment below, I’m happy to hear your opinions.

References

Vercel docs, supported runtimes https://vercel.com/docs/functions/runtimes
Vercel docs, supported backend frameworks https://vercel.com/docs/frameworks/backend/fastapi
Vercel docs, deploying Next.js https://vercel.com/docs/frameworks/full-stack/nextjs
Vercel docs, deploy button form https://vercel.com/docs/deploy-button
Neon integration example template https://github.com/neondatabase/vercel-marketplace-neon

Github login with FastAPI and Next.js

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sat, 07 Feb 2026 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’; import GithubLoginArchitectureImage from ’../../../../content/post/2026/02-07-github-login-fastapi-nextjs/_images/github-login-architecture-16-9.png’; import OAuth2FlowDiagramImage from ’../../../../content/post/2026/02-07-github-login-fastapi-nextjs/_images/oauth2-diagram.png’;

Introduction

In this article, we will show how to implement Github login in a FastAPI and Next.js application. We use Github in this particular case, but the same approach applies to any OAuth provider, you only need to adjust the FastAPI redirect and callback endpoints. Since this is a Next.js app using server components, we will store the session in an HttpOnly cookie. We will dig into implementation details such as domains, cookies, redirects, and overall structuring to achieve a clean, maintainable, and robust solution.

OAuth flow reminder

OAuth2 flow sequential diagram: (Source gist)

Let’s begin with a quick reminder of how OAuth works in very simplified terms. OAuth is built around the principle of a trusted middleman: both we (our app) and the user know who Github is (the authorization server) and trust it. This means we can use Github to identify the user and obtain their information. For the user, this means Github vouches for our app’s identity and legitimacy, clearly showing what information the app will access and at what level, so the user can give informed consent. Of course, there are many more implementation details, but this is enough for a high-level overview.

For our app, this practically means we need to register it with Github, obtain the app’s client ID and client secret, and then, in the backend, use an OAuth client library to implement two endpoints:

An endpoint that redirects the user to Github, where they can give consent.
A callback endpoint where Github redirects the user back to us, passing an authorization code that the auth library can exchange for an access token, which is then used to call Github APIs and obtain additional information about the user.

Additionally, within the callback endpoint we store the user’s information (email, OAuth ID, name, avatar, etc.) in the database and use the autogenerated database user ID to generate a JWT access token, in the same way we do for a regular email/password authenticated user.

In this way, we achieve a unified interface for authenticating users, regardless of whether they log in with Github or via email/password.

Architecture overview

Architecture diagram:

The one obvious and constant assumption is that we will use a FastAPI backend and a Next.js frontend. When it comes to authorization, this leaves additional room for deciding how we structure the logic and separate concerns. There is more than one way to do this, and some approaches can be fragile and hard to maintain, which is exactly what we want to avoid.

Let’s go straight to the point and explain the optimal approach that we will use, and then briefly touch on some suboptimal alternatives and the kinds of problems they introduce.

Since we use Next.js and server components, we will store the session in an HttpOnly cookie so we can have private, server-side rendered pages. This means localStorage is not used for storing authentication state.
The frontend and backend are separate applications, which means they run as separate Node.js and Python processes and are deployed as separate containers (reminder: Docker containers are meant to run a single process per container). This also applies to the domains on which the frontend and backend run. Ideally, we want complete freedom to use any unrelated domains for both, without making assumptions about subdomains, prefixes, or shared domain structure.
Cookies are tightly coupled to domains: a browser will accept and store a cookie only if it is set by a server running on the correct domain. The fact that Next.js provides API functionality is very fortunate, because those endpoints run on the same domain as the rest of the Next.js app (pages) and can set cookies for that exact domain. This removes the need to deal with cross-domain cookies, which are often complex and fragile.
Cookies are tied to a domain and are impractical for passing arguments via HTTP responses. Cookies are meant for storing data, not for transmitting it. Consequently, for passing the access_token and expires values, we will use the response body (for server actions) and URL query parameters (for the OAuth redirect). Response bodies and URL query parameters are domain-independent and are designed for passing data between HTTP requests.
Separation of concerns on the backend: FastAPI will contain all backend logic, including authorization. This means it will implement both OAuth endpoints (the redirect and callback endpoints). Next.js will handle cookie setting and unsetting logic: via server actions for email/password login, and via Next.js API routes for Github login. As a reminder, a server action is essentially a POST endpoint under the hood and can set or unset cookies.
The OAuth callback endpoint in FastAPI needs to initiate an uninterrupted redirect chain composed of two steps (FastAPI and Next.js API): Github -> FastAPI callback redirect -> Next.js API redirect -> Next.js home page. During this process, the access_token and expires values need to be passed as query parameters appended to the URL. Redirects are mandatory because the entire flow is driven by the browser, and we do not want the user’s browser to just land on a raw API response, but rather on the website’s home page as a successfully logged-in user.

Suboptimal approaches and their problems

There is some ambiguity caused by the redundancy of options, which can lead to suboptimal solutions if we do not think clearly enough. Let’s discuss some of them:

Since we have two backend-capable frameworks, we might be tempted to move a significant part of the authorization logic into Next.js APIs, for example, implementing the OAuth redirect endpoint, the callback endpoint, or even the email/password authentication endpoints there. This would introduce several serious problems, including unnecessary coupling of two backends to the same database and schema, backend deployment being split across two containers that must stay in strict sync, fragmented configuration and secrets management, violation of the “single source of truth” principle, potential read/write race conditions, and increased debugging and logging complexity.

To prevent all of this, we enforce a clear separation of concerns: FastAPI acts as a complete, standalone backend, while Next.js APIs (and server actions) are responsible only for setting and unsetting cookies. Since Next.js runs on the frontend domain, this approach drastically simplifies and hardens cookie handling.
Another pitfall is relying on cross-domain cookies by making assumptions about the domains used by the frontend and backend. For example, we might assume SITE_URL=https://my-website.com for the frontend and API_URL=https://api.my-website.com for the backend. In that case, the backend could tweak cookie properties such as SameSite=None and Domain=.my-website.com to get the browser to accept and store the cookie.

This introduces additional complexity and fragility into the authentication flow and deployment reliability, along with a number of problems and limitations. Some of them include a major mismatch between email/password login (where the cookie is set directly via a server action) and OAuth login, the inability to host the frontend and backend on completely different, unrelated domains (which is a legitimate requirement), and the inability to host the backend on a PaaS that uses domains included in the public suffix list (https://publicsuffix.org/list/), such as vercel.app.

Once again, this is solved by letting the Next.js API (and server actions) handle setting and unsetting cookies.

Implementation

That was a lot of text but still no code. On the other hand when we have clear mental model and worked out plan implementation is straight forward.

Create OAuth app on Github

Like with any OAuth provider we need to register our app on Github and obtain client id and client secret. One Github specific is that you can have set only one redirect URL per app, so if you want multiple deployments you will need to create a separate app for each of them.

It’s a straight forward process, go to your Github profile and open the following menus: Github (top-right avatar) -> Settings -> Developer settings (bottom of the left sidebar) -> OAuth Apps -> New OAuth App. Fill in your app info, including redirect URL where you should set the URL of your FastAPI callback endpoint, e.g. https://api.my-website.com/api/v1/auth/github/callback.

Then copy Client ID and Client secret and set inside the backend .env file.


# ...

GITHUB_CLIENT_ID=Ov23liasdxhfaOJasdf12
GITHUB_CLIENT_SECRET=c9ad7bc12977515fed61409492abe169212345

# ...

Instantiate OAuth client

We need to install OAuth client library, we will use authlib/authlib.

# Activate venv
source .venv/bin/activate

# Install authlib
poetry add authlib

Then we can instantiate OAuth client:

GITHUB_OAUTH_CONFIG = {
    "name": "github",
    "client_id": settings.GITHUB_CLIENT_ID,
    "client_secret": settings.GITHUB_CLIENT_SECRET,
    "access_token_url": "https://github.com/login/oauth/access_token",
    "authorize_url": "https://github.com/login/oauth/authorize",
    "api_base_url": "https://api.github.com/",
    "client_kwargs": {"scope": "user:email"},
}


def create_oauth() -> OAuth:
    oauth = OAuth()
    oauth.register(**GITHUB_OAUTH_CONFIG)
    return oauth


oauth = create_oauth()

Define OAuth endpoints in FastAPI

We can then use the instantiated OAuth client to implement the OAuth redirect and callback endpoints.

The redirect endpoint is quite simple, almost trivial. When the user hits this endpoint, they are redirected to the Github login page, where they can give consent. The redirect_uri variable contains the absolute URL of our callback endpoint, which we define next.

@router.get("/login/github")
async def login_github(request: Request):
    """
    Redirect to Github login page
    Must initiate OAuth flow from backend
    """
    redirect_uri = request.url_for("auth_github_callback")  # matches function name

    # rewrite to https in production
    if is_prod:
        redirect_uri = redirect_uri.replace(scheme="https")

    return await security.oauth.github.authorize_redirect(request, redirect_uri)

Now we can define the callback endpoint, which is where Github sends the user after they have logged in on Github. This part is a bit more complex.

Github includes an authorization code as a URL parameter, which we use to obtain an OAuth access token. We then use this token to call two separate Github APIs: one to retrieve the user’s profile information (full name, username, and OAuth ID), and another to retrieve the user’s primary email address. Next, we find or create the user in our database. Finally, we use the user’s database ID to create a JWT token, in exactly the same way as we do for a regular email/password user.

Next, we calculate the expires value for the session cookie so that it matches the JWT access_token expiration. We then attach the access_token and expires values as query parameters to the redirect URL. The redirect URL is constructed as f"{settings.SITE_URL}/api/auth/set-cookie", pointing to a Next.js API endpoint (which we define next) that is responsible for actually setting the cookie. Finally, we redirect the user.

Once again, it is important to emphasize that the redirect is essential so the browser can follow the entire chain. We do not want the user to land on a raw API response, the home page is the final destination after a successful login.

@router.get("/auth/github/callback")
async def auth_github_callback(
    request: Request, session: SessionDep
) -> RedirectResponse:
    """
    Github OAuth callback, Github will call this endpoint
    """
    # Exchange code for access token
    token = await security.oauth.github.authorize_access_token(request)

    # Get user profile Github API
    user_info = await security.oauth.github.get("user", token=token)
    profile = user_info.json()

    # Get primary email Github API
    emails = await security.oauth.github.get("user/emails", token=token)
    primary_email = next((e["email"] for e in emails.json() if e["primary"]), None)

    logger.info(f"Primary Github email: {primary_email}")

    # Authenticate or create user
    user = crud.authenticate_github(
        session=session,
        primary_email=primary_email,
        profile=profile,
    )

    expires_delta = timedelta(hours=settings.ACCESS_TOKEN_EXPIRE_HOURS)

    access_token = security.create_access_token(user.id, expires_delta)

    # Absolute expiration timestamp (UTC)
    expires_at = datetime.now(timezone.utc) + expires_delta
    expires_timestamp = int(expires_at.timestamp())

    # Build redirect URL to Next.js cookie-setter
    base_url = f"{settings.SITE_URL}/api/auth/set-cookie"
    query = urlencode(
        {
            "access_token": access_token,
            "expires": expires_timestamp,
        }
    )
    redirect_url = f"{base_url}?{query}"

    response = RedirectResponse(url=redirect_url, status_code=302)

    return response

Note the fixed dict type Token used for passing cookie properties. It is important that this type is identical and shared between both OAuth and email/password flows, ensuring that they conform to the same interface.

class Token(SQLModel):
    access_token: str
    # Absolute Date, timestamp, sufficient
    expires: int

Now that we have identified the user on Github and created the JWT access_token, the only remaining step is to set the cookie. As mentioned earlier, in the OAuth flow this is done in a Next.js API endpoint.

Below is the complete endpoint implementation. As you can see, it is not too complicated. We simply parse the access_token and expires values from the URL query parameters, use them to construct the cookie, and attach the cookie to a redirect response that sends the user to the home page. This final step sets the cookie, and that’s it.

It’s also worth mentioning that if the query parameters are invalid, we redirect the user back to the login page.

Note that we construct the cookie as host-only (domain: undefined), meaning it is valid only for the frontend domain. This is perfectly fine and exactly what we want, since in a Next.js app both pages and APIs run on the same domain.

export const GET = async (request: Request): Promise<Response> => {
  const { SITE_URL, NODE_ENV } = getPublicEnv();
  const isProd = NODE_ENV === 'production';

  const url = new URL(request.url);

  const accessToken = url.searchParams.get('access_token');
  const expiresParam = url.searchParams.get('expires');

  const hasAllData = accessToken && expiresParam;
  if (!hasAllData) {
    const loginUrl = new URL(`${LOGIN}?error=missing_auth_token`, SITE_URL);
    return NextResponse.redirect(loginUrl, { status: 302 });
  }

  // Convert Unix timestamp (seconds) to a JS Date object
  const expiresDate = new Date(Number(expiresParam) * 1000);

  const redirectUrl = new URL(DASHBOARD, SITE_URL);
  const response = NextResponse.redirect(redirectUrl, { status: 302 });

  response.cookies.set({
    name: AUTH_COOKIE,

    // passed from backend
    value: accessToken,
    expires: expiresDate,

    // frontend-specific
    httpOnly: true,
    secure: isProd,

    // host-only
    path: '/',
    sameSite: 'lax',
    domain: undefined,
  });

  return response;
};

Although a server action is used to set the session cookie only for the email/password login, it is important to explain the complete authentication picture, show how both login flows conform to the same interface, and highlight some differences.

In contrast to the OAuth flow, which relies on redirects, the email/password login can simply call the FastAPI endpoint LoginService.loginAccessToken({ body }) and obtain the access_token and expires values from the response body to construct the cookie.

Once again, the cookie is host-only (domain: undefined) and included in the server action response. Under the hood, a server action is just a POST request, which effectively sets the cookie.

export const loginAction = async (
  _prevState: ApiResult,
  formData: FormData
): Promise<ApiResult> => {
  const { NODE_ENV } = getPublicEnv();

  const body = Object.fromEntries(formData) as BodyLoginLoginAccessToken;
  const apiResponse = await LoginService.loginAccessToken({ body });

  const { response: _, ...result } = apiResponse;

  const isSuccess = isSuccessApiResult(result);
  // UI will display backend error
  if (!isSuccess) return result;

  const { access_token, expires } = result.data;
  const isProd = NODE_ENV === 'production';

  // Convert Unix timestamp (seconds) to a JS Date object
  const expiresDate = new Date(Number(expires) * 1000);

  const cookieStore = await cookies();

  cookieStore.set({
    name: AUTH_COOKIE,
    // args
    value: access_token,
    expires: expiresDate,
    // local
    httpOnly: true,
    secure: isProd,
    // host-only for exact frontend domain
    path: '/',
    sameSite: 'lax',
    domain: undefined,
  });

  // success result is ignored, just for type
  return result;
};

Another server action is used for logout. It simply unsets the cookie and applies to both email/password and OAuth logins, since both rely on the same session cookie.

export const logoutAction = async (): Promise<void> => {
  const cookiesList = await cookies();
  cookiesList.delete(AUTH_COOKIE);

  redirect(LOGIN);
};

Differences between the email/password and OAuth flows

So what is different between the email/password and OAuth flows?

The OAuth flow is based on two consecutive redirects: FastAPI callback endpoint -> Next.js API set-cookie endpoint -> Home page. This is mandatory because the callback endpoint is the only thing Github provides us, and the access_token and expires values must be passed as query parameters attached to the redirect responses.

In contrast, the email/password flow is based on an HTML form and a server action, which follows a standard request/response pattern. This allows the access_token and expires values to be sent directly in the response body.

Completed code

Repository: https://github.com/nemanjam/full-stack-fastapi-template-nextjs

The relevant files:

git clone git@github.com:nemanjam/full-stack-fastapi-template-nextjs.git
git checkout 45c840d48cba2aeab07e0a66f8245110b852571e

# Backend
backend/app/api/routes/login.py
backend/app/core/security.py
backend/app/crud.py
backend/app/models.py

# Frontend container (Next.js)
# API route
frontend/apps/web/src/app/api/auth/set-cookie/route.ts
# Server action
frontend/apps/web/src/actions/auth.ts

# Pull request with most of the code shown in a clear diff
https://github.com/nemanjam/full-stack-fastapi-template-nextjs/pull/4

Conclusion

Let’s conclude this article by summarizing the upsides and downsides of choosing to move the cookie-setting logic to Next.js server actions and APIs.

Upsides:

The frontend and backend are fully independent. We can use any domains for both by simply setting the SITE_URL and API_URL environment variables.
We can deploy to platforms like vercel.app without needing any additional modifications.
We maintain a single, unified interface for both email/password and OAuth logins.
The approach is applicable to any OAuth provider, not just Github. You only need to define the appropriate redirect and callback endpoints in FastAPI.

Downsides:

Slightly increased complexity caused by moving the cookie-setting logic into Next.js server actions and APIs.
The frontend container must include a Node.js runtime to support Next.js server actions and APIs. In practice, this is not much of a downside, since using SSR was already part of the plan.

Have you implemented something similar yourself? What approach did you choose? Let me know in the comments.

References

Authlib FastAPI docs https://docs.authlib.org/en/latest/client/fastapi.html, blog article https://blog.authlib.org/2020/fastapi-google-login, example https://github.com/authlib/demo-oauth-client/tree/master/fastapi-google-login
List of all known public suffixes https://publicsuffix.org/list/

Next.js server actions with FastAPI backend and OpenAPI client

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sat, 03 Jan 2026 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’; import DeploymentDiagramImage from ’../../../../content/post/2026/01-03-nextjs-server-actions-fastapi-openapi/_images/deployment-diagram-16-8.png’ import SequentialDiagramImage from ’../../../../content/post/2026/01-03-nextjs-server-actions-fastapi-openapi/_images/sequential-diagram.png’

Introduction

In 2026, Next.js and server components are the industry standard for building server-side rendered React websites. Next.js comes with API routes by default for building backend endpoints, but for various reasons you may want to use a completely different programming language and a non-TypeScript backend. For example, FastAPI is known for its excellent integration with ML and AI libraries, which are typically implemented in Python.

Today we will show how to implement one critical part of every full-stack app: data fetching and data mutations using Next.js and FastAPI. There is more than one way to do this, but we will aim to choose and implement the best one.

We will not start completely from scratch, but instead reuse https://github.com/fastapi/full-stack-fastapi-template as a starting point. It already provides a solid foundation, especially on the backend, where we will only change session storage from localStorage to an HttpOnly cookie. In contrast, we will replace most of the frontend code by switching from TanStack Router, React Query, and Chakra to Next.js 16, ShadcnUI, and Tailwind CSS v4.

The problem statement and requirements

What is the real challenge here? Modern React 19 and Next.js 16 provide new, advanced features and a standardized workflow not only for fetching data into components but also for managing state. These span both the server and the client, and we should aim to fully leverage them.

So the real goal is this: we want to use a non-TypeScript backend while at the same time preserving the well-established server components model for data fetching and the server actions model for data mutations, with the same level of type safety and state management we would have when using Next.js API endpoints.

Architecture overview

Visual representation of the architecture we will build in this tutorial.

Deployment diagram:

Sequential diagram:

Server-side rendering and server components are central features of modern Next.js and React. The original starter stores the session in the browser’s localStorage, which prevents us from having server-side rendered private pages and components. We will modify this to store the session in an HttpOnly cookie instead, which we can access and read in server components.

In the original repository, there is already a pull request, Replace plaintext auth tokens with HttpOnly cookies #1606, that implements this. We will reuse it and adapt it to our needs. Let’s highlight the most important parts of this code and discuss them.

First, let’s create utilities to set and unset the auth cookie in API responses. Note the signature of the set_auth_cookie method: set_auth_cookie(subject, expires_delta, response). We pass in the subject (typically a user.id) and the token’s expiration, as well as a response argument of the base class type Response so that this utility can be applied to any specific response subclass. The create_access_token() method itself remains unchanged; the token is created the same way for both localStorage and cookies.

backend/app/core/security.py

def set_auth_cookie(
    subject: str | Any, expires_delta: timedelta, response: Response
) -> Response:
    # Cookie expiration and JWT expiration match
    # Note: cookie expiration must be in seconds
    expires_in_seconds = int(expires_delta.total_seconds())
    access_token = create_access_token(subject, expires_delta)

    # Dev defaults
    samesite = "lax"
    domain = None

    # Prod overrides
    if is_prod:
        samesite = "none"
        # Note: important for cross-site cookies in prod to succeed
        # api-site.rpi.example.com and site.rpi.example.com
        parsed = urlparse(settings.SITE_URL)
        domain = parsed.hostname  # full domain

        # if it has subdomains whitelist cookie for "1 level less" subdomain, rpi.example.com
        host_segments = domain.split(".")
        if len(host_segments) > 2:
            domain = ".".join(host_segments[1:])  # remove the first segment (head)

    logger.info(f"domain: {domain}")

    response.set_cookie(
        key=settings.AUTH_COOKIE,
        value=access_token,
        httponly=True,
        max_age=expires_in_seconds,
        expires=expires_in_seconds,
        samesite=samesite,
        secure=is_prod,
        domain=domain,
    )
    return response

The core idea is straightforward: create a token, assign it to the cookie value, and attach the cookie to the response object.

There is some added cross-site cookie complexity specific to my use case, which you may not necessarily need to replicate, but let’s explain it for the sake of clarity.

In general, in practice, the frontend and backend often don’t share the same domain, and you need to account for this because cookies are tied to a domain. For example, if the frontend is on my-website.com and the backend is on api.my-website.com, the backend must set domain = "my-website.com" and samesite = "none" for the browser to accept and store the cookie.

On my server, I use an additional Traefik TCP router that treats a dot . as a special delimiter character, which prevents it from correctly routing infinite-depth subdomains. As a result, I had to use a dash - instead for my backend domain.

# frontend url
https://full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com

# backend url
https://api-full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com

The additional cookie domain logic essentially does the following: if the frontend is on a subdomain, it sets the cookie for the parent domain (one level up). For example, in this specific case, arm1.nemanjamitic.com.

That’s enough for this digression, I just wanted to emphasize that you must carefully adjust the domain property of a cross-site cookie depending on the URLs where you host your frontend and backend. Otherwise, the browser will reject the cookie, and authentication will fail.

Similarly, we use this code to unset the auth cookie. We simply return a JSONResponse containing an expired cookie with the same key. It can be improved, but it will suffice for now.

backend/app/core/security.py

def delete_auth_cookie() -> JSONResponse:
    response = JSONResponse(content={"message": "Logout successful"})
    response.delete_cookie(
        key=settings.AUTH_COOKIE,
        path="/",
        domain=None,
        httponly=True,
        samesite="lax",
        secure=is_prod,
    )
    return response

Now it’s time to make use of these utilities to log in and log out a user.

For login, we first verify that the user has provided a valid email and password. If the credentials are correct, we use the user.id to generate an access token, set it as the cookie value, and include the cookie in the response using the previously mentioned security.set_auth_cookie() utility.

backend/app/api/routes/login.py

@router.post("/login/access-token")
def login_access_token(
    session: SessionDep, form_data: Annotated[OAuth2PasswordRequestForm, Depends()]
) -> JSONResponse:
    """
    OAuth2-compatible token login: get an access token for future requests (sent in an HTTP-only cookie)
    """
    user = crud.authenticate(
        session=session, email=form_data.username, password=form_data.password
    )
    if not user:
        raise HTTPException(status_code=400, detail="Incorrect email or password")
    elif not user.is_active:
        raise HTTPException(status_code=400, detail="Inactive user")

    access_token_expires = timedelta(hours=settings.ACCESS_TOKEN_EXPIRE_HOURS)
    response = JSONResponse(content={"message": "Login successful"})

    return security.set_auth_cookie(user.id, access_token_expires, response)

For logout, we simply use the security.delete_auth_cookie() utility to unset the cookie from the user’s browser.

Note: we are implementing this endpoint for the sake of completeness in the FastAPI backend. In our particular setup, however, we will use the Next.js server to clear the auth cookie.

backend/app/api/routes/login.py

@router.post("/logout", dependencies=[Depends(get_current_user)])
def logout() -> JSONResponse:
    """
    Delete the HTTP-only cookie during logout
    """
    return security.delete_auth_cookie()

Protect endpoints with auth

After implementing login and logout, we can use them to protect specific API endpoints by identifying the user from the request and checking whether they have sufficient privileges to access a resource or perform an action.

We will use FastAPI’s dependency injection to centralize the logic for obtaining the auth cookie with CookieDep and for identifying the user who sent the cookie with CurrentUser. The get_current_user() method checks for the existence of the cookie, decodes and verifies the validity of the access token from the cookie, and finally uses the user.id from the token’s subject to query the user from the database.

backend/app/api/deps.py

cookie_scheme = APIKeyCookie(name=settings.AUTH_COOKIE)

CookieDep = Annotated[str, Depends(cookie_scheme)]


def get_current_user(session: SessionDep, cookie: CookieDep) -> User:
    if not cookie:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Not authenticated",
        )

    try:
        payload = jwt.decode(
            cookie, settings.JWT_SECRET_KEY, algorithms=[security.ALGORITHM]
        )
        token_data = TokenPayload(**payload)
    except (InvalidTokenError, ValidationError):
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Could not validate credentials",
        )

    user = session.get(User, token_data.sub)
    if not user:
        raise HTTPException(status_code=404, detail="User not found")
    if not user.is_active:
        raise HTTPException(status_code=400, detail="Inactive user")

    return user

CurrentUser = Annotated[User, Depends(get_current_user)]

Now, any function or route in FastAPI can make use of the CurrentUser dependency to identify the user simply by including it as a typed argument. Below is a simple /me endpoint for illustration.

backend/app/api/routes/users.py

@router.get("/me", response_model=UserPublic)
def read_user_me(current_user: CurrentUser) -> Any:
    """
    Get current user.
    """
    return current_user

With this HttpOnly cookie setup, authentication on the backend is complete. Essentially, we didn’t change much, we only moved the access token from the JSON body of the response to the cookie header.

Generate and configure OpenAPI client

Now that we have moved the session from localStorage to an HttpOnly cookie, we can identify the user in server components on the Next.js server. An important and obvious note worth repeating: HttpOnly cookies are accessible only on the Next.js server, not in the client-side JavaScript running in the browser.

Since our setup involves Next.js client -> Next.js server -> FastAPI server, we need to handle cookie transmission accordingly. We will continue using the @hey-api/openapi-ts package from the original template, but for our Next.js app, we will use the @hey-api/client-next client and configure it to handle the auth cookie properly.

Here is the configuration for generating the OpenAPI client:

frontend/apps/web/openapi-ts.config.ts

const config: HeyApiConfig = defineConfig({
  input: './openapi.json',
  output: {
    format: 'prettier',
    lint: 'eslint',
    path: './src/client',
    importFileExtension: null,
  },
  exportSchemas: true, // backend models types
  plugins: [
    // Note: order matters
    {
      name: '@hey-api/typescript',
      enums: 'javascript', // const objects instead of enums
    },
    '@hey-api/schemas', // default json, req.body, '{"username":"abc","password":"123"}'
    {
      name: '@hey-api/sdk',
      asClass: true, // UsersService.readUserMe(), 'true' doesn't allow tree-shaking
      classNameBuilder: '{{name}}Service', // class Users -> UsersService
      // @ts-expect-error @hey-api/openapi-ts doesn't export types
      methodNameBuilder, // usersReadUserMe -> readUserMe
    },
    {
      name: '@hey-api/client-next',
      // relative from src/client/ folder
      runtimeConfigPath: '../lib/hey-api', // sets API_URL, auth...
    },
  ],
});

The important parts are: we choose the Next.js client @hey-api/client-next, control the SDK method names using the methodNameBuilder callback, and specify the path for the generated client runtime configuration with runtimeConfigPath.

Runtime client configuration

There are several ways to set runtime configuration for the generated client. We will use the runtimeConfigPath field to specify the path to the configuration file, as this is the recommended approach according to the docs: https://heyapi.dev/openapi-ts/clients/next-js#runtime-api.

The auth cookie is originally stored in the browser and is included in client requests by default. An important note is that the cookie in these requests is valid only for the frontend domain (Next.js server) and not for the backend domain. If you try calling the FastAPI backend using this cookie, you will get a 403 Forbidden HTTP error. That’s why requests from client code need to be proxied through a Next.js API endpoint to the FastAPI backend.

In contrast, HTTP requests made from the Next.js server do not include the cookie automatically at all, so we need to forward it explicitly.

We handle both of these requirements in a base runtime configuration at the SDK instance level in a single place, so we don’t have to repeat this logic for every specific call.

Here is the code:

frontend/apps/web/src/lib/hey-api.ts

const { CLIENT_PROXY } = ROUTES.API;

/** Runtime config. Runs and imported both on server and in browser. */
export const createClientConfig: CreateClientConfig = (config) => {
  const { API_URL } = getPublicEnv();

  return {
    ...config,
    baseUrl: API_URL,
    credentials: 'include',
    fetch: isServer() ? serverFetch : clientFetch,
  };
};

const serverFetch: typeof fetch = async (input, init = {}) => {
  // Note: Dynamic import to avoid bundling 'next/headers' on client
  const { cookies } = await import('next/headers');

  const cookieStore = await cookies();
  const cookieHeader = cookieStore
    .getAll()
    .map((c) => `${c.name}=${c.value}`)
    .join('; ');

  // Note: must append auth_cookie like this or content-type header will break in server actions
  const headers = new Headers(init.headers);
  headers.append('Cookie', cookieHeader);

  // test skeletons styling
  // await waitMs(3000);

  const response = fetch(input, { ...init, headers });

  return response;
};

/** Client-side fetch: forwards requests to api/client-proxy/[...path]/route.ts */
const clientFetch: typeof fetch = async (input, init = {}) => {
  const { API_URL } = getPublicEnv() as { API_URL: string };

  // hey-api sends absolute URL
  const url: string = typeof input === 'string' ? input : input.toString();

  // Normalize to relative URL
  // API_URL.length + 1 - removes leading slash, API_URL guaranteed not to have trailing slash
  const relativeUrl = url.startsWith(API_URL) ? url.slice(API_URL.length + 1) : url;

  // Build the proxy URL relative to Next.js API
  const proxyUrl = `${CLIENT_PROXY}${relativeUrl}`;

  const headers = new Headers(init.headers);

  return fetch(proxyUrl, { ...init, headers, credentials: 'include' });
};

Server fetch override

The important part here is the serverFetch override for the fetch function used by the generated client on the Next.js server. The serverFetch code typically runs either on a server-side rendered page or in a server action. Both of these have access to the request object (including headers and cookies) that originally comes from the browser.

Based on this, we read the cookies (including the auth cookie) and forward them to the fetch client, which makes requests from the Next.js server to the FastAPI endpoints. This creates the following chain: browser -> Next.js server -> FastAPI server.

There are a few additional tricks required to make this work correctly.

We use a dynamic import, const { cookies } = await import('next/headers'), to prevent bundling server-only code into the client bundle, which would break the build.
We must create a new Headers() instance and use the append() method to pass cookies to fetch.
We obtain the API_URL environment variable using const { API_URL } = getPublicEnv() from the next-public-env package. For a reusable build, API_URL must be a runtime variable. Since this code runs both on the server and in the client, it must get the correct value in both environments. I wrote about reusable Next.js builds and runtime environment variables in detail in this article: Runtime environment variables in Next.js - build reusable Docker images.

Example client calls from Next.js server code

With this OpenAPI client, we can make HTTP authenticated calls from Next.js pages, server components, and server actions to protected FastAPI endpoints.

Example 1. Query in a page:

frontend/apps/web/src/app/dashboard/items/[[…page]]/page.tsx


const ItemsPage: FC<Props> = async ({ params }) => {
  const { page } = await params;
  const { currentPage, isValidPage } = parsePage(page);

  if (!isValidPage) notFound();

  const result = await ItemsService.readItems({
    query: {
      skip: (currentPage - 1) * PAGE_SIZE_TABLE,
      limit: PAGE_SIZE_TABLE,
    },
  });

  const items = result.data;

  // ...

}

Example 2. Query in a server component:

frontend/apps/web/src/components/dashboard/home/list-recent-items.tsx

const ListRecentItems: FC = async () => {
  const result = await ItemsService.readItems();

  throwIfApiError(result);

  const items = result.data?.data ?? [];

  // ...

}

Example 3. Mutation in a server action:

frontend/apps/web/src/actions/item.ts

export const itemCreateAction = async (
  _prevState: ApiResult,
  formData: FormData
): Promise<ApiResult> => {
  const body = Object.fromEntries(formData) as ItemCreate;

  const apiResponse = await ItemsService.createItem({ body });

  const { response: _, ...result } = apiResponse;

  revalidatePath(ITEMS);

  return result;
};

Client fetch override

The clientFetch implementation rewrites the original URL pointing to FastAPI to a Next.js API endpoint that serves as a proxy. With this, every call from the client code is routed to the Next.js API proxy endpoint.

The Next.js API endpoint then parses the authorization cookie and attaches it to the actual request to the corresponding FastAPI endpoint. It does the same for the response by forwarding the cookie from the FastAPI response back to the Next.js client code. In practice, it acts as a proxy middleman that attaches the cookie in both directions.

As a result, HTTP requests from the client code can be authenticated correctly with FastAPI instead of being rejected with a 403 Forbidden error.

Note: This call is not displayed in the architecture diagram, it needs to be updated.

frontend/apps/web/src/app/api/client-proxy/[…path]/route.ts

const proxyHandler = async (request: NextRequest, { params }: ClientProxyRouteParam) => {
  const { path } = await params;

  const { API_URL } = getPublicEnv();

  // params.path just splits into array, this just joins back to string
  // path = ['api', 'v1', 'users', 'me'] -> http://api.localhost:8000/api/v1/users/me
  const backendUrl = `${API_URL}/${path.join('/')}`;

  // clone headers from client request
  const headers: Record<string, string> = {};
  request.headers.forEach((value, key) => (headers[key] = value));

  const { method } = request;
  const body = ['GET', 'HEAD'].includes(method) ? undefined : await request.arrayBuffer();

  const apiResponse = await fetch(backendUrl, { method, headers, body });

  const data = await apiResponse.arrayBuffer();
  const response = new NextResponse(Buffer.from(data), { status: apiResponse.status });

  // copy headers from backend response, excluding ones automatically controlled by Node/Next.js internally
  const excludeHeaders = ['content-encoding', 'transfer-encoding', 'connection'];
  apiResponse.headers.forEach(
    (value, key) => !excludeHeaders.includes(key) && response.headers.set(key, value)
  );

  // optional: return error if response not ok
  if (!apiResponse.ok) {
    console.warn('Client proxy returned error:', apiResponse.status, backendUrl);
  }

  return response;
};

// add PUT, DELETE, PATCH if needed

export const GET = proxyHandler;

export const POST = proxyHandler;

Example client calls from Next.js client code

Example 1. Query in a page’s client component:

frontend/apps/web/src/app/page.tsx

'use client';

// ...

const { LOGIN, DASHBOARD } = ROUTES;
const { HOME_PAGE_REDIRECT } = DELAY;

// Must be client component to show loader before redirect

const HomePage: FC = () => {
  const router = useRouter();

  useEffect(() => {
    const redirect = async () => {
      // client side call, cookie valid only for Next.js API domain, and not for FastAPI domain
      const result = await UsersService.readUserMe();
      const currentUser = result.data;

      await waitMs(HOME_PAGE_REDIRECT);

      const redirectUrl = currentUser ? DASHBOARD : LOGIN;
      router.push(redirectUrl);
    };

    redirect();
  }, [router]);

  return (
    <div className="flex items-center justify-center min-h-screen">
      <div className="text-center">
        <div className="animate-spin rounded-full h-8 w-8 border-b-2 border-gray-900 mx-auto"></div>
        <p className="mt-2 text-gray-600">Redirecting...</p>
      </div>
    </div>
  );
};

Mutations

The original idea of this article is to preserve the default React workflow, even when using a non-TypeScript backend. To achieve this, the client code must not be aware of the FastAPI endpoints and should never call them directly, but only through the Next.js server, which will forward the requests.

We have already implemented passing the cookie from the browser to FastAPI via the Next.js server, but we have not yet implemented the opposite direction. The FastAPI login endpoint sets the cookie in the response; now we need to forward it through the Next.js server to the user’s browser. Here is the code that accomplishes this:

frontend/apps/web/src/utils/actions.ts

export const forwardCookiesFromResponse = async (response: Response): Promise<void> => {
  const rawCookies = response.headers.get('set-cookie');
  if (!rawCookies) return;

  const parsed = setCookieParser.parse(rawCookies);
  const cookieStore = await cookies();

  for (const c of parsed) {
    cookieStore.set({
      name: c.name,
      value: c.value,
      httpOnly: c.httpOnly,
      secure: c.secure,
      path: c.path ?? '/',
      sameSite: c.sameSite as any,
      expires: c.expires,
    });
  }
};

The utility method above retrieves the cookies from the FastAPI response object, parses them using the set-cookie-parser package, and finally attaches the parsed cookies to the Next.js response that will be returned as part of the server action.

We then call this utility method within the login server action, which sets the forwarded cookies in the user’s browser. Under the hood, a server action is just a POST request, and it can set cookies just like any other HTTP call.

frontend/apps/web/src/actions/auth.ts

export const loginAction = async (
  _prevState: ApiResult,
  formData: FormData
): Promise<ApiResult> => {
  const body = Object.fromEntries(formData) as BodyLoginLoginAccessToken;
  const apiResponse = await LoginService.loginAccessToken({ body });

  const { response, ...result } = apiResponse;
  await forwardCookiesFromResponse(response);

  return result;
};

React Hook Form and useActionState

Now that we have configured the backend and HTTP client, it is time to handle form submission within our client code. For some time, react-hook-form has been the dominant forms package in the React ecosystem. There are a few tricks to integrate it properly with the useActionState React API and server actions.

Here is example code for creating an item:

frontend/apps/web/src/components/dashboard/items/form-item-create.tsx


const defaultValues: ItemCreateFormValues = {
  title: '',
  description: '',
} as const;

const resolver = zodResolver(itemCreateSchema);

const FormItemCreate: FC<Props> = ({ onSuccess, onCancel }) => {
  const initialState = { data: undefined };
  const [state, formAction, isPending] = useActionState(itemCreateAction, initialState);

  const form = useForm<ItemCreateFormValues>({ resolver, defaultValues });

  const isSuccess = isSuccessApiResult(state);

  useEffect(() => {
    if (isSuccess) onSuccess?.();
  }, [isSuccess, onSuccess]);

  const isError = isErrorApiResult(state);

  const validateAndSubmit = (event: FormEvent<HTMLFormElement>) => {
    event.preventDefault();

    form.handleSubmit(() => {
      const formElement = event.target as HTMLFormElement;
      const formData = new FormData(formElement);

      startTransition(() => {
        formAction(formData);
        form.reset();
      });
    })(event);
  };

  return (
    <Form {...form}>
      <form action={formAction} onSubmit={validateAndSubmit} className="space-y-6">
        <FormField
          control={form.control}
          name="title"
          render={({ field }) => (
            <FormItem>
              <FormLabel>Title *</FormLabel>
              <FormControl>
                <Input {...field} placeholder="Enter item title..." disabled={isPending} />
              </FormControl>
              <FormMessage />
            </FormItem>
          )}
        />

        <FormField
          control={form.control}
          name="description"
          render={({ field }) => (
            <FormItem>
              <FormLabel>Description</FormLabel>
              <FormControl>
                <Input {...field} placeholder="Enter item description..." disabled={isPending} />
              </FormControl>
              <FormMessage />
            </FormItem>
          )}
        />

        {isError && (
          <Alert variant="destructive">
            <AlertDescription>{getApiErrorMessage(state.error)}</AlertDescription>
          </Alert>
        )}

        <div className="flex justify-end space-x-2">
          <Button type="button" variant="outline" onClick={onCancel} disabled={isPending}>
            Cancel
          </Button>

          <Button type="submit" disabled={isPending}>
            {isPending ? (
              <>
                <Loader2 className="mr-2 h-4 w-4 animate-spin" />
                Creating...
              </>
            ) : (
              'Create Item'
            )}
          </Button>
        </div>
      </form>
    </Form>
  );
};

Form submission and validation

The code above addresses two important requirements:

Server actions are designed to support form submission even with JavaScript disabled. In our code, we support this by assigning the action attribute on the <form /> tag to the formAction result returned by the useFormAction hook: <form action={formAction} ... />.
We use client-side JavaScript to validate the fields with Zod and display user-friendly error messages. For this, as well as for form submission when JavaScript is enabled, we use the validateAndSubmit event handler attached to the onSubmit event: <form onSubmit={validateAndSubmit} ... />.

I learned this trick from the tutorial linked in this GitHub discussion comment: react-hook-form/discussions/11832#discussioncomment-11832211.

useActionState

React provides the useActionState hook not only to handle form submission and call server actions, but also to manage state, all at the same time. It accepts the server action itemCreateAction as an argument and returns the formAction function, which can be used to bind to the form’s action attribute, invoked manually with an event handler, or both at the same time, as we do in this case.

const [state, formAction, isPending] = useActionState(itemCreateAction, initialState);

State management is modeled in a way that is somewhat reminiscent of reducers in Redux. It uses the concept of previous and next state, where the next state is the result of applying a transformation to the previous state, in this case, the transformation occurs on the server. Naturally, it all starts with an initial state.

In our example, useActionState accepts initialState as the second argument, and the server action response is contained within the state item in the returned tuple.

Server actions handle success and error results

There are certain rules that apply to React server actions in general:

In server actions, you shouldn’t throw exceptions but instead return errors within the result. This affects which particular generic we will use to type the OpenAPI client response, because in our case a server action is just a proxy to the respective FastAPI endpoint.

Here is the ItemsService.createItem() return type as an example. This type is reused for both FastAPI and the server action. This is intentional because the server action is just a proxy, and we want to avoid any unnecessary transformation of the results.

As mentioned earlier, errors should be included as return values. This type clearly shows that: the result is a union of success and error branches and additionally contains the raw HTTP Response object. You may recall that we already used it to extract the cookie in the login endpoint.
```
Promise<({
    data: ItemPublic;
    error: undefined;
} | {
    data: undefined;
    error: HttpValidationError;
}) & {
    response: Response;
}>
```
Server actions have limited serialization capabilities. You cannot return class instances, error instances, database model instances, etc. You should mostly rely on object literals for return values from server actions.

Let’s see the code example:

frontend/apps/web/src/actions/item.ts
```
export const itemCreateAction = async (
  _prevState: ApiResult,
  formData: FormData
): Promise<ApiResult> => {
  const body = Object.fromEntries(formData) as ItemCreate;

  const apiResponse = await ItemsService.createItem({ body });

  const { response: _, ...result } = apiResponse;

  revalidatePath(ITEMS);

  return result;
};
```
Note that the response field is a class instance and is not serializable, which is why we omit it from the server action return value.

You may recall that we bound the action attribute to formAction in the form: <form action={formAction} ... >. Because of this, form values are received as FormData in the server action, and we use Object.fromEntries() to convert it to a plain object that can be forwarded as the HTTP request body to the FastAPI endpoint.

useActionState vs useTransition to call actions

Just a quick reminder: useActionState is not the only way to call a server action. It is typically used with a form element, while also providing state and an isPending flag. We have already seen a code example for this.

When we have a simple void server action, we can skip the form and simply invoke the action within an event handler. Such calls are typically marked as lower priority by wrapping them with startTransition. Below is a code example:

frontend/apps/web/src/components/dashboard/items/dropdown-item.tsx

const DropdownItem: FC<Props> = ({ item }) => {
  const [isPending, startTransition] = useTransition();

  const handleDeleteItem = (userId: string) => {
    startTransition(() => {
      itemDeleteAction(userId);
    });
  };

  // ...

};

Completed code

Repository: https://github.com/nemanjam/full-stack-fastapi-template-nextjs

The relevant files:

git clone git@github.com:nemanjam/full-stack-fastapi-template-nextjs.git
git checkout be2b94b72b343563d21aeac29743099af8512f62

# Backend
backend/app/core/security.py
backend/app/api/deps.py
backend/app/api/routes/login.py
backend/app/api/routes/users.py

# Frontend

# OpenAPI configuration
frontend/apps/web/openapi-ts.config.ts
frontend/apps/web/src/lib/hey-api.ts

# Queries - Next.js server
frontend/apps/web/src/app/dashboard/items/[[...page]]/page.tsx
frontend/apps/web/src/components/dashboard/home/list-recent-items.tsx

# Queries - Next.js client
frontend/apps/web/src/app/page.tsx

# Next.js API proxy endpoint
frontend/apps/web/src/app/api/client-proxy/[...path]/route.ts

# Mutations

# Server actions
frontend/apps/web/src/utils/actions.ts
frontend/apps/web/src/actions/item.ts
frontend/apps/web/src/actions/auth.ts

# Forms
frontend/apps/web/src/components/dashboard/items/form-item-create.tsx
frontend/apps/web/src/components/dashboard/items/dropdown-item.tsx

Conclusion

Congratulations on reading this far. It was long to read, but also long to write and figure out.

Next.js is a comprehensive full-stack meta-framework, but there are situations where you might want to use it with a backend written in a completely different programming language. In this tutorial, we have demonstrated that it is entirely possible to bridge the modern Next.js ecosystem with a powerful Python-based FastAPI backend while preserving the full benefits of React Server Components and Server Actions. This approach shows that you don’t have to compromise on developer experience or type safety when using a non-TypeScript backend.

The key idea is that Next.js remains the only thing the browser ever talks to. FastAPI is treated as an internal service behind the Next.js server. This keeps a clean React mental model: components call server actions, server actions call strongly typed OpenAPI clients, and only the Next.js server handles cookies, authentication, and cross-service communication. From the client’s perspective, nothing changes.

The result is a stack where FastAPI does what it does best (Python, validation, data, ML, background jobs) and Next.js does what it does best (React, rendering, routing, and UX), without either one leaking into the other.

Feel free to explore the complete implementation in the accompanying repository. It serves as a solid starting point for anyone building AI-powered or data-intensive applications that require the rich AI/ML ecosystem and library availability of Python on the backend, along with the developer experience of modern React on the front.

Have you faced similar challenges and used a different approach? Feel free to share in the comments. I would love to hear about your experience.

References

Server Actions, Next.js docs https://nextjs.org/docs/app/getting-started/updating-data
HttpOnly cookie PR https://github.com/fastapi/full-stack-fastapi-template/pull/1606
HttpOnly cookie branch https://github.com/sinkozs/full-stack-fastapi-template/tree/use-http-only-cookie
@hey-api/openapi-ts configuration docs https://heyapi.dev/openapi-ts/configuration
@hey-api/client-next client docs https://heyapi.dev/openapi-ts/clients/next-js
Connect React Hook Form with server action trick, blog article https://github.com/orgs/react-hook-form/discussions/11832#discussioncomment-11832211, https://dev.to/emmanuel_xs/how-to-use-react-hook-form-with-useactionstate-hook-in-nextjs15-1hja

Why runtime environment variables for a pure static website are a bad idea

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sun, 21 Dec 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’; import OpenGraphBakedUrlImage from ’../../../../content/post/2025/12-21-static-website-runtime-environment-variables/_images/open-graph-baked-url.png’;

Introduction

This article is a live, “try and see” practical experiment. I will use this exact blog project, a static Astro website, and try to package it as a reusable Nginx Docker image that requires just a single .env file to run in any environment.

I will use this tutorial as a starting point: https://phase.dev/blog/nextjs-public-runtime-variables/. It describes the idea and the process, uses Next.js, and includes a shell replacement script that we can work with.

Goal

Let’s define our goal and requirements at the beginning. We will use a pure static website that consists only of assets, without any server-side runtime code. This is important because hosting a static website is simple, free, and widely available. We want reusable builds where no environment-specific data is bundled into the application code, but instead read from a single .env file.

Now it’s time to identify which data is environment-specific. In this particular website, these are the four environment variables:


# some example values

SITE_URL=http://localhost:8080
PLAUSIBLE_SCRIPT_URL=https://plausible.arm1.nemanjamitic.com/js/script.js
PLAUSIBLE_DOMAIN=nemanjamitic.com
PREVIEW_MODE=true

Notice the format of these variables: (almost) three URLs and one boolean value.

Now let’s clearly understand what makes this challenging. Once compiled, a static website becomes just a collection of static assets (.html, .js, .css, .jpg, etc.) deployed to an Nginx web folder. This practically means there is no server runtime and we cannot run any code, which greatly limits our power and control. The only runtime we have is the browser runtime, which is only useful to a limited extent when it comes to loading environment-specific data at runtime.

Options

Before we start following the original tutorial and move on to the practical implementation, let’s reconsider the possible alternatives at our disposal. I already touched on this in my previous article about runtime environment variables: https://nemanjamitic.com/blog/2025-12-13-nextjs-runtime-environment-variables#alternative-approaches, but let’s go through them once again, since this article is entirely dedicated to runtime variables in purely static websites. Here are the alternatives:

The first option is the original idea from the tutorial: using a shell script with the sed command to string-replace all placeholder values that were inlined at build time directly in the bundle. We will include this script in the Docker entrypoint so it can run when the container starts and insert environment-specific values. This way, we achieve an effect similar to start-time variables.
Nginx has certain features for injecting environment variables into responses. For example, sub_filter can perform string replacement in each text response. The upside of this approach is that the variables are truly runtime and will reflect any change immediately, but the major disadvantage is the performance overhead, especially under heavy traffic.
Another method is to rely on the JavaScript runtime in the browser and dynamically host and load <script src="./env.js"></script> in the HTML of your root layout. It can have a few variations. For example, you can create an env.template file that holds placeholders for the replacement script called in the entrypoint, or you could even inline env.js directly in the nginx.conf itself:

location = /env.js {
  default_type application/javascript;
  return 200 "window.__CONFIG__ = { API_URL: '$API_URL' }";
}

The common aspect of all these approaches is that you need to run client-side JavaScript on a given page to access runtime variables through the `window` object, for example `window.__RUNTIME_ENV__.MY_VAR`. This is a disadvantage on its own and comes with a performance cost. For example, Astro enforces a __zero client-side JavaScript by default__ strategy precisely for performance reasons.

Another deal-breaker is that files which do not run client-side JavaScript cannot access these variables. Examples include `sitemap.xml` and `robots.txt`, which are very important SEO-related files, especially for static websites.

Implementation

And finally, the most important and interesting part: the practical implementation of the most promising alternative. We have spent quite a lot of time on the introduction and the overview of alternatives.

You can review the exact implementation in the pull request at this link: https://github.com/nemanjam/nemanjam.github.io/pull/28

Replacement script

Let’s start with the shell replacement script. We will use the replace-variables.sh script from the tutorial as a starting point. After some trial and error and a few iterations, this is what the final script looks like:

scripts/replace-variables.sh

#!/bin/sh

# Note: sh shell syntax, NO bash in Alpine Nginx

# Summary:
# 1. Required variables are checked to be defined.
# 2. Optional variables are initialized to empty string if undefined.
# 3. All files in DIST_PATH with specified extensions are processed.
# 4. Placeholders of the form PREFIX_VAR are replaced with actual environment variable values.

# Define required and optional environment variables (space-separated strings for /bin/sh)
REQUIRED_VARS="SITE_URL PLAUSIBLE_SCRIPT_URL PLAUSIBLE_DOMAIN"
OPTIONAL_VARS="PREVIEW_MODE"

# Variables that are baked as URL-shaped placeholders (https://BAKED_VAR)
BAKED_URL_VARS="SITE_URL PLAUSIBLE_SCRIPT_URL"

PREFIX="BAKED_"
# Baked has always https://BAKED_VAR
# Will be replaced with whatever VAR value http:// or https:// or any string
URL_PREFIX="https://${PREFIX}"
FILE_EXTENSIONS="html js xml json"

# Read DIST_PATH from environment variable
# Do not provide a default; it must be set
if [ -z "${DIST_PATH}" ]; then
    echo "ERROR: DIST_PATH environment variable is not set."
    exit 1
fi

# Check if the directory exists
if [ ! -d "${DIST_PATH}" ]; then
    echo "ERROR: DIST_PATH directory '${DIST_PATH}' does not exist."
    exit 1
fi

# Check required environment variables are defined
for VAR in $REQUIRED_VARS; do
    # POSIX sh-compatible indirect expansion
    eval "VAL=\$$VAR"
    if [ -z "$VAL" ]; then
        echo "$VAR required environment variable is not set. Please set it and rerun the script."
        exit 1
    fi
done

# Default optional variables to empty string
for VAR in $OPTIONAL_VARS; do
    eval "VAL=\$$VAR"
    if [ -z "$VAL" ]; then
        eval "$VAR=''"
    fi
done

# Combine required and optional variables into a single string
ALL_VARS="$REQUIRED_VARS $OPTIONAL_VARS"

# Find and replace placeholders in files
for ext in $FILE_EXTENSIONS; do

    # Use 'find' to recursively search for all files with the current extension
    # -type f ensures only regular files are returned
    # -name "*.$ext" matches files ending with the current extension
    find "$DIST_PATH" -type f -name "*.$ext" |

    # Pipe the list of found files into a while loop for processing
    while read -r file; do
        # Read file once into a variable for faster checks
        FILE_CONTENT=$(cat "$file")
        FILE_REPLACED=0

        # Loop over each variable that needs to be replaced
        for VAR in $ALL_VARS; do

            PLACEHOLDER="${PREFIX}${VAR}"
            URL_PLACEHOLDER="${URL_PREFIX}${VAR}"

            # Get variable value (POSIX sh compatible)
            # Optional variables are guaranteed to have a value (possibly empty)
            eval "VALUE=\$$VAR"

            # Escape VALUE for sed replacement:
            # - & → \&  (ampersand is special in replacement, expands to the whole match)
            # - | → \|  (pipe is used as sed delimiter)
            ESCAPED_VALUE=$(printf '%s' "$VALUE" | sed 's/[&|]/\\&/g')

            # Handle baked URL variables (e.g. https://BAKED_SITE_URL)
            # These must be replaced as full URLs to avoid invalid or double protocols
            for URL_VAR in $BAKED_URL_VARS; do
                # Check if current variable is a baked URL var
                if [ "$VAR" = "$URL_VAR" ]; then
                    # Skip if URL placeholder is not present in this file, 2 - parent loop, i - case insensitive
                    echo "$FILE_CONTENT" | grep -qi "$URL_PLACEHOLDER" || continue 2

                    # Print file name once on first replacement
                    if [ "$FILE_REPLACED" -eq 0 ]; then
                        echo "Processing file: $file"
                        FILE_REPLACED=1
                    fi

                    # Log replacement
                    # Log $VALUE, because $ESCAPED_VALUE is just for sed
                    echo "replaced: $URL_PLACEHOLDER -> $VALUE"

                    # Replace full URL placeholder in-place, I - case insensitive
                    sed -i "s|$URL_PLACEHOLDER|$ESCAPED_VALUE|gI" "$file"

                    # Continue with next variable, 2 - parent loop
                    continue 2
                fi
            done

            # Note: exits loop early if placeholder is not present in the file, i - case insensitive
            echo "$FILE_CONTENT" | grep -qi "$PLACEHOLDER" || continue

            # Print file name only on first replacement
            if [ "$FILE_REPLACED" -eq 0 ]; then
                echo "Processing file: $file"
                FILE_REPLACED=1
            fi

            # Log what is replaced
            if [ -z "$VALUE" ]; then
                echo "replaced: $PLACEHOLDER -> (empty)"
            else
                echo "replaced: $PLACEHOLDER -> $VALUE"
            fi

            # Perform in-place replacement using sed
            # "s|pattern|replacement|g" replaces all occurrences in the file
            # The | delimiter is used instead of / to avoid conflicts with paths
            # I - case insensitive
            # Example: BAKED_SITE_URL → https://example.com
            sed -i "s|$PLACEHOLDER|$ESCAPED_VALUE|gI" "$file"

        done
    done
done

I left the verbose comments in the script for clarity, but here are the most important points to keep in mind:

It uses #!/bin/sh shell syntax because bash is not available by default in the Nginx Alpine image, and we want to keep the image size minimal.
It uses the DIST_PATH environment variable as an argument to pass the path to the bundle into the script. At the top of the script, we validate and initialize all hardcoded and passed arguments, and exit early if invalid data is provided.
We support and handle required and optional variables separately via REQUIRED_VARS and OPTIONAL_VARS.
This is the important part: we treat ordinary string variables and “URL-shaped” string variables separately, assign them distinct identifying prefixes (PREFIX="BAKED_" and URL_PREFIX="https://${PREFIX}"), and use the corresponding placeholders (PLACEHOLDER="${PREFIX}${VAR}" and URL_PLACEHOLDER="${URL_PREFIX}${VAR}"). This is necessary because a baked build for Astro will fail if we pass an invalid URL to the site option in astro.config.ts.
We continue to use sed instead of envsubst because it gives us more control over string replacement. Additionally, we escape special characters such as & and | in the sed input.
We log processed files and replaced variables for debugging and monitoring purposes.

Nginx image entrypoint

Once we have a working and tested replacement script, it’s time to include it in the Docker entrypoint so it can run when the container starts and replace baked placeholders in the bundle with the actual environment variables from the current environment.

Fortunately, the Nginx Alpine image already provides a dedicated pre-start folder, /docker-entrypoint.d, intended for entrypoint scripts. We define the ENV DIST_PATH=/usr/share/nginx/html environment variable because replace-variables.sh expects it as an input argument. Additionally, we include a 10- prefix in the script file name to define the execution order of scripts in the entrypoint. We want our script to run before any others.

Below is the complete runner stage of the docker/Dockerfile:

# -------------- runner --------------

FROM nginx:1.29.1-alpine3.22-slim AS runtime
COPY ./docker/nginx.conf /etc/nginx/nginx.conf

# set dist folder path for both web folder and script arg
ENV DIST_PATH=/usr/share/nginx/html

# sufficient for SSG
COPY --from=build /app/dist ${DIST_PATH}

# copy to pre-start scripts folder
# 10-xxx controls the order
COPY ./scripts/replace-variables.sh /docker-entrypoint.d/10-replace-variables.sh
RUN chmod +x /docker-entrypoint.d/10-replace-variables.sh

EXPOSE 8080

Setting the variables for test

Finally, we define the actual values for the environment variables in the docker-compose.yml for testing:

services:
  nmc-docker:
    container_name: nmc-docker
    build:
      context: .
      dockerfile: ./docker/Dockerfile

    platform: linux/amd64
    restart: unless-stopped
    environment:
      SITE_URL: 'http://localhost:8080'
      PLAUSIBLE_SCRIPT_URL: 'https://plausible.arm1.nemanjamitic.com/js/script.js'
      PLAUSIBLE_DOMAIN: 'nemanjamitic.com'
      PREVIEW_MODE: 'true'
    ports:
      - '8080:8080'
    networks:
      - default

With this, we are all set to run the container and apply the start-time environment variables. The original tutorial mentions certain trade-offs of this method, such as slower container startup and the risk of unintentional string replacements, but these are tolerable for our use case. But let’s actually see if there is more to it, in detail, in the next section.

Issues

This is by far the most important section of the article. If you wanted a “TLDR” of the article, this would be it. Let’s review the issues one by one:

You can have only string variables

You can have only string variables (or unions of string literals - enums). Let’s illustrate this with a code example:

My website uses one boolean variable, PLAUSIBLE_DOMAIN, that enables preview of draft articles. Initially, it’s typed and validated as a boolean in both Zod and Astro schemas:

// Zod schema

export const booleanValues = ['true', 'false', ''] as const;

export const processEnvSchema = z.object({
  // ...
  PREVIEW_MODE: z
    .enum(booleanValues)
    .transform((value) => value === 'true')
    .default(false),
  // ...

// Astro schema

export const envSchema = {
  schema: {
    // ...
    PREVIEW_MODE: envField.boolean({
      context: 'server',
      access: 'public',
      default: false,
    }),
    // ...

But obviously, since our replacement method requires unique, baked placeholder values, we cannot use true and false directly. We must convert it to a union of string literals so that the baked placeholder value is valid at build time and the build can succeed. The type becomes: PREVIEW_MODE: 'true' | 'false' | '' | 'BAKED_PREVIEW_MODE'. Here is the updated code:

// Zod schema

// src/utils/baked.ts
export const baked = <T extends string>(name: T): `BAKED_${T}` => `BAKED_${name}` as `BAKED_${T}`;

export const booleanValues = ['true', 'false', ''] as const;

export const processEnvSchema = z.object({
  // ...
  // Note: string union, not boolean, for baked
  PREVIEW_MODE: z
    .enum(booleanValues)
    .or(z.literal(baked('PREVIEW_MODE')))
    .default('false'),
  // ...

// Astro schema

// src/utils/baked.ts
export const baked = <T extends string>(name: T): `BAKED_${T}` => `BAKED_${name}` as `BAKED_${T}`;

export const envSchema = {
  schema: {
    // ...
    PREVIEW_MODE: envField.enum({
      context: 'server',
      access: 'public',
      values: [...booleanValues, baked('PREVIEW_MODE')],
      default: 'false',
    }),
    // ...

And here is an example of how to use the new quasi-boolean variable:

export const isPreviewMode = (): boolean => CONFIG_SERVER.PREVIEW_MODE === 'true';

Issue no. 1 conclusion: It’s a bit of a workaround, but acceptable.

You must handle URL-shaped variables separately

You must bake and replace URL-shaped variables separately for the build to pass. The most typical and obvious variable is SITE_URL, which is assigned to the site: option inside astro.config.ts. This option is deeply integrated into the framework, used for routing, and passed within default Astro.props. If left undefined, Astro defaults to http://localhost:port. On the other hand, if you set it to a non-URL baked placeholder, e.g., BAKED_SITE_URL, the build will fail, as Astro internally passes it into the native new URL() constructor.

We solve this by treating URL-shaped variables separately, giving them a different prefix and replacement rule. This way, a baked placeholder can be a valid URL, e.g., https://BAKED_SITE_URL, allowing the build to succeed.

PREFIX="BAKED_"
URL_PREFIX="https://${PREFIX}"

# ...

PLACEHOLDER="${PREFIX}${VAR}"
URL_PLACEHOLDER="${URL_PREFIX}${VAR}"

# ...

sed -i "s|$PLACEHOLDER|$ESCAPED_VALUE|gI" "$file"
sed -i "s|$URL_PLACEHOLDER|$ESCAPED_VALUE|gI" "$file"

Fortunately, Astro doesn’t transform the site: option internally, so the placeholder maintains its integrity and this works as expected.

Issue no. 2 conclusion: It was a close call, but it works and is acceptable.

Open Graph images with runtime data are impossible

This issue was somewhat obvious, but I still failed to predict it. Open Graph images are typically very important for SEO and the reach of static websites, especially for blogs or content-focused sites whose success largely depends on sharing on social networks.

One obvious piece of information that an Open Graph image should include is the website URL. Since we made SITE_URL a start-time variable, only its placeholder is available at build time.

On my website, I use an Astro static endpoint in src/pages/api/open-graph/[…route].png.ts to dynamically render .png images from a Satori HTML template. This endpoint is called at build time, and the rendered .png images are included in the bundle. There is nothing we can do at runtime.

The obvious consequence is that we can either:

Omit runtime data (SITE_URL) from the Open Graph images, or use a unique and consistent SITE_URL_CANONICAL for all environments.
Externalize the Open Graph endpoint and implement it as a dynamic API endpoint with a full Node.js runtime that reads the request object and renders images dynamically. This would require a separate backend app and hosting, and the complexity of this setup outweighs the complexity of rebuilding the images for each environment.

Obviously, both options 1 and 2 are bad trade-offs and beyond what’s acceptable. At this point, it’s better to keep the existing setup and rebuild the website for each environment.

Issue no. 3 conclusion: It is unacceptable.

You must transform variables in client-side JavaScript

Often, you need to transform a URL variable, for example to extract the domain from the URL. Obviously, you can’t do this in Astro TypeScript code that runs at build time, because it would use values from the baked placeholders and inline the incorrect placeholder domain into the bundle. You must keep the baked variable intact during the build.

The solution is to move that transformation to client-side JavaScript by including a <script /> with the transformation code that runs on page load. As mentioned earlier, this degrades page performance and SEO, because the client needs to parse and run the JavaScript to get the final content of the page.

<!-- title attribute needs just the domain from the SITE_URL -->

<link
  id="rss-link"
  rel="alternate"
  type="application/rss+xml"
  data-SITE_URL={SITE_URL}
  title="RSS feed"
  href="/api/feed/rss"
/>

<script>
  // or read it form DOM and data-SITE_URL attribute
  const siteUrl = window.__RUNTIME_ENV__.SITE_URL;
  const hostname = new URL(siteUrl).hostname;

  const link = document.getElementById('rss-link');

  if (link && hostname) {
    link.title = 'RSS feed for ' + hostname;
  }
</script>

As mentioned, this makes the code messy, overly verbose, and error-prone. It degrades page performance and SEO, and defeats Astro’s zero client-side JavaScript by default strategy. All of this is, once again, beyond acceptable.

Issue no. 4 conclusion: It is unacceptable.

Completed code

Pull request: https://github.com/nemanjam/nemanjam.github.io/pull/28

Conclusion

The experiment partially worked, but the results clearly show why a reusable build for a pure static website is a bad idea in practice.

It is possible to inject start-time environment variables into a static bundle using shell scripts, Nginx entrypoints, and carefully crafted placeholders. With enough discipline, you can even make builds pass by separating string variables from URL-shaped variables, bending schemas, and moving certain logic into client-side JavaScript. However, every step in that direction erodes the very benefits that make static websites attractive in the first place.

A pure static website has no server runtime, no request context, and no dynamic execution environment. As soon as you try to retrofit runtime configuration into that model, you run into hard limitations. Non-string values must be faked, URLs must be handled as special cases, Open Graph images become impossible to render correctly, and any transformation of environment data leaks into client-side JavaScript. At that point, you are no longer building a clean static site, but a fragile system of workarounds that hurts performance, SEO, and maintainability.

If you already have a dynamic, server-side rendered website that uses a server runtime and only need a few static pages, then a few workarounds to benefit from runtime environment variables and reusable builds can represent a reasonable trade-off. I already described that use case in the previous article: https://nemanjamitic.com/blog/2025-12-13-nextjs-runtime-environment-variables.

On the other hand, if you want the simplicity, performance, and reliability of a pure static website, then accept rebuilds as part of the workflow.

References

Starting point tutorial: https://phase.dev/blog/nextjs-public-runtime-variables/

Runtime environment variables in Next.js - build reusable Docker images

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sat, 13 Dec 2025 00:00:00 GMT

Classification of environment variables by dimension

At first glance, you might think of environment variables as just a few values needed when the app starts, but as you dig deeper, you realize it’s far more complex than that. If you don’t clearly understand the nature of the value you’re dealing with, you’ll have a hard time running the app and managing its configuration across multiple environments.

Let’s identify a few dimensions that any environment variable can have:

When: build-time, start-time, run-time
Where: server (static, SSR (request), ISR), client
Visibility: public, private
Requirement: optional, required
Scope: common for all environments (constant, config), unique
Mutability: constant, mutable
Git tracking: versioned, ignored

There are probably more, but this is enough to understand why it can be challenging to manage. We could go very wide, write a long article and elaborate each of these and their combinations, but since the goal of this article is very specific and practical - handling Next.js environment variables in Docker, we’ll focus just on the top three items from the list. Still, it was worth mentioning the others for context.

Next.js environment variables

If you search the Next.js docs, you will find a guide on environment variables, such as .env* filenames that are loaded by default, their load order and priority, variable expansion, and exposing and inlining variables with the NEXT_PUBLIC_ prefix into the client. In the self-hosting guide, you will also find a paragraph about opting into dynamic rendering so that variable values are read on each server component render, not just once at build time, and how this is useful for reusable Docker images.

The problem with build-time environment variables

A common scenario after reading the docs is to be aware of NEXT_PUBLIC_ and server variables and then scatter them around the codebase. If you use Docker and GitHub Actions, you will typically end up with something like this:

11c8a512…/frontend/Dockerfile

# Next.js app installer stage
FROM base AS installer
RUN apk update
RUN apk add --no-cache libc6-compat

# Enable pnpm
ENV PNPM_HOME="/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable
RUN corepack prepare pnpm@10.12.4 --activate

WORKDIR /app

# Copy monorepo package.json and lock files
COPY --from=builder /app/out/json/ .
# Install the dependencies
RUN pnpm install --frozen-lockfile

# Copy pruned source
COPY --from=builder /app/out/full/ .

# THIS: set build time env vars
ARG ARG_NEXT_PUBLIC_SITE_URL
ENV NEXT_PUBLIC_SITE_URL=$ARG_NEXT_PUBLIC_SITE_URL
RUN echo "NEXT_PUBLIC_SITE_URL=$NEXT_PUBLIC_SITE_URL"

ARG ARG_NEXT_PUBLIC_API_URL
ENV NEXT_PUBLIC_API_URL=$ARG_NEXT_PUBLIC_API_URL
RUN echo "NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL"

# Build the project
RUN pnpm turbo build

# ...

11c8a512…/.github/workflows/build-push-frontend.yml

name: Build and push Docker frontend

on:
  push:
    branches:
      - 'main'
  workflow_dispatch:

env:
  IMAGE_NAME: ${{ github.event.repository.name }}-frontend
  # THIS: set build time env vars
  NEXT_PUBLIC_SITE_URL: 'https://full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com'
  NEXT_PUBLIC_API_URL: 'https://api.full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com'

jobs:
  build:
    name: Build and push docker image
    runs-on: ubuntu-latest
    steps:

    # ...

      - name: Build and push Docker image
        uses: docker/build-push-action@v6
        with:
          context: ./frontend
          file: ./frontend/Dockerfile
          platforms: linux/amd64,linux/arm64
          progress: plain
          # THIS: set build time args
          build-args: |
            "ARG_NEXT_PUBLIC_SITE_URL=${{ env.NEXT_PUBLIC_SITE_URL }}"
            "ARG_NEXT_PUBLIC_API_URL=${{ env.NEXT_PUBLIC_API_URL }}"
          push: true
          tags: ${{ secrets.DOCKER_USERNAME }}/${{ env.IMAGE_NAME }}:latest

11c8a512…/frontend/package.json

{
  "name": "full-stack-fastapi-template-nextjs",
  "version": "0.0.1",
  "private": true,
  "scripts": {
    "build": "turbo build",
    "dev": "turbo dev",
    "standalone": "turbo run standalone --filter web",
    // THIS: set build time args
    "docker:build:x86": "docker buildx build -f ./Dockerfile -t nemanjamitic/full-stack-fastapi-template-nextjs-frontend --build-arg ARG_NEXT_PUBLIC_SITE_URL='full-stack-fastapi-template-nextjs.local.nemanjamitic.com' --build-arg ARG_NEXT_PUBLIC_API_URL='api.full-stack-fastapi-template-nextjs.local.nemanjamitic.com' --platform linux/amd64 ."

  // ...

  },

  // ...

}

In the code above, we can see that our Next.js app requires the NEXT_PUBLIC_SITE_URL and NEXT_PUBLIC_API_URL environment variables at build time. These values will be inlined into the bundle during the build and cannot be changed later. This means the Dockerfile must pass them as the corresponding ARG_NEXT_PUBLIC_SITE_URL and ARG_NEXT_PUBLIC_API_URL build arguments when building the image.

Leaving them undefined would break the build because they are validated with Zod inside the Next.js app, and validation runs at both build time and run time. Stripping the NEXT_PUBLIC_ prefix would also break the build, even without Zod, if they are used in client code.

Consequently, we need to pass these build arguments whenever we build the Docker image, for example in GitHub Actions and in the local build script defined in package.json.

Using this method, we would get a functional Docker image, but with one major drawback: it can be used only in a single environment because the NEXT_PUBLIC_SITE_URL and NEXT_PUBLIC_API_URL values are baked into the image at build time and are immutable.

To make this crystal clear, whatever we set for the NEXT_PUBLIC_SITE_URL and NEXT_PUBLIC_API_URL environment variables at runtime will be ignored because they no longer exist in the Next.js app. After the build they are replaced with string literals in the JavaScript bundle.

If, besides production, you also have staging, preview, testing environments, or other production mirrors, you would need to maintain a separate image with its own configuration code, build process, and registry storage for each of them. This means a lot of overhead.

Many people find this impractical, which you can see from the popularity of such issues in the Next.js repository:

Better support for runtime environment variables #44628

Docker image with NEXT_PUBLIC_ env variables #17641

Not possible to use different configurations in staging + production #22243

The solution: run-time environment variables

The solution is obvious: we should prevent any use of build-time (stale, immutable) variables and read everything from the target environment at runtime. This also means avoiding any NEXT_PUBLIC_* client variables.

To implement this, we must be well aware of where and when a given component runs:

Server component - runs on the server, generated at build time or at request time
Static page - runs on the server, generated once at build time
Client component - runs in the browser, generated at build time or at request time

Server component

These components (or entire pages) are dynamically rendered on each request. They have access to any server data, including both public and private environment variables. No additional action is needed. In Next.js, we identify such components by their use of request resources such as cookies, headers, and connection:

import { cookies, headers } from 'next/headers';
import { connection } from 'next/server';

export default async function Page() {
  const headersList = await headers();
  const cookiesList = await cookies();

  await connection(); // void
}

Static page

Such a page is pre-rendered once at build time in the build environment. It has access to server data, but it is converted to a static asset at build time and is immutable at runtime. We have two options:

Convert it to a dynamic page that is rendered on the server on each request.

import { connection } from 'next/server';

export default async function Page() {

  // opt into dynamic rendering
  await connection();

  // ...
}

Set placeholder values for variables at build time and perform string replacement directly on the generated static HTML using sed or envsubst and a shell script included in ENTRYPOINT ["scripts/entrypoint.sh"] in the Dockerfile.

Note that these will be start-time variables, not true run-time variables, but most of the time that is sufficient because they are unique to each environment. However, they cannot change during the app’s run time once initialized.

We won’t go into much detail about this method, it could be a good topic for a future article since it is quite useful for static, presentational websites. If you want to read more, here is an interesting and practical tutorial: https://phase.dev/blog/nextjs-public-runtime-variables/.

Client component

Next.js prevents exposing any variables to the client without the NEXT_PUBLIC_ prefix, but since those are inlined at build time, we simply won’t use them. For exposing environment variables to client components, we have a few options:

Pass variables as props from the parent server component like any other value. This is simple and convenient.
Inside the dynamically generated root layout, render a <script /> tag that injects a window.__RUNTIME_ENV__ property into the global window object using the dangerouslySetInnerHTML attribute. We will actually use this method. Then, on the client, we can access the variables on the window object, for example window.__RUNTIME_ENV__.API_URL.

Also this is a good moment to validate runtime vars with Zod.

Here is the illustration code bellow:

import { connection } from 'next/server';

export const runtimeEnvSchema = z.object({
  SITE_URL: z.url().regex(/[^/]$/, 'SITE_URL should not end with a slash "/"'),
  API_URL: z.url().regex(/[^/]$/, 'API_URL should not end with a slash "/"'),
});

const RootLayout: FC<Props> = async ({ children }) => {
  await connection();
  
  const runtimeEnvData = {
    SITE_URL: process.env.SITE_URL,
    API_URL: process.env.API_URL,
  };

  // validate vars with Zod before injecting
  const parsedRuntimeEnv = runtimeEnvSchema.safeParse(runtimeEnvData);

  // if invalid vars abort
  if (!parsedRuntimeEnv.success) throw new Error('Invalid runtime environment variable found...');

  const runtimeEnv = parsedRuntimeEnv.data;

  return (
    <html lang="en">
      <body>
        {/* Inline JSON injection */}
        <script
          dangerouslySetInnerHTML={{
            __html: `window.__RUNTIME_ENV__ = ${JSON.stringify(runtimeEnv)};`,
          }}
        />

        {children}
      </body>
    </html>
  );
}

Same as for static pages: set placeholder values and use sed to replace them with a shell script inside the JavaScript bundle when the container starts.
Expose variables through a dynamic API endpoint and perform an HTTP fetch in client components. This is a legitimate method, but note that it will make the variables asynchronous.

We can see from this that the first two methods are the simplest and most convenient, so we will use them.

Note: Whenever an environment variable is available on the client, it is public by default. Make sure not to expose any secrets to the client.

`alizeait/next-public-env` package

We could do this manually as shown in the snippet above, but there is already the alizeait/next-public-env package that handles all of this and also provides some more advanced handling.

Check these 2 files for example:

Usage is obvious and straightforward: just define a Zod schema, mount <PublicEnv /> in the root layout, and use getPublicEnv() to access the variables wherever you need them.

You can see bellow how I did it:

# install package

pnpm add next-public-env

frontend/apps/web/src/config/process-env.ts

/** Exports RUNTIME env. Must NOT call getPublicEnv() in global scope. */
export const { getPublicEnv, PublicEnv } = createPublicEnv(
  {
    NODE_ENV: process.env.NODE_ENV,
    SITE_URL: process.env.SITE_URL,
    API_URL: process.env.API_URL,
  },
  { schema: (z) => getProcessEnvSchemaProps(z) }
);

frontend/apps/web/src/schemas/config.ts

import { z } from 'zod';

export const nodeEnvValues = ['development', 'test', 'production'] as const;

type ZodType = typeof z;

/** For runtime env. */
export const getProcessEnvSchemaProps = (z: ZodType) => ({
  NODE_ENV: z.enum(nodeEnvValues),
  SITE_URL: z.url().regex(/[^/]$/, 'SITE_URL should not end with a slash "/"'),
  API_URL: z.url().regex(/[^/]$/, 'API_URL should not end with a slash "/"'),
});

/** For schema type. */
export const processEnvSchema = z.object(getProcessEnvSchemaProps(z));

frontend/apps/web/src/app/layout.tsx

import { PublicEnv } from '@/config/process-env';

interface Props {
  children: ReactNode;
}

const RootLayout: FC<Props> = ({ children }) => (
  <html lang="en" suppressHydrationWarning>
    <body className={fontInter.className}>
      <PublicEnv />
      <ThemeProvider attribute="class" defaultTheme="light" enableSystem disableTransitionOnChange>
        {/* Slot with server components */}
        {children}
        <Toaster />
      </ThemeProvider>
    </body>
  </html>
);

export default RootLayout;

An example usage, for instance in instrumentation.ts, to log the runtime values of all environment variables for debugging purposes:

frontend/apps/web/src/instrumentation.ts

/** Runs only once on server start. */

/** Log loaded env vars. */
export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { prettyPrintObject } = await import('@/utils/log');
    const { getPublicEnv } = await import('@/config/process-env');

    prettyPrintObject(getPublicEnv(), 'Runtime process.env');
  }
};

Usage for `baseUrl` for OpenAPI client

This is another typical and very important spot for using the API_URL environment variable. What makes it tricky is that it is included and runs on both the server and in the browser, but it is defined in a single place.

However, alizeait/next-public-env resolves this complexity very well on its own, and you can simply use getPublicEnv() to get the API_URL value while letting the package handle the rest.

frontend/apps/web/src/lib/hey-api.ts

import { getPublicEnv } from '@/config/process-env';

/** Runtime config. Runs and imported both on server and in browser. */
export const createClientConfig: CreateClientConfig = (config) => {
  const { API_URL } = getPublicEnv();

  return {
    ...config,
    baseUrl: API_URL,
    credentials: 'include',
    ...(isServer() ? { fetch: serverFetch } : {}),
  };
};

Legitimate build-time environment variables

Variables that are the same for every environment can be left as NEXT_PUBLIC_ and inlined into the bundle. They should also be versioned in Git (their .env.* files). Since this is the case, the best approach is to store them as TypeScript constants directly in the source, because that is what they truly are - shared constants.

Build and deploy reusable Docker image

Build once - deploy everywhere. Use a single image and .env file with no redundancy.

Building

Now that we have eliminated all build-time variables by converting them to run-time environment variables, we can simply remove all build arguments and environment variables from the Dockerfile, Github Actions build workflow, package.json build scripts, etc.

Note: During the build phase of a Next.js app, the global scope is also executed. Therefore, if you read any environment variables, such as process.env.MY_VAR_XXX, your code must be able to handle a default undefined value without throwing exceptions, as this would break the build.

Tip: To access environment variables, always use getPublicEnv() inside components and functions. Never call getPublicEnv() or read process.env in the global scope, this way, you won’t need to handle undefined environment variables explicitly for the build to pass.

Simply remove all build arguments and build-time environment variables from the Dockerfile:

# Not needed anymore, remove all build args
ARG ARG_NEXT_PUBLIC_SITE_URL
ENV NEXT_PUBLIC_SITE_URL=$ARG_NEXT_PUBLIC_SITE_URL
RUN echo "NEXT_PUBLIC_SITE_URL=$NEXT_PUBLIC_SITE_URL"

ARG ARG_NEXT_PUBLIC_API_URL
ENV NEXT_PUBLIC_API_URL=$ARG_NEXT_PUBLIC_API_URL
RUN echo "NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL"

This is the cleaned up Dockerfile that I am using to build Next.js app inside the monorepo: frontend/Dockerfile.

Also, don’t forget to clean up unused build arguments from the Github Actions workflow and package.json scripts for building the Docker image. You can see mine here: .github/workflows/build-push-frontend.yml, frontend/package.json.

"scripts": {
  "docker:build:x86": "docker buildx build -f ./Dockerfile -t nemanjamitic/full-stack-fastapi-template-nextjs-frontend --platform linux/amd64 ."
},

Deployment

Once built, you can use that image to deploy to any environment. Naturally, you need to define and pass all runtime environment variables into the Docker container. In your docker-compose.yml, use the env_file: or environment: keys.

services:
  frontend:
    image: nemanjamitic/full-stack-fastapi-template-nextjs-frontend:latest
    container_name: full-stack-fastapi-template-nextjs-frontend
    restart: unless-stopped
    env_file:
      - .env
    environment:
      - PORT=3000
    
    # ...

SITE_URL=https://full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com
API_URL=https://api-full-stack-fastapi-template-nextjs.arm1.nemanjamitic.com
NODE_ENV=production

# ...

You can see docker-compose.yml and .env I am using here: apps/full-stack-fastapi-template-nextjs/docker-compose.yml, apps/full-stack-fastapi-template-nextjs/.env.example

Alternative approaches

In the Static page section, I already mentioned a few notes about runtime variables and static websites. Indeed, you have two options for runtime variables:

Convert the website from static to dynamically rendered SSR (rendered at request time). Note that this is a significant change: from this point, your website will require a Node.js runtime, which will greatly impact your deployment options, as you can no longer use static hosting.

This is overkill just for the purpose of having runtime environment variables. Use it only if your website has additional reasons to use SSR.
Perform string replacement directly on bundle assets using sed, envsubst, etc. This is the right approach. There are other options, such as the Nginx subs_filter config option, but be careful with it, as it runs on each request and can waste CPU.

Another option to consider is using an ./env.js file instead of the usual .env. You can then host it with Nginx and load it into the app using <script src="./env.js" />. After that, you can reference the variables with window.__RUNTIME_ENV__.MY_VAR.

Note that this won’t work well for usage in pure HTML pages. For example, Astro omits any client-side JavaScript by default, so you would need to use an additional inline <script /> tag to update the HTML, e.g., getElementById("my-id")?.textContent = window.__RUNTIME_ENV__.MY_VAR, which is less optimal than the string replacement method.

Here is a quick, approximate code for illustration:

// define variables

window.__RUNTIME_ENV__ = {
  SITE_URL: "https://my-static-website.com",
  PLAUSIBLE_DOMAIN: 'my-static-website.com',
  PLAUSIBLE_SCRIPT_URL: 'https://plausible.my-server.com/js/script.js',
};

# mount and host env.js file

my-static-website:
  image: nginx:1.29.1-alpine3.22-slim
  container_name: my-static-website
  restart: unless-stopped
  volumes:
    - ./website:/usr/share/nginx/html
    - ./env.js:/usr/share/nginx/html/env.js # this
    - ./nginx/nginx.conf:/etc/nginx/nginx.conf

  # ...

<!-- Load env.js file -->

<head>
  <meta charset="UTF-8" />
  <title>My static website</title>

  <script src="./env.js"></script>

  <!-- ... -->
</head>

<!-- example usage -->

<!-- example 1: assign var to text content -->
<span id="my-element"></span>

<script>
  const mySpan = document.getElementById('my-element');
  mySpan.textContent = window.__RUNTIME_ENV__.MY_VAR;
</script>

<!-- example 2: assign var to script attribute -->
<script>
  const script = document.createElement("script");
  script.defer = true;
  script.type = "text/partytown";

  // dynamically set attributes from runtime env
  script.dataset.domain = window.__RUNTIME_ENV__.PLAUSIBLE_DOMAIN;
  script.src = window.__RUNTIME_ENV__.PLAUSIBLE_SCRIPT_URL;

  document.head.appendChild(script);
</script>

So, to conclude, the best approach is to use a shell script with sed or envsubst and add it to the Nginx Dockerfile ENTRYPOINT or the docker-compose.yml command:. Here is the link to the already mentioned practical tutorial again: https://phase.dev/blog/nextjs-public-runtime-variables/.

Completed code

Next.js app repository: https://github.com/nemanjam/full-stack-fastapi-template-nextjs/tree/main/frontend/apps/web
Deployment repository: https://github.com/nemanjam/traefik-proxy/tree/main/apps/full-stack-fastapi-template-nextjs

The relevant files:

# 1. Next.js app repo

# https://github.com/nemanjam/full-stack-fastapi-template-nextjs/tree/e990a3e29b7af60831851ff6f909c34df6a7f800

git checkout e990a3e29b7af60831851ff6f909c34df6a7f800

# run-time vars configuration
frontend/apps/web/src/config/process-env.ts
frontend/apps/web/src/schemas/config.ts
frontend/apps/web/src/app/layout.tsx

# usages
frontend/apps/web/src/instrumentation.ts
frontend/apps/web/src/lib/hey-api.ts

# 2. Deployment repo

# https://github.com/nemanjam/traefik-prox/tree/f3c087184e851db20e65409a6dd145767dd9bc2b

git checkout f3c087184e851db20e65409a6dd145767dd9bc2b

apps/full-stack-fastapi-template-nextjs/docker-compose.yml
apps/full-stack-fastapi-template-nextjs/.env.example

Conclusion

If you go by inertia and mix and scatter run-time and build-time variables around the source code, build, and deployment configuration, you will end up with development and production environments that are difficult to manage, hard to debug and replicate bugs, have an unreliable deployment process, constantly require troubleshooting for missing or invalid environment variables, and result in redundant Docker images, among other issues.

So, take a proactive approach: understand properly and identify the variables you are dealing with. One way to do this is to leverage the power and convenience of run-time environment variables.

What approach do you use to manage environment variables in Next.js apps? Feel free to share your experiences and opinions in the comments.

References

How to use environment variables in Next.js, Next.js docs guide https://nextjs.org/docs/app/guides/environment-variables#runtime-environment-variables
How to self-host your Next.js application, Next.js docs guide https://nextjs.org/docs/app/guides/self-hosting#environment-variables
Better support for runtime environment variables #44628, Github discussion https://github.com/vercel/next.js/discussions/44628
Docker image with NEXT_PUBLIC_ env variables #17641, Github discussion https://github.com/vercel/next.js/discussions/17641
Not possible to use different configurations in staging + production #22243, Github discussion https://github.com/vercel/next.js/discussions/22243
Runtime variables for static website, tutorial https://phase.dev/blog/nextjs-public-runtime-variables/
Runtime Environment Variables in Next.js, concise overview https://dt.in.th/NextRuntimeEnv

Comparing BFS, DFS, Dijkstra, and A* algorithms on a practical maze solver example

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Thu, 31 Jul 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import MazeSolverClassDiagramImage from ’../../../../content/post/2025/07-31-maze-solver/_images/maze-solver-class-diagram.png’; import YarnDevImage from ’../../../../content/post/2025/07-31-maze-solver/_images/yarn-dev.png’; import YarnTestVerboseImage from ’../../../../content/post/2025/07-31-maze-solver/_images/yarn-test-verbose.png’; import YarnCoverageImage from ’../../../../content/post/2025/07-31-maze-solver/_images/yarn-coverage.png’;

Introduction

Pathfinding is a fundamental topic in computer science, with applications in fields like navigation, AI/ML, network routing, and many others. In this article, we compare four core pathfinding algorithms: breadth-first search (BFS), depth-first search (DFS), Dijkstra’s algorithm, and A* (A star) through a practical maze-solving example. We don’t just explain them in theory, we built a demo app where you can tweak maze inputs or edit the algorithm code and instantly see how it affects the output and efficiency.

One of the key takeaways is how a tiny change, just a single line in the code can drastically alter an algorithm’s behavior. This highlights how critical implementation details are, even when the overall structure looks the same.

Problem overview

Paths in a maze form a tree structure or a graph if the maze contains cycles. That’s why tree and graph traversal algorithms can be used for finding paths and the shortest path in a maze.

All 4 algorithms differ in just a few lines of code, but their behavior differs dramatically.

App architecture

We create a pragmatic, simplified OOP model of the maze and its behavior as a tradeoff, favoring clarity and concise instantiation of maze objects in tests.

Maze representation

A maze is represented as a binary matrix where 0 stands for a free space, 1 for a boundary, and * for a path. It also has start and end points. In the sense of a weighted graph 0 cell has zero weight and cell 1 has infinite weight.

export class Maze implements IMaze {
  private board: number[][];
  private start: Coordinate;
  private end: Coordinate;

  // ...
}

export interface Coordinate {
  readonly x: number;
  readonly y: number;
}

// example maze
const testMaze: number[][] = [
  [0, 1, 0, 0, 0],
  [0, 1, 0, 1, 0],
  [0, 0, 0, 1, 0],
  [0, 1, 1, 1, 0],
  [0, 0, 0, 0, 0],
];

const start: Coordinate = { x: 0, y: 0 };
const end: Coordinate = { x: 4, y: 4 };

Class structure

We use polymorphism and a simplified Factory pattern. MazeSolver is an abstract class that declares the findPath() method, which is implemented in each derived concrete solver class.

/**
 * Abstract base class for maze solving algorithms.
 * Implements common functionality for maze solvers.
 */

export abstract class MazeSolver implements IMazeSolver {
  protected maze: IMaze;

  protected abstract findPath(): Coordinate[] | null;

  // ...
}

We use encapsulation and coding towards interface, separating interfaces from implementations by exposing only the public class methods through the interfaces.

Maze interface:

export interface IMaze {
  getBoard(): number[][];

  getStart: () => Coordinate;

  formatPath: (path: ReadonlyArray<Coordinate>) => string;

  // ...
}

Maze implementation:

export class Maze implements IMaze {
  public getBoard(): number[][] { ... }

  public getStart(): Coordinate { ... }

  public formatPath(path: ReadonlyArray<Coordinate>): string { ... }

  // ...
}

Maze usage:

const _maze2: IMaze = Maze.create(testMaze, start, end);

// ...

Class diagram:

Running the app

We install dependencies, run the app, and run the tests as usual, like any other TypeScript app.

# install dependencies
yarn install

# enable or disable logging in src/config.ts

# run the app in dev mode
yarn dev

# logging is disabled for tests by default

# run tests
yarn test

# run tests in verbose mode
yarn test-verbose

# generate coverage report
yarn coverage

We can see that different algorithms require a different number of steps for the same input maze. Example output for a given maze input:

We can run tests that ensure for each algorithm:

finds existing path
doesn’t find a false non existent path
finds the shortest path.

And calculate the code coverage:

Algorithms analysis and discussion

Now for the most important and interesting part: let’s analyze the algorithm’s code and explain how it affects their behavior and efficiency.

Unweighted graphs

BFS and DFS are basic traversal algorithms that ignore the weights of the edges, so they are applicable only to unweighted graphs.

The actual code for BFS and DFS differs by only a single line, but they exhibit completely opposite behavior. BFS uses a queue (FIFO), while DFS uses a stack (LIFO), and this has a fundamental impact on how the next node candidate for the path is selected.

// BFS
const { coord, path } = queue.shift()!;

// DFS
const { coord, path } = stack.pop()!;

This is the array of coordinates that represents possible directions for movement. This array is iterated over in the algorithm’s inner loop.

export const directions: Direction[] = [
  { x: 0, y: 1 }, // up
  { x: 1, y: 0 }, // right
  { x: 0, y: -1 }, // down
  { x: -1, y: 0 }, // left
];

Here is the complete BFS implementation (since all 4 algorithms share most of the same base) so we can have a better idea of what we are working with:

export class MazeSolverBFS extends MazeSolver {
  /**
   * Implements the Breadth-First Search (BFS) algorithm to find a path from the start to the end of the maze.
   */
  protected findPath(): Coordinate[] | null {
    const start = this.maze.getStart();

    // Initialize the BFS queue with the start position.
    const queue: BFSQueueElement[] = [{ coord: start, path: [start] }];

    // Keep track of visited coordinates (as strings).
    const visited = new Set<string>();
    visited.add(`${start.x},${start.y}`);

    while (queue.length > 0) {
      // Count iterations.
      this.incrementStep();

      // The most important line. 
      // FIFO - Takes the oldest element in the queue.
      const { coord, path } = queue.shift()!;

      // Check if end and exit.
      if (this.maze.isEnd(coord)) {
        return path;
      }

      // Print the current state of the maze.
      this.printBoard(coord, visited, path);

      // Always loops 4 times.
      for (const direction of directions) {
        // Calculate the next coordinate by applying the direction.
        const nextCoord: Coordinate = {
          x: coord.x + direction.x,
          y: coord.y + direction.y,
        };
        // Create a key for nextCoord (to check for uniqueness in the visited set).
        const coordKey = `${nextCoord.x},${nextCoord.y}`;

        // If nextCoord is not visited, is within bounds, and is walkable, add it to the potential path.
        if (
          !visited.has(coordKey) &&
          this.maze.isWithinBounds(nextCoord) &&
          this.maze.isWalkable(nextCoord)
        ) {
          visited.add(coordKey);
          queue.push({ coord: nextCoord, path: [...path, nextCoord] });
        }
      }
    }

    // Return null if no path to the end is found.
    return null;
  }
}

BFS

Since BFS uses a queue, it respects this structure and attempts to change direction in every iteration of the outer loop. Without obstacles and boundaries, this causes the algorithm to thoroughly inspect nodes closer to the starting node before moving further away. That’s why BFS can be inefficient for large trees and graphs where the end node is very distant from the starting node.

DFS

In contrast, DFS also respects the initial order in the directions array but prioritizes the earlier elements. So, in the example above, it will always attempt to apply the up direction first before exploring other directions. Without obstacles and boundaries, this causes the algorithm to inspect distant nodes in a straight line. DFS can be efficient for finding a distant end node but can also be very inefficient for finding a nearby node if it happens to be in a different direction.

Weighted graphs

Not all graphs have edges with uniform weights. In such cases, we must use algorithms that are aware of weights (the cost between two nodes), such as Dijkstra and A*.

Dijkstra

Dijkstra’s algorithm is aware of the cost between two nodes (edge weight) and takes it into account when selecting the next node. It uses a priority queue to keep track of the cost history.

// Take the first element from the priority queue.
// Choose the node that ads minimal cost.
queue.sort((a, b) => a.cost - b.cost);
const { coord, path, cost } = queue.shift()!;

// ...

// Test how much cost every new node ads to the path before adding it to the queue.
const nextCost = cost + this.maze.getCost(nextCoord);
// ...
if( ... && !costMap.has(coordKey) || nextCost < costMap.get(coordKey)!)

Dijkstra keeps a history of the cost of the current path and when selecting the next node chooses the node that adds the minimal cost. If there are cycles it may access the same node from multiple paths and will choose the one with the minimal weight (shortest path). In graphs with constant edge weights it reduces to BFS. This can be observed in the screenshot above, where both BFS and Dijkstra take an equal number of steps because the maze has uniform weights of 1 and Infinity.

A*

A* is the same as Dijkstra but besides keeping the history, it uses a heuristic function to predict the future - the direction in which the end node could be.

protected heuristic(a: Coordinate, b: Coordinate): number {
    // Manhattan distance as the heuristic.
    // Can only move horizontally or vertically, not diagonally. In "rectangles".
    return Math.abs(a.x - b.x) + Math.abs(a.y - b.y);
}

// ...

openSet.sort(
  (a, b) =>
    // The most important line. The only difference from Dijkstra.
    // Cost = history + Manhattan distance from the end node.
    // prettier-ignore
    (a.cost + this.heuristic(a.coord, end)) -
    (b.cost + this.heuristic(b.coord, end))
);

If the heuristic function is well chosen it will make A* more efficient than the before mentioned algorithms. Consequently, if the heuristic function is poorly chosen it will degrade the algorithm efficiency.

Completed code

Maze solver: https://github.com/nemanjam/maze-solver

Conclusion

In this example, we can see how algorithm analysis and design is a very sensitive and subtle discipline that leaves no room for low focus or a lack of understanding of the domain. Although BFS, DFS, Dijkstra, and A* share most of their implementation, even a subtle change in the code can lead to a dramatic change in behavior.

In the demo app, you can tweak the predefined mazes in the tests/fixtures/*.txt files and make your own observations. You can also check the resources and interactive playground listed in the References section.

Have you experimented with maze-solving and pathfinding algorithms before? Let me know in the comments.

References

Some visualized algorithms behavior https://www.youtube.com/watch?v=GC-nBgi9r0U
BFS vs DFS, basic overview and implementation https://www.geeksforgeeks.org/difference-between-bfs-and-dfs/
BFS vs Dijkstra for unweighted and weighted graphs https://www.baeldung.com/cs/graph-algorithms-bfs-dijkstra
BFS vs Dijkstra similarities https://stackoverflow.com/a/52676408/4383275
Visual playgrounds https://visualmazesolver.vercel.app/, http://qiao.github.io/PathFinding.js/visual/
Starter project, Typescript, Jest https://github.com/julianmateu/hello-ts

Load balancing multiple Rathole tunnels with Traefik HTTP and TCP routers

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Thu, 29 May 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import TraefikLoadBalancerArchitectureImage from ’../../../../content/post/2025/05-29-traefik-load-balancer/_images/traefik-load-balancer-architecture.png’;

Introduction

This article is a continuation of Expose home server with Rathole tunnel and Traefik article, which explains how to permanently host websites from home by bypassing CGNAT. That setup works well for exposing a single home server (like a Raspberry Pi, server PC, or virtual machine), but it has a limitation: it requires one VPS (or at least one public network interface) per home server. This is because the Rathole server exclusively uses ports 80 and 443.

But it doesn’t have to be like this. We can reuse a single Rathole server for many tunnels and home servers, we just need a tool to load balance their traffic, as long as our VPS’s network interface provides enough bandwidth for our websites and services.

This article explains how to achieve that using Traefik HTTP and TCP routers.

Prerequisites

A working Rathole tunnel setup from the previous article (including a VPS and a domain name)
More than one home server (Raspberry Pi, server PC, virtual machine, or LXC container)

Architecture overview

The problem

The main problem here is that we can’t bind more than one port to ports 80 and 443, respectively. Only one service can listen on a given port at the same time. So something like this doesn’t exist:

services:

  rathole:
    image: rapiz1/rathole:v0.5.0
    container_name: rathole
    command: --server /config/rathole.server.toml
    restart: unless-stopped
    ports:
      # host:container
      - 2333:2333
      - 80:5080,5081 # non existent syntax, can't bind two ports to a single port
      - 443:5443,5444 # same
    volumes:
      - ./rathole.server.toml:/config/rathole.server.toml:ro

Neither the operating system nor Docker provides load balancing functionality out of the box, we need to handle it ourselves.

The solution

We need to introduce a tool for load balancing traffic between tunnels. We will use Traefik, since we already use it with the Rathole client.

For each home server, we need 2 tunnels: one for HTTP and another for HTTPS traffic:

The tunnel for HTTP traffic will use the Traefik HTTP router as usual.
The tunnel for HTTPS traffic is a bit more interesting and challenging. For it, we will use the Traefik TCP router running in passthrough mode, since we don’t want to terminate HTTPS traffic on the VPS. Instead, we want to delegate certificate resolution to the existing Traefik instance running on the client side to preserve the current setup and architecture.

Reminder:

I already wrote about the advantage of resolving SSL certificates locally on the home server in the Architecture overview section of the previous article, but here is a quick recap:

The home server contains its entire configuration
The home server is tunnel-agnostic and reusable
No coupling between the tunnel server and client, no need to maintain state or version
Decoupled debugging
Improved security, an additional encryption layer further down the tunnel

Traefik load balancer and Rathole server

Since we passthrough encrypted HTTPS traffic, Traefik can’t read the subdomain from an HTTP request as usual. Instead, we will run the Traefik router in TCP mode, using the HostSNIRegexp matcher. This will run the router on layer 4 (TCP) instead of the usual layer 7 (HTTP).

For more in-depth info on how this works, you can read here: Server Name Indication (SNI).

Now that we understand the principle, we can get to the practical implementation.

Traefik HTTP and TCP routers

Below is the complete docker-compose.yml that defines the Traefik TCP router and the Rathole server with 2 HTTP/HTTPS tunnel pairs for 2 home servers: pi (OrangePi) and local (MiniPC), in my case.

version: '3.8'

services:
  traefik:
    image: traefik:v2.9.8
    container_name: traefik
    restart: unless-stopped
    command:
      - --providers.docker=true
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --entrypoints.traefik.address=:8080
      - --api.dashboard=true
      - --api.insecure=false
      - --log.level=DEBUG
      - --accesslog=true
    ports:
      - 80:80
      - 443:443
      - 8080:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - proxy
    labels:
      # Enable the dashboard at http://traefik.amd2.nemanjamitic.com
      # http for simplicity, no acme.json file
      - traefik.enable=true
      - 'traefik.http.routers.traefik.rule=Host(`traefik.amd2.${SITE_HOSTNAME}`)'
      - traefik.http.routers.traefik.entrypoints=web
      - traefik.http.routers.traefik.service=api@internal
      - traefik.http.routers.traefik.middlewares=auth
      - 'traefik.http.middlewares.auth.basicauth.users=${TRAEFIK_AUTH}'

  rathole:
    image: rapiz1/rathole:v0.5.0
    container_name: rathole
    command: --server /config/rathole.server.toml
    restart: unless-stopped
    ports:
      - 2333:2333
    volumes:
      - ./rathole.server.toml:/config/rathole.server.toml:ro
    networks:
      - proxy

    labels:
      ### HTTP port 80 - HTTP routers ###

      # pi.nemanjamitic.com, www.pi.nemanjamitic.com, *.pi.nemanjamitic.com, www.*.pi.nemanjamitic.com
      
      # Route *.pi.nemanjamitic.com -> 5080
      - 'traefik.http.routers.rathole-pi.rule=HostRegexp(`pi.${SITE_HOSTNAME}`, `www.pi.${SITE_HOSTNAME}`, `{subdomain:[a-z0-9-]+}.pi.${SITE_HOSTNAME}`, `www.{subdomain:[a-z0-9-]+}.pi.${SITE_HOSTNAME}`)'
      - traefik.http.routers.rathole-pi.entrypoints=web
      - traefik.http.routers.rathole-pi.service=rathole-pi
      - traefik.http.services.rathole-pi.loadbalancer.server.port=5080

      # Route *.local.nemanjamitic.com -> 5081
      - 'traefik.http.routers.rathole-local.rule=HostRegexp(`local.${SITE_HOSTNAME}`, `www.local.${SITE_HOSTNAME}`, `{subdomain:[a-z0-9-]+}.local.${SITE_HOSTNAME}`, `www.{subdomain:[a-z0-9-]+}.local.${SITE_HOSTNAME}`)'
      - traefik.http.routers.rathole-local.entrypoints=web
      - traefik.http.routers.rathole-local.service=rathole-local
      - traefik.http.services.rathole-local.loadbalancer.server.port=5081

      ### HTTPS port 443 with TLS passthrough - TCP routers ###

      # Route *.pi.nemanjamitic.com -> 5443
      - 'traefik.tcp.routers.rathole-pi-secure.rule=HostSNIRegexp(`pi.${SITE_HOSTNAME}`, `www.pi.${SITE_HOSTNAME}`, `{subdomain:[a-z0-9-]+}.pi.${SITE_HOSTNAME}`, `www.{subdomain:[a-z0-9-]+}.pi.${SITE_HOSTNAME}`)'
      - traefik.tcp.routers.rathole-pi-secure.entrypoints=websecure
      - traefik.tcp.routers.rathole-pi-secure.tls.passthrough=true
      - traefik.tcp.routers.rathole-pi-secure.service=rathole-pi-secure
      - traefik.tcp.services.rathole-pi-secure.loadbalancer.server.port=5443

      # Route *.local.nemanjamitic.com -> 5444
      - 'traefik.tcp.routers.rathole-local-secure.rule=HostSNIRegexp(`local.${SITE_HOSTNAME}`, `www.local.${SITE_HOSTNAME}`, `{subdomain:[a-z0-9-]+}.local.${SITE_HOSTNAME}`, `www.{subdomain:[a-z0-9-]+}.local.${SITE_HOSTNAME}`)'
      - traefik.tcp.routers.rathole-local-secure.entrypoints=websecure
      - traefik.tcp.routers.rathole-local-secure.tls.passthrough=true
      - traefik.tcp.routers.rathole-local-secure.service=rathole-local-secure
      - traefik.tcp.services.rathole-local-secure.loadbalancer.server.port=5444

networks:
  proxy:
    external: true

Let’s start with the most important part: the labels on the rathole container that define load balancing on the two tunnels.

First, we define two HTTP routers using the HostRegexp() matcher. It takes HTTP traffic from the entrypoint on port 80 and load balances it between two tunnels on ports 5080 and 5081.

The second pair of labels defines a TCP router that takes traffic from the HTTPS entrypoint on port 443, passes it through without decrypting, and load balances it between tunnels on ports 5443 and 5444. Note that with the HostSNIRegexp() matcher, you can’t include escaped dots (.) in the regex, you must repeat the entire domain sequence to handle the www variant of the domain.

Also note that we use separate regex variants to match the root subdomain explicitly, e.g. pi.nemanjamitic.com and www.pi.nemanjamitic.com for both HTTP and TCP routers.

That’s it, this is the main load balancing logic definition.

Note: Because we use HostRegexp() and HostSNIRegexp() on the server, you will need to use Host() and HostSNI() matchers for the Traefik running on the client side of the tunnel, or you will get 404 errors without additional configuration. Regex matchers on both the server and client sides seem to be too loose.

Rathole server config

Now it’s just left to write the config for the Rathole server that defines 2×2 tunnels. Just make sure to use a different token and port for each tunnel.

[server]
bind_addr = "0.0.0.0:2333"

[server.transport]
type = "noise"

[server.transport.noise]
local_private_key = "private_key"

# separated based on token, also can NOT use same ports

# pi
[server.services.pi-traefik-http]
token = "secret_token_1"
bind_addr = "0.0.0.0:5080"

[server.services.pi-traefik-https]
token = "secret_token_1"
bind_addr = "0.0.0.0:5443"  

# local
[server.services.local-traefik-http]
token = "secret_token_2"
bind_addr = "0.0.0.0:5081"

[server.services.local-traefik-https]
token = "secret_token_2"
bind_addr = "0.0.0.0:5444"

Reminder: You just need to open port 2333 in the VPS firewall for the Rathole control channel and not for the ports 5080, 5081, 5443, or 5444, because they are used by Rathole internally.

Traefik dashboard

Additionally, for the sake of debugging, we expose the Traefik dashboard using labels on the traefik container. To simplify the configuration and avoid handling the acme.json file, we expose it using HTTP.

Warning: When setting the dashboard hashed password via the TRAEFIK_AUTH environment variable, make sure to escape the $ characters properly or authentication will break. To do that, you need to use both double quotes "..." and the escape slash ‘\’, as shown in the example below:

# install apache2-utils
sudo apt install apache2-utils

# hash the password
htpasswd -nb admin yourpassword

# use BOTH "..." and \$ to escape $ properly

# this will work correctly
TRAEFIK_AUTH="admin:\$asd1\$E3lsdAo\$3Mertp57JJ4LVU.HRR0"

# this will break
TRAEFIK_AUTH="admin:$asd1$E3lsdAo$3Mertp57JJ4LVU.HRR0"

# this will also break
TRAEFIK_AUTH=admin:\$asd1\$E3lsdAo\$3Mertp57JJ4LVU.HRR0

Rathole client

The client part of the tunnel is almost the same as for a single home server. The only thing to keep in mind is to bind the specific client only to the tunnels that are meant for it, and not to all tunnels. Kind of obvious and self-explanatory, but just in case, let’s be very clear and explicit.

Here, we define the rathole.client.toml Rathole client config to bind the pi home server to its HTTP pi-traefik-http and HTTPS pi-traefik-https tunnels.

[client]
remote_addr = "123.123.123.123:2333"

[client.transport]
type = "noise"

[client.transport.noise]
remote_public_key = "public_key"

# single client per tunnels pair

# pi
[client.services.pi-traefik-http]
token = "secret_token_1"
local_addr = "traefik:80"

[client.services.pi-traefik-https]
token = "secret_token_1"
local_addr = "traefik:443"

Similarly, here we define the rathole.client.toml config to bind the local home server to it’s HTTP local-traefik-http and HTTPS local-traefik-https tunnels.

[client]
remote_addr = "123.123.123.123:2333"

[client.transport]
type = "noise"

[client.transport.noise]
remote_public_key = "public_key"

# single client per tunnels pair

# local
[client.services.local-traefik-http]
token = "secret_token_2"
local_addr = "traefik:80"

[client.services.local-traefik-https]
token = "secret_token_2"
local_addr = "traefik:443"

docker-compose.yml for the Rathole client and Traefik is exactly the same as it was for a single home server. I am repeating it here just for the sake of completeness.

version: '3.8'

services:

  rathole:
    image: rapiz1/rathole:v0.5.0
    container_name: rathole
    command: --client /config/rathole.client.toml
    restart: unless-stopped
    volumes:
      - ./rathole.client.toml:/config/rathole.client.toml:ro
    networks:
      - proxy

  traefik:
    image: 'traefik:v2.9.8'
    container_name: traefik
    restart: unless-stopped
    depends_on:
      - rathole
    command:
      # moved from static conf to pass email as env var
      - '--certificatesresolvers.letsencrypt.acme.email=${TRAEFIK_LETSENCRYPT_EMAIL}'
    security_opt:
      - no-new-privileges:true
    networks:
      - proxy
    # rathole will pass traffic through proxy network directly on 80 and 443
    # defined in rathole.client.toml
    environment:
      - TRAEFIK_AUTH=${TRAEFIK_AUTH}
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik-data/traefik.yml:/traefik.yml:ro
      - ./traefik-data/acme.json:/acme.json
      - ./traefik-data/configurations:/configurations
    labels:
      - 'traefik.enable=true'
      - 'traefik.docker.network=proxy'
      - 'traefik.http.routers.traefik-secure.entrypoints=websecure'
      - 'traefik.http.routers.traefik-secure.rule=Host(`traefik.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.traefik-secure.middlewares=user-auth@file'
      - 'traefik.http.routers.traefik-secure.service=api@internal'

networks:
  proxy:
    external: true

Completed code

Traefik load balancer and Rathole server: https://github.com/nemanjam/rathole-server
Rathole client and local Traefik: https://github.com/nemanjam/traefik-proxy/tree/main/core

Conclusion

You can use this setup to expose as many home servers as you want, in a cost-effective and practical way, as long as your VPS has enough network bandwidth to support their traffic. It can bring your homelab to another level.

What tool and method did you use to expose your home servers to the internet? Do you like this approach, are you willing to give it a try? Let me know in the comments.

Happy self-hosting.

References

Traefik v2.9 HostRegexp reference: https://doc.traefik.io/traefik/v2.9/routing/routers/#rule
Traefik v2.9 HostSNIRegexp reference: https://doc.traefik.io/traefik/v2.9/routing/routers/#rule_1
TLS Server Name Indication (SNI), Wikipedia https://en.wikipedia.org/wiki/Server_Name_Indication

Expose home server with Rathole tunnel and Traefik

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Tue, 29 Apr 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import RatholeTraefikArchitectureImage from ’../../../../content/post/2025/04-29-rathole-traefik-home-server/_images/rathole-traefik-architecture-16-9.png’; import FirewallImage from ’../../../../content/post/2025/04-29-rathole-traefik-home-server/_images/firewall.png’; import HomeServerContainersImage from ’../../../../content/post/2025/04-29-rathole-traefik-home-server/_images/home-server-containers.png’; import OrangePiGifImage from ’../../../../content/post/2025/04-29-rathole-traefik-home-server/_images/orange-pi.gif’;

Introduction

In the previous article, I wrote about a temporary SSH tunneling technique to bypass CGNAT. This method is not suitable for exposing permanent services, at least not without autossh manager. Proper tools for this are rapiz1/rathole or fatedier/frp. I chose Rathole since it’s written in Rust and offers better performance and benchmarks.

Prerequisites

A VPS server with a public IP and Docker, ideally small, you can’t use ports 80 and 443 for any other services aside from Rathole
A home server
A domain name

Architecture overview

We will use Rathole for an encrypted tunnel between the VPS and the local network. We will also use Traefik since we want to host multiple websites on our home server, just like you would on any server.

The main question is where to run Traefik:

On the VPS
On the home server

I highly prefer option 2 because, that way, the entire configuration is stored on our home server. The home server is almost tunnel-agnostic, and you can reuse it on any tunneled or non-tunneled server. Otherwise, we would need to maintain state between the VPS and the home server, debug both together, etc.

Another point is that, with option 2, we avoid the gap of unencrypted traffic on the VPS between Traefik (TLS) and Rathole (Noise Protocol). You can read more about the comparison of these two options in this article: https://blog.mni.li/posts/caddy-rathole-zero-knowledge/.

The downside is that Rathole will exclusively occupy ports 80 and 443 on the VPS, preventing any other process from using them. We won’t be able to run other web servers on that VPS, so it’s best to use a small one dedicated to this purpose.

Unless we use a load balancer Load balancing multiple Rathole tunnels with Traefik HTTP and TCP routers.

Rathole server

We will run the Rathole server inside a Docker container on our VPS. Rathole uses the same binary for both the server and client, you just pass the right option (--server or --client) and the .toml configuration file.

Here is the Rathole server configuration rathole.server.toml:

[server]
bind_addr = "0.0.0.0:2333"

[server.transport]
type = "noise"

[server.transport.noise]
local_private_key = "private_key"

[server.services.traefik-http]
token = "secret_token_1"
bind_addr = "0.0.0.0:5080"

[server.services.traefik-https]
token = "secret_token_1"
bind_addr = "0.0.0.0:5443"

Let’s explain it: we choose port 2333 for the control channel and bind it to all interfaces inside the Docker container with the 0.0.0.0 IP. We choose the noise encryption protocol and specify a private key. The public key will be used on the Rathole client. The public and private key pair is generated with:

docker run -it --rm rapiz1/rathole --genkey

Then we define two tunnels: one for HTTP and another for HTTPS. For the HTTP tunnel, we define the name server.services.traefik-http, set the value for token, and choose port 5080, and again we bind it to all container interfaces with 0.0.0.0. Similarly, for HTTPS, we set the name to server.services.traefik-https, provide a token value, and choose port 5443.

Every tunnel has to have a unique name, token value, and port. With that fulfilled, a single Rathole server instance can have as many Rathole clients as needed, which is pretty convenient. For example, besides the existing home server on ports 5080 and 5443, we can expose another one using ports 5081 and 5444.

Token is just a random base64 string, we generate it by running this:

openssl rand -base64 32

After configuration file we define a Rathole server container with docker-compose.yml:

services:

  rathole:
    image: rapiz1/rathole:v0.5.0
    container_name: rathole
    command: --server /config/rathole.server.toml
    restart: unless-stopped
    ports:
      # host:container
      - 2333:2333
      - 80:5080
      - 443:5443
    volumes:
      - ./rathole.server.toml:/config/rathole.server.toml:ro

In the command, we set the --server option, pass the .toml configuration file, and mount it as a read-only bind-mount volume.

The important part is the port mappings. Here, you can see that the Rathole server container will occupy ports 2333, 80, and 443 exclusively on the host VPS. This practically means we won’t be able to run any other web servers on ports 80 and 443. We will also need to open ports 80, 443, and 2333 in the VPS firewall. You don’t need to open ports 5080 and 5443, those are used only by Rathole internally.

Rathole client and connecting with Traefik

We run the Rathole client and Traefik inside Docker containers on the home server. Configuring the Rathole client and connecting it to Traefik is a bit more complex and tricky.

Here is the Rathole client configuration core/rathole.client.toml.example:

[client]
remote_addr = "123.123.123.123:2333"

[client.transport]
type = "noise"

[client.transport.noise]
remote_public_key = "public_key"

# this is the important part
# Rathole knows traffic comes from 5080 and 5443, control channel told him
# DON'T do ANY mapping in docker-compose.yml
# just pass traffic from Rathole on ports which Traefik expects (80 and 443)

[client.services.traefik-http]
token = "secret_token_1"
local_addr = "traefik:80"

[client.services.traefik-https]
token = "secret_token_1"
local_addr = "traefik:443"

Let’s go through it. First, we define the VPS server IP remote_addr, the control channel port 2333, set the noise encryption protocol, and this time specify a public key remote_public_key.

Now comes the important and tricky part: defining tunnels and services. We repeat the service name and token that we used in the Rathole server config.

And now the most important part: local_addr, for this we target the Traefik hostname - service name from core/docker-compose.local.yml and the Traefik listening ports 80 and 443. That’s it. It might look simple and obvious, this is the correct setup. I must emphasize: don’t fall into temptation of setting any additional port mappings in core/docker-compose.local.yml, functionality will break, all should be done in core/rathole.client.toml.

Another note: You might wonder why ports 5080 and 5443 aren’t repeated anywhere in the client config core/rathole.client.toml. The answer is “no need for it”, we already specified port 2333 for the control channel, which will communicate all additional required information between the Rathole server and client.

Now that we have configured the Rathole client, we need to define Rathole client and Traefik containers.

Here is the Rathole client container and the important part of the Traefik container core/docker-compose.local.yml:

services:

  rathole:
    # 1. default official x86 image
    image: rapiz1/rathole:v0.5.0

    # 2. custom built ARM image (for Raspberry pi)
    # image: nemanjamitic/my-rathole-arm64:v1.0

    # 3. build for arm - AVOID, use prebuilt ARM image above
    # build: https://github.com/rapiz1/rathole.git#main
    # platform: linux/arm64

    container_name: rathole
    command: --client /config/rathole.client.toml
    restart: unless-stopped
    volumes:
      - ./rathole.client.toml:/config/rathole.client.toml:ro
    networks:
      - proxy

  traefik:
    image: 'traefik:v2.9.8'
    container_name: traefik
    restart: unless-stopped

    # for this to work both services must be defined in the same docker-compose.yml file
    depends_on: 
      - rathole

    # other config...

    networks:
      - proxy

    # leave this commented out, just for explanation
    # Rathole will pass Traffic through proxy network directly on 80 and 443
    # defined in rathole.client.toml
    # ports:
    #   - '80:80'
    #   - '443:443'

    # other config...

Let’s start with the Rathole service. Similarly to the server command, we run the Rathole binary, this time in client mode with --client and we pass the client config file /config/rathole.client.toml which we also bind mount as volume. An important part is that we set both the Rathole and Traefik containers on the same external network proxy so they can communicate with each other and with the host.

Additional notes about the Rathole image:

Always make sure to use the same Rathole image version for both the server and client for compatibility.
x86 - By default, Rathole provides only the x86 image. If your home server uses that architecture, you are good to go.
ARM - If you have an ARM home server (e.g., Raspberry Pi), you will have to build the image yourself or use a prebuilt, unofficial one. Avoid building images on low-power ARM single-board computers, as it will take a long time and require a lot of RAM and CPU power. Instead, either pre-build one yourself and push it to Docker Hub, or you can reuse my nemanjamitic/my-rathole-arm64:v1.0 image (which uses Rathole v0.5.0).

Now, the Traefik container. It must be on the same proxy external network as Rathole. Another important part: It must wait for the Rathole container to boot up depends_on: rathole, because the traffic will come from the Rathole tunnel. Do not expose ports 80 and 443, Rathole has already bound those Traefik container ports, as we defined in the Rathole client config core/rathole.client.toml.

The rest of the Traefik container definition is left out here because it’s the usual configuration, unrelated to the Rathole tunnel. Below is a quick reminder about the general Traefik configuration.

Traefik reminder

Provide the .env file with variables needed for Traefik:

cp .env.example .env

SITE_HOSTNAME=homeserver.my-domain.com

# important: must put value in quotes "..." and escape $ with \$
TRAEFIK_AUTH=

# will receive expiration notifications
TRAEFIK_LETSENCRYPT_EMAIL=myname@example.com

On your home server host OS you must create an external Docker network:

docker network create proxy

Create acme.json file with permission 600:

touch ~/homelab/traefik-proxy/core/traefik-data/acme.json

sudo chmod 600 ~/homelab/traefik-proxy/core/traefik-data/acme.json

Always start with the staging Acme server for testing and swap to production once satisfied:

# core/traefik-data/traefik.yml

certificatesResolvers:
  letsencrypt:
    acme:
      # always start with staging certificate
      caServer: "https://acme-staging-v02.api.letsencrypt.org/directory"
      # caServer: 'https://acme-v02.api.letsencrypt.org/directory'

To clear the temporary staging certificates, clear the contents of acme.json

truncate -s 0 acme.json

That’s it. Once done, we can run Rathole client and Traefik containers on our home server with:

docker compose -f docker-compose.local.yml up -d

Exposing multiple servers

Fortunately, Rathole makes it trivial to run multiple tunnels using a single Rathole server. We don’t need to open any additional ports in the firewall or run multiple container instances. What we do need are different tunnel names, token values, and ports. Those must be unique for each tunnel/service. Also, you will need a load balancer to bind ports 80 and 443 to more than one destination port, respectively.

I wrote a detailed tutorial on how to expose multiple home servers using a single Rathole server. You can read it here: Load balancing multiple Rathole tunnels with Traefik HTTP and TCP routers.

Open the firewall on the VPS

Like for any webserver, on the VPS you will need to open ports 80 and 443 to listen for HTTP/HTTPS traffic. Additionally you will need to open the port 2333 for the Rathole control channel - tunnel.

Completed code

Rathole server: https://github.com/nemanjam/rathole-server
Rathole client and local Traefik: https://github.com/nemanjam/traefik-proxy/tree/main/core

Conclusion

Most consumer-grade internet connections are behind a CGNAT. This setup allows you to bypass CGNAT and host an unlimited number of websites on your home server almost for free. You can use it for web servers in virtual machines, LXC containers, SBC computers, etc. - anywhere you can run Docker.

It is simple, cheap, and you can set it up in 30 minutes. Like anything, it also has some downsides, one of them is the overhead latency caused by an additional network hop between the VPS and your home network, but it’s a reasonable tradeoff.

Did you make something similar yourself? Can you see room for improvement? Did you use a different method? You tried to run the code and need help with troubleshooting? Let me know in the comments.

References

Rathole repository https://github.com/rapiz1/rathole
Local or remote Traefik discussion https://github.com/rapiz1/rathole/issues/169
Local and remote Traefik comparison, Tailscale benchmarks https://blog.mni.li/posts/caddy-rathole-zero-knowledge/
Rathole Docker example configuration https://nitinja.in/tech/
Rathole .toml environment variables discussion https://github.com/rapiz1/rathole/issues/218
frp repository https://github.com/fatedier/frp

Expose local dev server with SSH tunnel and Docker

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sun, 20 Apr 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import SshTunnelArchitectureImage from ’../../../../content/post/2025/04-20-ssh-tunnel-docker/_images/ssh-tunnel-architecture-16-9.png’; import FirewallImage from ’../../../../content/post/2025/04-20-ssh-tunnel-docker/_images/firewall.png’;

import TunnelDemoVideo from ’../../../../content/post/2025/04-20-ssh-tunnel-docker/_images/tunnel-demo.webm’;

Introduction

Most consumer-grade internet connections are hidden behind CGNAT and are not reachable from the internet. This is done to save IP addresses, as IPv4 has a limited range. If you have a static public IPv4 or any IPv6 address, you won’t need the setup from this tutorial.

There are already services like localtunnel or ngrok for this purpose, but when you actually start using them, you will often find out that they have limitations on their free plans. So, we will configure our own custom setup once and have it always available for convenient and practical usage which will save a lot of time and nerves in the long run.

Why is this useful

This is useful whenever you need to share your local project with others or provide a publicly accessible URL for your service so that external systems can reach it. This is often the case if you work remotely.

Yes, you can use test deployments, but having a tunnel setup configured and being able to run it with a single terminal command saves a lot of time and energy.

Possible use cases:

Sharing work in progress with clients or teammates
Remote debugging or pair programming
Demos for presentations or team meetings
Testing the frontend on different devices (mobile, different resolution, OS, browser)
Testing webhooks from external services (Stripe, GitHub, Oauth, Slack, Contentful, Twilio, etc.)

Prerequisites

A VPS server with a public IP and Docker
A local machine with a working SSH and dev server that you want to expose
A domain name (optional)

Demo video

Architecture overview

Without going too deep into computer networking theory, let’s explain port forwarding in simplified terms. Port forwarding is a mapping (binding) between two points (services) on a private network (or even on the same machine) that would otherwise be unreachable. You can think of it as a VPN for a single service (port).

So it’s exactly what we need: we want to bind (redirect traffic from) a public port (1081 in our case) on the VPS, which acts as a gateway, to port 3000 on our local dev server that is not directly reachable from the internet. That’s it for the tunneling part, this setup is sufficient for serving HTTP traffic.

Additionally, to support HTTPS and provide a user-friendly URL, we will add Traefik, which will handle HTTPS certificates and route traffic from port 443 to port 1081 of the tunnel.

Running the SSH server in Docker

We already use SSH to access our VPS, but we prefer to keep that configuration untouched. So, we will run a separate SSH server inside a Docker container specifically for tunneling.

For this, we will use linuxserver/openssh-server image. Linuxserver is an organization that maintains very stable Dokcer images for all kinds of purposes.

By default, the SSH server doesn’t allow tunneling, so we need to modify the config in /etc/ssh/sshd_config and enable it.

AllowTcpForwarding yes
GatewayPorts yes

sudo nano /etc/ssh/sshd_config

# edit config...

sudo systemctl restart sshd

But since we are using a Docker container, we will do it differently.

We will use openssh-server-ssh-tunnel mod, which enables tunnelling in the linuxserver/openssh-server image. You can think of mods as presets (additional layers and configurations) for these images.

Here is docker-compose.yml for the SSH tunnel container:

version: '3.8'

services:
  openssh-server:
    image: linuxserver/openssh-server
    container_name: openssh-server
    restart: unless-stopped
    hostname: openssh-server #optional
    expose:
      - 1081 # tunneled service port, for Traefik
    ports:
      - 1080:2222 # 1080 is the main SSH connection port
    environment:
      # https://github.com/linuxserver/docker-mods/tree/openssh-server-ssh-tunnel
      - DOCKER_MODS=linuxserver/mods:openssh-server-ssh-tunnel
      - SHELL_NOLOGIN=false
      # set correct for current host user
      - PUID=1001
      - PGID=1001
      - TZ=Etc/UTC
      # important
      - PUBLIC_KEY 
      # optional env vars bellow
      - SUDO_ACCESS=true 
      - USER_NAME=username 
      - PASSWORD_ACCESS=false 
    volumes:
      - ./config:/config
    # Traefik configuration bellow
    labels:
      - 'traefik.enable=true'
      - 'traefik.docker.network=proxy'
      - 'traefik.http.routers.ssh-tunnel.rule=Host(`preview.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.ssh-tunnel.entrypoints=websecure'
      - 'traefik.http.routers.ssh-tunnel.service=ssh-tunnel'
      - 'traefik.http.services.ssh-tunnel.loadbalancer.server.port=1081' # matches exposed port
    networks:
      - proxy

networks:
  proxy:
    external: true

Lets explain the code above:

services:
  openssh-server:
    image: linuxserver/openssh-server
    # ...
    ports:
      - 1080:2222 # 1080 is the main SSH connection port

By default, linuxserver/docker-openssh-server runs the SSH service on port 2222, to avoid conflicting with the usual port 22 that is used for host’s SSH service and it’s hardcoded in the Dockerfile. We will choose port 1080 for the main SSH connection, so we need to map it to port 2222 with SSH in the container. Port 1080 is used for the actual connection over the internet, and it is required to allow that port in VPS firewall.

So, let’s establish clear and precise naming from the beginning:

Port 1080 - the main SSH connection port
Ports 1081, 1082, 1083, ... - tunneled services remote ports

Additionally, you need to configure the SSH client on your dev machine to use port 1080 for SSH when connecting to this container. In this example I have named VPS host amd1 and SSH container host amd1c, you can use your own naming logic.


# ssh amd1 ssh container
Host amd1c 123.123.123.123 # VPS IP
    HostName 123.123.123.123
    IdentityFile ~/.ssh/my-keys/amd1_ssh_container__id_ed25519 # private key file name
    User username
    Port 1080

In the client SSH config above, you will notice the private key file amd1_ssh_container__id_ed25519. The public key is passed to the SSH container as an environment variable:

services:
  openssh-server:
    image: linuxserver/openssh-server
    # ...
    environment:
      - PUBLIC_KEY # important

You generate SSH key pairs as usual, e.g.:

ssh-keygen -t ed25519 -C "myemail@gmail.com" -f ~/.ssh/my-keys/amd1_ssh_container__id_ed25519

Now, we choose which remote port we will use to expose our local dev server. If you’re using Traefik and don’t access this port directly via the browser, you don’t need to allow it in the VPS’s firewall.

services:
  openssh-server:
    image: linuxserver/openssh-server
    # ...
    expose:
      - 1081 # tunneled service remote port

The other environment variables worth mentioning are the following:

services:
  openssh-server:
    image: linuxserver/openssh-server
    # ...
    environment:
      # https://github.com/linuxserver/docker-mods/tree/openssh-server-ssh-tunnel
      - DOCKER_MODS=linuxserver/mods:openssh-server-ssh-tunnel
      - PUID=1001
      - PGID=1001

We use DOCKER_MODS variable to specify openssh-server-ssh-tunnel mod. PUID and PGID are user and group IDs used to handle permissions between the host and the container. You get their values by running id -u && id -g on the VPS host. It is also a good idea to export them as global environment variables in ~/.bashrc file to make them available for all containers:

export MY_UID=$(id -u)
export MY_GID=$(id -g)

Then you can pass them like this:

  # ...
  environment:
    - PUID=$MY_UID
    - PGID=$MY_GID

The SSH tunnel is now configured. Now you can access your local dev server by HTTP via the your VPS IP e.g. http://123.123.123.123:1081 or domain http://my-domain.com:1081.

Configuring HTTPS with Traefik

Some browsers disallow insecure HTTP traffic by default, and you need to tweak the browser settings to allow it explicitly. This can be inconvenient when sending a demo link to a non-technical person. Additionally, some OAuth providers require HTTPS even for testing (e.g. Facebook). So let’s make an extra effort to do things properly and configure a HTTPS with a subdomain using Traefik.

If you are running a VPS, chances are you already use a reverse proxy for handling certificates and subdomain routing. This example shows how to do it with Traefik.


services:
  openssh-server:
    image: linuxserver/openssh-server
    container_name: openssh-server
    # ...
    expose:
      - 1081 # tunneled service remote port
    ports:
      - 1080:2222 # 1080 is the main SSH connection port

    # Traefik configuration bellow
    labels:
      - 'traefik.enable=true'
      - 'traefik.docker.network=proxy'
      - 'traefik.http.routers.ssh-tunnel.rule=Host(`preview.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.ssh-tunnel.entrypoints=websecure'
      - 'traefik.http.routers.ssh-tunnel.service=ssh-tunnel'
      - 'traefik.http.services.ssh-tunnel.loadbalancer.server.port=1081' # matches exposed port
    networks:
      - proxy

networks:
  proxy:
    external: true

The truth is, there is not much work to do here. All you need to do is to map the remote port of the tunnel 1081 to Traefik and define the URL on which you want to expose your local dev server via the environment variable e.g. SITE_HOSTNAME=preview.my-domain.com.

Everything else is just generic Traefik configuration. Also, don’t forget to add the wildcard A record for your subdomains (e.g., you might add a *.tunnels “namespace”) in your DNS provider’s dashboard and point it to your VPS IP. Additionally, create an external Docker network, e.g. named proxy as shown in the example above.

services:
  openssh-server:
    image: linuxserver/openssh-server
    container_name: openssh-server
    # ...
    expose:
      - 1081 # tunneled service remote port, passed to Traefik

    # ...
    # Traefik configuration bellow
    labels:
      # ...
      - 'traefik.http.routers.ssh-tunnel.rule=Host(`preview.${SITE_HOSTNAME}`)' # in .env file: SITE_HOSTNAME=my-domain.com
      - 'traefik.http.services.ssh-tunnel.loadbalancer.server.port=1081' # matches the exposed port

In the end, you just need to define 2 environment variables for your docker-compose.yml inside the .env file:

# full with subdomain, without 'https://'
SITE_HOSTNAME=my-domain.com # or e.g. preview.my-domain.com

# public ssh key
PUBLIC_KEY=my-public-ssh-key

Above is shown only the relevant Traefik configuration for the SSH tunnel container. A complete Traefik reverse proxy configuration requires additional static and dynamic configurations for the Traefik container, but that is outside the scope of this tutorial. You can search for examples of Traefik configurations or reuse mine, which is available in this repository: nemanjam/traefik-proxy.

Tunneling multiple services

Sometimes your app runs more than a single service, e.g. frontend and backend. If you expose just the frontend from port 3000, note that localhost from, e.g. localhost:5000 won’t be resolved. Therefore, you need to tunnel all services and set the tunneled URLs in your .env files.

How to have more than one tunnel? Your first thought might be to run multiple SSH server containers, but fortunately, that is not necessary. You can tunnel as many services as you want through a single SSH connection. You just need to expose multiple ports on the SSH container and map them to multiple Traefik hosts with labels, as shown below:

version: '3.8'

services:
  openssh-server:
    image: linuxserver/openssh-server
    container_name: openssh-server
    restart: unless-stopped
    hostname: openssh-server #optional
    # tunneled services, remote ports
    expose:
      - 1081 # tunnel1
      - 1082 # tunnel2
      - 1083 # tunnel3
    ports:
      - 1080:2222 # 1080 is the main SSH connection port
    environment:
      # https://github.com/linuxserver/docker-mods/tree/openssh-server-ssh-tunnel
      - DOCKER_MODS=linuxserver/mods:openssh-server-ssh-tunnel
      - SHELL_NOLOGIN=false
      # set correct for current host user
      - PUID=1001
      - PGID=1001
      - TZ=Etc/UTC
      # important
      - PUBLIC_KEY 
      # optional env vars bellow
      - SUDO_ACCESS=true 
      - USER_NAME=username 
      - PASSWORD_ACCESS=false 
    volumes:
      - ./config:/config
    # Traefik configuration bellow
    labels:
      # common config
      - 'traefik.enable=true'
      - 'traefik.docker.network=proxy'

      # tunnel1 (port 3000 -> 1081)
      - 'traefik.http.routers.ssh-tunnel1.rule=Host(`preview1.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.ssh-tunnel1.entrypoints=websecure'
      - 'traefik.http.routers.ssh-tunnel1.service=ssh-tunnel1'
      - 'traefik.http.services.ssh-tunnel1.loadbalancer.server.port=1081'

      # tunnel2 (port 5000 -> 1082)
      - 'traefik.http.routers.ssh-tunnel2.rule=Host(`preview2.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.ssh-tunnel2.entrypoints=websecure'
      - 'traefik.http.routers.ssh-tunnel2.service=ssh-tunnel2'
      - 'traefik.http.services.ssh-tunnel2.loadbalancer.server.port=1082'

      # tunnel3 (port 5001 -> 1083)
      - 'traefik.http.routers.ssh-tunnel3.rule=Host(`preview3.${SITE_HOSTNAME}`)'
      - 'traefik.http.routers.ssh-tunnel3.entrypoints=websecure'
      - 'traefik.http.routers.ssh-tunnel3.service=ssh-tunnel3'
      - 'traefik.http.services.ssh-tunnel3.loadbalancer.server.port=1083'

    networks:
      - proxy

networks:
  proxy:
    external: true

If you have a large number of services to tunnel, you might want to use a VPN to access all ports by default, but that’s rarely the case.

Another point to make is that the SSH tunnel technique is most suitable for temporarily exposing services for demo purposes. For permanent tunnels, you would need to add autossh to keep the connection alive, but there are better tools for permanent tunnels, such as rapiz1/rathole or fatedier/frp.

Open the firewall on the VPS

For the main SSH connection, you will need to open a port in your VPS firewall, port 1080 in this example. Additionally, if you want to access tunnels directly via a port in the browser without Traefik, you will need to open those ports as well. Be mindful not to open too many unnecessary ports, as every newly opened port increases the attack surface.

Running the tunnel

You start the tunnel with a single command like below. The -R option means remote port forwarding, followed by two IP:port pairs. The first pair is remote, and the second is local. At the end, you have the VPS host.

# command format
ssh -R [remote_addr:]remote_port:local_addr:local_port [user@]gateway_addr

# example:
# amd1c host is defined in ~/.ssh/config
ssh -R *:1081:localhost:3000 amd1c

# access the url, e.g.
https://preview1.my-domain.com

# terminate tunnel, like any ssh connection
exit

You can open multiple tunnels with a single command. Just specify the tunnels one after another before the host. Note that you must have these tunnels defined in your docker-compose.yml for the SSH server (exposed ports and Traefik host labels).

# tunnel frontend at port 3000 and backend at port 5000
ssh \
  -R *:1081:localhost:3000 \
  -R *:1082:localhost:5000 \
  amd1c

# access the urls, e.g.
https://preview1.my-domain.com
https://preview2.my-domain.com/api

Completed code

SSH tunnel configuration: https://github.com/nemanjam/traefik-proxy/tree/main/apps/ssh-server
Traefik configuration: https://github.com/nemanjam/traefik-proxy/tree/main/core

Conclusion

Port forwarding is a basic networking technique that is very familiar to network engineers, but perhaps not often utilized by developers. It can be very useful and practical, especially in a remote work setting. As described in this tutorial, you just need to run a single container, configure the client and firewall, and once you have it set up, it can save you a lot of time and energy in the long run.

SSH remote port forwarding is just one of the many useful and cool SSH networking tricks. There are many others like dynamic port forwarding, SSH agent forwarding, X11 forwarding, SSH file system, etc. Do you use some of them? Please share in the comments bellow.

References

Local and remote port forwarding tutorial https://iximiuz.com/en/posts/ssh-tunnels
linuxserver/docker-openssh-server image repository https://github.com/linuxserver/docker-openssh-server
openssh-server-ssh-tunnel mod repository https://github.com/linuxserver/docker-mods/tree/openssh-server-ssh-tunnel
Useful discussion that suggests to use the existing tunnel mod https://github.com/linuxserver/docker-openssh-server/issues/22
The list of all available Linuxserver mods https://github.com/linuxserver/docker-mods, https://mods.linuxserver.io
The list of all available Linuxserver images https://www.linuxserver.io/our-images

Build a random image component with Astro and React

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Sun, 06 Apr 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import LayoutShiftImage from ’../../../../content/post/2025/04-06-random-image-component/_images/layout-shift.png’;

import OverviewVideo from ’../../../../content/post/2025/04-06-random-image-component/_images/overview.webm’; import ResponsiveImagesVideo from ’../../../../content/post/2025/04-06-random-image-component/_images/responsive-images.webm’;

Introduction

For the sake of practice and fun let’s build a component that displays a random image on mouse click. It looks more fun and interactive than a static hero image. You can see it in action on the Home page of the my website.

This functionality shares some common parts with the image gallery described in the previous article, such as the component hierarchy and including urls in the client. However, it also introduces some new elements, like a proper blur preloader.

What we will be building

Demo: https://nemanjamitic.com/
Github repository: https://github.com/nemanjam/nemanjam.github.io

Component hierarchy

Again, we will use the similar structure MDX (index.mdx) -> Astro component (ImageRandom.astro) -> React components (ImageRandomReact.jsx and ImageBlurPreloader.jsx), and again, the client React components contain the most complexity.

Code (paraphrased):

// src/pages/index.mdx

<ImageRandom />

// src/components/ImageRandom.astro

<ImageRandomReact {galleryImages} client:load />

// src/components/react/ImageRandomReact.tsx

<ImageBlurPreloader {...props} />

Responsive image

This is the functionality shared with the image gallery. This time, we’ll use a fixed low-resolution image for the blur effect and a responsive, high-resolution hero image as the main one. The blur and main images will have different resolutions but share the same 16:9 aspect ratio.

The code is as follows:

// src/constants/image.ts

export const IMAGE_SIZES = {
  FIXED: {
    // blur image
    BLUR_16_9: {
      width: 64,
      height: 36,
    },
  // ...  
  },
  RESPONSIVE: {
    // main image
    POST_HERO: {
      widths: [TW_SCREENS.XS, TW_SCREENS.SM, TW_SCREENS.MD, TW_SCREENS.LG],
      sizes: `(max-width: ${TW_SCREENS.XS}px) ${TW_SCREENS.XS}px, (max-width: ${TW_SCREENS.SM}px) ${TW_SCREENS.SM}px, (max-width: ${TW_SCREENS.MD}px) ${TW_SCREENS.MD}px, ${TW_SCREENS.LG}px`,
    },
    // ...  
  },
};

// actual <img /> tag attributes that are generated with the POST_HERO

<img
  sizes="(max-width: 475px) 475px, (max-width: 640px) 640px, (max-width: 768px) 768px, 1024px"
  width="3264"
  height="1836"
  srcset="
    /_astro/amfi1.Cv2xkJ5B_1Lofkq.webp 475w,
    /_astro/amfi1.Cv2xkJ5B_Oxmi8.webp 640w,
    /_astro/amfi1.Cv2xkJ5B_X0wXS.webp 768w,
    /_astro/amfi1.Cv2xkJ5B_Z1u01H4.webp 1024w
  "
  src="/_astro/amfi1.Cv2xkJ5B_26HGs8.webp"
/>

// src/libs/gallery/transform.ts

export const heroImageOptions = {
  ...IMAGE_SIZES.RESPONSIVE.POST_HERO,
};

// src/libs/gallery/images.ts

export const getHeroImages = async (): Promise<HeroImage[]> => {
  const blur = await getCustomImages(blurImageOptions);
  const hero = await getCustomImages(heroImageOptions);

  const heroImages = mergeArrays(blur, hero).map(([blur, hero]) => ({
    blur: imageResultToImageAttributes(blur),
    hero: imageResultToImageAttributes(hero),
  }));

  return heroImages;
};

// src/components/ImageRandom.astro

const galleryImages = await getHeroImages();

Responsive main image in action:

Random image in a static website

Again, we have the same situation as in the image gallery. The key point is to include all image urls in the client and execute getRandomElementFromArray() in the client React component to display a random image at runtime. If we called the random function on the server, in the Astro component, we would end up with a single image that was randomly picked at build time - which is not what we want.

This is the code:

---
// src/components/ImageRandom.astro

const galleryImages = await getHeroImages();
---

{/* include all the images in the client and let the client pick the random image */}

<div  {...props}>
  <ImageRandomReact {galleryImages} client:load />
</div>

// src/components/react/ImageRandom.tsx

const ImageRandomReact: FC<Props> = ({ galleryImages, className, divClassName, ...props }) => {
  // cache randomized images
  const randomImage = useMemo(() => getRandomElementFromArray(galleryImages), [galleryImages]);

  const [image, setImage] = useState(initialImage);

  // pick initial random image on mount
  useEffect(() => {
    setImage(randomImage);
  }, [setImage, randomImage]);

  // pick random image onClick
  const handleClick = async () => {
    const randomImage = getRandomElementFromArray(galleryImages);
    setImage(randomImage);
  };

  return (
    <ImageBlurPreloader
      {...props}
      blurAttributes={{ ...image.blur, alt: 'Blur image' }}
      mainAttributes={{ ...image.hero, onClick: handleClick, alt: 'Hero image' }}
      className={cn('cursor-pointer my-0', className)}
      divClassName={divClassName}
    />
  );
};

Blur preloader

This is the most interesting part of the feature. The first instinct when swapping the blur and main images might be to use a ternary operator to mount or unmount the appropriate image. But we actually can’t do that here. Why? Because both images need to remain mounted in the DOM to ensure the onLoad event works correctly for both the blur and main images. So instead of unmounting, we will use absolute positioning to place the main image above the blur image and toggle its opacity to show or hide it.

But there is more. Note that with the onLoad event, we have three possible values for the image’s src attribute (although the main image actually uses the srcset and sizes attributes). These are:

An empty string '' when both blur and main images are still loading. In this case we will show an empty <div /> of the same size as the main image.
The src attribute of the blur image, when the blur image is loaded but the main image is still loading.
The srcset and sizes attributes of the main image, once the main image has fully loaded.

This is the code src/components/react/ImageBlurPreloader.tsx:

const initialAttributes: ImgTagAttributes = { src: '' } as const;

const ImageBlurPreloader: FC<Props> = ({
  blurAttributes = initialAttributes,
  mainAttributes = initialAttributes,
  onMainLoaded,
  className,
  divClassName,
}) => {
  const [isLoadingMain, setIsLoadingMain] = useState(true);
  const [isLoadingBlur, setIsLoadingBlur] = useState(true);

  const prevMainAttributes = usePrevious(mainAttributes);

  const isNewImage = !(
    prevMainAttributes?.src === mainAttributes.src &&
    prevMainAttributes.srcSet === mainAttributes.srcSet
  );

  // reset isLoading on main image change
  useEffect(() => {
    if (isNewImage) {
      setIsLoadingBlur(true);
      setIsLoadingMain(true);
    }
  }, [isNewImage, setIsLoadingMain, setIsLoadingBlur]);

  // important: main image must be in DOM for onLoad to work
  // unmount and display: none; will fail
  const handleLoadMain = () => {
    setIsLoadingMain(false);
    onMainLoaded?.();
  };

  const commonAttributes = {
    // blur image must use size from main image
    width: mainAttributes.width,
    height: mainAttributes.height,
  };

  const blurAlt = !isLoadingBlur ? blurAttributes.alt : '';
  const mainAlt = !isLoadingMain ? mainAttributes.alt : '';

  const hasImage = Boolean(
    isLoadingMain
      ? mainAttributes.src || mainAttributes.srcSet
      : blurAttributes.src || blurAttributes.srcSet
  );

  return (
    <div className={cn('relative size-full', divClassName)}>
      {hasImage && (
        <>
          {/* blur image */}
          <img
            {...blurAttributes}
            {...commonAttributes}
            alt={blurAlt}
            onLoad={() => setIsLoadingBlur(false)}
            className={cn('object-cover absolute top-0 left-0 size-full', className)}
          />

          {/* main image */}
          <img
            {...mainAttributes}
            {...commonAttributes}
            alt={mainAlt}
            onLoad={handleLoadMain}
            className={cn(
              'object-cover absolute top-0 left-0 size-full',
              // important: don't hide main image until next blur image is loaded
              isLoadingMain && !isLoadingBlur ? 'opacity-0' : 'opacity-100',
              className
            )}
          />
        </>
      )}
    </div>
  );
};

That is a lot of code, so let’s break it down. First, note the use of relative and absolute classes to position the images on top of each other.

We set the initial src to an empty string in the initialAttributes variable. This sets the hasImage flag to true, unmounts the images, and displays an empty <div> that fills the parent container thanks to the size-full class (which is needed to prevent layout shift).

Next, note that we track the separate states isLoadingMain and isLoadingBlur for the main and blur images. Both are necessary so we can correctly show/hide the main image by changing its opacity from opacity-0 to opacity-100. The general idea is this: “Always keep the blur image below, just show or hide the main image above.”

Additionally, we track the previous main image, prevMainAttributes, to detect when a new image is selected via the onClick event passed from the parent component.

Finally, while an image is loading, we set its alt attribute (using the blurAlt and mainAlt variables) to an empty string to avoid rendering text in place of an empty image, as it doesn’t look nice.

Bonus tip: You can also experiment with the <img style={{imageRendering: 'pixelated'}} /> scaling style on the blur image if you find it more aesthetically pleasing.

Cumulative layout shift

This is also an interesting part. In general, the server always sends images with their sizes (at least it should), which makes handling layout shifts easier, so we should be able to solve it properly.

The key point is this: Set the component’s actual size in the server component ImageRandom.astro and use w-full h-full (size-full) in the client ImageRandom.tsx React component to stretch it to fill the parent. This way, the size is resolved on the server, and there is no shift when hydrating the client component.

Lets see it in practice src/components/ImageRandom.astro#L21

---
// add 'px' suffix or styles will fail
const { width, height } = Object.fromEntries(
  Object.entries(IMAGE_SIZES.FIXED.MDX_XL_16_9).map(([key, value]) => [key, `${value}px`])
);
---

{/* height and width MUST be defined ON SERVER component to prevent layout shift */}
{/* set height and width to image size but set real size with max-height and max-width */}

<div
  class={cn('max-w-full max-h-64 md:max-h-96 my-8', className)}
  style={{ width, height }}
  {...props}
>
  <ImageRandomReact {galleryImages} client:load />
</div>

We use the max-w-... and max-h-... classes to set the actual (responsive) size for the server component, which the client component will fill.

The my-8 margin is there to override the vertical margin styles for the image component in the markdown (prose class). Remember, we have two actual, absolutely positioned <img /> tags in the DOM, so prose will add double margins, and we need to correct that.

Client component src/components/react/ImageBlurPreloader.tsx:

const ImageBlurPreloader: FC<Props> = ({
  // ...
  className,
  divClassName,
}) => {

  // ...

  return (
    <div className={cn('relative size-full', divClassName)}>
      {hasImage && (
        <>
          {/* blur image */}
          <img
            className={cn('object-cover absolute top-0 left-0 size-full', className)}
          />

          {/* main image */}
          <img
            className={cn(
              'object-cover absolute top-0 left-0 size-full',
            )}
          />
        </>
      )}
    </div>
  );
};

In the client component, we simply stretch all elements with size-full to fill the parent server component.

With this in place, we achieve the following score for the cumulative layout shift:

Completed code and demo

Demo: https://nemanjamitic.com/
Github repository: https://github.com/nemanjam/nemanjam.github.io

The relevant files:

# https://github.com/nemanjam/nemanjam.github.io/tree/c1e105847d8e7b4ab4aaffad3078726c37f67528
git checkout c1e105847d8e7b4ab4aaffad3078726c37f67528

# random image code
src/pages/index.mdx
src/components/ImageRandom.astro
src/components/react/ImageRandom.tsx
src/components/react/ImageBlurPreloader.tsx

# common code with gallery
src/libs/gallery/images.ts
src/libs/gallery/transform.ts
src/constants/image.ts

Outro

Once again, we played around with images, Astro, and React. Have you implemented any similar components yourself, maybe a carousel? What was your approach? Do you have suggestions for improvements or have you spotted anything incorrect? Don’t hesitate to leave a comment below.

References

React image preloader tutorial https://benhoneywill.com/progressive-image-loading-with-react-hooks/
Astro documentation, tutorial how to use getImage() function https://docs.astro.build/en/recipes/build-custom-img-component/
“Squared” image scaling algorithm styles https://www.w3schools.com/cssref/css3_pr_image-rendering.php

Build an image gallery with Astro and React

nemanja.mitic.elfak@hotmail.com (Nemanja Mitic) — Wed, 02 Apr 2025 00:00:00 GMT

import { Image } from ‘astro:assets’;

import { IMAGE_SIZES } from ’../../../../constants/image’;

import LayoutShiftBeforeImage from ’../../../../content/post/2025/04-02-astro-react-gallery/_images/layout-shift-before.png’; import LayoutShiftAfterImage from ’../../../../content/post/2025/04-02-astro-react-gallery/_images/layout-shift-after.png’;

import OverviewVideo from ’../../../../content/post/2025/04-02-astro-react-gallery/_images/overview.webm’; import ResponsiveImagesVideo from ’../../../../content/post/2025/04-02-astro-react-gallery/_images/responsive-images-1.5x.webm’; import InfiniteScrollLoaderVideo from ’../../../../content/post/2025/04-02-astro-react-gallery/_images/infinite-scroll-loader.webm’;

Introduction

I wanted to have a simple, Instagram-like, scroll paginated gallery page on the website where I could share my everyday photos. Initially I implemented it using benhowell/react-grid-gallery package for gallery, and frontend-collective/react-image-lightbox for lightbox component. It worked ok, but since those are a bit legacy packages I was unable to upgrade to React 19, it loaded all images at once without scroll pagination and Lighthouse score wasn’t so great.

You can see that implementation if you navigate back in Git history e0165b:

# in git history navigate back to the old gallery commit
git checkout e0165b295db2ccc72bbbb7be4bdd7eb48f7dedae

# preview
yarn clean && yarn install && yarn dev

I decided to reimplement it, did a quick research and decided to make my own gallery component and use dimsemenov/photoswipe package for lightbox. And that’s how this article got created, while implementing I took notes about the most important and interesting parts from the process. Look at it as not necessarily the absolute best way to make image gallery with Astro and React but as one of the ways that is proven in practice and works well.

What we will be building

Demo: https://nemanjamitic.com/gallery
Github repository: https://github.com/nemanjam/nemanjam.github.io

{/* overview.webm - https://github.com/user-attachments/assets/59744bb9-3c87-4e2c-9b10-1c830f9af554 */} <video {…IMAGE_SIZES.FIXED.MDX_LG} controls>

Image - server component, client component, slot, props

This is the first dilemma and initial decision that affects all the future code that we write. Since this is a static website example we are naturally inclined to pre-render everything we can at build time, but can this work for images too?

Astro provides <Image /> component and it’s a server component like any other Astro component. It is clear that we will need onLoad, onClick events on a image and events aren’t possible on a server component. Yes, but maybe we can use client component wrapper and pass Astro <Image /> component as a slot so we can have best from both - Astro component for image optimization and a <div /> for events, could this work?

Not really, for any preload effects onLoad event needs to be on the <img /> tag, but more important is that we can’t pass any client props to the slot <Image /> component, we can generate only a single instance at build time. For any props values we would need to pregenerate separate image HTML which in this case is highly impractical.

Conclusion: We will use a React client component that supports interactivity and Astro getImage() function to optimize the images.

API route vs `import.meta.glob()`

We want to stick to a static website, for performance reasons and convenient deployments. What way should we use to pass the image urls to the client? We could make a static API endpoint that serves JSON array. We could even make an parametrized API endpoint that serves optimized images.

Right away, why having an extra HTTP call for JSON on client when we can pregenerate image urls at build time, it’s not what we want.

For a static API endpoint, since it’s static we would need to pre-render all params at build time, so we could do http://localhost/api/gallery/xl/image1.webp but not http://localhost/api/gallery/300x200/image1.webp and http://localhost/api/gallery/301x200/image1.webp, for that we would need to enable Astro server side rending and have Node.js runtime in production.

If we log a src attribute of an imported image in dev and prod mode we will see something like this:

// in dev
http://localhost:3000/_image?href=/@fs/home/username/Desktop/nemanjam.github.io/src/assets/images/all-images/morning1.jpg?origWidth=4608&origHeight=2592&origFormat=jpg&w=1280&h=720&f=webp

// in prod
http://localhost:3000/_astro/morning1.CEdGhKb3_nVk9T.webp

So Astro is already serving images for us, with a dedicated API endpoint we would just accomplish human friendly url rewriting, that could be useful only if some external service fetches those images, which we don’t have here.

Conclusion: We will use import.meta.glob('/src/assets/images/all-images/*.jpg') from Vite to import images as modules to obtain images at build time and pass them as props into the Gallery component.

The code is as follows src/libs/gallery/images.ts#L16:


export const getGalleryImagesMetadata = (): ImageMetadata[] => {
  const imageModules = import.meta.glob<{ default: ImageMetadata }>(
    // can't be a variable
    '/src/assets/images/all-images/*.jpg',
    { eager: true }
  );

  // convert map to array
  const imagesMetadata = Object.keys(imageModules)
    // filter excluded filenames
    .filter((path) => !EXCLUDE_IMAGES.some((excludedFileName) => path.endsWith(excludedFileName)))
    // return metadata array
    .map((path) => imageModules[path].default);

  return imagesMetadata;
};

Code structure

We will structure code like this: MDX (gallery.mdx) -> Astro component (Gallery.astro) -> React component (Gallery.jsx). The call stack is top-down, MDX is a declarative presentation layer, Astro component will resolve data - images, React component will handle events and define logic, it’s the most complex layer.

Code (paraphrased):

// src/pages/gallery.mdx

<Gallery class="not-prose grow" />

// src/components/Gallery.astro

<ReactGallery client:only="react" images={randomizedGalleryImages} />

// src/components/react/Gallery.tsx

<div className="grid grid-cols-1 gap-1 sm:grid-cols-2 lg:grid-cols-3">
  {loadedImages.map((image) => (
    <img {...imageProps} />
  )}
</div>

Static generation, include image urls and `map()` on the client

Again, interesting and important point that is easy to forget is that images.map() needs to be in React component in order to have infinite scroll pagination. For that all image urls (and other props) need to be bundled and available on client, that is passed as props from Astro to the React component.

If we placed images.map() in the Astro component we would we would have a single image list as is without any interactivity (pagination on scroll).

Reminder: Static “backend” runs only once - at build time. We have a Node.js runtime only in development, and not in production - in there we have just a webserver static folder for serving assets. Kind of obvious, but it can sometimes be overlooked when we decide whether to put certain code in a server or client component.

Responsive, optimized images - `getImage()` and `<img srcset sizes />`

Astro provides getImage() function that we will use to optimize images and generate <img /> tag attributes for the client. It accepts the same arguments as the <Image /> component. Note, <img /> tag supports srcset and sizes attributes for responsive images which is sufficient for our use case. This time we don’t need <picture /> support for different images (art direction) and different formats.

We will prepare different image presets (sizes) in src/libs/gallery/transform.ts#L7:

Note that only thumbnail uses responsive image, and lightbox uses a fixed size image since Photoswipe lightbox doesn’t support responsive image (at least without a custom component).


// common props
const defaultAstroImageOptions = {
  format: 'webp',
};

// thumbnail preset
export const thumbnailImageOptions = {
  ...IMAGE_SIZES.RESPONSIVE.GALLERY_THUMBNAIL,
};

// lightbox preset
export const lightboxImageOptions = {
  ...IMAGE_SIZES.FIXED.MDX_2XL_16_9,
};

// getImage() wrapper
export const getCustomImage = async (options: UnresolvedImageTransform): Promise<GetImageResult> =>
  getImage({
    ...defaultAstroImageOptions,
    ...options,
  });

After that we use getCustomImage() to optimize gallery images that we previously loaded with import.meta.glob() in src/libs/gallery/images.ts#L50:


export const getGalleryImages = async (): Promise<GalleryImage[]> => {
  const thumbnails = await getCustomImages(thumbnailImageOptions);
  const lightBoxes = await getCustomImages(lightboxImageOptions);

  const galleryImages = mergeArrays(thumbnails, lightBoxes).map(([thumbnail, lightbox]) => ({
    thumbnail: imageResultToImageAttributes(thumbnail),
    lightbox: imageResultToImageAttributes(lightbox),
  }));

  return galleryImages;
};

// select only needed attributes for the <img /> tag
export const imageResultToImageAttributes = (imageResult: GetImageResult): ImgTagAttributes => ({
  src: imageResult.src,
  srcSet: imageResult.srcSet?.attribute,
  ...imageResult.attributes,
});

Now we have the ready <img /> attributes (props) available to pass into the React gallery client component.

Interesting part is configuring <img /> sizes (sizes and widths args in getImage()) attribute for responsive images in src/constants/image.ts#L86:


GALLERY_THUMBNAIL: {
  widths: [TW_SCREENS.XS, TW_SCREENS.SM],
  sizes: `(max-width: ${TW_SCREENS.SM}px) ${TW_SCREENS.SM}px, ${TW_SCREENS.XS}px`,
},

// actual <img /> tag attributes that are generated with the GALLERY_THUMBNAIL
<img 
  sizes="(max-width: 640px) 640px, 475px" 
  srcset="
    /_astro/river16.CcFOUvED_Z2d5kbP.webp 475w, 
    /_astro/river16.CcFOUvED_Z16pb6L.webp 640w
  " 
  src="/_astro/river16.CcFOUvED_Z1Dswo2.webp"
  width="4000" 
  height="2252" 
/>

// src/components/react/Gallery.tsx

<div
  id={GALLERY_ID}
  className="pswp-gallery grid grid-cols-1 gap-1 sm:grid-cols-2 lg:grid-cols-3"
>
...
</div>

If you are not familiar with defining responsive images, it’s not that complicated as it seems. The code above basically says, bellow SM screen breakpoint (640px) use SM size (width) (640px) image, and if screen is wider than SM use smaller XS (475px) image. Maybe unexpected to use smaller image for larger screen, but it makes sense when you look at responsive grid that is used for the gallery layout.

You can see in grid classes that bellow sm: breakpoint image uses full width of the layout and above sm: there are 2 images per row, above lg: 3 images per row, so it makes sense to use the larger image on smaller screens.

While configuring responsive images it’s advisable to preview what is generated in the browser and ensure that result meets the expectation, we have sharp images at all resolutions and not too large image files.

{/* responsive-images-1.5x.webm - https://github.com/user-attachments/assets/48d00b22-feac-40d4-a98b-2e840824fc7f */} <video {…IMAGE_SIZES.FIXED.MDX_LG} controls>

Blur preloader, CSS transition

Large lightbox image will handle Photoswipe on its own, we won’t interfere with it for now. But we can have some nice effect on thumbnail images on infinite scroll. They are already small enough to load fast so no need to use smaller resolution image for blur preloader, we can achieve the same effect with a simple CSS transition.

The following code does that src/components/react/Gallery.tsx#L132


const [loadedImages, setLoadedImages] = useState<GalleryImage[]>([]);

const isLoadingPageImages = useMemo(
  () => !Object.values(loadedStates).every(Boolean),
  [loadedStates, loadedImages.length]
);

useEffect(() => {
  const callback: IntersectionObserverCallback = (entries) => {
    // must wait here for images to load
    if (!isEnd && !isLoadingPageImages && entries[0].isIntersecting) {
      setPage((prevPage) => prevPage + 1);
    }
  };
  
  // ...
  
  // page dependency is important for initial load to work for all resolutions
}, [observerTarget, page, isEnd, isLoadingPageImages]);


const handleLoad = (src: string) => {
  setLoadedStates((prev) => ({ ...prev, [src]: true }));
};

{loadedImages.map((image) => (
// ...
    <img
      {...image.thumbnail}
      onLoad={() => handleLoad(image.thumbnail.src)}
      alt={loadedStates[image.thumbnail.src] ? 'Gallery image' : ''}
      className={cn(
        'w-full transition-all duration-[2s] ease-in-out',
        loadedStates[image.thumbnail.src]
          ? 'opacity-100 blur-0 grayscale-0'
          : 'opacity-75 blur-sm grayscale'
      )}
    />
))}

Note that we have a map() call here and we are storing loading states for an array of images. This is because we want to have a smooth transition for the entire new page of images, not for each image separately because they will load randomly and that’s less esthetic. Important part is isLoadingPageImages variable, it is used to block loading a new page until all images from the previous page are loaded. This happens in the observer callback condition if (!isEnd && !isLoadingPageImages && entries[0].isIntersecting).

Another part is CSS transition, duration-[...] should be picked so it takes more than actual thumbnail image loading time. For the transition effect, you can play around with opacity and Tailwind’s filter classes and see what looks nicest to you.

Infinite scroll

We want to implement pagination through infinite scroll like e.g. Instagram. Obviously, for this, Gallery needs to be a client component and we will use IntersectionObserver to detect the bottom of the gallery and trigger loading a new page of images. For the observer we could use ready-made hooks from utility libraries like uidotdev/usehooks or streamich/react-use but lets go with our own custom implementation this time.

The code for this is in src/components/react/Gallery.tsx#L76:


// sets only page
useEffect(() => {
  const callback: IntersectionObserverCallback = (entries) => {
    // must wait here for images to load
    if (!isEnd && !isLoadingPageImages && entries[0].isIntersecting) {
      setPage((prevPage) => prevPage + 1);
    }
  };
  const debouncedCallback = debounce(callback, OBSERVER_DEBOUNCE);
  const options: IntersectionObserverInit = { threshold: 1 };

  const observer = new IntersectionObserver(debouncedCallback, options);

  const observerRef = observerTarget.current;
  if (observerRef) observer.observe(observerRef);

  return () => {
    if (observerRef) observer.unobserve(observerRef);
  };
  // page dependency is important for initial load to work for all resolutions
}, [observerTarget, page, isEnd, isLoadingPageImages]);

There are 3 important parts in this code:

We need to include page state variable in the useEffect dependencies array because we want to trigger effect execution every time new page of images loads and height of gallery increases. Also note that we read page state value from the state setter callback argument setPage((prevPage) => prevPage + 1), that’s why we must also list page in useEffect dependencies array.
We need to be precise about when we are loading new page of images. Note this condition if (!isEnd && !isLoadingPageImages && entries[0].isIntersecting), it practically means “load new page of images whenever 1. we haven’t loaded all images AND 2. previous page of images is fully loaded - for esthetics AND 3. the gallery is scrolled to the bottom - main prerequisite.
The observer callback() triggers quite often, so we need to limit the frequency by debouncing. Note OBSERVER_DEBOUNCE constant value needs to be fine tuned and validated through practical trial and error.

Another important and interesting part is detecting bottom of the page and displaying loader UI:


{/* control threshold with margin-top */}
{/* must be on top so loader doesn't affect it */}
<div ref={observerTarget} className="mt-0" />

<div
  className={cn(
    // duration-500 is related to OBSERVER_DEBOUNCE: 300
    'flex items-center justify-center transition-all duration-500 ease-in-out',
    shouldShowLoader ? 'min-h-48' : 'min-h-0'
  )}
>
  {shouldShowLoader && <PiSpinnerGapBold className="size-10 sm:size-12 animate-spin mt-4" />}
</div>

This can be tricky because they are circularly dependent - detection triggers showing loader and displaying loader affects position of detection <div ref={observerTarget}/>. Another thing ot note is that detection div has zero height and is placed either above or bellow the loader. It is important to be the above loader because we are interested in the bottom of the images, not the loader that will disappear from the UI in a few milliseconds anyway.

Another important part is controlling and fine-tuning the threshold of the observed element <div ref={observerTarget}/>. We do this by adjusting the positioning with className="mt-0", controlling the observers callback execution frequency with OBSERVER_DEBOUNCE, setting the transition timing for the loader element duration-500, specifying how many images we load (number of rows in the gallery) using the pageSize constant, and how many pages of images we load initially on the first screen initialPage constant.

All of these parameters are connected together and you need to fine tune them for smooth infinite scroll experience. Also note that pageSize and initialPage constants are responsive and need to be defined for each breakpoint independently for full and ergonomic control.

You can see that in the constants file in src/constants/gallery.ts#L7


export const GALLERY = {
  GALLERY_ID: 'my-gallery',
  // Todo: make it responsive
  /** step. */
  PAGE_SIZE: {
    XS: 1,
    SM: 2,
    LG: 3,
  },
  /** page dependency in useEffect is more important. To load first screen quickly, set to 3 pages */
  INITIAL_PAGE: {
    XS: 3,
    SM: 3,
    LG: 3,
  },
  /** fine tuned for scroll */
  OBSERVER_DEBOUNCE: 300,
} as const;

And the mapping to translate constants into usable pageSize and initialPage values are defined in utility functions in src/utils/gallery.ts#L8:


const { PAGE_SIZE, INITIAL_PAGE } = GALLERY;

// related to gallery grid css
const breakpointToPageKey = {
  XXS: 'XS',
  XS: 'XS',
  SM: 'SM',
  MD: 'SM',
  LG: 'LG',
  XL: 'LG',
  _2XL: 'LG',
} as const;

const defaultPageKey = 'LG' as const;

export const getPageSize = (breakpoint: Breakpoint): number => {
  const key = breakpointToPageKey[breakpoint] ?? defaultPageKey;
  const pageSize = PAGE_SIZE[key];

  return pageSize;
};

export const getInitialPage = (breakpoint: Breakpoint): number => {
  const key = breakpointToPageKey[breakpoint] ?? defaultPageKey;
  const initialPage = INITIAL_PAGE[key];

  return initialPage;
};

With this, we have a smooth scrolling experience on all screen sizes:

{/* infinite-scroll-loader-1.5x.webm - https://github.com/user-attachments/assets/c4deb895-5b48-43d6-ae4e-aa34518fc317 */} <video {…IMAGE_SIZES.FIXED.MDX_LG} controls>

Also pay attention how we “fetch” a new page of images to update:


const fetchImagesUpToPage = (
  images: GalleryImage[],
  pageSize: number,
  nextPage: number
): GalleryImage[] => {
  const endIndex = nextPage * pageSize;
  const isLastPage = endIndex >= images.length;

  // for fetchPageImages pagination startIndex must use loadedImages and not all images and page
  const selectedImages = images.slice(0, endIndex);

  // load all images for last page
  return !isLastPage ? sliceToModN(selectedImages, pageSize) : selectedImages;
};

// converts page to loaded images
useEffect(() => {
  const upToPageImages = fetchImagesUpToPage(images, pageSize, page);
  setLoadedImages(upToPageImages);
}, [page, images, pageSize]);

There are 2 important moments here:

Since we have a static website all image urls are already included and available on the client so we don’t need to calculate the starting index and can simply use zero images.slice(0, endIndex);. Usually pagination implies a network and database calls that require both startIndex and endIndex, and if we went that path we would need to calculate startIndex by finding the last element of the loadedImages state array in the images array and pass those as arguments.
Since the pageSize constant is responsive it can change when e.g. user resizes the browser window, so we call sliceToModN(selectedImages, pageSize) for evenly loaded new row. Note that we don’t call this for the last page because, eventually, we want to load all images, and the correct loadedImages array length is important for calculating the isEnd variable.

Cumulative layout shift

Layout shift is important web vitals parameter and it’s more challenging to optimize here since we are dealing with a dynamic client components. In the Gallery component we handle this by setting initialPage constant to load enough images to fill the initial gallery screen.


export const GALLERY = {
  // ...
  PAGE_SIZE: {
    XS: 1,
    SM: 2,
    LG: 3,
  },
  INITIAL_PAGE: {
    XS: 3,
    SM: 3,
    LG: 3,
  },
  // ...
} as const;

Another optimization we can do is to stretch the empty gallery container element with flex grow. For that we need to modify the Page layout and pass the required Tailwind classes via the MDX frontmatter and articleClass prop.

You can see that in src/layouts/Page.astro#L38:

---

import Centered from '@/layouts/Centered.astro';
import { getOpenGraphImagePath } from '@/libs/api/open-graph/image-path';
import { cn } from '@/utils/styles';

export interface Content {
  // ...
  class?: string;
  /** for flex flex-grow min-height to prevent layout shift for client components */
  articleClass?: string;
}

// ...

const { title, description, class: className, articleClass } = content;

// ...

---

<Centered {metadata} class={cn(className)}>
  {/* in general must not have flex, it will disable margin collapsing in MDX */}
  <article class={cn('my-prose', articleClass)}>
    <slot />
  </article>
</Centered>

Flex class is passed from MDX frontmatter in src/pages/gallery.mdx#L7:


---
layout: '../layouts/Page.astro'
...
class: 'max-w-5xl'
articleClass: 'grow flex flex-col'
---

import Gallery from '../components/Gallery.astro';

# Gallery

<Gallery class="not-prose grow" />

This will reduce the shift of DOM elements size, it won’t make it perfect like in fully static page but for our use case it’s good enough.

Another point to make is that flex container will disable margin collapsing which is important for proper vertical spacings in MDX generated HTML. So if you do that you will need to add an additional <div> wrapper element without flex to re-enable proper margin collapsing.

Lighthouse score, old gallery:

Lighthouse score, new gallery:

Please ignore the “Accessibility” score above, since the accessibility attributes aren’t yet tackled on the entire website.

For previewing images in full screen lightbox we will use ready made library Photoswipe that looks solid, reliable and flexible. We will use a basic React example from the documentation.

This is the code src/components/react/Gallery.tsx#L98


// lightbox
useEffect(() => {
  let lightbox: PhotoSwipeLightbox | null = new PhotoSwipeLightbox({
    gallery: '#' + GALLERY_ID,
    children: 'a',
    pswpModule: () => import('photoswipe'),
  });
  lightbox.init();

  return () => {
    lightbox?.destroy();
    lightbox = null;
  };
}, []);

return (
  <>
    <div
      id={GALLERY_ID}
      className="pswp-gallery grid grid-cols-1 gap-1 sm:grid-cols-2 lg:grid-cols-3"
    >
      {loadedImages.map((image) => (
        <a
          key={`${GALLERY_ID}--${image.lightbox.src}`}
          // lightbox doesn't support responsive image
          href={image.lightbox.src}
          data-pswp-width={image.lightbox.width}
          data-pswp-height={image.lightbox.height}
          target="_blank"
          rel="noreferrer"
        >
          <img
              {...image.thumbnail}
            // ...
          />
        </a>
      ))}
    </div>
  {/* ... */}
  <>

Note that for a simplicity sake we are using a simple fixed image and Photoswipe implements scale transition on its own. By default it uses a simple link <a href={image.lightbox.src}> to load the <img src /> in the full page lightbox.

This is a tradeoff for simplicity. Loading a responsive image with srcset would require integrating a custom component which could be a topic for another article. Another possible improvement is to enable closing lightbox on backdrop click on mobile which is not the case with the default config.

Lightbox image size is defined in src/libs/gallery/transform.ts#L24


export const lightboxImageOptions = {
  ...IMAGE_SIZES.FIXED.MDX_2XL_16_9,
};

// src/constants/image.ts

export const IMAGE_SIZES = {
  FIXED: {
    // ...
    MDX_2XL_16_9: { width: TW_SCREENS._2XL, height: TW_SCREENS.HEIGHTS._2XL },
  },
  // ...
};

Completed code and demo

Demo: https://nemanjamitic.com/gallery
Github repository: https://github.com/nemanjam/nemanjam.github.io

The relevant files:

# new gallery https://github.com/nemanjam/nemanjam.github.io/tree/c1e105847d8e7b4ab4aaffad3078726c37f67528
git checkout c1e105847d8e7b4ab4aaffad3078726c37f67528

src/pages/gallery.mdx
src/components/Gallery.astro
src/components/react/Gallery.tsx
src/libs/gallery/images.ts
src/libs/gallery/transform.ts
src/utils/gallery.ts
src/constants/gallery.ts
src/constants/image.ts
src/components/react/hooks/useScrollDown.tsx
src/components/react/hooks/useWidth.tsx

# old gallery https://github.com/nemanjam/nemanjam.github.io/tree/e0165b295db2ccc72bbbb7be4bdd7eb48f7dedae
git checkout e0165b295db2ccc72bbbb7be4bdd7eb48f7dedae

Outro

That was a pretty long read, thank you for your attention and dedication. Have you implemented an Astro image gallery yourself and used a different approach? Do you have suggestions for improvements or spotted anything incorrect? Don’t hesitate to leave a comment below.

References

Astro gallery example, inspiration to take Photoswipe for a lightbox component https://github.com/EmaSuriano/astro-art-portfolio
Photoswipe documentation https://photoswipe.com/getting-started
Astro documentation, tutorial how to use getImage() function https://docs.astro.build/en/recipes/build-custom-img-component/
Infinite scroll with React and IntersectionObserver tutorial https://blog.logrocket.com/react-infinite-scroll/ and Codesandbox example https://codesandbox.io/p/github/Elijah-trillionz/react-infinite-scroll/master
Images in Astro as client components, useful Reddit discussion https://www.reddit.com/r/astrojs/comments/1bia6lq/how_to_utilize_image_with_react_component