MemoGraph is an AI-driven photo management pipeline designed to automatically organize, analyze, and enrich your photo collections, turning them into structured and searchable memories.
It processes a folder of images, extracts metadata, generates descriptive captions, detects faces and species, resolves locations, and creates insightful reports like a daily blog summary and an interactive map.
- EXIF Extraction: Scans images and extracts metadata like date, time, and GPS coordinates.
- Automated Tagging: Uses AI to generate tags for objects, scenes, and even specific species.
- AI Captioning: Generates human-like captions for your photos.
- Face Detection: Identifies photos that contain people.
- Location Resolution: Converts GPS data into human-readable addresses.
- Trip Organization: Automatically groups photos by day.
- Report Generation: Creates a Markdown blog and a JSON summary of your trip.
- Interactive Map: Generates an HTML map plotting your geotagged photos.
Follow these steps to set up your local environment.
-
Python 3.12.3 (exact version used during development)
-
CMake: Required for one of the Python dependencies (
dlib).Linux:
sudo apt-get update sudo apt-get install cmake
Windows:
You can install CMake on Windows using a package manager like Chocolatey or by downloading the installer from the official website.
- Using Chocolatey:
choco install cmake
- Manual Installation:
- Download the latest installer from the CMake website.
- Run the installer and make sure to select the option "Add CMake to the system PATH for all users" or "Add CMake to the system PATH for the current user".
- Using Chocolatey:
It's highly recommended to use a virtual environment to manage dependencies.
Linux/macOS:
python3 -m venv .venv
source .venv/bin/activateWindows:
python -m venv .venv
.\.venv\Scripts\activateInstall all the required packages from requirements.txt.
pip install -r requirements.txtNote: The exact versions of the packages used during development are listed in requirements.txt to ensure compatibility.
+### dlib GPU Support (Postponed) + +Attempts to enable dlib GPU support encountered runtime issues. For now, dlib will run on CPU. GPU support will be addressed in future development. +
The main pipeline is executed through the run_all.py script.
-
Place your photos in a directory (e.g.,
data/trips/my_awesome_trip). -
Run the pipeline (sequential mode by default):
python run_all.py data/trips/my_awesome_trip
-
Optional: enable internal parallelism for heavy steps by setting an environment variable:
# Linux/macOS export MEMOGRAPH_PARALLEL_EXECUTION=true python run_all.py data/trips/my_awesome_trip # Windows (PowerShell) $env:MEMOGRAPH_PARALLEL_EXECUTION = "true" python run_all.py data/trips/my_awesome_trip
At the top level, steps like Faces → Labels → Captions → AI Captions → Species still run one after another to avoid race conditions on
labels.csv, but each step is free to use threads / processes internally. -
Reset and rerun a trip (clean MemoGraph):
# Remove <trip>/MemoGraph and then run the full pipeline python run_all.py data/trips/my_awesome_trip --resetTo only clean the existing MemoGraph without starting the pipeline, use:
python run_all.py data/trips/my_awesome_trip --reset-only
-
Check the output: All generated files (CSV, logs, blog, map) will be placed in a
MemoGraphfolder inside your trip directory. -
Web app & context (auto):
run_all.pynow also writesblog_context.jsonand a static gallery atMemoGraph/webapp/index.html, generating JPEG thumbnails inMemoGraph/thumbnailsso the UI loads quickly even on large trips.
MemoGraph's run_all.py runs the following steps sequentially (each step calls
the script named in parentheses):
- Scan + EXIF ingest (
image_scanner.py) - Day assignment (
trip_day_assigner.py) - GPS resolution + early map preview (
location_resolver.py) - Faces (
face_detector.py, optional face recognition viaface_recognizer.py) - Image labels (CLIP) (
image_labeler.py) - BLIP captions (
caption_filler.py) - BLIP AI captions (
generate_ai_captions.py) - Species tags + bounding boxes (OWLv2 + BioCLIP 2, with CLIP fallback) (
species_detector.py) - Image type classification (CLIP prompts) (
image_type_detector.py) - Image quality scoring (histogram/exposure/sharpness/noise heuristics) (
image_quality.py) - Color palette extraction (
image_colors.py) - Blog + summary (
blog_generator.py) - Final map + overview page (
map_visualizer.py) blog_context.jsonbuilder (build_blog_context.py)- Static Leaflet gallery/map web app + thumbnails (
build_webapp.py)
Every run writes a complete MemoGraph folder containing labels.csv, blog.md,
trip_summary.json, trip_map.html, trip_overview.html, blog_context.json,
webapp/index.html, per-step logs, and JPEG thumbnails under MemoGraph/thumbnails.
| Step | Script | GPU | CPU | Network | Parallelizable |
|---|---|---|---|---|---|
| 1 | image_scanner.py |
❌ | ✅ | ❌ | No |
| 2 | trip_day_assigner.py |
❌ | ✅ | ❌ | No |
| 3 | location_resolver.py |
❌ | ✅ | ✅ | No (rate-limited) |
| 4 | map_visualizer.py (preview) |
❌ | ✅ | ❌ | No |
| 5 | face_detector.py |
✅ preferred | ✅ fallback | ❌ | Yes (ProcessPool) |
| 6 | image_labeler.py (CLIP) |
✅ required | ❌ | ❌ | No |
| 7 | caption_filler.py (BLIP) |
✅ required | ❌ | ❌ | Yes (ThreadPool) |
| 8 | generate_ai_captions.py (BLIP) |
✅ required | ❌ | ❌ | No |
| 9 | species_detector.py (OWLv2+BioCLIP) |
✅ required | ❌ | ❌ | No |
| 10 | image_type_detector.py (CLIP) |
✅ required | ❌ | ❌ | No |
| 11 | image_quality.py |
❌ | ✅ | ❌ | No |
| 12 | image_colors.py |
❌ | ✅ | ❌ | No |
| 13 | blog_generator.py |
❌ | ✅ | ❌ | No |
| 14 | map_visualizer.py (final) |
❌ | ✅ | ❌ | No |
| 15 | build_blog_context.py |
❌* | ✅ | ❌ | No |
| 16 | build_webapp.py |
❌ | ✅ | ❌ | No |
*Optional GPU if YOLO/Places365 extras enabled via --include-extras
| Script | Config Variable | Default | Purpose |
|---|---|---|---|
face_detector.py |
MAX_IMAGE_SIZE |
512px | Face detection (CNN/HOG) |
image_labeler.py |
MAX_IMAGE_SIZE |
512px | CLIP object/scene detection |
caption_filler.py |
MAX_IMAGE_SIZE |
512px | BLIP captioning |
generate_ai_captions.py |
MAX_IMAGE_SIZE |
512px | BLIP AI captions |
species_detector.py |
MAX_IMAGE_SIZE |
512px | CLIP species detection |
image_type_detector.py |
MAX_IMAGE_SIZE |
512px | CLIP image classification |
image_quality.py |
QUALITY_MAX_SIZE |
512px | Histogram/quality analysis |
image_colors.py |
Hardcoded | 150px | Color quantization |
build_webapp.py |
THUMBNAIL_MAX_SIZE |
320px | Web gallery thumbnails |
image_scanner → trip_day_assigner → location_resolver → map_preview
↓
┌─────────────────────────────────────────┐
│ ANALYSIS SUITE (sequential) │
│ │
│ GPU Tasks: CPU Tasks: │
│ ├─ face_detector ├─ image_quality │
│ ├─ image_labeler └─ image_colors │
│ ├─ caption_filler │
│ ├─ generate_ai_captions │
│ ├─ species_detector │
│ └─ image_type_detector │
└─────────────────────────────────────────┘
↓
blog_generator → map_final → build_blog_context → build_webapp
↓
build_trip_index
| Column | Written By | Read By |
|---|---|---|
image_name, local_path, md5sum |
image_scanner | All scripts |
datetime_original, device_model |
image_scanner | day_assigner, blog, webapp |
gps_lat, gps_lon |
image_scanner, location_resolver | map_visualizer, webapp |
location_inferred |
location_resolver | blog_generator, webapp |
day_number |
trip_day_assigner | blog_generator, webapp |
faces_detected, faces_count |
face_detector | blog_context, webapp |
face_locations |
face_detector | blog_context, webapp (bounding box overlays) |
detected_objects |
image_labeler | species_detector, blog, webapp |
caption, caption_samples |
caption_filler | blog_generator, webapp |
caption_ai |
generate_ai_captions | blog_generator, webapp |
species_tags |
species_detector (OWLv2+BioCLIP or CLIP fallback) | blog, webapp |
species_boxes |
species_detector (OWLv2 bounding boxes) | blog_context, webapp (bounding box overlays) |
image_type |
image_type_detector | species_detector, webapp |
quality_score, exposure_score, etc. |
image_quality | webapp |
color_palette |
image_colors | webapp |
MemoGraph supports internal parallelism for computationally intensive steps (face detection, image labeling, caption generation, species detection).
-
When
MEMOGRAPH_PARALLEL_EXECUTION=true, scripts likeface_detector.pyandcaption_filler.pyuse multiple cores/threads where appropriate, but high-level steps are sequenced to keep CSV writes safe. -
The pipeline monitors CPU, RAM, and GPU usage in the main process and its children, logging per-step snapshots to:
data/trips/<trip_folder>/MemoGraph/logs/resource_usage.csv -
Why CPU% can exceed 100%: the reported CPU value is the sum of CPU usage across all cores for the process tree. For example, ~600% means roughly 6 cores were busy at that sampling point.
You can customize the behavior of the scripts by editing memograph_config.py. This file contains settings for:
- File paths and extensions
- CSV headers
- Logging and backup options
- Image size and parallelism knobs (e.g.,
MAX_IMAGE_SIZE,FACE_DETECTION_BATCH_SIZE,FACE_DETECTION_PARALLEL_WORKERS,CAPTION_PARALLEL_WORKERS). - Optional face recognition settings (
ENABLE_FACE_RECOGNITION,FACE_GALLERY_PATH,FACE_RECOGNITION_THRESHOLD) that allow you to recognise known faces in images after you build a face gallery from reference photos. - Image quality + context extras:
ENABLE_IMAGE_QUALITY+ related knobs to control thumbnail-sized histogram analysis.BLOG_CONTEXT_INCLUDE_EXTRASto decide whetherbuild_blog_contextshould run expensive YOLO/OCR/Places passes (defaultFalseto keep runs fast; override with CLI flags when needed).
The CSV schema includes an image_type column used for high-level content
classification (e.g., natural photo, document scan, meme/graphic, screenshot,
chart/plot). It is currently populated by scripts/image_type_detector.py
using CLIP zero-shot prompts and produces values such as natural_photo,
document_scan, meme_or_graphic, screenshot, and chart_or_plot.
The repository includes helper scripts for comparing runs and configurations:
-
scripts/compare_resolutions.py
Comparelabels.csvfields (faces, species, objects, captions) across multipleMemoGraph_*folders for a trip at differentMAX_IMAGE_SIZEvalues. -
scripts/compare_stats.py
Compare per-step CPU/RAM/GPU metrics across multipleMemoGraph_*folders using theirresource_usage.csvfiles. -
scripts/compare_labels_variants.py
Print a CSV-style table comparing chosen fields (e.g.,detected_objects,species_tags,caption,caption_ai) across any set ofMemoGraph_*variants.
These are documented in more detail in working.txt and task.txt, and are useful when deciding which resolution (e.g. 256, 512, 1024) gives acceptable accuracy for your models.
MemoGraph also produces:
- A first-pass human-readable trip blog (
blog.md) - A structured day summary (
trip_summary.json) - A rich context file (
blog_context.json, generated automatically byrun_all.py) that aggregates per-day themes/activities plus per-image captions, CLIP/YOLO/Places tags, species, faces, etc.- Heavy extras (YOLO detections, OCR text, Places365 scenes) are disabled by default to keep runs fast; set
BLOG_CONTEXT_INCLUDE_EXTRAS = Trueinmemograph_config.pyor callpython -m scripts.build_blog_context <trip> --include-extraswhen you specifically need them.
- Heavy extras (YOLO detections, OCR text, Places365 scenes) are disabled by default to keep runs fast; set
These files can be:
- Used as-is for quick trip overviews.
- Fed into an external LLM (see
blog_generation_prompt.md) if you want to generate a longer, more narrative travel blog using MemoGraph's captions, locations, and species as input. - Regenerated manually when needed via:
python -m scripts.build_blog_context data/trips/my_awesome_tripThis aggregates per-day times, locations, themes (mountains/roads/temples/markets/food/stays/astro/wildlife), CLIP labels, BLIP captions, YOLO objects, Places365 scene tags, and species into a single JSON file that is ideal for feeding into external text-generation models.
-
GPS propagation:
location_resolver.pycan infer GPS coordinates for photos that lack EXIF GPS by copying the last known coordinates from nearby-in-time images in the same trip. The time window is controlled by a config knob inmemograph_config.py(e.g.,GPS_PROPAGATION_MAX_MINUTES, default around 15 minutes). This helps fill in locations for images taken shortly before/after a geotagged photo on the same hike/drive. -
Early map preview:
After location resolution (including propagation),run_all.pycallsmap_visualizer.create_maponce to generate an initialtrip_map.htmlso you can open a basic map while heavier AI steps (faces, captions, species, etc.) continue in the background. -
Final map and overview:
At the end of the pipeline, the map is regenerated with full captions/species/image_type data. In addition,map_visualizer.create_overview_pagebuilds atrip_overview.htmlfile that:- Embeds the map on the left.
- Shows non-geotagged photos in a right-hand sidebar as cards (lazy-loaded thumbnails).
- Derives simple tags per image (e.g., people, birds, plants_flowers, insects, animals, landscapes, astro and the image_type categories).
- Provides a chip-style filter bar so you can interactively filter sidebar photos by these tags.
-
Static Leaflet gallery + map (
MemoGraph/webapp/index.html):build_webapp.pygenerates a modern, 3-column web application:- Left Sidebar: Interactive filters grouped by category (Nature, Structures, People, Tech, Food, etc.) and a "Clear Filters" button.
- Main Gallery: Scrollable grid/list/timeline views with smart thumbnails, showing color palettes, key tags, and quality scores.
- Right Map Pane: A sticky map that updates markers in real-time as you filter the gallery.
- Lightbox: A detailed full-screen view with a filmstrip, metadata panel (location, faces, camera info), a mini-map, color swatches, and detection bounding box overlays (faces in blue, species in green, person fallback in cyan).
- View Modes: Grid (default card layout), List (compact horizontal rows with metadata columns), and Timeline (photos grouped by day with sticky date headers).
- All assets are local; no external backend is required.
This static webapp replaces the earlier baked overview-only experience and makes it easy to explore each trip offline.
- Master trips hub (
data/trips/index.html):
Every timerun_all.pycompletes,build_trip_index.pyrefreshes a landing page that lists every trip underdata/trips. Each card shows stacked thumbnails (sourced from each trip's MemoGraph thumbnails), date ranges, photo/day counts, and top themes/species so you get a quick visual vibe before diving in. Cards link straight into<trip>/MemoGraph/webapp/index.html, effectively giving you a native-feeling photo library for multiple trips with a constant back button to return to the hub.
This ensures that every photo in a trip is visible somewhere (on the map if it has GPS, or in the sidebar if it does not), and that you can still explore large trips while processing is ongoing.
The current static viewer covers the basics (search, filters, lightbox, map, multi-trip hub), and the next batch of improvements we are tracking includes:
- Richer Material polish: chip ripples, card ripple effects, and smoother transitions when filters/map clusters update.
- Expanded EXIF/metadata: surface shutter/ISO/f-stop/device sensor info (requires parsing EXIF and extending
blog_context.json). - Share/export affordances: quick buttons to download filtered metadata CSVs, copy shareable file paths, or open the original folder.
- Smarter map clustering: switch from simple lat/lon rounding to a Leaflet clustering plugin and keep marker groups in sync with filter chips.
- Cross-trip search: load a compact manifest on the hub so you can search for themes/species/people (e.g., "Bulbul", "snow", "Mom") and jump directly into the relevant trip/photo.
- Semantic search ideas: optionally store CLIP embeddings to support fuzzy queries like "snowy yak on a mountain pass" without pre-defined tags.
- Quality-aware browsing: leverage the new per-image quality metrics (exposure/color/contrast/sharpness/noise) to highlight the most balanced photos in the UI.
MemoGraph now includes a unified GPU model manager that loads all AI models (CLIP, BLIP, LLaVA) once and processes images through all models in a single pass. This is significantly more efficient than the traditional approach of loading/unloading models for each step.
# Activate venv
source .venv/bin/activate
# Process all trips with all models
python -m scripts.batch_gpu_processor --all-trips
# Process a single trip
python -m scripts.batch_gpu_processor data/trips/my_awesome_trip
# Force reprocess all images (even if already processed)
python -m scripts.batch_gpu_processor --all-trips --force
# Use only specific models
python -m scripts.batch_gpu_processor --all-trips --models clip blipOr use the convenience script:
./scan_all_trips.sh # Process all trips
./scan_all_trips.sh --reset # Reset and reprocess all trips
./scan_all_trips.sh data/trips/MyTrip # Process single tripWith Batch Processing (New!)
The GPU model manager now supports batch processing for CLIP and BLIP, processing multiple images simultaneously for significantly higher throughput:
| Processing Mode | Throughput | Time/Image | GPU Memory | Speedup |
|---|---|---|---|---|
| Sequential | 4.89 img/s | 0.20s | 1907MB (16%) | 1.0x |
| Batch (n=4) | 11.39 img/s | 0.09s | 1971MB (16%) | 2.25x |
| Batch (n=8) | 21.74 img/s | 0.05s | 2093MB (17%) | 4.45x |
# Test batch processing performance
python -m scripts.gpu_model_manager --batch-size 8 --num-images 20
# Compare sequential vs batch
python -m scripts.gpu_model_manager --batch-size 8 --compareFull Pipeline (CLIP + BLIP + LLaVA)
| Metric | Value |
|---|---|
| GPU VRAM Used | ~4.4GB (36% of 12GB) |
| System RAM | ~7GB (45%) |
| CPU Usage | ~15% |
| Throughput | 0.29 images/second (~3.5s per image) |
| Trip | Images | Time | Speed |
|---|---|---|---|
| 2025_Annapurna_Nepal | 97 | 317s | 0.31 img/s |
| Home | 25 | 92s | 0.27 img/s |
| Vengurla | 16 | 52s | 0.31 img/s |
| Total | 138 | 8.0 min | 0.29 img/s |
| Model | VRAM | Purpose |
|---|---|---|
| CLIP ViT-B/32 | ~0.5GB | Object/scene detection |
| BLIP | ~0.5GB | Image captioning |
| LLaVA 0.5B | ~2GB | Detailed AI descriptions |
| OWLv2 | ~0.6GB | Species bounding box detection |
| BioCLIP 2 | ~1.8GB | Species classification (952K taxa) |
| Bird Classifier | ~0.1GB | Bird species refinement |
| Processing buffers | ~1.5GB | Tensor operations |
scripts/gpu_model_manager.py- Unified GPU model manager with resource monitoringscripts/batch_gpu_processor.py- Batch processor for all tripsscan_all_trips.sh- Convenience script with venv activation
You can use a small multimodal model (LLaVA OneVision 0.5B) to get rich, detailed descriptions for your photos.
To generate detailed vision_caption fields for all images in a trip and save them to labels.csv:
python -m scripts.batch_vision_llm data/trips/my_awesome_tripThis runs systematically over the trip, skipping images that already have a vision caption.
To experiment with custom prompts on a single image:
python -m scripts.vision_llm_demo data/trips/2025_Annapurna_Nepal `
--model-id models/llava_onevision_qwen2_0.5b `
--question "Describe this photo with any interesting objects or activities."- Activate the venv:
.venv\Scripts\Activate.ps1
- Log in to Hugging Face (one-time):
hf auth login --token <YOUR_HF_TOKEN>
- Download the model locally (downloads ≈1.6 GB but expands to ~12.8 GB on disk, ~8–9 minutes on our link):
python - <<'PY' from huggingface_hub import snapshot_download snapshot_download( 'llava-hf/llava-onevision-qwen2-0.5b-ov-hf', local_dir='models/llava_onevision_qwen2_0.5b' ) PY
Recommended starter model for GTX 1650-class GPUs: llava-hf/llava-onevision-qwen2-0.5b-ov-hf (~0.5B parameters, runs in ~3 GB VRAM). Heavier 7B+ vision models may exceed the 4 GB limit unless you use aggressive quantization or CPU inference.
- Inference on a GTX 1650 takes ~40 seconds per image (after the initial model load).
Sample output (Annapurna trip, first photo):
Model: models/llava_onevision_qwen2_0.5b
Image: data\trips\2025_Annapurna_Nepal\IMG20240816111741.jpg
Question: Describe this photo in detail. Mention setting, subjects, lighting, and any interesting objects.
Response:
…The image captures a serene outdoor scene, dominated by a lush green tree with vibrant orange fruits…white wall and green roof…overall scene exudes a sense of tranquility and natural beauty.
Another example (Home trip, IMG20251019232730):
…a white and black telescope mounted on a tripod… positioned in front of a window with a black-and-white checkered curtain… a wooden cabinet on the right with a pink blanket and striped pillow, making the room feel cozy, while the telescope adds intrigue.
You can change prompts or max tokens (--max-new-tokens) to explore different descriptions. Later we can wire this into the web app or blog generation flow if desired.
- A 10-prompt comparison run on
data/trips/2025_Annapurna_Nepal/IMG20240816111741.jpglives atdata/trips/2025_Annapurna_Nepal/MemoGraph/llm_vision_prompt_study.txt. Because the helper script keeps the model in memory, the entire sweep took ~616 s on CPU (~35–90 s per prompt) and highlighted where prompts shine (structured diary/narrator perspectives) versus fail (casual Instagram requests led to repeated hashtags). Reviewing that file helps us pick a house prompt before wiring the model into production. - Near-term roadmap (see
task.txt/working.txtfor details):- Wrap the ad-hoc prompt sweep into a tiny CLI that accepts a prompt list and emits JSON so we can compare outputs systematically.
- Teach
run_all.py(behind a flag like--vision-llmor via config) to load the model once, iterate over every photo in the trip, and stash the preferred VLM caption under newvision_caption_*columns plusblog_context.json. - Surface those richer captions in the static web app (toggle between BLIP + VLM text) and optionally in the Markdown blog.
- Until that wiring exists, you can reproduce the study by calling
scripts/vision_llm_demo.pyrepeatedly with different--questionprompts or by adapting the inline example used for the saved study.
MemoGraph uses a two-stage pipeline for accurate species detection with bounding boxes:
Google's OWLv2 (google/owlv2-base-patch16-ensemble) provides zero-shot object detection, identifying and localizing birds, butterflies, insects, and other wildlife in photos with bounding boxes. 155M parameters, ~591MB VRAM.
BioCLIP 2 (imageomics/bioclip-2) classifies cropped detections to species level using a biology-focused CLIP model trained on 952K+ taxa. Each detected region is cropped and classified against category-specific species lists (85+ Indian birds, 55+ butterflies, 45+ insects, etc.).
# Activate venv
source .venv/bin/activate # Linux/macOS
# OWLv2 auto-downloads from HF on first use, or save locally:
python -c "
from transformers import Owlv2Processor, Owlv2ForObjectDetection
proc = Owlv2Processor.from_pretrained('google/owlv2-base-patch16-ensemble')
model = Owlv2ForObjectDetection.from_pretrained('google/owlv2-base-patch16-ensemble')
proc.save_pretrained('models/owlv2')
model.save_pretrained('models/owlv2')
"
# BioCLIP 2 downloads via open_clip on first use
# Bird classifier (optional, for additional refinement):
python -m scripts.download_bird_modelIn memograph_config.py:
ENABLE_SPECIES_DETECTION = True # Enable OWLv2 detection
ENABLE_BIOCLIP = True # Enable BioCLIP 2 classification
ENABLE_BIRD_MODEL = True # Enable bird classifier refinement
SPECIES_DETECTION_THRESHOLD = 0.15 # OWLv2 confidence thresholdspecies_detector.pychecks for biological hints (bird/animal/plant/insect keywords in CLIP tags and captions)- When advanced models are available: OWLv2 detects wildlife → BioCLIP 2 classifies each crop → results stored in
species_tags+species_boxes - When advanced models unavailable: falls back to CLIP prompts + bird classifier
- Bounding boxes stored as normalized percentages in
species_boxescolumn:"label:species@left%,top%,right%,bottom%" - Webapp renders species boxes as green overlays with species name labels
The specialist bird classifier (dennisjooo/Birds-Classifier-EfficientNetB2) is still available as a fallback and for additional refinement via the bird species refiner step.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.