Video Action Localization App is a Streamlit-powered web interface that processes video input to detect and localize actions across frames using spatio-temporal deep learning models. It leverages MMAction2βs SlowFast-ACRN model for action recognition and Faster R-CNN for spatial localization.
A complete, Colab-ready solution for spatio-temporal action detection in videos, featuring:
β SlowFast-ACRN for action recognition β Faster R-CNN for human/object detection β Streamlit interface for interactivity β Google Colab backend for GPU-powered inference
β Prerequisites
- A Google Colab account
- GPU runtime enabled in Colab (Runtime > Change runtime type > GPU)
- A free Ngrok account
- Your personal Ngrok Auth Token
- π€ Upload .mp4, .avi, or .mov video files
- π§ Perform frame-wise action localization with SlowFast-ACRN
- π₯ Download annotated output video with bounding boxes and action labels
Component Description Backbone SlowFast Network with ACRN Head Detector Faster R-CNN (ResNet-50-FPN) Trained On Kinetics-400 Dataset
This app is designed to run entirely in Google Colab, using GPU resources for efficient inference, while Streamlit provides the user interface.
- Run the notebook sequentially from top to bottom
- You may be prompted to restart runtime after dependency installations
- Once Ngrok tunnel is active, you'll get a public URL to access your Streamlit app
βββ Action_Localization.ipynb # π Main Colab notebook for setup & execution βββ app.py # π» Streamlit app logic βββ checkpoints/ # π§ Model weight files β βββ slowfast-acrn.pth # - Action recognition weights β βββ faster_rcnn.pth # - Object detection weights βββ mmaction2/ # π¦ MMAction2 framework (cloned repo) β βββ configs/ # - Model configuration files β βββ demo/ # - Sample media for testing βββ requirements.txt # π Python dependencies
π€ Author Ibrahim Mustapha Mechatronics Engineering Student | Computer Vision & AI Enthusiast π GitHub π linkedin π§ Email