15,859 questions
Advice
0
votes
0
replies
31
views
Uncalibrated Camera Setup - Projection from 3dmm found in one camera to second camera with false intrinsics and 3D coordinates
I tracked a 3DMM object in a video stream. The tracking model performs bundle adjustment and estimates camera intrinsics and object scale. However, neither the intrinsics nor the object size are ...
-10
votes
0
answers
105
views
Using LLM to correct misaligned PDF field coordinates — is this approach reliable? [closed]
I am working on a system that renders input fields on top of PDF forms using coordinates extracted from a document analysis tool.
Current Setup
For each form, we receive coordinates (X, Y, width, ...
Advice
0
votes
0
replies
21
views
Implementing a KIE pipeline for hybrid document types (1 dynamic schema, 4 static templates)
I’m architecting a document processing pipeline for a real-time workflow. I have 5 document types, but they require two completely different extraction strategies.
The Document for Dynamic Form: This ...
1
vote
0
answers
111
views
Corner Detection in Rotated Squares Using 3×3 Convolution Filters
I am working on a task where I need to detect the four corners of each square in an image and highlight each corner as a single pixel.
Each filter is intended to respond to a specific corner ...
Advice
0
votes
1
replies
73
views
How to extract rooms and dimensions from MEP/floor plan drawings using AI or computer vision?
I’m working on a project where I want to use AI / computer vision to read MEP (Mechanical, Electrical, Plumbing) drawings or floor plans.
My goal is to:
Detect rooms and extract their labels (e.g., “...
Tooling
0
votes
0
replies
37
views
Fast keypoint annotation tool
I’m currently working on annotating a human pose dataset (specifically of people swimming) and I’m struggling to find a tool that fits my workflow.
I’m looking for a click‑based labeling workflow, ...
Advice
0
votes
1
replies
74
views
Tracking small targets on thermal video
I'm tracking a target from an UAV using a thermal camera. Detection is YOLOv8n running every N frames on an NPU, and I need something to hold the track between detections.
What I've tried:
Template ...
Advice
0
votes
1
replies
55
views
SLEAP model training question and communicate
I am currently working on a project focused on automated computer vision-based behavior recognition for captive dolphins. I would like to ask about your experience using SLEAP for model training—...
Advice
2
votes
3
replies
118
views
How to analyze classroom behavior using computer vision and pose estimation?
I am trying to build a computer vision system to analyze classroom behavior from surveillance cameras.
The goal is to automatically detect several behavioral indicators such as:
- student attention
- ...
1
vote
1
answer
155
views
ValueError: shapes mismatch when combining Re-ID Cosine distance and IoU matrices for custom MOT tracking
I am building a custom Multi-Object Tracking (MOT) system using Python, OpenCV, and TensorFlow. My goal is to track people and perform real-time clothing recognition. To prevent ID switches when a ...
Advice
0
votes
1
replies
143
views
How to improve FPS when using MediaPipe hand tracking with OpenCV in Python?
I am building a simple AI hand tracking application using MediaPipe and OpenCV in Python.
The program reads frames from a webcam, processes them with MediaPipe Hands, and draws the hand landmarks on ...
Advice
0
votes
1
replies
58
views
can V-JEPA be used to detect audience engagement during a seminar from live video
I am experimenting with the V-JEPA model developed by Meta for video understanding.
My goal is to analyze a live video stream of people attending a seminar and determine their engagement level (for ...
Advice
0
votes
3
replies
118
views
How to improve the text retrieval accuracy from the image
import cv2
import pytesseract
import numpy as np
image_path = "elecBill.jpg"
img = cv2.imread(image_path)
# Resize (VERY IMPORTANT)
img = cv2.resize(img, None, fx=2, fy=2, interpolation=...
Tooling
0
votes
0
replies
61
views
Search & Retrieval Robot Object Detection
i am currently developing a hobbyist search and retrieval robot
im running a astra pro RGB-D camera on a jetson nano, running ROS melodic and ubuntu 18.04.
My task is to create a robot that can detect ...
Advice
2
votes
0
replies
141
views
Is clothing-invariant person recognition possible using still images only?
I am working on a person recognition system for learning purposes.
My goal is:
Maintain a small gallery of known people (multiple images per person)
Given a new query image, return the most similar ...
Tooling
1
vote
2
replies
85
views
Segmentation of Connected Component Based on Known Primitive Template
(This is my first time posting, so feedback is welcome!)
I am working on a depth-based Vision system where I need to detect packages of a single type in such a way that I retreive:
Their center in ...
Best practices
0
votes
0
replies
99
views
ODIR-5K: Should I train a Dual-Input CNN (Left/Right) or split images for Patient-Level Multi-Label Classification?
I am working with the ODIR-5K (Ocular Disease Intelligent Recognition) dataset. The goal is multi-label classification of 8 ocular diseases (Normal, Diabetes, Glaucoma, Cataract, etc.).
The Data ...
Advice
0
votes
3
replies
75
views
Stereo Calibration: disparity for far points is bigger than for close points
I am trying to use a custom stereo camera setup for depth estimation. The first step is to perform stereo calibration, for which I use a Charuco board and standard OpenCV functions (calibrateCamera, ...
Tooling
1
vote
2
replies
31
views
Self-Contained Alt Text Generation For Use In Other Software
I am currently working on a tool that takes in every webpage of a site, adds alt text to images missing it, and outputs the webpage as a PDF for archival purposes.
For the generation of alt text, is ...
Best practices
2
votes
2
replies
47
views
How to map people detected in a fixed camera view to a 2D seat layout (seat occupancy, not person ID)?
I am working on a classroom attendance / seat occupancy visualization system and I am struggling with the system design rather than the detection model itself.
Scenario
A fixed-position surveillance ...
1
vote
2
answers
124
views
How to get coordinate of group of feature matching points
I use cv2.FlannBasedMatcher to detect some objects. I got good accuracy and would like to get (x, y) of group of points.
What I have:
What i'd like to get
There is my function:
def detect(self):
...
Advice
0
votes
4
replies
73
views
Detecting sink mark defects on chair seats using computer vision
I have images of a chair seat with and without a surface defect known as sink marks (see example image below).
My dataset is very small: 15 images of good chairs and only 3 images with sink mark ...
0
votes
1
answer
60
views
ChArUco markers color inversion
Can we detect the ChArUco markers after inverting the color of ChArUco markers? My client wants the color inverted ChArUco board.
import os
import numpy as np
import cv2
# ----------------------------...
Advice
0
votes
1
replies
48
views
Step wise pattern prediction of prices
I have sample dataset where prices are stable few days/week/months and then falls/raise and stays in that price again for few days/weeks/months.
Basically it looks like step down and up when you ...
Advice
3
votes
0
replies
189
views
What is the most reliable face liveness detection model and dataset for real-world mobile apps?
I’m working on face liveness (anti-spoofing) detection intended for real-world mobile apps (Flutter), and I’m struggling to achieve reliable performance outside controlled datasets.
What I’m trying to ...
Advice
0
votes
1
replies
75
views
Is this data suitable for ML model if so which one would be best for this type of data?
I have this data
Name,X1,Y1,X2,Y2,X3,Y3,X4,Y4,,centroid_x,centroid_y,,area
R1-A,79,55,70,87,154,78,159,48,,115.5,67,,2486
R1-B,1108,23,1126,51,1197,44,1174,14,,1151.25,33,,2150.5
R1-C,2134,53,2183,...
Tooling
0
votes
4
replies
164
views
I am confused, when to use tensorflow and pytorch
I am always confused between tensorflow and pytorch when to use which, because both are used for same task but when to use tensorflow and pytorch, like situation oriented. Some people's said suggested ...
1
vote
0
answers
65
views
instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background)
I am running a training of instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background).
What is the problem with following setup and why RPN head not ...
Advice
0
votes
2
replies
64
views
Matching card tracking with card detection/classification
I'm writing a program to track my card games. It uses a birds eye view camera to record the playing surface, and a YOLO model to classify the cards. I'm running into an issue figuring out where the ...
Best practices
0
votes
2
replies
154
views
How can I correctly extract table structure (rows, columns, merged cells) from a complex scanned image using OpenCV?
I’m trying to extract tabular data from a scanned engineering document.
The table contains:
merged header cells
irregular row heights
irregular column widths
faint and broken borders
text inside ...
0
votes
2
answers
188
views
How to close a gap in the contour in OpenCV
I have a problem that I struggle with and cant seem to find a solution;
I want to get the area of the contours I see in this image/video frame:
The problem is the contour is cut off at the right as ...
1
vote
1
answer
121
views
Watershed fails to properly segmented objects
Currently I'm working on object detection for counting how many object presented on the frame. I already successfully separate some of them. There's still some object which is very close together ...
Tooling
0
votes
2
replies
87
views
Free Software for 3D Reconstruction
I am trying to create a 3D model from overlapping aerial images and am looking for a free software to use. My dataset includes 1,500 RGB images, a ground truth segmentation mask for each image, the ...
0
votes
1
answer
126
views
How to close a round object opening border to be filled later?
I wanted to close the border of object. Some of the object could be rectangles too. I already tried using dilation and closing with 2 iteration but it seems the border isn't completely close. Here is ...
1
vote
0
answers
141
views
How to fix this python code to count duplicate sample in the images?
I want to count samples in the image and measure the length of each sample as I show below. But I am facing a big problem that when sample is overlapping it cannot make an accurate count, for example ...
1
vote
2
answers
126
views
How to detect a B/W icon inside a colored dashboard photo when scale/rotation/color differ (OpenCV, Python)
Problem:
I need to check whether a small black-and-white icon (template) appears inside a large, colored dashboard photo.
The icon in the photo may differ from the template in color, scale, small ...
0
votes
1
answer
141
views
Preventing GPU memory leak due to a custom neural network layer
I am using the MixStyle methodology for domain adaptation, and it involves using a custom layer that is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...
0
votes
0
answers
110
views
Batch processing with Ultralytics YOLO does not seem to work for coreml, but is working fine for .pt
I am trying to do batch inference with YOLO11. I am working with MacBook and I am running into this issue
from ultralytics import YOLO
import numpy as np
# Load YOLO model
model = YOLO("yolo11s....
0
votes
2
answers
253
views
How can I automatically crop and rotate a large number of images to be upright?
I have a large number of scanned discs like this:
Actual image is 600 DPI, 7400x7400, 48 bit TIFF.
I want to convert them to a JPEG like this:
Same DPI, but cropped, and rotated so that it is ...
1
vote
1
answer
133
views
How can I find the contour of a box with a diagonal inside using OpenCV
I have to find the contours of boxes.
Some boxes have diagonal inside of them. I try to remove diagonal but I think it isn't answer.
Here are the images those I preprocessing and contour result. Only ...
0
votes
0
answers
280
views
How to correctly generate Nerfstudio transforms.json from drone GPS + yaw/pitch/roll so the point cloud is in geographic space?
I am training a NeRF with Nerfstudio using drone imagery from a MicaSense Rededge-P camera.
For each capture I have metadata:
lat, lon, alt (WGS84 position)
yaw, pitch, roll (from MicaSense DLS, ...
2
votes
1
answer
242
views
Tesseract OCR cannot read dotted LED digits on MAUI/Xamarin
I am trying to extract numbers from dotted LED-style digits (0–9) using Tesseract OCR in a MAUI/Xamarin app on Android and iOS, fully offline. My boss wants a local solution that works on mobile ...
3
votes
2
answers
200
views
How does Local Binary Pattern return an image?
I'm trying to understand how scikit-image's local_binary_pattern() function works. Let's take the simplest setup: input is a grayscale image, radius = 1, n_points = 4, method = "uniform". ...
1
vote
1
answer
64
views
Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]
I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run
bash run-custom.sh
I get the ...
0
votes
1
answer
174
views
RuntimeError in torch.cat during VACE-Wan2.1 inference: mask and video tensor shape mismatch
I'm using the Wan2.1-VACE video generation model, and during inference I encountered a RuntimeError related to mismatched tensor shapes in a torch.cat operation inside the vace_latent() function.
From ...
3
votes
1
answer
123
views
Segmentation Error when Creating Charuco Board with Custom Ids
My intention is to create a charuco board object, which supports custom ids. Here is the code snippet being used.
def __init__(self, squaresX=11, squaresY=8, squareLength=0.015, markerLength=0.011,
...
0
votes
1
answer
105
views
How to obtain the cutting point using OpenCV methods?
Background: Currently, I need to crop the head or tail of the grayscale image of the steel, but this image has problems such as uneven grayscale distribution. How can I find this cutting point?
My ...
-1
votes
1
answer
147
views
How to identify contours using OpenCV or traditional methods? [closed]
Background: I currently have many grayscale images of steel. Some of them have high brightness, while others have uneven brightness. As shown in the figure below, how can I better extract their ...
2
votes
1
answer
96
views
findChessboardCorners fails for thermal image
I am trying to get OpenCV-python to recognize a checkerboard pattern from my thermal camera. I couldn't get that working. This is the thermal image and I realize the image is low resolution, but I can'...
-1
votes
1
answer
79
views
The function cv2.solvePnP throws Assertion failed in solvepnp.cpp:824 [closed]
I have a function which is supposed to return the rotation vector and translation vector (rvec and tvec) given some 3d points, some 2d points, and an intrinsics matrix
def solvePnP(points_3d: list[...