Newest 'computer-vision' Questions

Advice

0 votes

0 replies

31 views

Uncalibrated Camera Setup - Projection from 3dmm found in one camera to second camera with false intrinsics and 3D coordinates

I tracked a 3DMM object in a video stream. The tracking model performs bundle adjustment and estimates camera intrinsics and object scale. However, neither the intrinsics nor the object size are ...

sophie

88

asked Apr 9 at 6:31

-10 votes

0 answers

105 views

Using LLM to correct misaligned PDF field coordinates — is this approach reliable? [closed]

I am working on a system that renders input fields on top of PDF forms using coordinates extracted from a document analysis tool. Current Setup For each form, we receive coordinates (X, Y, width, ...

Ranit Mondal

1

asked Apr 8 at 21:03

Advice

0 votes

0 replies

21 views

Implementing a KIE pipeline for hybrid document types (1 dynamic schema, 4 static templates)

I’m architecting a document processing pipeline for a real-time workflow. I have 5 document types, but they require two completely different extraction strategies. The Document for Dynamic Form: This ...

JS3

1,951

asked Apr 7 at 13:16

1 vote

0 answers

111 views

Corner Detection in Rotated Squares Using 3×3 Convolution Filters

I am working on a task where I need to detect the four corners of each square in an image and highlight each corner as a single pixel. Each filter is intended to respond to a specific corner ...

Marcel Majhenic

65

asked Apr 6 at 16:17

Advice

0 votes

1 replies

73 views

How to extract rooms and dimensions from MEP/floor plan drawings using AI or computer vision?

I’m working on a project where I want to use AI / computer vision to read MEP (Mechanical, Electrical, Plumbing) drawings or floor plans. My goal is to: Detect rooms and extract their labels (e.g., “...

Narmeen Zafar

1

asked Apr 2 at 11:16

Tooling

0 votes

0 replies

37 views

Fast keypoint annotation tool

I’m currently working on annotating a human pose dataset (specifically of people swimming) and I’m struggling to find a tool that fits my workflow. I’m looking for a click‑based labeling workflow, ...

Jan Lattenkamp

1

asked Mar 26 at 17:38

Advice

0 votes

1 replies

74 views

Tracking small targets on thermal video

I'm tracking a target from an UAV using a thermal camera. Detection is YOLOv8n running every N frames on an NPU, and I need something to hold the track between detections. What I've tried: Template ...

Klim

1

asked Mar 24 at 7:48

Advice

0 votes

1 replies

55 views

SLEAP model training question and communicate

I am currently working on a project focused on automated computer vision-based behavior recognition for captive dolphins. I would like to ask about your experience using SLEAP for model training—...

蔡秀蘭

1

asked Mar 23 at 16:53

Advice

2 votes

3 replies

118 views

How to analyze classroom behavior using computer vision and pose estimation?

I am trying to build a computer vision system to analyze classroom behavior from surveillance cameras. The goal is to automatically detect several behavioral indicators such as: - student attention - ...

AI助教_曾子昕

1

asked Mar 16 at 7:19

1 vote

1 answer

155 views

ValueError: shapes mismatch when combining Re-ID Cosine distance and IoU matrices for custom MOT tracking

I am building a custom Multi-Object Tracking (MOT) system using Python, OpenCV, and TensorFlow. My goal is to track people and perform real-time clothing recognition. To prevent ID switches when a ...

BestlabChill

11

asked Mar 15 at 21:39

Advice

0 votes

1 replies

143 views

How to improve FPS when using MediaPipe hand tracking with OpenCV in Python?

I am building a simple AI hand tracking application using MediaPipe and OpenCV in Python. The program reads frames from a webcam, processes them with MediaPipe Hands, and draws the hand landmarks on ...

WoW Sky

1

asked Mar 11 at 14:57

Advice

0 votes

1 replies

58 views

can V-JEPA be used to detect audience engagement during a seminar from live video

I am experimenting with the V-JEPA model developed by Meta for video understanding. My goal is to analyze a live video stream of people attending a seminar and determine their engagement level (for ...

Harshitha Gangu

1

asked Mar 6 at 7:28

Advice

0 votes

3 replies

118 views

How to improve the text retrieval accuracy from the image

import cv2 import pytesseract import numpy as np image_path = "elecBill.jpg" img = cv2.imread(image_path) # Resize (VERY IMPORTANT) img = cv2.resize(img, None, fx=2, fy=2, interpolation=...

Sidharth Kumar

1

asked Mar 4 at 11:42

Tooling

0 votes

0 replies

61 views

Search & Retrieval Robot Object Detection

i am currently developing a hobbyist search and retrieval robot im running a astra pro RGB-D camera on a jetson nano, running ROS melodic and ubuntu 18.04. My task is to create a robot that can detect ...

Jyhrie

1

asked Mar 1 at 16:50

Advice

2 votes

0 replies

141 views

Is clothing-invariant person recognition possible using still images only?

I am working on a person recognition system for learning purposes. My goal is: Maintain a small gallery of known people (multiple images per person) Given a new query image, return the most similar ...

Shanthini M

340

asked Feb 26 at 11:10

Tooling

1 vote

2 replies

85 views

Segmentation of Connected Component Based on Known Primitive Template

(This is my first time posting, so feedback is welcome!) I am working on a depth-based Vision system where I need to detect packages of a single type in such a way that I retreive: Their center in ...

Vincenzo

1

asked Feb 19 at 16:15

Best practices

0 votes

0 replies

99 views

ODIR-5K: Should I train a Dual-Input CNN (Left/Right) or split images for Patient-Level Multi-Label Classification?

I am working with the ODIR-5K (Ocular Disease Intelligent Recognition) dataset. The goal is multi-label classification of 8 ocular diseases (Normal, Diabetes, Glaucoma, Cataract, etc.). The Data ...

Nitish Kumar

1

asked Feb 9 at 10:28

Advice

0 votes

3 replies

75 views

Stereo Calibration: disparity for far points is bigger than for close points

I am trying to use a custom stereo camera setup for depth estimation. The first step is to perform stereo calibration, for which I use a Charuco board and standard OpenCV functions (calibrateCamera, ...

Andrew

525

asked Feb 5 at 15:31

Tooling

1 vote

2 replies

31 views

Self-Contained Alt Text Generation For Use In Other Software

I am currently working on a tool that takes in every webpage of a site, adds alt text to images missing it, and outputs the webpage as a PDF for archival purposes. For the generation of alt text, is ...

Joel Singh

1

asked Jan 21 at 2:25

Best practices

2 votes

2 replies

47 views

How to map people detected in a fixed camera view to a 2D seat layout (seat occupancy, not person ID)?

I am working on a classroom attendance / seat occupancy visualization system and I am struggling with the system design rather than the detection model itself. Scenario A fixed-position surveillance ...

猪猪猪猪

1

asked Jan 12 at 1:42

1 vote

2 answers

124 views

How to get coordinate of group of feature matching points

I use cv2.FlannBasedMatcher to detect some objects. I got good accuracy and would like to get (x, y) of group of points. What I have: What i'd like to get There is my function: def detect(self): ...

Ennjin

75

asked Jan 9 at 20:29

Advice

0 votes

4 replies

73 views

Detecting sink mark defects on chair seats using computer vision

I have images of a chair seat with and without a surface defect known as sink marks (see example image below). My dataset is very small: 15 images of good chairs and only 3 images with sink mark ...

Optical_flow_lover

149

asked Jan 9 at 12:13

0 votes

1 answer

60 views

ChArUco markers color inversion

Can we detect the ChArUco markers after inverting the color of ChArUco markers? My client wants the color inverted ChArUco board. import os import numpy as np import cv2 # ----------------------------...

Rishmika Wijewardhana

9

asked Jan 8 at 5:19

Advice

0 votes

1 replies

48 views

Step wise pattern prediction of prices

I have sample dataset where prices are stable few days/week/months and then falls/raise and stays in that price again for few days/weeks/months. Basically it looks like step down and up when you ...

Shridhar Kulkarni

21

asked Jan 6 at 12:17

Advice

3 votes

0 replies

189 views

What is the most reliable face liveness detection model and dataset for real-world mobile apps?

I’m working on face liveness (anti-spoofing) detection intended for real-world mobile apps (Flutter), and I’m struggling to achieve reliable performance outside controlled datasets. What I’m trying to ...

Mr x

11

asked Jan 6 at 7:18

Advice

0 votes

1 replies

75 views

Is this data suitable for ML model if so which one would be best for this type of data?

I have this data Name,X1,Y1,X2,Y2,X3,Y3,X4,Y4,,centroid_x,centroid_y,,area R1-A,79,55,70,87,154,78,159,48,,115.5,67,,2486 R1-B,1108,23,1126,51,1197,44,1174,14,,1151.25,33,,2150.5 R1-C,2134,53,2183,...

Munsif Ali

8,244

asked Dec 29, 2025 at 10:12

Tooling

0 votes

4 replies

164 views

I am confused, when to use tensorflow and pytorch

I am always confused between tensorflow and pytorch when to use which, because both are used for same task but when to use tensorflow and pytorch, like situation oriented. Some people's said suggested ...

Shrinivas Nadager

1

asked Dec 26, 2025 at 16:52

1 vote

0 answers

65 views

instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background)

I am running a training of instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background). What is the problem with following setup and why RPN head not ...

SavEng

11

asked Dec 8, 2025 at 7:30

Advice

0 votes

2 replies

64 views

Matching card tracking with card detection/classification

I'm writing a program to track my card games. It uses a birds eye view camera to record the playing surface, and a YOLO model to classify the cards. I'm running into an issue figuring out where the ...

terrenana

33

asked Nov 26, 2025 at 19:00

Best practices

0 votes

2 replies

154 views

How can I correctly extract table structure (rows, columns, merged cells) from a complex scanned image using OpenCV?

I’m trying to extract tabular data from a scanned engineering document. The table contains: merged header cells irregular row heights irregular column widths faint and broken borders text inside ...

pragyan lamba

1

asked Nov 17, 2025 at 8:23

0 votes

2 answers

188 views

How to close a gap in the contour in OpenCV

I have a problem that I struggle with and cant seem to find a solution; I want to get the area of the contours I see in this image/video frame: The problem is the contour is cut off at the right as ...

Jakob Leboerg

9

asked Nov 13, 2025 at 12:38

1 vote

1 answer

121 views

Watershed fails to properly segmented objects

Currently I'm working on object detection for counting how many object presented on the frame. I already successfully separate some of them. There's still some object which is very close together ...

Exto Logia

23

asked Nov 12, 2025 at 6:42

Tooling

0 votes

2 replies

87 views

Free Software for 3D Reconstruction

I am trying to create a 3D model from overlapping aerial images and am looking for a free software to use. My dataset includes 1,500 RGB images, a ground truth segmentation mask for each image, the ...

Fred1313

31

asked Nov 4, 2025 at 11:01

0 votes

1 answer

126 views

How to close a round object opening border to be filled later?

I wanted to close the border of object. Some of the object could be rectangles too. I already tried using dilation and closing with 2 iteration but it seems the border isn't completely close. Here is ...

Exto Logia

23

asked Oct 28, 2025 at 2:08

1 vote

0 answers

141 views

How to fix this python code to count duplicate sample in the images?

I want to count samples in the image and measure the length of each sample as I show below. But I am facing a big problem that when sample is overlapping it cannot make an accurate count, for example ...

user31726648

1

asked Oct 21, 2025 at 8:30

1 vote

2 answers

126 views

How to detect a B/W icon inside a colored dashboard photo when scale/rotation/color differ (OpenCV, Python)

Problem: I need to check whether a small black-and-white icon (template) appears inside a large, colored dashboard photo. The icon in the photo may differ from the template in color, scale, small ...

Dũng Hoàng

11

asked Oct 16, 2025 at 15:02

0 votes

1 answer

141 views

Preventing GPU memory leak due to a custom neural network layer

I am using the MixStyle methodology for domain adaptation, and it involves using a custom layer that is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...

Vedant Dalimkar

3

asked Sep 28, 2025 at 15:00

0 votes

0 answers

110 views

Batch processing with Ultralytics YOLO does not seem to work for coreml, but is working fine for .pt

I am trying to do batch inference with YOLO11. I am working with MacBook and I am running into this issue from ultralytics import YOLO import numpy as np # Load YOLO model model = YOLO("yolo11s....

Ananda

3,310

asked Sep 24, 2025 at 9:56

0 votes

2 answers

253 views

How can I automatically crop and rotate a large number of images to be upright?

I have a large number of scanned discs like this: Actual image is 600 DPI, 7400x7400, 48 bit TIFF. I want to convert them to a JPEG like this: Same DPI, but cropped, and rotated so that it is ...

shogged

270

asked Sep 23, 2025 at 14:47

1 vote

1 answer

133 views

How can I find the contour of a box with a diagonal inside using OpenCV

I have to find the contours of boxes. Some boxes have diagonal inside of them. I try to remove diagonal but I think it isn't answer. Here are the images those I preprocessing and contour result. Only ...

Lee Minhyeung

13

asked Sep 23, 2025 at 5:03

0 votes

0 answers

280 views

How to correctly generate Nerfstudio transforms.json from drone GPS + yaw/pitch/roll so the point cloud is in geographic space?

I am training a NeRF with Nerfstudio using drone imagery from a MicaSense Rededge-P camera. For each capture I have metadata: lat, lon, alt (WGS84 position) yaw, pitch, roll (from MicaSense DLS, ...

Evan Hammam

13

asked Sep 6, 2025 at 16:42

2 votes

1 answer

242 views

Tesseract OCR cannot read dotted LED digits on MAUI/Xamarin

I am trying to extract numbers from dotted LED-style digits (0–9) using Tesseract OCR in a MAUI/Xamarin app on Android and iOS, fully offline. My boss wants a local solution that works on mobile ...

boss

1,648

asked Aug 25, 2025 at 11:57

3 votes

2 answers

200 views

How does Local Binary Pattern return an image?

I'm trying to understand how scikit-image's local_binary_pattern() function works. Let's take the simplest setup: input is a grayscale image, radius = 1, n_points = 4, method = "uniform". ...

J.D.

309

asked Aug 19, 2025 at 9:16

1 vote

1 answer

64 views

Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]

I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run bash run-custom.sh I get the ...

aniaf

19

asked Aug 15, 2025 at 13:09

0 votes

1 answer

174 views

RuntimeError in torch.cat during VACE-Wan2.1 inference: mask and video tensor shape mismatch

I'm using the Wan2.1-VACE video generation model, and during inference I encountered a RuntimeError related to mismatched tensor shapes in a torch.cat operation inside the vace_latent() function. From ...

范姜伯軒

59

asked Aug 4, 2025 at 14:36

3 votes

1 answer

123 views

Segmentation Error when Creating Charuco Board with Custom Ids

My intention is to create a charuco board object, which supports custom ids. Here is the code snippet being used. def __init__(self, squaresX=11, squaresY=8, squareLength=0.015, markerLength=0.011, ...

Tommy Llewellyn

35

asked Aug 4, 2025 at 8:45

0 votes

1 answer

105 views

How to obtain the cutting point using OpenCV methods?

Background: Currently, I need to crop the head or tail of the grayscale image of the steel, but this image has problems such as uneven grayscale distribution. How can I find this cutting point? My ...

Mumu

49

asked Jul 17, 2025 at 7:31

-1 votes

1 answer

147 views

How to identify contours using OpenCV or traditional methods？ [closed]

Background: I currently have many grayscale images of steel. Some of them have high brightness, while others have uneven brightness. As shown in the figure below, how can I better extract their ...

Mumu

49

asked Jul 16, 2025 at 9:15

2 votes

1 answer

96 views

findChessboardCorners fails for thermal image

I am trying to get OpenCV-python to recognize a checkerboard pattern from my thermal camera. I couldn't get that working. This is the thermal image and I realize the image is low resolution, but I can'...

준서이

21

asked Jul 14, 2025 at 1:30

-1 votes

1 answer

79 views

The function cv2.solvePnP throws Assertion failed in solvepnp.cpp:824 [closed]

I have a function which is supposed to return the rotation vector and translation vector (rvec and tvec) given some 3d points, some 2d points, and an intrinsics matrix def solvePnP(points_3d: list[...

Tommy Llewellyn

35

asked Jul 11, 2025 at 15:54

Collectives™ on Stack Overflow