Newest 'python-tesseract' Questions

3 votes

2 answers

151 views

How do I Download Poppler and Tesseract Programmatically with PowerShell

In Python, there are two libraries which are often used in tandem, Poppler and Tesseract. They both need external downloads to function: Poppler, Tesseract. The general recommendation for Windows is ...

user30589464

136

asked Nov 18 at 18:13

-4 votes

1 answer

158 views

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 270: invalid start byte - Why? [closed]

I'm doing an ultra-simple web page scraper using Python/Beautifulsoup. Facing a key information displayed as PNG image, I've had to reach for PIL/Pytesseract. Code being extremely simple, and working ...

tishma

1,873

asked Nov 13 at 13:07

0 votes

0 answers

42 views

Error: Deserialize header failed: 1.lstmf when training new data

I need to train the default eng data, so that it can also recognize seom new characters. I created box files and lstm files and when running cmd: lstmtraining \ --model_output output/eng_latin \ --...

coure2011

42.8k

asked Oct 29 at 5:21

0 votes

2 answers

76 views

Pytesseract cannot always understand very simple and clear text (font Consolas)

Pytesseract cannot understand very simple and clear text. I've tried nearest neighbor, bilinear, gaussian blur, and everything else and cannot get tesseract to read the text consistently, the best I ...

RvBVakama

117

asked Sep 24 at 7:37

1 vote

0 answers

197 views

How to set Tesseract PSM in Docling (Python)

I’m using Docling to OCR scanned PDFs. I want to control Tesseract’s page-segmentation mode (PSM), e.g. --psm 6. Docling exposes both TesseractOcrOptions and TesseractCliOcrOptions, but neither ...

Pamudu Ranasinghe

101

asked Aug 15 at 5:54

2 votes

1 answer

75 views

Tesseract unable to recognise the letter O in plain image

I'm attempting to perform OCR on a set of single letters inside an image using Python. I'm new to this so apologies if I get the terminology wrong, but I've filtered and have obtained (I think) quite ...

user201341

119

asked Jul 26 at 14:57

1 vote

2 answers

251 views

Why do I get nothing in output with pytesseract?

I have installed language support for chi_sim: ls /usr/share/tesseract-ocr/5/tessdata chi_sim.traineddata eng.traineddata pdf.ttf configs osd.traineddata tessconfigs You can try it by ...

showkey

381

asked Jul 24 at 13:33

1 vote

1 answer

88 views

When I Try To Train a Tesseract Model I get a Compute CTC targets failed error

I am currently using tesseract 5.0 and am training a model. I have generated the png, box and the ground truth files for a thousand images. However, when I run the command: make training MODEL_NAME=...

Akshay NN

21

asked Jun 17 at 9:54

0 votes

1 answer

172 views

How to get good OCR results using pytesseract

I'm trying to get the data out of this image: and no matter what I try I can't get a good result. I have tried ImageEnhance and cv2 I got the most promising result using cv2 and adaptive Treshold: ...

Cyclo

3

asked May 22 at 12:11

1 vote

1 answer

83 views

Tesseract doesn't find page numbers

I have a PDF document that I want to scan with pytesseract, but the page numbers are not recognized. The page number is not recognized on any of the pages. The PDF is written with Latex. I ried ...

mike3467

97

asked May 21 at 13:38

0 votes

1 answer

66 views

Prevent tesseract guessing characters based on surrounding context instead of just the character outline

I'm using pytesseract to read tabular data out of an image but I'm having trouble with the software making "educated guesses" about characters and word splitting based on context. I have a ...

SpliFF

39.1k

asked May 3 at 7:27

0 votes

0 answers

62 views

lstm-unicharset file is unable to be created during tesseract training

I am trying to fine-tune an Optical Character Recognition (OCR) model on Tesseract's provided tesstrain repository for Japanese . I tried encoding the bash commands into Python in VSCode as I wanted ...

Jiansen Chan

11

asked Apr 28 at 8:03

0 votes

0 answers

158 views

Tesseract OCR Command in ocrmypdf Fails with 'SubprocessOutputError' on Windows

ExitCodeException _common.py:271 Traceback (most recent call last): File "C:\<USER>\apps\python\...

Username

1

asked Mar 26 at 17:17

1 vote

1 answer

166 views

Tesseract Training: Error 'Integer (fast) model' When Using Apex.lstm

I’ve been following this tutorial from YouTube: Guide to Tesseract Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its corresponding GitHub repository: astutejoe/tesseract_tutorial. ...

Impetus

1

asked Mar 21 at 17:50

-1 votes

1 answer

68 views

I'm having trouble trying to convert image to text in python

I'm trying to convert the attached image using the pytesseract and opencv libraries in python, but the conversion is not satisfactory, since many characters are converted incorrectly. Does anyone have ...

Cristi Garcia

1

asked Mar 19 at 15:18

-1 votes

1 answer

61 views

Pytesseract not recognize text from image in Python

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing ...

Mohit Prajapat

68

asked Mar 13 at 6:05

2 votes

1 answer

533 views

Image Preprocessing to extract 2D number list

I've been tring to make a puzzle solving program. The game is 'fruit box' and you can play it through the link below. https://en.gamesaien.com/game/fruit_box/ To do that, I have to extract numbers ...

eunsang

23

asked Feb 12 at 5:48

3 votes

0 answers

110 views

Memory Usage Keeps Increasing in Python Script Using OpenCV, PyAutoGUI, and Tesseract OCR [closed]

I'm working on a Python script that continuously monitors a screen region, extracts text using Tesseract OCR, and sends serial commands to an Arduino based on the detected text. However, I notice that ...

André

31

asked Feb 6 at 20:23

0 votes

1 answer

41 views

Pytesseract numbers image to text

I am trying to use pytesseract to extract numbers from images. It works for some of them (1, 2, 3, 5, 6, 20...) but I would like to make it work for all of them. Here is a sample of the data that I'm ...

User_123917425

1

asked Dec 26, 2024 at 2:30

0 votes

0 answers

73 views

PyTesseract and 7 segment numbers, how to get confidence of recognition?

I need to recognize digits on 7 seg clocks(see picture below), so I use following python code: def detect_date(image: cv2.UMat, bbox:list) -> datetime: gry1 = cv2.cvtColor(image, ...

Sharov

460

asked Dec 22, 2024 at 0:20

0 votes

0 answers

59 views

Extracting data from a table with known labels with tesseract

I am trying to use Tesseract to create a small Windows application that allows the user to: Take a screenshot of the monitor and cut a smaller portion containing a table (the table always has the ...

Riccardo

11

asked Dec 21, 2024 at 10:17

0 votes

0 answers

40 views

How can extract the content from image using python with the pytesseract?

I tried to extract the content from an image with the Python py-tesseract OCR, but I was unable to obtain the numbers. I get the extracted_text empty value. Code: def ImageReader(image_path): ...

Yug

43

asked Nov 6, 2024 at 17:28

-1 votes

1 answer

127 views

Pytesseract wrong text recognition when word are close to each other

When I use PyTesseract to recognize the text in this image, it returns 'FORREST C. BLopGetTrT' instead of FORREST C. BLODGETT The result of code i get the image i use, which contains many name. I ...

Mengyang Cao

1

asked Nov 1, 2024 at 10:06

1 vote

1 answer

248 views

Pytesseract TesseractError: Unable to Load Language Files

I am trying to use pytesseract in my system. But I am getting the following error message pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /opt/homebrew/share/eng.traineddata ...

Sashaank

972

asked Oct 30, 2024 at 8:47

2 votes

0 answers

112 views

tesseract not able to find .lstm-unicharset file while performing model training

I am using tesseract to perfrom custom model training. I have created my own text dataset and saved in tesstrain->data->codec folder with images and corresponding .gt files. At the same level as ...

Prachi Kedar

21

asked Oct 14, 2024 at 10:04

2 votes

1 answer

198 views

OCR character recognition fails

I am experimenting with AI and specifically character recognition. I saw that one of the best algorithms is OCR and Google's implementation in Tesseract seems like the best open source solution right ...

Alejandro

31

asked Oct 8, 2024 at 12:53

0 votes

2 answers

183 views

How do i get around this permission error with tesseract-ocr

i am doing a python project, in which i use Tesseract-OCR. when i set it up from git, it gave me this error: C:\Users\jpmv1\AppData\Local\Programs\Python\Python312\python.exe C:\Users\jpmv1\Projects\...

Doutor JP

9

asked Oct 2, 2024 at 22:02

1 vote

1 answer

102 views

TesseractNot Found Error is displayed after deploying app on render even after trying several methods [closed]

I am trying to deploy app through render but after executing there is error as TesseratNotFound or Tesseract is not installed Even though I have added package.txt , requirements.txt as well as build....

Prarthana Kolhe

11

asked Sep 30, 2024 at 16:58

0 votes

0 answers

96 views

How to extract data from pdfs which are not in tables or containers into a column based table format in python?

I am trying to convert my pdf data into structured table format data. I have tried bunch of options but none of them have been able to separate fields into columns of table format. I am able to do ...

ViSa

2,377

asked Sep 19, 2024 at 12:22

0 votes

1 answer

138 views

Trying to convert a PDF to a JPEG but I keep facing an error

I'm trying to convert a PDF into a JPEG using python. I'm trying to perform OCR by converting the PDF's into JPEG but keep running into the error: cannot identify image file <_io.BytesIO object at ...

Alvin Joseph

1

asked Aug 26, 2024 at 6:53

2 votes

1 answer

314 views

what is the best way to recognize embossed text with Tesseract OCR?

I am trying to read the text from a U.S. penny to orient the coin. the original is from https://www.usmint.gov/wordpress/wp-content/uploads/2024/05/2024-lincoln-penny-uncirculated-obverse-philadelphia....

skeeter

39

asked Aug 19, 2024 at 11:51

-1 votes

1 answer

105 views

Use pytesseract OCR to read text from a captcha

I need to use Pytesseract to extract text from this picture: I'm using this code: import pytesseract import cv2 pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract....

Doğucan ÇALIŞKAN

1

asked Aug 18, 2024 at 6:51

0 votes

0 answers

121 views

Unable to Extract Text from Image Using Tesseract OCR - How to Preprocess Instagram Reels Frames

I am working on a project where I need to extract text from frames of an Instagram Reels video. I used the yt-dlp to download the video, extracted frames using ffmpeg, and attempted to read the text ...

Rasik

2,529

asked Aug 10, 2024 at 13:13

0 votes

1 answer

279 views

Problem to extract correct data from PDF with tesseract

I'm trying to extract specific data from multiple PDFs. I begin by isolating the example image (Picture 1) using horizontal and vertical lines to create cells. After creating the cells, I crop them ...

David in sweden

1

asked Aug 7, 2024 at 14:35

0 votes

0 answers

48 views

PyTesseract not extracting text?

Pytesseract does not extract the text from the image. The terminal stays black with a space as if it was actually trying to extract the text. Here is my code and the image: from PIL import Image ...

Spidercoder

11

asked Jul 26, 2024 at 1:39

0 votes

0 answers

265 views

How can I extract tables from an image into excel using optical character recognition?

As an example, I have this image and will like to convert this to an modifiable excel table. In have tried using the 'pytesseract' library, but it doesn't accurately extract the text from the image ...

UsangR01

1

asked Jul 20, 2024 at 14:24

0 votes

1 answer

161 views

How to recognize single characters from an image using Tesseract?

This is the original image: This is the processed image: I'm trying to automate a mini-game, in which characters appear on the screen. I did some light reaserch and managed to process the image to ...

Flako

1

asked Jul 18, 2024 at 20:51

0 votes

0 answers

63 views

OpenCV contours sorting x-axis and y-axis

I am working on a python program to solve a wordsearch. I am using pytesseract and opencv to process an image of the wordsearch and the solution will be displayed as a text. The script processes the ...

HND

1

asked Jul 17, 2024 at 5:04

0 votes

1 answer

97 views

Getting numbers from matrix image using pytesseract

I am trying to retrieve the text from an image that is a matrix 4x4. The text are numbers. Although I was expecting the numbers all I got was: BE, 8, EEE, BE. The image is attached here: image Anyone ...

Sandro Pinho

1

asked Jul 11, 2024 at 7:34

1 vote

1 answer

151 views

Pytesseract OCR recognizes "o" as "0"

I'm trying to read text on this image using pytesseract library. original-screenshot.png Here is my code: path = 'original-screenshot.png' image = cv2.imread(path) image = cv2.cvtColor(image, cv2....

ThunderFound

11

asked Jul 8, 2024 at 21:47

1 vote

0 answers

50 views

I don't want the boxes to be read as special character or letters

This is the image: This is the sample image that i will convert into text. And here is the output: ***"| | .** indicators (Bids: S.1.4.1. valid Certificate of Registration and **LJ Poy |** ...

Nami_Raven

15

asked Jul 4, 2024 at 8:43

3 votes

1 answer

118 views

Incorrect digit detection using Tesseract OCR on video frames in Python

I'm trying to calculate the real time of video recording. I have a lot of videos, some of which were lost during transmission. All of them are in mp4 format. to get the duration, I recognize the time ...

Ernán

33

asked Jun 22, 2024 at 21:44

-1 votes

1 answer

126 views

Unable to solve the captcha correctly using pytesseract

I have created a python code to read the captcha using OCR and fill the form further. I have used pytesseract library for the recognition of characters in the captcha. I am unable to retrieve the ...

Onkar Mehra

75

asked Jun 21, 2024 at 22:35

1 vote

0 answers

131 views

Improving OCR accuracy with pytesseract for processing manga images

def get_string(img_path): img = cv2.imread(img_path) img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ...

Myat Thet

19

asked Jun 18, 2024 at 5:38

0 votes

1 answer

99 views

OCR and pytesseract detecting numbers in an image

currentbid.png: I am trying to detect the number in this image and it gives me letters or the wrong number. This is my image i am trying to detect the number ive tried tons of stuff with greyscale ...

philM

11

asked Jun 12, 2024 at 18:14

0 votes

0 answers

237 views

How to read small numbers on given image using PyTesseract

I am trying to use OpenCV and Pytesseract to loop over the white numbers at the bottom of this image (or similar images) and record each number. While I have the logic correct for determining the ROI,...

Axel

1

asked Jun 9, 2024 at 17:52

1 vote

0 answers

27 views

I want a more detailed square using pytesseract

I want to make a code to extract the x-axis numbers and x-axis labels in the chart. I hope the numbers and labels are separated. Is there a way to solve it? Recognize the x-axis y-axis and classify it ...

김보미

11

asked Jun 9, 2024 at 17:38

0 votes

2 answers

261 views

Appium: identifying iOS elements using pytesseract instead of locators

Below is a snapshot of our application in test. iOS app in react native. The hierarchy is too deep. We are already using snapshotmaxdepth - 60 as one of the capabilities. Other capabilities include ...

Libra

43

asked Jun 7, 2024 at 14:52

0 votes

0 answers

129 views

Installing teserract in AWS Beanstalk

I was deploying a flask application that has a dependency with Tesseract library and I am getting below error. No match for argument: Error: Unable to find a match: tesseract FileNotFoundError: [...

dinesh

15

asked Jun 7, 2024 at 9:26

0 votes

0 answers

107 views

How to improve OCR accuracy using Pytesseract and describe preprocessing steps?

This code I've implemented a preprocessing pipeline before applying OCR. This includes converting images to grayscale, applying a sharpening filter, and binarizing using Otsu's thresholding method. ...

user18025483

21

asked Jun 6, 2024 at 4:29

Collectives™ on Stack Overflow