Getting numbers from matrix image using pytesseract

Question

I am trying to retrieve the text from an image that is a matrix 4x4. The text are numbers. Although I was expecting the numbers all I got was: BE, 8, EEE, BE. The image is attached here: image

Anyone have an idea to solve this?

import cv2
import numpy as np
from PIL import Image
import pytesseract

def preprocess_image(image_path):
   
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)

    return binary

def find_cells(binary_image):
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    bounding_boxes = [cv2.boundingRect(contour) for contour in contours]
    bounding_boxes = sorted(bounding_boxes, key=lambda x: (x[1], x[0]))

    return bounding_boxes

def extract_text_from_cells(image_path, bounding_boxes):
    image = Image.open(image_path)

    cell_texts = []
    for box in bounding_boxes:
        left, top, width, height = box
        right, bottom = left + width, top + height

        cropped_image = image.crop((left, top, right, bottom))


        text = pytesseract.image_to_string(cropped_image, config='--psm 6')
        cell_texts.append(text.strip())

    return cell_texts


image_path = r'C:\Users\Sandro\Desktop\BINGO\screenshot.png'

binary_image = preprocess_image(image_path)

cells = find_cells(binary_image)

texts = extract_text_from_cells(image_path, cells)

for i, text in enumerate(texts):
    print(f"Texto da célula {i+1}: {text}")

Hello, did you install pytesseract using the command pip install pytesseract. Can it get results if use other images? Or try changing the --psm 6 in the pytesseract.image_to_string to 8. — Jeanneli
– Jeanneli, Commented Jul 11, 2024 at 9:37

Sparkling Marcel · Accepted Answer · 2024-07-11 10:31:54Z

0

From what I understand from your problem, you only want to read number

You can specify a config to pytesseract so that it only recognize numbers (Or, more precisly, tries to see if it recognize number; it could read a letter as a number, but you only have number so that's not a problem for you)

Example of config you can use :

config = r'--oem 3 --psm 6 -c tessedit_char_whitelist=0123456789'

--oem 3 means that you don't care how long processing take, you just want "the best result"

tessedit_char_whitelist=0123456789 is pretty self explanatory, you only recognize numbers

you can use it like that in your code

def extract_text_from_cells(image_path, bounding_boxes):
    image = Image.open(image_path)

    cell_texts = []
    for box in bounding_boxes:
        left, top, width, height = box
        right, bottom = left + width, top + height

        cropped_image = image.crop((left, top, right, bottom))

        config = r'--oem 3 --psm 6 -c tessedit_char_whitelist=0123456789'
        text = pytesseract.image_to_string(cropped_image, config=config)
        cell_texts.append(text.strip())

    return cell_texts

answered Jul 11, 2024 at 10:31

Sparkling Marcel

9684 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sandro Pinho Over a year ago

I will try that, but I guess I already tried it. I will reply as soon as I test this approach. However, the image looks very straight forward to get the numbers, without noise. I thought it would be easier. Thanks

Jeanneli Over a year ago

Hello, can you detail what kind of effect you need? In many cases, pytesseract doesn't recognize text, and we need to give it some hints.

Sandro Pinho Over a year ago

I need the numbers of that image. the numbers will change over time in value an position. I already tried several configs with pytesseract but they all gave bad result. However, I read somewhere in a forum, that tesseract reads well the text if the text height is between 20 to 30 px. I still did not try that. Also, if I get no luck, I will try EasyOCR.

Jeanneli Over a year ago

You can try scaling the image by a higher multiplier to see if you get the desired effect.

Collectives™ on Stack Overflow

Getting numbers from matrix image using pytesseract

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related