0

I am trying to preprocess this image.

With Tesseract I then try to read the numbers on the right like:

const COORDINATES = [
  MORE_INFO_LABELS: {
    x: 740,
    y: 165,
    w: 112,
    h: 326,
  },
];

const worker = await createWorker("eng", OEM.TESSERACT_LSTM_COMBINED);

await worker.setParameters({
  tessedit_char_whitelist: "0123456789",
  tessedit_pageseg_mode: PSM.SINGLE_LINE,
});

const moreInfoScreenshot = await cv.imdecodeAsync(
  await fs.readFile("test.png"),
  cv.IMREAD_GRAYSCALE
);

const binaryImage = moreInfoScreenshot.adaptiveThreshold(
  255,
  cv.ADAPTIVE_THRESH_GAUSSIAN_C,
  cv.THRESH_BINARY_INV,
  11,
  2
);
const moreInfoScreenshotPNG = await cv.imencodeAsync(".png", binaryImage);

await cv.imwriteAsync("test-fmt.png", binaryImage);

function coordinatesToRectangle(coordinates: Required<Coordinate>) {
  return {
    top: coordinates.y,
    left: coordinates.x,
    width: coordinates.w,
    height: coordinates.h,
  };
}

const {
  data: { text: moreInfoText },
} = await options.worker.recognize(moreInfoScreenshotPNG, {
  rectangle: coordinatesToRectangle(COORDINATES.MORE_INFO_LABELS),
});

The output image looks like this. The problem is that tesseract does not read the smaller numbers (moreInfoText: '100408218\n18870369\n26783840937\n3330133360\n215735\n'). How can I make sure those are read properly?

2
  • Tesseract, out of the box, works best with black text on a white background. Try using some other thresholding method which doesn't produce borders like ADAPTIVE does. Commented Dec 9, 2023 at 22:08
  • @cbr Thanks for the suggestion. After looking around I also saw that this is supposed to work better. I am not familiar with opencv / image manipulation however. Do you have some idea how I could achieve this? Commented Dec 9, 2023 at 22:44

1 Answer 1

0

You're lucky in that your image's background is mostly blue. When reading the image with colors (warning: OpenCV defaults to BGR, not RGB), you can extract each color channel with cv.split():

As you can see, since the background is blue, the red color is already wonderfully clean. Now you can either try running Tesseract on that, or additionally threshold and invert the image. I'm using OpenCV-python to demonstrate, but the OpenCV functions are the same:

import cv2 as cv

img = cv.imread("image.png")

_, _, r = cv.split(img)
_, thresh = cv.threshold(r, 85, 255,cv.THRESH_BINARY)

Now that the image is thresholded, you can just use cv.bitwise_not() to invert the colors:

cv.bitwise_not(thresh, thresh)

Going further, looking at the cropped segment:

we can see that you'll want to add , to the character whitelist, or otherwise Tesseract will probably guess a number instead. You can just .replace(/,/g, '') in JS to strip out the commas later.

Additionally, this is more like a block of text, so try using one of the page segmentation modes (PSM)s for blocks of text. Alternatively, slice each line separately and run them as lines of text.

Going further, the Tesseract docs has a page on improving quality: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html The erosion/dilation operation is one or two lines of code with OpenCV, and are covered in their documentation: https://docs.opencv.org/4.8.0/d4/d76/tutorial_js_morphological_ops.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.