I am trying to preprocess this image.
With Tesseract I then try to read the numbers on the right like:
const COORDINATES = [
MORE_INFO_LABELS: {
x: 740,
y: 165,
w: 112,
h: 326,
},
];
const worker = await createWorker("eng", OEM.TESSERACT_LSTM_COMBINED);
await worker.setParameters({
tessedit_char_whitelist: "0123456789",
tessedit_pageseg_mode: PSM.SINGLE_LINE,
});
const moreInfoScreenshot = await cv.imdecodeAsync(
await fs.readFile("test.png"),
cv.IMREAD_GRAYSCALE
);
const binaryImage = moreInfoScreenshot.adaptiveThreshold(
255,
cv.ADAPTIVE_THRESH_GAUSSIAN_C,
cv.THRESH_BINARY_INV,
11,
2
);
const moreInfoScreenshotPNG = await cv.imencodeAsync(".png", binaryImage);
await cv.imwriteAsync("test-fmt.png", binaryImage);
function coordinatesToRectangle(coordinates: Required<Coordinate>) {
return {
top: coordinates.y,
left: coordinates.x,
width: coordinates.w,
height: coordinates.h,
};
}
const {
data: { text: moreInfoText },
} = await options.worker.recognize(moreInfoScreenshotPNG, {
rectangle: coordinatesToRectangle(COORDINATES.MORE_INFO_LABELS),
});
The output image looks like this. The problem is that tesseract does not read the smaller numbers (moreInfoText: '100408218\n18870369\n26783840937\n3330133360\n215735\n'). How can I make sure those are read properly?



