How to build tesseract 4 beta on macOS
1 | brew info tesseract |
The result of recognizing Simplified Chinese is a bit terrible.
I noticed that it added a new neural network system based on LSTM after 4.0.0+.
But it needs to be built from source code on macOS.
Fortunately, the manual on its README.md has detailed instructions.
Install dependencies
1 | brew install automake autoconf autoconf-archive libtool |
Compile
1 | git clone https://github.com/tesseract-ocr/tesseract/ |
Download their best trained models, download the language chi_sim.traineddata
and put it under tesseract/4.0.0.1/tessdata/
Usage
1 | tesseract image.png image -l chi_sim |
Okay, itβs still terrible under the song font
. I need to train with the new model myself.
Finally, I ignored tesseract
, and I found that dragging the image
into OneNote
, and then Ctrl + click
-> Copy Text from Picture
will give higher accuracy. π
Translated by gpt-3.5-turbo