How to build tesseract 4 beta on macOS

发表于 2018-05-06 更新于 2023-09-29 Disqus：本文字数： 785 阅读时长 ≈ 1 分钟

brew info tesseract

tesseract: stable 3.05.01 (bottled), HEAD
OCR (Optical Character Recognition) engine

识别简体中文的结果有点可怕。
我注意到它在 4.0.0+之后添加了一个基于 LSTM 的新神经网络系统
但是它需要从 macOS 上的源代码构建。
值得庆幸的是，该手册在其 README.md 上已作了详细说明

Install dependencies

brew install automake autoconf autoconf-archive libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
brew install gcc

Compile

git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure CC=gcc CXX=g++ CPPFLAGS=-I/usr/local/opt/icu4c/include LDFLAGS=-L/usr/local/opt/icu4c/lib
make -j
make install

Their best trained modes, download the language chi_sim.traineddata and put it under tesseract/4.0.0.1/tessdata/

Usage

1 2	tesseract image.png image -l chi_sim cat image.txt

好的，在歌曲字体字体下仍然很糟糕。我需要自己使用新模型进行培训。

最后，我忽略了tesseract，我发现将图像拖到OneNote中，而Ctrl +单击->从图片复制文本将获得更高的准确性。 😓