tesseract4.0:ubuntu16.04 +x64+leptonica1.74.4源码安装(ViewerDebugging)工具记录
2017-09-27 15:58
573 查看
!!!tesseract官网有提供相关视频!!!
https://www.youtube.com/watch?v=vOdnt2h1U8U https://www.youtube.com/watch?v=WZLJucXZy-g !!!官网编译教程!!!建议完整看完后再作操作。https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux1)必要流程If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04)sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
2)leptonica编译(有2种方式,一种是github源码,一种是压缩包,我使用github源码)
sudo apt install git git clone https://github.com/DanBloomberg/leptonica cd leptonica autoreconf -vi ./autobuild ./configure make
sudo make install3)安装tesseractcd
git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig
tesseract -v测试效果如下图
4)安装训练文件
make
make training
sudo make training-install5)安装调试工具
下载官方的两个jar包,复制piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar 到 tesseract/java文件夹下,网址:https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebuggingsudo apt-get install default-jre
sudo apt-get install default-jdk
cd tesseract
cd java
make ScrollView.jar
export SCROLLVIEW_PATH=$PWD在/tesseract/api/tesseractmain.cpp找到如下代码
插入以下代码api.SetVariable("tessedit_dump_pageseg_images", "true");
api.SetVariable("textord_show_blobs", "true");
api.SetVariable("textord_show_boxes", "true");
api.SetVariable("textord_tabfind_show_blocks", "true");
api.SetVariable("textord_tabfind_show_reject_blobs", "true");
api.SetVariable("textord_tabfind_show_initial_partitions", "true");
api.SetVariable("textord_tabfind_show_partitions", "1");
api.SetVariable("textord_tabfind_show_initialtabs", "true");
api.SetVariable("textord_tabfind_show_finaltabs", "true");
api.SetVariable("textord_tabfind_show_images", "true"); 变成
保存。
cd tesseract make sudo make install sudo ldconfig6)下载语言包我选择了tessdata
git clone https://github.com/tesseract-ocr/tessdata export TESSDATA_PREFIX=/home/XX/tessdata7)测试
tesseract rorate.png out -l chi_sim+eng
8)测试2
tesseract /home/joy/tesseract/testing/phototest.tif关掉一个窗口,就跳出来了!
!!!!终于出现官网的图片了!
插入翻译:The words found in the image are represented as blue rectangles. There are 3 menus:MODES sets the mode for what a left-click or selection does. DISPLAY changes the requested displayed content of the window. (Not immediately) OTHER provides a bunch of miscellaneous global actions. If you right-click in the Editor Image window, you can change the values of any of the "new" config variables on the fly. Depending on what you want to change though, a lot of the useful variables are in the old style and cannot be changed this way. Some day, someone will update all the old style variables to new ones.NOTE that the menus seem rather strange. This is because the tool was originally designed to provide the capability to create ground truthed files, in excruciating detail with labels on the characters, information on the connected components making up each character etc. Most of this functionality is redundant and hasn't been used in over 10 years. Some of the functionality advertised can easily crash the program, but the functionality documented here should work...To show the characters, deselect DISPLAY/Bounding Boxes, select DISPLAY/Polygonal Approx and then select OTHER/Uniform display.To zoom in, position the cursor over a word, and roll the mouse scroll wheel away from you 2 or three clicks. Each click doubles the size. To zoom out roll the mouse wheel towards you. If you haven't got a mouse wheel ... you may be out of luck. The Java code needs some work in this area.Now select MODES/Recog words and click in a word. If you choose the word 'code' (the 2nd word on the 2nd line) then you should get something like this:
参考文献a)leptonica的编译版本
!!!Do not install
libleptonica-devwith apt-get, since you manually intsall leptonica later.
使用leptonica的github https://www.panhaoo.cn/posts/1750844891/ https://github.com/tesseract-ocr/tesseract/issues/1043
https://github.com/DanBloomberg/leptonica/issues/197 https://github.com/tesseract-ocr/tesseract/issues/1000
使用leptonica压缩包 http://www.cnblogs.com/jkmiao/p/6417167.html http://blog.csdn.net/u012384044/article/details/77979803 http://www.cnblogs.com/gavanwanggw/p/7219503.html http://jybaek.tistory.com/620
b)ViewerDebugging
https://lengerrong.blogspot.jp/2017/03/viewerdebugging-tesseract-ocr-on-ubuntu.html(可行,步骤2,出错点:复制两个jar包的时候要复制好,我第一遍的时候可能复制不好)
https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging http://blog.csdn.net/yazi1297/article/details/54706390 http://blog.csdn.net/tfygg/article/details/632623963)其他参考(tesseract旧版本) http://blog.csdn.net/yimingsilence/article/details/51276138 http://blog.csdn.net/yimingsilence/article/details/51353772
http://blog.csdn.net/tuling_research/article/details/53543673 http://www.letout.cc/archives/macosx-compling-and-install-tesseract.html http://blog.csdn.net/u012476249/article/details/53423193 https://segmentfault.com/a/1190000007267921
vs编译:
https://www.polarxiong.com/archives/Tesseract-3-05%E5%8F%8A%E4%B9%8B%E5%90%8E%E7%89%88%E6%9C%AC%E7%BC%96%E8%AF%91%E7%94%9F%E6%88%90%E5%8A%A8%E6%80%81%E9%93%BE%E6%8E%A5%E5%BA%93DLL.html https://github.com/DanBloomberg/leptonica/issues/237 https://groups.google.com/forum/#!topic/tesseract-ocr/r6bL_KLlcyE http://jhoci.tistory.com/1 http://blog.csdn.net/zzb4702/article/details/51760678 http://blog.csdn.net/naidoudou/article/details/70225849
封装tesseract 应用: https://www.polarxiong.com/archives/python-pytesser-tesseract.html https://www.polarxiong.com/archives/python-tesseract-verification-code.html
http://dmlcoding.com/2017/TesseractBasic/ http://www.codepalace.org/2017/08/05/Tesseract-OCR-with-Python/
有错请指出,谢谢!欢迎加Tesseract OCR 讨论群 389402579
一直没有解锁viewerbugging,这次终于成功了!特别记录一下!!
相关文章推荐
- ubuntu16.04 x64上nginx源码安装
- ubuntu16.04下的tensorflow源码安装,踩坑&填坑记录[ubuntu16.04+GTX960+CUDA8.0+cuDNN5.1.5]
- tesseract4.0:win10 +x64+vs2015 源码安装(ViewerDebugging)安装记录
- tensorflow源码安装过程记录(ubuntu16、基于CPU)
- Ubuntu 16.04安装JAD反编译工具(Java)
- Openfoam学习记录(2017.06.10)(foam-extend-3.2在ubuntu16.04上的安装与编译)
- Ubuntu 16.04 虚拟机安装记录
- Ubuntu 16.04 安装opencv 3.2 记录
- Ubuntu 16.04 源码安装 opencv 3.2.0
- ubuntu14.04(X64) qt5.4.0 + vtk6.1.0 Install (安装记录)
- 64bit ubuntu 16.04 安装海思编译32位工具链
- Ubuntu 16.04系统下安装RapidSVN版本控制器及配置diff,editor,merge和exploer工具
- Ubuntu16.04 Caffe 安装步骤记录(超详尽)
- ubuntu16.04安装Oracle_linux_x64_11gR2
- Ubuntu16.04 记录一次安装greenplum的过程,以及遇到的问题与部分解决方案
- 在win10上通过VMware安装ubuntu16.04虚拟机问题记录
- Ubuntu 16.04 源码方式安装 JDK
- ubuntu16.04通过修改transmission源码编译安装transmission,使得其能够通过ipv6在六维下载
- 阿里云ubuntu 16.04 Server配置方案 3 安装git ,node,pyhon常用的工具
- Ubuntu16.04 caffe安装记录