tesseract-ocr了解
2016-07-25 20:28
337 查看
安装文件下载地址
http://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.01-1.exe
README
For the latest online version of the README.md see:
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors seeAUTHORS and github's log of contributors.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information.
Tesseract supports various output formats: plain-text, hocr(html), pdf.
This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.
You should note that in many cases, in order to get better OCR results, you'll need to improve the qualityof the image you are giving Tesseract.
The latest stable version is 3.04.01, released in February 2016.
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
Release Notes
Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io.
NOTE: This software depends on other packages that may be licensed under different open source licenses.
For more information about the various command line options use
Please read the FAQ before asking any question in the mailing-list or reporting an issue.
http://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.01-1.exe
README
For the latest online version of the README.md see:
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
About
This package contains an OCR engine -libtesseractand a command line program -
tesseract.
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors seeAUTHORS and github's log of contributors.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information.
Tesseract supports various output formats: plain-text, hocr(html), pdf.
This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.
You should note that in many cases, in order to get better OCR results, you'll need to improve the qualityof the image you are giving Tesseract.
The latest stable version is 3.04.01, released in February 2016.
Brief history
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998.In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
Release Notes
For developers
Developers can uselibtesseractC or C++ API to build their own application. If you need bindings to
libtesseractfor other programming languages, please see the wrapper section on AddOns wiki page.
Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io.
License
The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
NOTE: This software depends on other packages that may be licensed under different open source licenses.
Installing Tesseract
You can either Install Tesseract via pre-built binary package or build it from source.Running Tesseract
Basic command line usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
For more information about the various command line options use
tesseract --helpor
man tesseract.
Support
Mailing-lists: * tesseract-ocr - For tesseract users. * tesseract-dev - For tesseract developers.Please read the FAQ before asking any question in the mailing-list or reporting an issue.
相关文章推荐
- ajax 传递数组参数
- 收发邮件的礼仪
- Java 思维导图
- H5学习之7 canvas的运用2 画圆
- 关于eclipse升级ADT的问题(没有解决)
- 利用docker-compose实现elk+redice的但宿主机环境搭建
- Linux的历史命令重用及环境的配置文件
- 【Maven用户手册】Maven之pom.xml配置文件详解
- Java 父类和子类
- 蓝点通用管理系统V13版发布了!
- 解决ie6png透明的方法小结
- 习题3-5 谜题
- NKOI 2090 游戏
- CodeForces 289A Polo the Penguin and Segments
- 面试题52: 构建乘积数组
- 【POJ2533】Longest Ordered Subsequence(LIS-最长上升子序列/DP)
- 浮动与清浮动
- Linux 常见命令(一)
- Ueditor angularjs化过程记录(重点在于验证)
- iOS之 多线程中的NSThread