tesseract-ocr语言库训练的一种出错情况
2015-05-07 14:12
441 查看
看过一些其他博客关于tesseract-ocr的介绍,关于训练语言库的方法都类似。但是,由于一些小地方的错误,都没有出现预期的结果。比如定义字体特征文件,文件的后缀为.txt文件,具体怎么设置可以详看http://blog.csdn.net/firehood_/article/details/8433077的文章。我根据这个步骤下来,只有到最后一步“7.生成语言文件”时才出现了错误。它的批处理文件里是这样的内容:
rem 执行改批处理前先要目录下创建font_properties文件
echo Run Tesseract for Training..
tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.train
echo Compute the Character Set..
unicharset_extractor.exe num.font.exp0.box
mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr
echo Clustering..
cntraining.exe num.font.exp0.tr
echo Rename Files..
rename normproto num.normproto
rename inttemp num.inttemp
rename pffmtable num.pffmtable
rename shapetable num.shapetable
echo Create Tessdata..
combine_tessdata.exe num.
可是出现了这样的结果:
E:\tesseract\tessdata>test.bat
E:\tesseract\tessdata>rem 执行改批处理前先要目录下创建font_properties文件
E:\tesseract\tessdata>echo Run Tesseract for Training..
Run Tesseract for Training..
E:\tesseract\tessdata>tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.
train
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 1 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
TRAINING ... Font name = font
Generated training data for 1 words
Page 2 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 4 words
Page 3 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
Page 4 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
Page 5 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
E:\tesseract\tessdata>echo Compute the Character Set..
Compute the Character Set..
E:\tesseract\tessdata>unicharset_extractor.exe num.font.exp0.box
Extracting unicharset from num.font.exp0.box
Wrote unicharset file ./unicharset.
E:\tesseract\tessdata>mftraining -F font_properties -U unicharset -O num.unichar
set num.font.exp0.tr
Warning: No shape table file present: shapetable
Failed to load font_properties from font_properties
E:\tesseract\tessdata>echo Clustering..
Clustering..
E:\tesseract\tessdata>cntraining.exe num.font.exp0.tr
Reading num.font.exp0.tr ...
Clustering ...
Writing normproto ...
E:\tesseract\tessdata>echo Rename Files..
Rename Files..
E:\tesseract\tessdata>rename normproto num.normproto
E:\tesseract\tessdata>rename inttemp num.inttemp
系统找不到指定的文件。
E:\tesseract\tessdata>rename pffmtable num.pffmtable
系统找不到指定的文件。
E:\tesseract\tessdata>rename shapetable num.shapetable
系统找不到指定的文件。
E:\tesseract\tessdata>echo Create Tessdata..
Create Tessdata..
E:\tesseract\tessdata>combine_tessdata.exe num.
Combining tessdata files
Error opening unicharset file
Error combining tessdata files into num.traineddata
非常郁闷,经过反复的尝试,最后发现,了问题所在。只要把批处理文件中“mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr”改为““mftraining -F font_properties.txt
-U unicharset -O num.unicharset num.font.exp0.tr””问题就解决了!出现了想要的结果:
rem 执行改批处理前先要目录下创建font_properties文件
echo Run Tesseract for Training..
tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.train
echo Compute the Character Set..
unicharset_extractor.exe num.font.exp0.box
mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr
echo Clustering..
cntraining.exe num.font.exp0.tr
echo Rename Files..
rename normproto num.normproto
rename inttemp num.inttemp
rename pffmtable num.pffmtable
rename shapetable num.shapetable
echo Create Tessdata..
combine_tessdata.exe num.
可是出现了这样的结果:
E:\tesseract\tessdata>test.bat
E:\tesseract\tessdata>rem 执行改批处理前先要目录下创建font_properties文件
E:\tesseract\tessdata>echo Run Tesseract for Training..
Run Tesseract for Training..
E:\tesseract\tessdata>tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.
train
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 1 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
TRAINING ... Font name = font
Generated training data for 1 words
Page 2 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 4 words
Page 3 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
Page 4 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
Page 5 of 5
APPLY_BOXES:
Boxes read from boxfile: 10
Found 10 good blobs.
Generated training data for 1 words
E:\tesseract\tessdata>echo Compute the Character Set..
Compute the Character Set..
E:\tesseract\tessdata>unicharset_extractor.exe num.font.exp0.box
Extracting unicharset from num.font.exp0.box
Wrote unicharset file ./unicharset.
E:\tesseract\tessdata>mftraining -F font_properties -U unicharset -O num.unichar
set num.font.exp0.tr
Warning: No shape table file present: shapetable
Failed to load font_properties from font_properties
E:\tesseract\tessdata>echo Clustering..
Clustering..
E:\tesseract\tessdata>cntraining.exe num.font.exp0.tr
Reading num.font.exp0.tr ...
Clustering ...
Writing normproto ...
E:\tesseract\tessdata>echo Rename Files..
Rename Files..
E:\tesseract\tessdata>rename normproto num.normproto
E:\tesseract\tessdata>rename inttemp num.inttemp
系统找不到指定的文件。
E:\tesseract\tessdata>rename pffmtable num.pffmtable
系统找不到指定的文件。
E:\tesseract\tessdata>rename shapetable num.shapetable
系统找不到指定的文件。
E:\tesseract\tessdata>echo Create Tessdata..
Create Tessdata..
E:\tesseract\tessdata>combine_tessdata.exe num.
Combining tessdata files
Error opening unicharset file
Error combining tessdata files into num.traineddata
非常郁闷,经过反复的尝试,最后发现,了问题所在。只要把批处理文件中“mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr”改为““mftraining -F font_properties.txt
-U unicharset -O num.unicharset num.font.exp0.tr””问题就解决了!出现了想要的结果:
相关文章推荐
- Tesseract-OCR 样本训练,生成语言文件
- tesseract-ocr 第四课 如何训练新语言
- tesseract-ocr 第四课 如何训练新语言
- Tesseract-OCR 3.0.1训练自己的语言库之图像文字识别
- ubuntu下使用Tesseract-ocr(编译、安装、使用、训练新的语言库)
- tesseract-ocr 如何训练新语言 && 字符识别---样本训练
- Tesseract-OCR 3.0.1训练自己的语言库
- tesseract-ocr训练方法
- tesseract ocr训练样本库 识别字库
- Tesseract-OCR 字符识别---样本训练
- tesseract-ocr使用以及训练方法
- Tesseract-OCR 字符识别---样本训练
- 训练tesseract-ocr3.00字典的步骤
- 使用Tesseract-OCR训练文字识别记录
- tesseract-ocr字库训练图文讲解
- 纯记录,Tesseract-OCR 中文字符训练
- Tesseract-ocr训练字库
- Tesseract-OCR 字符识别---样本训练
- Tesseract-ocr自己做训练样本库来进行字符识别
- python使用tesseract-ocr完成验证码识别(模型训练和使用部分)