mahout 创建向量问题There are too many documents that do not have a term vector
2012-12-25 14:31
274 查看
bin/mahout lucene.vector --dir /home/hadoop/index --output /user/hadoop/out/part-out.vec --field title --idField id --dictOut /user/hadoop/out/dict.out
--maxPercentErrorDocs 0.1
Exception in thread "main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for ***
at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:118)
at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:109)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
原因1
***是不存在的file ,修改为正确的field
原因2
***是termVectors为false的field
解决,生成index时要设置field的termVectors 为true
原因3,错误文档数目超过了预定的百分比,
可以增加参数--maxPercentErrorDocs 0.1
表示允许10%的错误文档
--maxPercentErrorDocs 0.1
Exception in thread "main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for ***
at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:118)
at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:109)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
原因1
***是不存在的file ,修改为正确的field
原因2
***是termVectors为false的field
解决,生成index时要设置field的termVectors 为true
原因3,错误文档数目超过了预定的百分比,
可以增加参数--maxPercentErrorDocs 0.1
表示允许10%的错误文档
相关文章推荐
- 将分支推送到远程存储库时遇到错误: rejected Updates were rejected because the remote contains work that you do not have locally
- openstack 创建云主机No valid host was found. There are not enough hosts available.
- MySQL中 InnoDB: Check that you do not already have another mysqld process 错误解决办法
- InnoDB: Check that you do not already have another mysqld process 错误解决办法
- Poll shows that many still do not know the truth about PETA
- Spark之submit任务时的Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
- application使用@符合问题:'@' that cannot start any token. (Do not use @ for indentation)
- mysql启动报错:"Check that you do not already have another mysqld process "
- You do not have a license for this Vuser type问题
- 【mysql异常】 InnoDB: Check that you do not already have another mysqld process using the same InnoDB da
- OpenAirInterface USRP安装时You do not have write permissions at the install location问题解决方案
- How to create MFC applications that do not have a menu bar in Visual C++(MFC单文档和多文档程序中去掉菜单栏)(转)
- MySQL创建触发器的时候报1419错误( 1419 - You do not have the SUPER privilege and binary logging is enabled )
- Cannot connect to WMI provider.You do not have permission or the server is unreachable.Note that you can only manager SQL Server 2005 and later version with SQL Server Configuration Manager.Invalid namespace [0x8004100e]
- vue中出现 There are multiple modules with names that only differ in casing的问题
- mac FileZilla FTP 报错421 There are too many connections from your internet address
- git push updates were rejected because the remote contain works that you do not have
- 【问题】There are no resources that can be added or removed from the server
- 全网最详细使用Scrapy时遇到0: UserWarning: You do not have a working installation of the service_identity module: 'cannot import name 'opentype''. Please install it from ..的问题解决(图文详解)