您的位置:首页 > 其它

terrier建立索引遇到的问题,以及解决方案

2011-08-03 08:51 465 查看
Issuse1:java.lang.OutOfMemoryError: Java heap space

具体错误如下:

java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space

at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)

at gnu.trove.THash.postInsertHook(THash.java:359)

at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)

at org.terrier.utility.TermCodes.getCode(TermCodes.java:100)

at org.terrier.structures.indexing.DocumentPostingList.getTermId(DocumentPostingList.java:133)

at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:168)

at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:166)

at gnu.trove.TObjectIntHashMap.forEachEntry(TObjectIntHashMap.java:426)

at org.terrier.structures.indexing.DocumentPostingList.getPostings2(DocumentPostingList.java:165)

at org.terrier.indexing.BasicIndexer.indexDocument(BasicIndexer.java:368)

at org.terrier.indexing.BasicIndexer.createDirectIndex(BasicIndexer.java:261)

at org.terrier.indexing.Indexer.index(Indexer.java:344)

at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:123)

at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:390)

at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)

at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)

21877.18user 916.34system 6:01:37elapsed 105%CPU (0avgtext+0avgdata 0maxresident)k

45946520inputs+21416016outputs (1major+1978833minor)pagefaults 0swaps

解决方案:

increased the maximum Java Heap Space to 2GB, by setting TERRIER_HEAP_MEM to 2048M in bin/terrier-env.sh.

And It seems to be running smoothly.

Issuse2:(可以概括为Key is not unique)

ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found

ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]

具体错误如下:

NFO - Collection #0 took 183780 seconds to build the runs for 20000000 documents

ERROR - Problem finishing index

java.io.IOException: Key is not unique: 38131,3514

at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.mergeTwo(FSOrderedMapFile.java:908)

at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.close(FSOrderedMapFile.java:861)

at org.terrier.structures.indexing.CompressingMetaIndexBuilder.close(CompressingMetaIndexBuilder.java:259)

at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:274)

at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)

at org.terrier.indexing.Indexer.index(Indexer.java:344)

at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)

at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)

at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)

at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)

INFO - Optimising structure lexicon

ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found

ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]

ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-valuefactory: property index.lexicon-valuefactory.class not found

ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]

A problem occurred: java.lang.NullPointerException

java.lang.NullPointerException

at org.terrier.structures.collections.FSOrderedMapFile.numberOfEntries(FSOrderedMapFile.java:490)

at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.java:389)

at org.terrier.structures.indexing.LexiconBuilder.optimise(LexiconBuilder.java:790)

at org.terrier.indexing.BasicIndexer.finishedInvertedIndexBuild(BasicIndexer.java:438)

at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:292)

at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)

at org.terrier.indexing.Indexer.index(Indexer.java:344)

at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)

at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)

at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)

at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)

解决方案:

这些问题是建立meta index 造成的,以下为解决方案:

Give this issue some thought.

* My initial idea was that your indexer.meta.forward.keylens was too small, but this is not the case.

* The error is occurring when building the reverse lookup table (docno -> docid). Will you need this functionality? If not, then you can disable it using indexer.meta.reverse.keys= during indexing.

* Otherwise, can you alter the exception being raised in FSOrderedMapFile to print the value of the key that is causing the collision?

我采用的是第二种,问题迎刃而解。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: