走进lucene - 创建索引、检索
2014-05-23 10:39
363 查看
看过王老师的信息检索导论,一直想学lucene,看看从实现的角度,搜索引擎是如何工作的。
正好工作中用到一点lucene,尝试使用,为后续分析做好准备。
截止到现在,lucene已经更新到4.8了,适配Java8,并做了很多封装。
自己会用3.6来进行学习。因为阅读lucene源码还是4版本一下的更为合适,而且一些简单查询,3.6也足够用了。
1 构建索引
2 检索
最开始的时候使用QueryParser生成Query,发现不需要分析的字段,检索不出结果来。改为有Term生成Query就OK了。
正好工作中用到一点lucene,尝试使用,为后续分析做好准备。
截止到现在,lucene已经更新到4.8了,适配Java8,并做了很多封装。
自己会用3.6来进行学习。因为阅读lucene源码还是4版本一下的更为合适,而且一些简单查询,3.6也足够用了。
1 构建索引
public boolean indexBuild(String indexPath, String inputFile) { boolean suc = true; try { Directory dir = FSDirectory.open(new File(indexPath)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36, analyzer); boolean create = true; if (create) { // Create a new index in the directory, // removing any previously indexed documents: iwc.setOpenMode(OpenMode.CREATE); } else { // Add new documents to an existing index: iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); } // Optional: for better indexing performance, if you // are indexing many documents, increase the RAM // buffer. But if you do this, increase the max heap // size to the JVM (eg add -Xmx512m or -Xmx1g): // iwc.setRAMBufferSizeMB(256.0); IndexWriter writer = new IndexWriter(dir, iwc); FileInputStream fis = new FileInputStream(inputFile); // make a new, empty document Document doc = new Document(); // Add the path of the file as a field named "path". Use a // field that is indexed (i.e. searchable), but don't tokenize // the field into separate words and don't index term frequency // or positional information: BufferedReader reader = new BufferedReader(new InputStreamReader( fis, "UTF-8")); String line = null; while ((line = reader.readLine()) != null) { int beginPos = line.indexOf("\t"); String hid = line.substring(0, beginPos); String tags = line.substring(beginPos+1); Field hidField = new Field("hid", hid, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS); hidField.setIndexOptions(IndexOptions.DOCS_ONLY); doc.add(hidField); Field tagsField = new Field("tags", tags, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS); tagsField.setIndexOptions(IndexOptions.DOCS_ONLY); doc.add(tagsField); if (writer.getConfig().getOpenMode() == OpenMode.CREATE) { // New index, so we just add the document (no old document // can be there): System.out.println("adding " + inputFile); writer.addDocument(doc); } else { // Existing index (an old copy of this document may have // been indexed) so // we use updateDocument instead to replace the old one // matching the exact // path, if present: writer.updateDocument(new Term("path", inputFile), doc); } } writer.close(); } catch (IOException e) { e.printStackTrace(); } return suc; }
2 检索
最开始的时候使用QueryParser生成Query,发现不需要分析的字段,检索不出结果来。改为有Term生成Query就OK了。
private IndexReader reader = null; private Analyzer analyzer = null; private IndexSearcher searcher = null; public boolean init(String indexPath) { boolean suc = true; try { reader = IndexReader.open(FSDirectory.open(new File(indexPath))); searcher = new IndexSearcher(reader); analyzer = new StandardAnalyzer(Version.LUCENE_36); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return suc; } public String queryTags(String hid) { String result = null; // QueryParser parser = new QueryParser(Version.LUCENE_36, "hid", analyzer); try { Term term = new Term("hid", hid); Query query = new TermQuery(term); int hitsPerPage = 1; // Collect enough docs to show 5 pages TopDocs results = searcher.search(query, 5 * hitsPerPage); ScoreDoc[] hits = results.scoreDocs; int numTotalHits = results.totalHits; System.out.println(numTotalHits + " total matching documents"); String str; int start = 0; // end pos for return docs int end = Math.min(numTotalHits, hitsPerPage); boolean raw = false; for (int i = start; i < end; i++) { if (raw) { // output raw format String log = "doc=" + hits[i].doc + " score=" + hits[i].score; System.out.println(log); continue; } Document doc = searcher.doc(hits[i].doc); String tags = doc.get("tags"); if (tags != null) { //just for work result = tags; break; //System.out.println((i + 1) + ". " + tags); } else { String log = (i + 1) + " No tags for " + hid; System.out.println(log); } } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return result; } public void close() { try { searcher.close(); reader.close(); } catch (IOException e) { e.printStackTrace(); } }
相关文章推荐
- Lucene使用(一)简单索引的创建和检索
- Lucene全文检索技术之如何创建索引
- Lucene.Net 创建索引和检索
- Lucene.Net 创建索引和检索
- Lucene 全文检索的基本原理-如何创建索引
- 全文检索工具lucene 之索引创建方法
- 全文检索Lucene入门之创建索引及简单搜索
- Lucene全文检索 之创建索引、增删改查(纯代码)
- Lucene创建索引/检索信息
- lucene5.5创建索引和检索
- Lucene创建索引和索引的基本检索
- lucene学习笔记之索引创建、内容检索、分类器实现
- Lucene.Net 创建索引和检索
- Apache Lucene(全文检索引擎)—创建索引
- Lucene的配置及创建索引全文检索
- lucene简介 + 创建索引和搜索初步 ——转自robert
- 使用org.apache.lucene创建和查询索引核心代码详解
- Lucene搜索已经创建好的索引
- lucene_43版本创建索引demo
- Lucene 4.10 + Mysql 5.5 创建数据库表索引(Lucene 学习序列1)