您的位置:首页 > 其它

关于Lucene 3.0升级到Lucene 4.x 备忘

2015-04-02 17:44 351 查看
  最近,需要对项目进行lucene版本升级。而原来项目时基于lucene 3.0的,很古老的一个版本的了。在老版本中中,我们主要用了几个lucene的东西:

  1、查询lucene多目录索引。

  2、构建RAMDirectory,把索引放到内存中,以提高检索效率。

  3、构建Lucene自定义分词。

  4、修改Lucene默认的打分算法。

  下面,将代码改造前和改造后做一对比:

1. 搜索多索引目录

   3.0 构建多索引目录: 

// 初始化全国索引
private boolean InitGlobal(String strRootPath) {
try {

IndexSearcher[] searchers = new IndexSearcher[2];

MultiSearcher globalSearcher = null;
if (Configution.IsMMap.equalsIgnoreCase("true")) {

searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + GLABOL_INDEX))));
searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + BUS_INDEX))));
//                searchers[2] = new IndexSearcher(new RAMDirectory(FSDirectory
//                        .open(new File(strRootPath + "/" + LU_INDEX))));
globalSearcher =  new MultiSearcher(searchers);
} else {
searchers[0] = new IndexSearcher(FSDirectory.open(new File(
strRootPath + "/" + GLABOL_INDEX)));
searchers[1] = new IndexSearcher(FSDirectory.open(new File(
strRootPath + "/" + BUS_INDEX)));
//                searchers[2] = new IndexSearcher(FSDirectory.open(new File(
//                        strRootPath + "/" + LU_INDEX)));

globalSearcher =  new MultiSearcher(searchers);
}
System.out.println("finish Global");

m_mapIndexName2Searcher.put("0", globalSearcher);
m_mapAdmin2IndexName.put("0", "0");

return true;

} catch (Exception e) {
e.printStackTrace();
SearchLog.SearchLog.error("全国索引初始化异常");
return false;
}
}


    Ok,使用MultiSearcher,这是lucene低版本搜索多索引的解决方案。但是在高版本,MutiSearcher这个类本身都删除了,折腾我很长时间。可见以版本帝著称的Lucene代码设计不是太好。整个lucene代码,接口使用很少,大多是类和抽象类。

4.x 构建多索引目录: 

// 初始化全国索引
private boolean InitGlobal(String strRootPath) {
try {

IndexSearcher globalSearcher = null;
if (Configution.IsMMap.equalsIgnoreCase("true")) {

IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));

IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));

MultiReader mr = new MultiReader(irGlobal,irBus);

globalSearcher =  new IndexSearcher(mr);//new MultiSearcher(searchers);
} else {

IndexReader irGlobal = DirectoryReader.open(FSDirectory
.open(new File(strRootPath + "/" + GLABOL_INDEX)));

IndexReader irBus = DirectoryReader.open(FSDirectory
.open(new File(strRootPath + "/" + BUS_INDEX)));

MultiReader mr = new MultiReader(irGlobal,irBus);
globalSearcher =   new IndexSearcher(mr);//new MultiSearcher(searchers);
}
System.out.println("finish Global");

m_mapIndexName2Searcher.put("0", globalSearcher);
m_mapAdmin2IndexName.put("0", "0");

return true;

} catch (Exception e) {
e.printStackTrace();
SearchLog.SearchLog.error("全国索引初始化异常");
return false;
}
}


  ok 改造后,直接用IndexSearcher替代MultiSearcher,通过传入MultiReader来检索多个索引目录。

  2、构建RAMDirectory,将索引放入内存中。

    3.0 构建内存索引目录:

searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + GLABOL_INDEX))));
searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + BUS_INDEX))));


    直接将Diretory作为RAMDirectory的构造函数,注意这个动作有坑,如果数据量大,你要等很久的!

    4.x 构建内存索引目录:

IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));

IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
.open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));

MultiReader mr = new MultiReader(irGlobal,irBus);


    在4.x中,安装3.0构造方法是不行的,还需要传入一个IOContext对象,汗~~~~~~~~~~~~~~~~

 3、自定义分词:

    3.0 自定义分词:

public class SingleAnalyzer extends Analyzer {

/**
* @param args
*/

public TokenStream tokenStream(String fieldName, Reader reader){
TokenStream result = null;
if(fieldName.equals("name"))
{
result = new SingleTokenizer(reader);
}
if(fieldName.equals("totalcity"))
{
result = new IKTokenizer(reader, false);
}

//        result = new StandardFilter(result);
//        result = new LowerCaseFilter(result);
//    result = new StopFilter(result, stopSet);
return result;
}

public static void main(String[] args) {
// TODO Auto-generated method stub

}

}


  重写tokenStream方法即可,很简单。

    4.x自定义分词:

public class SingleAnalyzer extends Analyzer {

/**
* @param args
*/

//        public TokenStream tokenStream(String fieldName, Reader reader){
//            TokenStream result = null;
//            if(fieldName.equals("name"))
//            {
//                result = new SingleTokenizer(reader);
//            }
//            if(fieldName.equals("totalcity"))
//            {
//                result = new IKTokenizer(reader, false);
//            }
//
////        result = new StandardFilter(result);
////        result = new LowerCaseFilter(result);
//    //    result = new StopFilter(result, stopSet);
//        return result;
//        }

@Override
protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
// TODO Auto-generated method stub
//         final Tokenizer source = new ChineseTokenizer(reader);
//          return new TokenStreamComponents(source, new ChineseFilter(source));
Tokenizer source = null;
if(fieldName.equals("name")){
source = new SingleTokenizer(reader);
}else if(fieldName.equals("totalcity")){
source = new IKTokenizer(reader, false);
}
return new TokenStreamComponents(source, source);
}

}


  OK,在4.x中你需要重写createComponents方法。

  4、打分算法:

    3.x和4.x打分算法变化不大,但是命名空间发生了变化,汗~~~~~~~~~~~~

3.x 命名空间:引入:import org.apache.lucene.search.DefaultSimilarity,命名空间在:org.apache.lucene.search

4.x命名空间:引入:import org.apache.lucene.search.similarities.*,命名空间在:org.apache.lucene.search.similarities。

  5、查询表达式:主要体现在TermRangeQuery上,3.x版本的一个参数是string类型,但是在4.x版本变成了包了string一层的BytesRef,还有其他很多细节变化

    3.x TermRangerQuery: 

String left = Long
.toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
String right = Long
.toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
String top = Long
.toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
String bottom = Long
.toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));

TermRangeQuery query1 = new TermRangeQuery("lon", left, right,
true, true);
TermRangeQuery query2 = new TermRangeQuery("lat", bottom, top,
true, true);
searchQuery.add(query1, BooleanClause.Occur.MUST);
searchQuery.add(query2, BooleanClause.Occur.MUST);


   4.x TermRangerQuery:  

String left = Long
.toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
String right = Long
.toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
String top = Long
.toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
String bottom = Long
.toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));

BytesRef brLeft = new BytesRef(left);
BytesRef brRight = new BytesRef(right);
BytesRef brBottom = new BytesRef(bottom);
BytesRef brTop = new BytesRef(top);

TermRangeQuery query1 = new TermRangeQuery("lon",
brLeft, brRight, true, true);
TermRangeQuery query2 = new TermRangeQuery("lat",
brBottom, brTop, true, true);
searchQuery.add(query1, BooleanClause.Occur.MUST);
searchQuery.add(query2, BooleanClause.Occur.MUST);


  6、关闭IndexSearcher

    3.x 关闭IndexSearcher直接调用close方法即可:

public void UnInit() {
if (!m_bIsInit)
return;

Iterator iter = m_mapIndexName2Searcher.keySet().iterator();

while (iter.hasNext()) {

String key = (String) iter.next();

MultiSearcher val = (MultiSearcher) m_mapIndexName2Searcher
.get(key);

try {

val.close();//关闭IndexSearcher
} catch (IOException e) {
e.printStackTrace();
SearchLog.SearchLog.error("分级索引关闭异常");
}
}

m_mapIndexName2Searcher.clear();
m_mapAdmin2IndexName.clear();
m_mapIndexName2Searcher = null;
m_mapAdmin2IndexName = null;
m_bIsInit = false;
}


  4.x 关闭IndexSearcher 没有直接close的方法,需要getIndexReader 然后调用IndexReader的close方法:

public void UnInit() {
if (!m_bIsInit)
return;

Iterator iter = m_mapIndexName2Searcher.keySet().iterator();

while (iter.hasNext()) {

String key = (String) iter.next();

IndexSearcher val = (IndexSearcher) m_mapIndexName2Searcher
.get(key);

try {
val.getIndexReader().close();//关闭IndexSearcher
} catch (IOException e) {
e.printStackTrace();
SearchLog.SearchLog.error("分级索引关闭异常");
}
}

m_mapIndexName2Searcher.clear();
m_mapAdmin2IndexName.clear();
m_mapIndexName2Searcher = null;
m_mapAdmin2IndexName = null;
m_bIsInit = false;
}


  总之,lucene版本变化很大,如果升级很多方法发送变化,您需要细致观察,多试试,才能升级。升级完成后,最好进行一次功能测试,有些功能可能发生变化甚至错误。升级Lucene不是一件好差事~~~~~~~~~

文章转载请注明出处:http://www.cnblogs.com/likehua/p/4387700.html

    

  
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: