自己用LUCENE建立索引
2006-11-21 14:50
447 查看
在spider搜索的网页基础上作的,依然连接mysql数据库
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
class LinkToDb ...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected Connection con;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected PreparedStatement preCount;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected PreparedStatement preSelect;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
LinkToDb(String driver,String sqlurl)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Class.forName(driver);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
con=DriverManager.getConnection(sqlurl);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
preCount=con.prepareStatement("SELECT count(*) as qty FROM visited_tab;");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
preSelect=con.prepareStatement("SELECT * FROM visited_tab;");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
public int GetTableNum()...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
int count=0;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
ResultSet rs=preCount.executeQuery();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs.next();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
count=rs.getInt("qty");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
return count;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
public ResultSet GetResult()...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
ResultSet rs=null;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs=preSelect.executeQuery();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
//rs.next();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
return rs;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
GetResult()方法是获得数据库所有对象(不清楚一点,rs是引用还是类,要是类的话 如果数据库太大。。。)
建议类对象creatIndex ci=new creatIndex();
还有 IndexWriter writer=new IndexWriter(dir,new CJKAnalyzer(),true);用了cjkanalyzer呵呵,之后就用lucene建立索引
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
ci.createConnection();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
count=ci.getTableNum();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
if(count<1)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println("no record in database");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
else...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs=ci.getResult();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
while(rs.next())...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Document doc=new Document();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.Keyword("url",rs.getString("url")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.Text("title",rs.getString("title")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnStored("text",rs.getString("text")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnIndexed("encode",rs.getString("encode")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnIndexed("last_modify_time",rs.getString("last_modify_time")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.addDocument(doc);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println(rs.getString("url")+" has been indexed");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.optimize();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.close();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println("complete");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
其实搜索代码也作好了,由于spider没有使用网页分析算法,导致搜索出很多没必要的内容,想看看pagerank算法,改进一下spider
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
class LinkToDb ...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected Connection con;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected PreparedStatement preCount;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
protected PreparedStatement preSelect;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
LinkToDb(String driver,String sqlurl)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Class.forName(driver);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
con=DriverManager.getConnection(sqlurl);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
preCount=con.prepareStatement("SELECT count(*) as qty FROM visited_tab;");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
preSelect=con.prepareStatement("SELECT * FROM visited_tab;");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
public int GetTableNum()...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
int count=0;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
ResultSet rs=preCount.executeQuery();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs.next();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
count=rs.getInt("qty");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
return count;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
public ResultSet GetResult()...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
ResultSet rs=null;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
try...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs=preSelect.executeQuery();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
//rs.next();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
catch(Exception e)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
return rs;
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
GetResult()方法是获得数据库所有对象(不清楚一点,rs是引用还是类,要是类的话 如果数据库太大。。。)
建议类对象creatIndex ci=new creatIndex();
还有 IndexWriter writer=new IndexWriter(dir,new CJKAnalyzer(),true);用了cjkanalyzer呵呵,之后就用lucene建立索引
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
ci.createConnection();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
count=ci.getTableNum();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
if(count<1)...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println("no record in database");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
else...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
rs=ci.getResult();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedSubBlock.gif)
while(rs.next())...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Document doc=new Document();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.Keyword("url",rs.getString("url")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.Text("title",rs.getString("title")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnStored("text",rs.getString("text")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnIndexed("encode",rs.getString("encode")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
doc.add(Field.UnIndexed("last_modify_time",rs.getString("last_modify_time")));
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.addDocument(doc);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println(rs.getString("url")+" has been indexed");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.optimize();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
writer.close();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
System.out.println("complete");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
其实搜索代码也作好了,由于spider没有使用网页分析算法,导致搜索出很多没必要的内容,想看看pagerank算法,改进一下spider
相关文章推荐
- [lucene]索引建立
- Lucene (一)建立索引及应用的属性详解
- lucene(二)索引的建立 Directory,lockFactory
- lucene的建立索引,搜索,中文分词
- Lucene(Lucence)建立索引(字段)
- 开发搜索引擎初步(一)建立索引(Lucene实现)
- lucene4 建立索引
- Lucene 建立索引和搜索
- lucene3.0全站搜索建立索引时的编码处理
- Lucene-建立索引的简单例子
- 用Lucene建立索引及查询示例
- 使用Lucene对建立的索引进行搜索
- 内存中直接建立索引: Lucene 建索引效率(litertiger)
- 【C#】解决lucene.net在建立索引的时候,出现“算数运算导致溢出”stackoverflow exception的错误
- 利用lucene对整个数据库建立索引
- 用Lucene建立索引及查询示例
- 自己动手写搜索引擎(常搜吧历程二#索引#)(Java、Lucene、hadoop)
- 用Lucene建立索引及查询示例
- lucene索引的建立昨天的问题已解决
- 【Lucene02】索引和搜索建立