您的位置:首页 > 其它

自己用LUCENE建立索引

2006-11-21 14:50 447 查看
在spider搜索的网页基础上作的,依然连接mysql数据库




class LinkToDb ...{


protected Connection con;


protected PreparedStatement preCount;


protected PreparedStatement preSelect;




LinkToDb(String driver,String sqlurl)...{




try...{


Class.forName(driver);


con=DriverManager.getConnection(sqlurl);


preCount=con.prepareStatement("SELECT count(*) as qty FROM visited_tab;");


preSelect=con.prepareStatement("SELECT * FROM visited_tab;");


}




catch(Exception e)...{




}




}




public int GetTableNum()...{


int count=0;




try...{


ResultSet rs=preCount.executeQuery();


rs.next();


count=rs.getInt("qty");


}




catch(Exception e)...{




}


return count;


}




public ResultSet GetResult()...{


ResultSet rs=null;




try...{


rs=preSelect.executeQuery();


//rs.next();


}




catch(Exception e)...{




}


return rs;


}



GetResult()方法是获得数据库所有对象(不清楚一点,rs是引用还是类,要是类的话 如果数据库太大。。。)

建议类对象creatIndex ci=new creatIndex();

还有 IndexWriter writer=new IndexWriter(dir,new CJKAnalyzer(),true);用了cjkanalyzer呵呵,之后就用lucene建立索引


ci.createConnection();


count=ci.getTableNum();




if(count<1)...{


System.out.println("no record in database");


}




else...{




rs=ci.getResult();




while(rs.next())...{


Document doc=new Document();


doc.add(Field.Keyword("url",rs.getString("url")));


doc.add(Field.Text("title",rs.getString("title")));


doc.add(Field.UnStored("text",rs.getString("text")));


doc.add(Field.UnIndexed("encode",rs.getString("encode")));


doc.add(Field.UnIndexed("last_modify_time",rs.getString("last_modify_time")));




writer.addDocument(doc);


System.out.println(rs.getString("url")+" has been indexed");


}


writer.optimize();


writer.close();


System.out.println("complete");


}





其实搜索代码也作好了,由于spider没有使用网页分析算法,导致搜索出很多没必要的内容,想看看pagerank算法,改进一下spider
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: