您的位置:首页 > 编程语言 > Java开发

给你的网站加上站内搜索---Spring+Hibernate基于Compass(基于Lucene)实现

2012-09-07 14:32 776 查看

给你的网站加上站内搜索---Compass入门教程

syxChina(syxchina.cnblogs.com)
Compass(基于Lucene)入门教程
1 序言
2 Compass介绍
3 单独使用Compass
4 spring+hibernate继承compass
4-1 jar包
4-2 配置文件
4-3 源代码
4-4 说明
4-5 测试
5 总结下吧

1 序言

这些天一直在学点新的东西,想给毕业设计添加点含量,长时间的SSH项目也想尝试下新的东西和完善以前的技术,搜索毋容置疑是很重要的。作为javaer,作为apache的顶级开源项目lucene应该有所耳闻吧,刚学完lucene,知道了基本使用,学的程度应该到可以使用的地步,但不的不说lucene官方给的文档例子不是很给力的,还好互联网上资料比较丰富!在搜索lucene的过程中,知道了基于lucene的compass和lucene-nutch。lucene可以对给定内容加上索引搜索,但比如搜索本地数据库和web网页,你需要把数据给拿出来索引再搜索,所以你就想可不可以直接搜索数据库,以数据库内容作为索引,并且伴随着数据库的CRUD,索引也会更新,compass出现了,compass作为站内搜索那是相当的方便的,并且官方提供了spring和hibernate的支持,更是方便了。Lucene-nutch是基于lucene搜索web页面的,如果有必要我在分享下lucene、lecene-nutch的学习经验,快速入门,其他的可以交给文档和谷歌了。
不得不提下,compass09年貌似就不更新了,网上说只支持lucene3.0以下版本,蛮好的项目不知道为什么不更新了,试了下3.0以后的分词器是不能使用了,我中文使用JE-Analyzer.jar。我使用的环境:
Spring3.1.0+Hibernate3.6.6+Compass2.2.0。

2 Compass介绍

Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括:

* 搜索引擎抽象层(使用Lucene搜索引荐),
* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,
* 类似于Google的简单关键字查询语言,
* 可扩展与模块化的框架,
* 简单的API.
官方网站:谷歌

3 单独使用Compass

Compass可以不继承到hibernate和spring中的,这个是从网上摘录的,直接上代码:







@Searchable
publicclass Book {
private String
id;//编号
private String
title;//标题
private String
author;//作者
privatefloatprice;//价格
public Book() {
}
public Book(String id, String title, String author,
float price) {
super();
this.id = id;
this.title = title;
this.author = author;
this.price = price;
}
@SearchableId
public String getId() {
returnid;
}
@SearchableProperty(boost = 2.0F, index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
returntitle;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getAuthor() {
returnauthor;
}
@SearchableProperty(index = Index.NO, store = Store.YES)
publicfloat getPrice() {
returnprice;
}
publicvoid setId(String id) {
this.id = id;
}
publicvoid setTitle(String title) {
this.title = title;
}
publicvoid setAuthor(String author) {
this.author = author;
}
publicvoid setPrice(float
price) {
this.price = price;
}
@Override
public String toString() {
return"[" +
id + "] " +
title +
" - " + author +
" $ " +
price;
}
}
publicclass Searcher {
protected Compass
compass;
public Searcher() {
}
public Searcher(String path) {
compass =
new CompassAnnotationsConfiguration()//
.setConnection(path).addClass(Book.class)//
.setSetting("compass.engine.highlighter.default.formatter.simple.pre",
"<font color='red'>")//
.setSetting("compass.engine.highlighter.default.formatter.simple.post",
"</font>")//
.buildCompass();//
Runtime.getRuntime().addShutdownHook(new Thread() {
publicvoid run() {
compass.close();
}
});
}
/**
* 新建索引
* @param book
*/
publicvoid index(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.create(book);
tx.commit();
} catch (RuntimeException e) {
if (tx !=
null)
tx.rollback();
throw e;
} finally {
if (session !=
null) {
session.close();
}
}
}
/**
* 删除索引
* @param book
*/
publicvoid unIndex(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.delete(book);
tx.commit();
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session !=
null) {
session.close();
}
}
}
/**
* 重建索引
* @param book
*/
publicvoid reIndex(Book book) {
unIndex(book);
index(book);
}
/**
* 搜索
* @param queryString
* @return
*/
public List<Book> search(String queryString) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
CompassHits hits = session.find(queryString);
int n = hits.length();
if (0 == n) {
return Collections.emptyList();
}
List<Book> books = new ArrayList<Book>();
for (int i = 0; i < n; i++) {
books.add((Book) hits.data(i));
}
hits.close();
tx.commit();
return books;
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session !=
null) {
session.close();
}
}
}
publicclass Main {
static List<Book>
db =
new ArrayList<Book>();
static Searcher
searcher =
new Searcher("index");
publicstaticvoid main(String[] args) {
add(new Book(UUID.randomUUID().toString(),
"Thinking in Java",
"Bruce", 109.0f));
add(new Book(UUID.randomUUID().toString(),
"Effective Java",
"Joshua", 12.4f));
add(new Book(UUID.randomUUID().toString(),
"Java Thread Programing",
"Paul", 25.8f));
long begin = System.currentTimeMillis();
int count = 30;
for(int i=1; i<count; i++) {
if(i%10 == 0) {
long end = System.currentTimeMillis();
System.err.println(String.format("当时[%d]条,剩[%d]条,已用时间[%ds],估计时间[%ds].",
i,count-i,(end-begin)/1000, (int)((count-i)*((end-begin)/(i*1000.0))) ));
}
String uuid = new Date().toString();
add(new Book(uuid, uuid.substring(0, uuid.length()/2), uuid.substring(uuid.length()/2), (float)Math.random()*100));
}
int n;
do {
n = displaySelection();
switch (n) {
case 1:
listBooks();
break;
case 2:
addBook();
break;
case 3:
deleteBook();
break;
case 4:
searchBook();
break;
case 5:
return;
}
} while (n != 0);
}
staticint displaySelection() {
System.out.println("\n==select==");
System.out.println("1. List all books");
System.out.println("2. Add book");
System.out.println("3. Delete book");
System.out.println("4. Search book");
System.out.println("5. Exit");
int n =
readKey();
if (n >= 1 && n <= 5)
return n;
return 0;
}
/**
* 增加一本书到数据库和索引中
*
* @param book
*/
privatestaticvoid add(Book book) {
db.add(book);
searcher.index(book);
}
/**
* 打印出数据库中的所有书籍列表
*/
publicstaticvoid listBooks() {
System.out.println("==Database==");
int n = 1;
for (Book book :
db) {
System.out.println(n +
")" + book);
n++;
}
}
/**
* 根据用户录入,增加一本书到数据库和索引中
*/
publicstaticvoid addBook() {
String title = readLine(" Title: ");
String author = readLine(" Author: ");
String price = readLine(" Price: ");
Book book = new Book(UUID.randomUUID().toString(), title, author, Float.valueOf(price));
add(book);
}
/**
* 删除一本书,同时删除数据库,索引库中的
*/
publicstaticvoid deleteBook() {
listBooks();
System.out.println("Book index: ");
int n =
readKey();
Book book = db.remove(n - 1);
searcher.unIndex(book);
}
/**
* 根据输入的关键字搜索书籍
*/
publicstaticvoid searchBook() {
String queryString = readLine(" Enter keyword: ");
List<Book> books = searcher.search(queryString);
System.out.println(" ====search results:" + books.size() +
"====");
for (Book book : books) {
System.out.println(book);
}
}
publicstaticint readKey() {
BufferedReader reader = new BufferedReader(new
InputStreamReader(System.in));
try {
int n = reader.read();
n = Integer.parseInt(Character.toString((char) n));
return n;
} catch (Exception e) {
thrownew RuntimeException();
}
}
publicstaticString readLine(String propt) {
System.out.println(propt);
BufferedReader reader = new BufferedReader(new
InputStreamReader(System.in));
try {
return reader.readLine();
} catch (Exception e) {
thrownew RuntimeException();
}
}
}



这种方法向数据库插入数据和加索引速度很慢,下面方法可以提高,注意这上面没设置分词器,所以使用默认的,如果是中文的话会分隔为一个一个的。

4 spring+hibernate继承compass

4-1 jar包














4-2 配置文件




Beans.xml
<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:context="http://www.springframework.org/schema/context"
xmlns:aop="http://www.springframework.org/schema/aop"xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-3.0.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd"> <context:annotation-config/>
<context:component-scanbase-package="com.syx.compass"></context:component-scan>
<aop:aspectj-autoproxy></aop:aspectj-autoproxy>
<importresource="hibernate-beans.xml"/>
<importresource="compass-beans.xml"/>
</beans>
compass-beans.xml
<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="...">
<!--compass主配置 -->
<beanid="compass"class="org.compass.spring.LocalCompassBean">
<propertyname="compassSettings">
<props>
<propkey="compass.engine.connection">file://compass</prop><!--
数据索引存储位置 -->
<propkey="compass.transaction.factory">
org.compass.spring.transaction.SpringSyncTransactionFactory</prop>
<propkey="compass.engine.analyzer.default.type">
jeasy.analysis.MMAnalyzer</prop><!--定义分词器-->
<propkey="compass.engine.highlighter.default.formatter.simple.pre">
<![CDATA[<font color="red"><b>]]></prop>
<propkey="compass.engine.highlighter.default.formatter.simple.post">
<![CDATA[</b></font>]]></prop>
</props>
</property>
<propertyname="transactionManager">
<refbean="txManager"/>
</property>
<propertyname="compassConfiguration"
ref="annotationConfiguration"/>
<propertyname="classMappings">
<list>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
</bean>
<beanid="annotationConfiguration"
class="org.compass.annotations.config.CompassAnnotationsConfiguration">
</bean>
<beanid="compassTemplate"class="org.compass.core.CompassTemplate">
<propertyname="compass"ref="compass"/>
</bean>
<!-- 同步更新索引, 数据库中的数据变化后同步更新索引 -->
<beanid="hibernateGps"class="org.compass.gps.impl.SingleCompassGps"
init-method="start"destroy-method="stop">
<propertyname="compass">
<refbean="compass"/>
</property>
<propertyname="gpsDevices">
<list>
<refbean="hibernateGpsDevice"/>
</list>
</property>
</bean>
<!--hibernate驱动 链接compass和hibernate -->
<beanid="hibernateGpsDevice"
class="org.compass.spring.device.hibernate.dep.SpringHibernate3GpsDevice">
<propertyname="name">
<value>hibernateDevice</value>
</property>
<propertyname="sessionFactory">
<refbean="sessionFactory"/>
</property>
<propertyname="mirrorDataChanges">
<value>true</value>
</property>
</bean>
<!-- 定时重建索引(利用quartz)或随Spring ApplicationContext启动而重建索引 -->
<beanid="compassIndexBuilder"
class="com.syx.compass.test1.CompassIndexBuilder"
lazy-init="false">
<propertyname="compassGps"ref="hibernateGps"/>
<propertyname="buildIndex"value="false"/>
<propertyname="lazyTime"value="1"/>
</bean>
<!-- 搜索引擎服务类 -->
<beanid="searchService"class="
com.syx.compass.test1.SearchServiceBean">
<propertyname="compassTemplate">
<refbean="compassTemplate"/>
</property>
</bean>
</beans>
hibernate-beans.xml
<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="...">
<!-- DataSource -->
<beanid="dataSource"class="com.mchange.v2.c3p0.ComboPooledDataSource">
<propertyname="driverClass"value="${jdbc.driverClassName}"/>
<propertyname="jdbcUrl"value="${jdbc.url}"/>
<propertyname="user"value="${jdbc.username}"/>
<propertyname="password"value="${jdbc.password}"/>
<propertyname="autoCommitOnClose"value="true"/>
<propertyname="checkoutTimeout"value="${cpool.checkoutTimeout}"/>
<propertyname="initialPoolSize"value="${cpool.minPoolSize}"/>
<propertyname="minPoolSize"value="${cpool.minPoolSize}"/>
<propertyname="maxPoolSize"value="${cpool.maxPoolSize}"/>
<propertyname="maxIdleTime"value="${cpool.maxIdleTime}"/>
<propertyname="acquireIncrement"value="${cpool.acquireIncrement}"/>
<!-- <property name="maxIdleTimeExcessConnections" value="${cpool.maxIdleTimeExcessConnections}"/> -->
</bean>
<bean
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<propertyname="locations">
<value>classpath:jdbc.properties</value>
</property>
</bean>
<!-- SessionFacotory -->
<beanid="sessionFactory"
class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
<propertyname="dataSource"ref="dataSource"/>
<propertyname="annotatedClasses">
<list>
<value>com.syx.compass.model.Article</value>
<value>com.syx.compass.model.Author</value>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
<propertyname="hibernateProperties">
<props>
<propkey="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>
<propkey="hibernate.current_session_context_class">thread</prop>
<propkey="javax.persistence.validation.mode">none</prop>
<propkey="hibernate.show_sql">true</prop>
<propkey="hibernate.format_sql">false</prop>
<propkey="hibernate.hbm2ddl.auto">update</prop>
</props>
</property>
</bean>
<beanid="hibernateTemplate"class="org.springframework.orm.hibernate3.HibernateTemplate">
<propertyname="sessionFactory"ref="sessionFactory"></property>
</bean>
<beanid="txManager"
class="org.springframework.orm.hibernate3.HibernateTransactionManager">
<propertyname="sessionFactory"ref="sessionFactory"/>
</bean>
</beans>
jdbc.properties
jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.hostname=localhost
jdbc.url=jdbc:mysql://localhost:3306/compass
jdbc.username=root
jdbc.password=root
cpool.checkoutTimeout=5000
cpool.minPoolSize=1
cpool.maxPoolSize=4
cpool.maxIdleTime=25200
cpool.maxIdleTimeExcessConnections=1800
cpool.acquireIncrement=5
log4j.properties
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.rootLogger=error, stdout

4-3 源代码




@Searchable(alias =
"article")
@Entity(name="_article")
publicclass Article {
private Long
ID; // 标识ID
private String
content;
// 正文
private String
title;
// 文章标题
private Date
createTime;
// 创建时间
public Article(){}
public Article(Long iD, String content, String title, Date createTime) {
ID = iD;
this.content = content;
this.title = title;
this.createTime = createTime;
}
public String toString() {
return String.format("%d,%s,%s,%s",
ID, title,
content,
createTime.toString());
}
@SearchableId
@Id
@GeneratedValue
public Long getID() {
returnID;
}
publicvoid setID(Long id) {
ID = id;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getContent() {
returncontent;
}
publicvoid setContent(String content) {
this.content = content;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
returntitle;
}
publicvoid setTitle(String title) {
this.title = title;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public Date getCreateTime() {
returncreateTime;
}
publicvoid setCreateTime(Date createTime) {
this.createTime = createTime;
}
}
publicclass CompassIndexBuilder
implements InitializingBean {

// 是否需要建立索引,可被设置为false使本Builder失效.

privatebooleanbuildIndex =
false;

// 索引操作线程延时启动的时间,单位为秒
privateintlazyTime = 10;

// Compass封装
private CompassGps
compassGps;

// 索引线程
private Thread
indexThread =
new Thread() {

@Override
publicvoid run() {

try {

Thread.sleep(lazyTime * 1000);

System.out.println("begin compass index...");

long beginTime = System.currentTimeMillis();

// 重建索引.
// 如果compass实体中定义的索引文件已存在,索引过程中会建立临时索引,

// 索引完成后再进行覆盖.
compassGps.index();

long costTime = System.currentTimeMillis() - beginTime;

System.out.println("compss index finished.");

System.out.println("costed " + costTime +
" milliseconds");

} catch (InterruptedException e) {

e.printStackTrace();

}
}
};

/**
* 实现<code>InitializingBean</code>接口,在完成注入后调用启动索引线程.

*/
publicvoid afterPropertiesSet()
throws Exception {

if (buildIndex) {

indexThread.setDaemon(true);

indexThread.setName("Compass Indexer");

indexThread.start();

}
}

publicvoid setBuildIndex(boolean
buildIndex) {
this.buildIndex = buildIndex;

}

publicvoid setLazyTime(int
lazyTime) {
this.lazyTime = lazyTime;

}

publicvoid setCompassGps(CompassGps compassGps) {

this.compassGps = compassGps;

}
}
publicclass SearchServiceBean {
private CompassTemplate
compassTemplate;
/** 索引查询 * */
publicMap find(final String keywords,
final String type,
finalint start,
finalint end) {
returncompassTemplate.execute(new CompassCallback<Map>()
{
publicMap doInCompass(CompassSession session)
throws CompassException {
List result =
newArrayList();
int totalSize = 0;
Map container =
newHashMap();
CompassQuery query = session.queryBuilder().queryString(keywords).toQuery();
CompassHits hits = query.setAliases(type).hits();
totalSize = hits.length();
container.put("size", totalSize);
int max = 0;
if (end < hits.length()) {
max = end;
} else {
max = hits.length();
}
if (type.equals("article")) {
for (int i = start; i < max; i++) {
Article article = (Article) hits.data(i);
String title = hits.highlighter(i).fragment("title");
if (title !=
null) {
article.setTitle(title);
}
String content = hits.highlighter(i).setTextTokenizer(CompassHighlighter.TextTokenizer.AUTO).fragment("content");
if (content !=
null) {
article.setContent(content);
}
result.add(article);
}
}
container.put("result", result);
return container;
}
});
}
public CompassTemplate getCompassTemplate() {
returncompassTemplate;
}
publicvoid setCompassTemplate(CompassTemplate compassTemplate) {
this.compassTemplate = compassTemplate;
}
}
publicclass MainTest {
publicstatic ClassPathXmlApplicationContext
applicationContext;
privatestatic HibernateTemplate
hibernateTemplate;
@BeforeClass
publicstaticvoid init() {
System.out.println("sprint init...");
applicationContext =
new ClassPathXmlApplicationContext("beans.xml");
hibernateTemplate =
applicationContext.getBean(HibernateTemplate.class);
System.out.println("sprint ok");
}
@Test
publicvoid addData() {
System.out.println("addDate");
//把compass-beans.xml 中 bean id="compassIndexBuilder"

//buildIndex=true lazyTime=1
//会自动的根据数据库中的数据重新建立索引
try {
Thread.sleep(10000000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Test
publicvoid search() {
String keyword =
"全文搜索引擎";
SearchServiceBean ssb = applicationContext.getBean(SearchServiceBean.class);
Map map = ssb.find(keyword,
"article", 0, 100);//第一次搜索加载词库
long begin = System.currentTimeMillis();
map = ssb.find(keyword, "article", 0, 100);//第二次才是搜索用时
long end = System.currentTimeMillis();
System.out.println(String.format(
"搜索:[%s],耗时(ms):%d,记录数:%d", keyword, end-begin, map.get("size")));
List<Article> list = (List<Article>) map.get("result");
for(Article article : list) {
System.out.println(article);
}
}

4-4 说明

compass-beans.xml中可以设置建立索引的目录和分词器,测试的时候我们使用数据库添加数据,启动的建立索引,测试速度。

4-5 测试

使用mysql,写了一个添加数据的函数:
DELIMITER $$
CREATE
FUNCTION `compass`.`addDateSyx`(num int(8))
RETURNS varchar(32)
BEGIN
declare i int(8);
set i = 0;
while ( i < num) DO
insert into _article (title,content, createTime) values (i, num-i, now());
set i = i + 1;
end while;
return "OK";
END$$
DELIMITER ;
4-5-1 10000条重复的中文数据测试
数据库函数的时候修改下insert:
insert into _article (title,content, createTime) values ('用compass实现站内全文搜索引擎(一)', 'Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括:

* 搜索引擎抽象层(使用Lucene搜索引荐),
* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,
* 类似于Google的简单关键字查询语言,
* 可扩展与模块化的框架,
* 简单的API.
如果你需要做站内搜索引擎,而且项目里用到了hibernate,那用compass是你的最佳选择。 ', now());
插入数据:
select addDateSyx1(10000);//hibernate 中的
hibernate.hbm2ddl.auto=update





建立索引:









10000条,8045ms,速度还不错。
索引大小:



搜索:



的确分词了,如果使用默认的分词,中文会每个中文分一个,速度比较快,如果使用JE-Anaylzer 116ms也是可以接受的。
4-5-2 10w条重复的中文数据测试
插入数据:



Mysql 10w大约12s左右。
建立索引:





索引大小和我想象的差不多,就是时间比我像的长多了,但我不想在试了。
搜索:



10w的是数据,243ms还是很不错的,看来只要索引建好,搜索还是很方便的。

5 总结下吧

Compass用起来还是挺顺手的,应该基本需求可以满足的,不知道蛮好的项目怎么就不更新了,不然hibernate search就不会有的。
因为compass的不更新,所以lucene3.0以后的特性就不能用了,蛮可以的,虽然compass可以自动建索引(当然也可以手动CRUD),但如果封装下lucene来完成compass应该可以得到比较好的实现,期待同学们出手了。
参考文章:
用compass实现站内全文搜索引擎(一)
再谈compass:集成站内搜索
用compass快速给你的网站添加搜索功能
ITEYE上一篇也不错,不小心页面关了...
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: