您的位置:首页 > 运维架构

全文索引-lucene,solr,nutch,hadoop之solr

2013-10-10 20:40 375 查看
     上一节大概讲了一下lucene,但真正运用在项目中的并不多,运用的最多的当属于solr,solr是对lucene的封装,形成一个独立的服务,专门提供索引,分词,搜索的服务,一般在项目中,大概的布局也是这样,项目一般分好多个模块,而搜索则使用solr专门提供一个服务,别的模块需要使用搜索的功能时,则使用solrj 来调用solr的搜索功能获取结果。

    而且solr已经默认启用了近实时搜索的功能,还有高亮的功能,使其在项目中非常容易上手。下面大概说下我在上个公司时使用solr做成的搜索服务。

  关系:在商城里面用户可以开店,用户可以在自己的店铺页面添加产品

  需求:1、能够根据关键字搜索指定的商店(根据店主设置的搜索关键字-以逗号隔开,和店铺描述内容进行关键字搜索)

             2、能够根据关键字搜索指定的产品(根据店主设置的搜索关键字-以逗号隔开,和产品描述内容进行关键字搜索)

             3、能够对产品进行多个属性同时进行搜索,比如搜索颜色为红 , 操作系统为ANDROID ,外形为直板,价格位于2000到3000的产品

 

  实现:

   利用solr提供搜索分词服务和solrj插件调用solr服务,以及mmseg中文词库能够多中文进行分词

    solr的设置



                                                                                                  图一 solr安装

   solr分为solr_home和solr_web 和tomcat

      solr_home是solr的核心,提供核心功能,比如分词,搜索。

      solr_web 对外提供一个管理,查询的页面,也就是对外的接口。

      tomcat提供对外服务。

 

   solr_home的设置



                                                                                             图2 solr_home设置

其中

lib下面放的mmseg4j-all-1.8.5.jar,提供中文分词

dic目录下面放的是中文词库

data目录下面是solr进行分词后的数据,索引

conf目录下面放的是solr的配置文件,比较重要的是schema.xml

schema.xml中添加的内容:

types里面添加

<fieldType name="textComplex" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="./dic"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="textMaxWord" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="./dic"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="pkey" type="text_ws" indexed="true" stored="false" />
<field name="pshopid" type="int" indexed="true" stored="true" />
<field name="pname" type="textSimple" indexed="false" stored="true" />
<field name="purl" type="textComplex" indexed="false" stored="true"/>
<field name="pprice" type="float" indexed="true" stored="true"/>
<field name="queryStr" type="text_ws" indexed="true" stored="true"/>
<field name="cid" type="int" indexed="true" stored="true"/>

<field name="shopName" type="textSimple" indexed="true" stored="true"/>
<field name="shopDetail" type="text_ws" indexed="true" stored="true"/>
<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
<field name="shopKey" type="text_ws" indexed="true" stored="false"/>

<field name="simplemmseg" type="textSimple" indexed="true" stored="true"/>
<field name="complexmmseg" type="textComplex" indexed="true" stored="true"/>
<field name="maxwordmmseg" type="textMaxWord" indexed="true" stored="true"/>

<fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="./dic"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
 fields里面添加

<field name="pkey" type="text_ws" indexed="true" stored="false" />
<field name="pshopid" type="int" indexed="true" stored="true" />
<field name="pname" type="textSimple" indexed="false" stored="true" />
<field name="purl" type="textComplex" indexed="false" stored="true"/>
<field name="pprice" type="float" indexed="true" stored="true"/>
<field name="queryStr" type="text_ws" indexed="true" stored="true"/>
<field name="cid" type="int" indexed="true" stored="true"/>

<field name="shopName" type="textSimple" indexed="true" stored="true"/>
<field name="shopDetail" type="text_ws" indexed="true" stored="true"/>
<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
<field name="shopKey" type="text_ws" indexed="true" stored="false"/>

<field name="simplemmseg" type="textSimple" indexed="true" stored="true"/>
<field name="complexmmseg" type="textComplex" indexed="true" stored="true"/>
<field name="maxwordmmseg" type="textMaxWord" indexed="true" stored="true"/>


下面添加

<copyField source="shopName" dest="shopKey"/>
<copyField source="shopDetail" dest="shopKey"/>

<copyField source="cat" dest="text"/>
<copyField source="name" dest="text"/>
<copyField source="manu" dest="text"/>
<copyField source="features" dest="text"/>
<copyField source="includes" dest="text"/>
<copyField source="manu" dest="manu_exact"/>

<!-- Copy the price into a currency enabled field (default USD) -->
<copyField source="price" dest="price_c"/>


这样solr_home便设置完毕,已经可以提供索引搜索服务,但还需要solr_web 来对外提供接口,我们才能够操作。

solr_web

 solr_web是solr自带的一个web项目,部署到tomcat下面即可,更改的地方如下

修改tomcat的server.xml文件如下:



                                                                                      图3 编码



                                                                                                                               图4 映射路径

这样solr_web便设置完毕,这是启动tomcat,就可以查看solr的界面。

 调用模块

 利用solrj调用solr获取结果,代码如下:

package net.b2c.a.solr;

import java.io.Serializable;

import org.apache.solr.client.solrj.beans.Field;

public class SolrProduct implements Serializable{
/*
* <field name="pkey" type="text_ws" indexed="true" stored="false"/>
<field name="pshopid" type="int" indexed="true" stored="true"/>
<field name="pname" type="textSimple" indexed="false" stored="true"/>
<field name="purl" type="textComplex" indexed="false" stored="true"/>
<field name="pprice" type="float" indexed="true" stored="true"/>
<field name="queryStr" type="text_ws" indexed="true" stored="queryStr"/>
<field name="cid" type="int" indexed="true" stored="true"/>
*/

@Field
private String id;
@Field
private int pshopid;
@Field
private String pname;
@Field
private String purl;
@Field("pprice")
private float price;
@Field
private String pkey;
//类别id
@Field
private int cid;
//满足查询的条件
@Field
private String queryStr; //颜色@红  操作系统@ANDROID 外形@直板

public String getPkey() {
return pkey;
}
public void setPkey(String pkey) {
this.pkey = pkey;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getPname() {
return pname;
}
public void setPname(String pname) {
this.pname = pname;
}
public String getPurl() {
return purl;
}
public void setPurl(String purl) {
this.purl = purl;
}

public float getPrice() {
return price;
}
public void setPrice(float price) {
this.price = price;
}
public int getPshopid() {
return pshopid;
}
public void setPshopid(int pshopid) {
this.pshopid = pshopid;
}

public int getCid() {
return cid;
}
public void setCid(int cid) {
this.cid = cid;
}
public String getQueryStr() {
return queryStr;
}
public void setQueryStr(String queryStr) {
this.queryStr = queryStr;
}
@Override
public String toString() {
return "SolrProduct [id=" + id + ", pshopid=" + pshopid + ", pname="
+ pname + ", purl=" + purl + ", price=" + price + ", pkey="
+ pkey + ", cid=" + cid + ", queryStr=" + queryStr + "]";
}

}


 

package net.b2c.a.solr;

import java.io.Serializable;

import org.apache.solr.client.solrj.beans.Field;

public class SolrShop implements Serializable{
/*
* <field name="shopName" type="textSimple" indexed="true" stored="true"/>
<field name="shopDetail" type="textSimple" indexed="true" stored="true"/>
<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
<field name="shopKey" type="text_ws" indexed="true" stored="false"/>
*/

@Field
private String id;
@Field
private String shopName;
@Field
private String shopDetail;
@Field
private String shopImage;
@Field
private String shopKey;

public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getShopName() {
return shopName;
}
public void setShopName(String shopName) {
this.shopName = shopName;
}
public String getShopDetail() {
return shopDetail;
}
public void setShopDetail(String shopDetail) {
this.shopDetail = shopDetail;
}
public String getShopImage() {
return shopImage;
}
public void setShopImage(String shopImage) {
this.shopImage = shopImage;
}
public String getShopKey() {
return shopKey;
}
public void setShopKey(String shopKey) {
this.shopKey = shopKey;
}
@Override
public String toString() {
return "SolrShop [id=" + id + ", shopName=" + shopName
+ ", shopDetail=" + shopDetail + ", shopImage=" + shopImage
+ ", shopKey=" + shopKey + "]";
}

}


 

package net.b2c.a.solr;

import java.util.List;

public class SolrResult {
private List<SolrProduct> queryProducts;
private long totalNum;
private List<SolrShop> queryShops;
public List<SolrProduct> getQueryProducts() {
return queryProducts;
}
public void setQueryProducts(List<SolrProduct> queryProducts) {
this.queryProducts = queryProducts;
}
public long getTotalNum() {
return totalNum;
}
public void setTotalNum(long totalNum) {
this.totalNum = totalNum;
}
public List<SolrShop> getQueryShops() {
return queryShops;
}
public void setQueryShops(List<SolrShop> queryShops) {
this.queryShops = queryShops;
}
@Override
public String toString() {
return "SolrResult [queryProducts=" + queryProducts + ", totalNum="
+ totalNum + ", queryShops=" + queryShops + "]";
}

}


 

package net.b2c.a.solr;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.Properties;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;

public class SolrUtil {
private  static String URL = "";
private final static String PKEY = "pkey";
private final static String PNAME = "pname";
private final static String SHOPKEY = "shopKey";
private final static String SHOPIDPRE = "shop";

private static SolrServer server = null;
static {
try {
String path = SolrUtil.class.getClassLoader().getResource("").toURI().getPath();
Properties property = new Properties();
property.load(new FileInputStream(path+"../resource.properties"));
URL = property.getProperty("solrUrl");
System.out.println("**************URL************************"+URL);
server = new CommonsHttpSolrServer(URL);
} catch (Exception e) {
e.printStackTrace();
}
}

/**
* 添加商品关键词
* @param product
* @return
*/
public static boolean addOrUpdateProduct(SolrProduct product) {
if(product == null)return false;
try {
server.addBean(product);
server.commit();
return true;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return false;
}

/**
* 添加商店关键词
* @param product
* @return
*/
public static boolean addOrUpdateShop(SolrShop shop) {
if(shop == null)return false;
try {
shop.setId(SHOPIDPRE+shop.getId());
server.addBean(shop);
server.commit();
return true;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return false;
}

/**
* 根据产品id删除产品
* @param id
*/
public static void deleteProductById(long id)
{

try {
server.deleteById(id+"");
server.commit();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* 根据id删除商店
* @param id
*/
public static void deleteShopById(long id)
{

try {
server.deleteById(SHOPIDPRE+id);
server.commit();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

/**
* 系统内查询
* @param queryName 查询字符串
* @param start 开始行号 ,从1开始
* @param limit 每页大小
* @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
*/
public static SolrResult queryProduct(String queryName,int start,int limit) {

SolrResult result = new SolrResult();
List<SolrProduct> products = new LinkedList<SolrProduct>();
result.setQueryProducts(products);
if(queryName==null ||queryName.trim().equals(""))return result;

try {
SolrQuery query = new SolrQuery(PKEY + ":" + queryName);
query.setHighlight(true)
.setHighlightSimplePre("<span class='solrhighligter'>")
.setHighlightSimplePost("</span>").setStart(0).setRows(5);
query.setParam("hl.fl", PKEY+","+PNAME);
query.setParam("start",(start-1)+"");
query.setParam("rows", limit+"");
QueryResponse resp = server.query(query);
SolrDocumentList sdl = resp.getResults();

result.setTotalNum(sdl.getNumFound());
for (SolrDocument sd : sdl) {

String id = (String) sd.getFieldValue("id");

SolrProduct p = new SolrProduct();
p.setId(id);
Object tname=resp.getHighlighting().get(id).get(PNAME);
if(tname == null)
{
p.setPname(sd.getFieldValue(PNAME).toString());
}else {
p.setPname(tname.toString());
}

p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
p.setPurl(sd.getFieldValue("purl").toString());
products.add(p);
}
} catch (SolrServerException e) {
e.printStackTrace();
}

System.out.println(result);

return result;
}

/**
* 系统 产品类别下面 属性值搜索
* @param queryStr 查询字符串 如 "queryStr:CPU@0.8G AND queryStr:颜色@红" 或者 "queryStr:CPU@0.8G" 为空则不限
* @param start 开始行号 ,从1开始
* @param limit 每页大小
* @param cid 产品类别id  必填项
* @param priceRange 价格区间 如[100 TO 200] 为空则不限
* @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
*/
public static SolrResult queryCategoryProduct(String queryStr,int start,int limit,Integer cid,String priceRange) {

SolrResult result = new SolrResult();
List<SolrProduct> products = new LinkedList<SolrProduct>();
result.setQueryProducts(products);

StringBuffer sb= new StringBuffer();
if(queryStr != null && !queryStr.trim().equals(""))
{
sb.append(queryStr+" ");
}

if(priceRange != null && !priceRange.trim().equals(""))
{
if(!sb.toString().trim().equals(""))
{
sb.append("AND ");
}
sb.append("pprice:"+priceRange+" ");
}

if(null != cid)
{
if(!sb.toString().trim().equals(""))
{
sb.append("AND ");
}
sb.append("cid:"+cid);
}

String q=sb.toString().trim();

System.out.println(q);
if("".equals(q.trim()))return result;

try {
SolrQuery query = new SolrQuery(q);//PKEY + ":" + queryStr
query.setHighlight(true)
.setHighlightSimplePre("<span class='solrhighligter'>")
.setHighlightSimplePost("</span>").setStart(0).setRows(5);
query.setParam("hl.fl", PKEY+","+PNAME);
query.setParam("start",(start-1)+"");
query.setParam("rows", limit+"");
QueryResponse resp = server.query(query);
SolrDocumentList sdl = resp.getResults();

result.setTotalNum(sdl.getNumFound());
for (SolrDocument sd : sdl) {

String id = (String) sd.getFieldValue("id");

SolrProduct p = new SolrProduct();
p.setId(id);
Object tname=resp.getHighlighting().get(id).get(PNAME);
if(tname == null)
{
p.setPname(sd.getFieldValue(PNAME).toString());
}else {
p.setPname(tname.toString());
}
p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
p.setPurl(sd.getFieldValue("purl").toString());
products.add(p);
}
} catch (SolrServerException e) {
e.printStackTrace();
}

System.out.println(result);

return result;
}

/**
* 店内查询
* @param queryName 查询字符串
* @param start 开始行号 ,从1开始
* @param limit 每页大小
* @param shopid 商店id
* @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
*/
public static SolrResult queryProduct(String queryName,int start,int limit,int shopid) {

SolrResult result = new SolrResult();
List<SolrProduct> products = new LinkedList<SolrProduct>();
result.setQueryProducts(products);
if(queryName==null ||queryName.trim().equals(""))return result;

try {
SolrQuery query = new SolrQuery(PKEY + ":" + queryName);
query.setHighlight(true)
.setHighlightSimplePre("<span class='solrhighligter'>")
.setHighlightSimplePost("</span>").setStart(start-1).setRows(limit);
query.setParam("hl.fl", PKEY+","+PNAME);
query.setParam("start",(start-1)+"");
query.setParam("rows", limit+"");
query.addFilterQuery("pshopid:"+shopid);
QueryResponse resp = server.query(query);
SolrDocumentList sdl = resp.getResults();

result.setTotalNum(sdl.getNumFound());
for (SolrDocument sd : sdl) {

String id = (String) sd.getFieldValue("id");

SolrProduct p = new SolrProduct();
p.setId(id);
Object tname=resp.getHighlighting().get(id).get(PNAME);
if(tname == null)
{
p.setPname(sd.getFieldValue(PNAME).toString());
}else {
p.setPname(tname.toString());
}

p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
p.setPurl(sd.getFieldValue("purl").toString());
p.setPshopid(shopid);
products.add(p);
}
} catch (SolrServerException e) {
e.printStackTrace();
}

System.out.println(result);

return result;
}

/**
* 系统内查询
* @param queryName 查询字符串
* @param start 开始行号 ,从1开始
* @param limit 每页大小
* @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
*/
public static SolrResult queryShop(String queryName,int start,int limit) {

SolrResult result = new SolrResult();
List<SolrShop> shops = new LinkedList<SolrShop>();
result.setQueryShops(shops);
if(queryName==null ||queryName.trim().equals(""))return result;

try {
SolrQuery query = new SolrQuery(SHOPKEY + ":" + queryName);
query.setHighlight(true)
.setHighlightSimplePre("<span class='solrhighligter'>")
.setHighlightSimplePost("</span>").setStart(start-1).setRows(limit);
query.setParam("hl.fl", "shopName,shopDetail");
query.setParam("start",(start-1)+"");
query.setParam("rows", limit+"");
QueryResponse resp = server.query(query);
SolrDocumentList sdl = resp.getResults();

result.setTotalNum(sdl.getNumFound());
for (SolrDocument sd : sdl) {

String id = (String) sd.getFieldValue("id");

SolrShop s= new SolrShop();
try {
s.setId(id.substring(4));
} catch (Exception e) {
// TODO Auto-generated catch block
//	e.printStackTrace();
s.setId(id);
}
Object tname=resp.getHighlighting().get(id).get("shopName");
if(tname == null)
{
s.setShopName(sd.getFieldValue("shopName").toString());
}else {
s.setShopName(tname.toString());
}

tname=resp.getHighlighting().get(id).get("shopDetail");
if(tname == null)
{
s.setShopDetail(sd.getFieldValue("shopDetail").toString());
}else {
s.setShopDetail(tname.toString());
}

s.setShopImage(sd.getFieldValue("shopImage").toString());

shops.add(s);
}
} catch (SolrServerException e) {
e.printStackTrace();
}

System.out.println(result);

return result;
}

public static void testAdd() {

SolrProduct product = new SolrProduct();
String[] qq={"CPU@0.8G 颜色@红  外形@直板","CPU@0.8G 颜色@红","颜色@红"};

for(int i=1;i<=3;i++)
{
product.setId(i+"");
product.setPname("myname 垃圾 你就是个KK");
product.setPrice(i);
product.setPurl("../im/kk.jpg");
String key="aa bb 垃圾 AA Ab myname     wei";
product.setPkey(key);
product.setPshopid(i%3);
product.setCid(i%2);
product.setQueryStr(qq[i-1]);
System.out.println(product);
addOrUpdateProduct(product);
}

}

public static void testAddShop() {

for(int i=3;i<=3;i++)
{
SolrShop shop= new SolrShop();
shop.setId(""+"1");
shop.setShopDetail("欢迎来到星空的专栏, zwls  aa");
shop.setShopImage("../image.gif");
shop.setShopName("星空专栏");
shop.setShopKey("aa 星空  zwls");
addOrUpdateShop(shop);
}

}

public static void main(String[] args) {
String string="CPU@0.8G 颜色@红";
String dd="queryStr:CPU@0.8G AND queryStr:颜色@红";

queryCategoryProduct(dd,1,100,1,"[0 TO 100]");//[10 TO 100]
//					 testAdd();

//	testAddShop();
//	deleteProductById(7);
//		testAdd();
//	queryShop("zwls",1,2);
//	String d=SHOPIDPRE+"100";
//		System.out.println(d.substring(4));

}

}

其中SolrUtil是封装的一个接口,看代码即可。

至此,以上3个需求就可以迎刃而解了。

 

 

 

 

 

 

 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
相关文章推荐