继续我的代码,分享我的快乐 - WEBUS2.0 资源汇总
2013-09-07 18:18
267 查看
WEBUS2.0只能够将一种Document数据类型(Webus.Index.Document类)添加到索引中,所有其他类型的数据(如txt、html、word、pdf等等)都需要预先转换成Document才能够对其编制索引:
![](http://images.cnblogs.com/cnblogs_com/iamzyf/Document_0.JPG)
如此一来,对于新的数据类型,我们只要开发新的Parser就能够将其添加到索引中,因此WEBUS依靠这种方式获得了很高的通用性。
一个Document是多个Field(字段)的集合,每个Field主要包含Name和Value两个属性:
![](http://images.cnblogs.com/cnblogs_com/iamzyf/Field_0.JPG)
如果我们要将下表的数据添加到索引中,
![](http://images.cnblogs.com/cnblogs_com/iamzyf/Document_1.JPG)
代码如下:
1. 准备数据
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Titles = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"A Modern Art of Education - Rudolf Steiner",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Imperial Secrets of Health and Longevity - Bob Flaws",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Tao Te Ching 道德经 - Stephen Mitchell",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Godel, Escher, Bach: an Eternal Golden Braid - Douglas Hofstadter"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Categories = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/education/pedagogy",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/health/alternative/Chinese",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/philsosphy/eastern",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/technology/computers/ai"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Subjects = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"education philosophy psychology practice Waldorf",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"diet chinese medicine qi gong health herbs",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"taoism",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"artificial intelligence number theory mathematics music"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
2. 添加索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
IIndexWriter writer = new IndexManager(new SimpleWordAnalyzer()); //用SimpleWordAnalyzer构造一个Index Writer
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
writer.New(@"F:Index"); //在F:Index目录新建索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
for (int i = 0; i < Titles.Length; i++)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
Document doc = new Document();
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Title", Titles[i], FieldAttributes.Index | FieldAttributes.Analyse));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Category", Categories[i], FieldAttributes.Index | FieldAttributes.Sort));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Subject", Subjects[i], FieldAttributes.Analyse | FieldAttributes.Index));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
writer.Add(doc); //将Document添加到索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
writer.Close(); //保存并关闭索引
补充:关于FieldAttributes
在Field中还有另外一个属性即Attribute(FieldAttributes类型),它与数据无关,但是会直接影响编制索引的行为:
FieldAttributes.Index:需要编制索引
FieldAttributes.Analyse:需要经过分析
FieldAttributes.UnStore:字段值(Field.Value)将不会保存到索引中
FieldAttributes.Sort:需要排序,选择此项的字段在编制索引时将会排序
FieldAttributes.Compress:需要压缩,选择此项将用GZip压缩算法对字段值进行压缩
这5个属性可以组合使用,如FieldAttributes.Default就是一个组合属性,它等于FieldAttributes.Index | FieldAttributes.Sort 。
下一篇:WEBUS2.0 In Action - 开始搜索
相关信息及WEBUS2.0 SDK下载:继续我的代码,分享我的快乐
- WEBUS2.0
如此一来,对于新的数据类型,我们只要开发新的Parser就能够将其添加到索引中,因此WEBUS依靠这种方式获得了很高的通用性。
一个Document是多个Field(字段)的集合,每个Field主要包含Name和Value两个属性:
如果我们要将下表的数据添加到索引中,
代码如下:
1. 准备数据
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Titles = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"A Modern Art of Education - Rudolf Steiner",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Imperial Secrets of Health and Longevity - Bob Flaws",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Tao Te Ching 道德经 - Stephen Mitchell",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"Godel, Escher, Bach: an Eternal Golden Braid - Douglas Hofstadter"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Categories = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/education/pedagogy",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/health/alternative/Chinese",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/philsosphy/eastern",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"/technology/computers/ai"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
string[] Subjects = new string[]
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"education philosophy psychology practice Waldorf",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"diet chinese medicine qi gong health herbs",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"taoism",
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
"artificial intelligence number theory mathematics music"
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
};
2. 添加索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
IIndexWriter writer = new IndexManager(new SimpleWordAnalyzer()); //用SimpleWordAnalyzer构造一个Index Writer
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
writer.New(@"F:Index"); //在F:Index目录新建索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
for (int i = 0; i < Titles.Length; i++)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockStart.gif)
![](http://write.blog.csdn.net/Images/OutliningIndicators/ContractedBlock.gif)
![](http://www.cnblogs.com/Images/dot.gif)
{
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
Document doc = new Document();
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Title", Titles[i], FieldAttributes.Index | FieldAttributes.Analyse));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Category", Categories[i], FieldAttributes.Index | FieldAttributes.Sort));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
doc.Fields.Add(new Field("Subject", Subjects[i], FieldAttributes.Analyse | FieldAttributes.Index));
![](http://write.blog.csdn.net/Images/OutliningIndicators/InBlock.gif)
writer.Add(doc); //将Document添加到索引
![](http://write.blog.csdn.net/Images/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://write.blog.csdn.net/Images/OutliningIndicators/None.gif)
writer.Close(); //保存并关闭索引
补充:关于FieldAttributes
在Field中还有另外一个属性即Attribute(FieldAttributes类型),它与数据无关,但是会直接影响编制索引的行为:
FieldAttributes.Index:需要编制索引
FieldAttributes.Analyse:需要经过分析
FieldAttributes.UnStore:字段值(Field.Value)将不会保存到索引中
FieldAttributes.Sort:需要排序,选择此项的字段在编制索引时将会排序
FieldAttributes.Compress:需要压缩,选择此项将用GZip压缩算法对字段值进行压缩
这5个属性可以组合使用,如FieldAttributes.Default就是一个组合属性,它等于FieldAttributes.Index | FieldAttributes.Sort 。
下一篇:WEBUS2.0 In Action - 开始搜索
相关信息及WEBUS2.0 SDK下载:继续我的代码,分享我的快乐
- WEBUS2.0
相关文章推荐
- 继续我的代码,分享我的快乐 - WEBUS2.0 资源汇总
- 继续我的代码,分享我的快乐 - WEBUS2.0 资源汇总
- 继续我的代码,分享我的快乐 - WEBUS2.0 资源汇总
- 计算机视觉资源汇总 - Part V(计算机视觉代码合集一)
- RNN资源博客 Recurrent Neural Network的经典论文、代码、课件、博士论文和应用汇总
- 比较省资源的PHP简单的MEMCACHE助手类代码分享
- 【资源分享】CLR.via.C#(第3版)英文原版+中文译本+随书代码
- 图像处理与计算机视觉资源汇总——论文+代码+教材+视频等等
- 分享一个 安卓各种工具资源下载汇总网址
- 我的技术资源归档(分享快乐)
- 分享9个最棒的代码片段资源网站
- 【资源汇总分享】Android开发资源汇总之一
- WEBUS2.0 In Action - [源代码] - C#代码搜索器
- 代码汇总:图像质量评价Matlab代码分享
- 【转】图像分割论文及代码资源汇总
- 计算机视觉资源汇总 - Part VI(计算机视觉代码合集二)
- 分享9个最棒的代码片段资源网站
- RNN资源博客 Recurrent Neural Network的经典论文、代码、课件、博士论文和应用汇总
- QQ联系、腾讯微博收听、新浪微博关注、百度分享等常用开放平台代码分享汇总
- WEBUS2.0 In Action - 开始搜索 [代码示例]