您的位置:首页 > 其它

Lucene.Net中 FSDirectory存储方式下一个 Document是如何得到的

2012-06-05 17:51 513 查看
防止忘记的最好的方法就是记下来。

这是一段最简单的搜索代码:

public void Search()
{
var dir=FSDirectory.Open(new DirectoryInfo("xxx"));
var searcher = new IndexSearcher(dir, true);
var query = new TermQuery(new Term("Title", "jinzhao"));
var tops=searcher.Search(query,100);
foreach(var top in tops)
{
var doc=searcher.Doc(top);
Output(doc);
}
}


红色的一句话就返回了一个完整document,是search内部的IndexReader(Lucene.Net.Index.IndexReader)返回的document,方法如下:

public abstract Document Document(int n, FieldSelector fieldSelector);


下面是这个类的实现:



他们的关系如下:

MultiReader和ParallelReader维护了IndexReader的一个集合(这些IndexReader可能由下面几重实现,但是不包含SegmentReader),封装了访问多个reader的方式,原理就是lucene里最常见的偏移的方式;

DirectoryReader等除SegmentReader外模拟的是一个目录,就像索引文件夹一样,它维护了一组SegmentReader的实现,原理如上;

SegmentReader是读取文档的最小单位它不再维护任何子的IndexReader,接收到ID后就会读取通过public sealed class FieldsReader 读取这个文档的字段(Lucene的核心就是文档,一个文档由若干字段组成),这里加载方式有立即加载、立即加载指定字段、懒加载等其它几种,方法如下:

public /*internal*/ Document Doc(int n, FieldSelector fieldSelector)
{
SeekIndex(n);
long position = indexStream.ReadLong();
fieldsStream.Seek(position);

Document doc = new Document();
int numFields = fieldsStream.ReadVInt();
for (int i = 0; i < numFields; i++)
{
int fieldNumber = fieldsStream.ReadVInt();
FieldInfo fi = fieldInfos.FieldInfo(fieldNumber);
FieldSelectorResult acceptField = fieldSelector == null?FieldSelectorResult.LOAD:fieldSelector.Accept(fi.name);

byte bits = fieldsStream.ReadByte();
System.Diagnostics.Debug.Assert(bits <= FieldsWriter.FIELD_IS_COMPRESSED + FieldsWriter.FIELD_IS_TOKENIZED + FieldsWriter.FIELD_IS_BINARY);

bool compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
bool tokenize = (bits & FieldsWriter.FIELD_IS_TOKENIZED) != 0;
bool binary = (bits & FieldsWriter.FIELD_IS_BINARY) != 0;
//TODO: Find an alternative approach here if this list continues to grow beyond the
//list of 5 or 6 currently here.  See Lucene 762 for discussion
if (acceptField.Equals(FieldSelectorResult.LOAD))
{
AddField(doc, fi, binary, compressed, tokenize);
}
else if (acceptField.Equals(FieldSelectorResult.LOAD_FOR_MERGE))
{
AddFieldForMerge(doc, fi, binary, compressed, tokenize);
}
else if (acceptField.Equals(FieldSelectorResult.LOAD_AND_BREAK))
{
AddField(doc, fi, binary, compressed, tokenize);
break; //Get out of this loop
}
else if (acceptField.Equals(FieldSelectorResult.LAZY_LOAD))
{
AddFieldLazy(doc, fi, binary, compressed, tokenize);
}
else if (acceptField.Equals(FieldSelectorResult.SIZE))
{
SkipField(binary, compressed, AddFieldSize(doc, fi, binary, compressed));
}
else if (acceptField.Equals(FieldSelectorResult.SIZE_AND_BREAK))
{
AddFieldSize(doc, fi, binary, compressed);
break;
}
else
{
SkipField(binary, compressed);
}
}

return doc;
}


标红的是一个IndexInput的实现,它是具体读取的方法,实现一般在存储类中以嵌套公开的方式实现,比如此处例子的实现如下:

public /*protected internal*/class SimpleFSIndexInput : BufferedIndexInput, System.ICloneable
{

protected internal class Descriptor : System.IO.BinaryReader
{
// remember if the file is open, so that we don't try to close it
// more than once
protected internal volatile bool isOpen;
internal long position;
internal long length;

public Descriptor(/*FSIndexInput enclosingInstance,*/ System.IO.FileInfo file, System.IO.FileAccess mode)
: base(new System.IO.FileStream(file.FullName, System.IO.FileMode.Open, mode, System.IO.FileShare.ReadWrite))
{
isOpen = true;
length = file.Length;
}

public override void Close()
{
if (isOpen)
{
isOpen = false;
base.Close();
}
}

~Descriptor()
{
try
{
Close();
}
finally
{
}
}
}


可以看到最后字段由System.IO.BinaryReader到文件中读取。

完。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐