您的位置:首页 > 编程语言 > Go语言

Solr using RankingAlgorithm实现准实时搜索

2011-09-15 18:15 344 查看
Implementing NRT (Near Real Time Search) in Solr using RankingAlgorithm

By

Nagendra Nagarajayya
http://solr-ra.tgels.com
Summary

This paper describes how NRT (Near Real Time Search) can be implemented in Solr

using the RankingAlgorithm. The technical details of the NRT implementation are

discussed below.

Step 1:

Changes to DirectUpdateHandler2.java

a. code changes

The main code change is in DirectUpdateHandler2.java where a commit is no longer

needed now.

public int addDoc(AddUpdateCommand cmd) throws IOException {



The below code needs to be added to the the method:

if (realtime) {

1 IndexReader r = core.getNRTReader();

2 core.storeNRTReader(writer.getReader());

if (r != null) {

3 r.close();

}

}

Description:

If realtime is enabled in solrconfig.xml by adding <realtime>true</realtime>:

1. the method core.getNRTReader(); retrieves the existing reader.

2. The writer.getReader() method gets a new reader with the newly added docs in its cache.

3. The old reader is closed.

Other modifications to DirectUpdateHandler2.java:

protected void closeWriter() throws IOException {

IndexReader r = core.getNRTReader();



if (r != null) {

r.close();

core.storeNRTReader(null);

}

protected void rollbackWriter() throws IOException {

try {

numDocsPending.set(0);

if (writer!=null) writer.rollback();

IndexReader r = core.getNRTReader();



if (r != null) {

r.close();

core.storeNRTReader(null);

}

Step 2:

Changes to SolrCore.java

a. code changes

The below instance attributes are used to store the reader and the time of update:

private HashMap<String, IndexReader> reader_hm = new HashMap() ;

private HashMap<String, Long> update_hm = new HashMap() ;



The below methods make available the new reader to other components:

public IndexReader getNRTReader() {

return reader_hm.get(name);

}



public long getNRTWhenTime() {

Long l = update_hm.get(name);

if (l == null) {

return 0;

}

return l.longValue();

}



public void storeNRTReader(IndexReader ir) {

reader_hm.put(name, ir);

update_hm.put(name, new Long(System.currentTimeMillis()));

}

Step 3:

Changes to SolrIndexSearcher.java

a. code changes

private IndexReader ir = null;

RankingAlgorithm uses the new IndexReader as below:

ir1 = reader;

1 if (realtime) {

2 ir1 = core.getNRTReader();

3 Long when = core.getNRTWhenTime();

4 if (ir1 != null) {

5 if (this.when.longValue() < when.longValue() ) {

if (ir != null) {

6 ir.close();

}

7 ir = (IndexReader)ir1.clone();

}

8 ir1 = ir;

}else {

9 ir1 = reader;

}

}

Description:

1. Realtime is enabled

2. request core to get any new IndexReader if available.

3. get time when reader was created

4. if reader exists

5. check timestamps to see if it is a new reader

6. if so, close any old readers

7. clone the new reader

8. use this as the reader for search

9. use the old reader for search

public int maxDoc() throws IOException {

1 if (realtime && ir != null) {

2 return ir.maxDoc();

}

3 return super.maxDoc();

}

Description:

1. if realtime

2. return maxDoc using the new reader

3. If not return maxDoc with the old reader

A new method getWrappedReader() that returns the IndexReader instead of the

SolrIndexReader for faceting, fq, etc.:

public IndexReader getWrappedReader() {

if (ir != null && realtime) {

return ir;

}

return reader.getWrappedReader();

}

public Document doc(int n, FieldSelector fieldSelector) throws IOException {

try {

if (ir != null && realtime) {

return ir.document(n);

}

return getIndexReader().document(n, fieldSelector);

} catch(IOException t) {

throw t;

}

}

public Document doc(int i, Set<String> fields) throws IOException {



Document d=null;

if (documentCache != null) {

d = (Document)documentCache.get(i);

if (d!=null) return d;

}

if(!enableLazyFieldLoading || fields == null) {

//d = getIndexReader().document(i);

try {

if (ir == null && realtime) {

IndexReader ir1 = core.getNRTReader();

when = core.getNRTWhenTime();

if (ir1 != null) {

ir = (IndexReader)ir1.clone();

}

}

if (ir != null) {

d = ir.document(i);

}else {

d = getIndexReader().document(i);

}

} catch(IOException t) {

throw t;

}

} else {

//d = getIndexReader().document(i,

//s new SetNonLazyFieldSelector(fields));

try {

if (ir == null && realtime) {

IndexReader ir1 = core.getNRTReader();

when = core.getNRTWhenTime();

if (ir1 != null) {

ir = (IndexReader)ir1.clone();

}

}

if (ir != null) {

d = ir.document(i, new SetNonLazyFieldSelector(fields));

}else {

d = getIndexReader().document(i, new

SetNonLazyFieldSelector(fields));

}



} catch(Throwable t) {

throw new IOException(t);

}

}

if (documentCache != null) {

documentCache.put(i, d);

}

return d;

}

Step 4:

Changes to UnInvertedField.java:

a. code changes

public static UnInvertedField getUnInvertedField(String field, SolrIndexSearcher

searcher) throws IOException {

SolrCache cache = searcher.getFieldValueCache();

if (cache == null) {

return new UnInvertedField(field, searcher);

}

UnInvertedField uif = (UnInvertedField)cache.get(field);

if (uif == null) {

synchronized (cache) {

uif = (UnInvertedField)cache.get(field);

if (uif == null) {

uif = new UnInvertedField(field, searcher);

cache.put(field, uif);

}

}

}

/* NRT */

1 if (searcher.maxDoc() > uif.index.length) {

2 uif = new UnInvertedField(field, searcher); /* need to make this

dynamic*/

3 cache.put(field, uif);

}

return uif;

}

}

Description:

1. Check if any docs were added

2. Create a new copy of UIF

3. Store this in the cache and return the new UIF

b. Change all getReader() method calls to getWrappedReader() in the file.

Step 5:

Changes to SimpleFacet.java:

a. Change all getReader() method calls to getWrappedReader()

Step 6:

Changes to SolrConfig.java:

a. code changes

public boolean realtime = false;

public boolean getRealtime() {

return realtime;

}

realtime = getBool("realtime", false);

Conclusion

The near real time search in Solr-RA works well and allows concurrent search with

indexing in parallel without closing the IndexSearchers or clearing the cache

providing the ability to offer searches in near real time. The NRT implementation

supports faceting, filter queries, etc. The faceting count can be seen changing as

documents are added in the screenshots below Fig 1 and Fig2. Fig 1 shows a facet

query for “john” from the mbartists index (from the book Solr-14-Enterprise-Search-Server). Fig 2 shows the same query after adding a new artist to the index as below:

curl "http://localhost:8990/solr/mbartists/update/csv?stream.file=/tmp/x.csv&encapsulator=%1f"

<?xml version="1.0" encoding="UTF-8"?>

<response>

<lst name="responseHeader"><int name="status">0</int><int name="QTime">163</int></lst>

</response>

cat /tmp/x:

id,type,a_name,a_name_sort,a_alias,a_type,a_begin_date,a_end_date,a_member_name,a_member_id,a_release_d

ate_latest,a_spell,a_spellPhrase,r_name,r_name_sort,r_name_facetLetter,r_a_name,r_a_id,r_attributes,r_t

ype,r_official,r_lang,r_tracks,r_event_country,r_event_date,r_event_date_earliest,l_name,l_name_sort,l_

type,l_begin_date,l_end_date,t_name,t_duration,t_a_id,t_a_name,t_num,t_r_id,t_r_name,t_r_attributes,t_r

_tracks,t_trm_lookups,word,includes

Artist:3991866,Artist,John Ab Davis,John Ab Davis,,person,1942-12-29T00:00:00Z,1999-12-10T00:00:00Z,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Fig 1, shows numFound as 3256, and the facet count for “john” as 3256. Fig 2 after

adding a doc with curl shows 3257, and the facet count for “john” as 3257. The Solr

query is as below:
http://192.168.1.126:8990/solr/mbartists/select/?q=john&facet=on&facet.field=a_name&facet.field=a_type&fl=score
Note: make sure you clear the browser cache

The indexing performance observed on a 2 core intel system with Fedora Linux 12 is

about 262 tps (new document adds). This could be improved to a very high number

(from 14 secs for indexing about 3900 documents to about 2 secs) if

IndexWriter.getReader() performance is improved; at the moment, it takes

about 70-90 ms to get a IndexReader.

The modified src can be downloaded along with Solr with RankingAlgorithm from

here:
http://solr-ra.tgels.com
Fig 1

Fig 2
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: