您的位置:首页 > 其它

Introduction to The Solr Enterprise Search Server

2007-04-02 16:31 597 查看

Solr in a Nutshell

Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called "indexing") via XML over HTTP. You query it via HTTP GET and receive XML results.

Advanced Full-Text Search Capabilities

Optimized for High Volume Web Traffic

Standards Based Open Interfaces - XML and HTTP

Comprehensive HTML Administration Interfaces

Scalability - Efficient Replication to other Solr Search Servers

Flexible and Adaptable with XML configuration

Extensible Plugin Architecture

Solr Uses the Lucene Search Library and Extends it!

A Real Data Schema, with Dynamic Fields, Unique Keys

Powerful Extensions to the Lucene Query Language

Support for Dynamic Result Grouping and Filtering

Advanced, Configurable Text Analysis

Highly Configurable and User Extensible Caching

Performance Optimizations

External Configuration via XML

An Administration Interface

Monitorable Logging

Fast Incremental Updates and Snapshot Distribution

Detailed Features

Schema

Defines the field types and fields of documents

Can drive more intelligent processing

Declarative Lucene Analyzer specification

Dynamic Fields enables on-the-fly addition of fields

CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field

Explicit types eliminates the need for guessing types of fields

External file-based configuration of stopword lists, synonym lists, and protected word lists

Query

HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby)

Highlighted context snippets

Faceted Searching based on field values and explicit queries

Sort specifications added to query language

Constant scoring range and prefix queries - no idf, coord, or lengthNorm factors, and no restriction on the number of terms the query matches.

Function Query - influence the score by a function of a field's numeric value or ordinal

Performance Optimizations

Core

Pluggable query handlers and extensible XML data format

Document uniqueness enforcement based on unique key field

Batches updates and deletes for high performance

User configurable commands triggered on index changes

Searcher concurrency control

Correct handling of numeric types for both sorting and range queries

Ability to control where docs with the sort field missing will be placed

Support for dynamic grouping of search results

Caching

Configurable Query Result, Filter, and Document cache instances

Pluggable Cache implementations

Cache warming in background

When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.

Autowarming in background

The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabing high cache hit rates across index/searcher changes.

Fast/small filter implementation

User level caching with autowarming support

Replication

Efficient distribution of index parts that have changed via rsync transport

Pull strategy allows for easy addition of searchers

Configurable distribution interval allows tradeoff between timeliness and cache utilization

Admin Interface

Comprehensive statistics on cache utilization, updates, and queries

Text analysis debugger, showing result of every stage in an analyzer

Web Query Interface w/ debugging output

parsed query output

Lucene explain() document score detailing

explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐