您的位置:首页 > 其它

Understanding the Search Service Architecture

2015-11-04 12:16 351 查看
Search is one of the most important services in SharePoint 2013, like it has been throughout theSharePoint product development. This is because it enables users to quickly andeasily find the content
held in SharePoint, both from SharePoint  Content Databases and from external datasources, through the Business Connectivity Service. In SharePoint 2013, theunderlying architecture of Search redeveloped, to provide a richer enterprisesearch experience for
users and streamlined administration.
 
The Search Architecture
 
TheSearch service has been rearchitected in
SharePoint 2013. The architecture changes have
been made to ensure that Search offers ahigher
level of redundancy forsingle and multiple farm
environments.One of the most obvious changes in
the new architecture is the inclusion of the FAST
technologies into the Search service. Thisenhances
search functionalityand makes it easier for
solutionarchitects to design and deploy a fullfeatured
search solution in their organizations.
Search combines a range of elements on
application servers, which include:
 
Crawl Component

Content Processing Component

Analytics Processing Component

Indexing Component

Query Processing Component

Search Administration Component
 
Thesecomponents interact to complete search data ingestion to the search index andfor query results
surfacing. Toingest data, the Crawl Component interrogates the content sources that you have
configured, either on the SharePoint farmor on external sources, such as Microsoft Exchange or Lotus
Notes. The crawled items are processed bythe Content Processing Component to format them
appropriately to be stored on the index. Information from theAnalytics Processing Component is used in
this process, to identify useful associated item information, suchas previous user interaction. The data, or
artifacts, are written to the index, which is a series of files andfolders that are stored on disk and referred
to, collectively, as the index file. The Query Processing Componentreceives queries from the Web Front
End (WFE) server, processes and sends the query to the IndexingComponent, which returns result sets.
The Query Processing Component performs additional processing toaggregate and clean the results and
then sends the result sets back to the WFE to be rendered for theuser. There are temporary and
permanentstorage databases used throughout the process.
 
The Crawl Component
 
TheCrawl Component is a part of the crawl and
content processing architecture, which comprises
the Crawl Component, the crawl database,and the
Content ProcessingComponent. The crawl role is
responsiblefor crawling content sources to build a
search index. This means that the component
reviews each of the documents or pages in a
source location.
 
Crawl Connectors:
 
HTTP

File Shares

SharePoint Sites

User profiles

Exchange

Lotus Notes

Custom
 
TheCrawl Component delivers content from files and pages, together with theirassociated metadata, to
theContent Processing Component. The crawled items that are passed to the ContentProcessing
Component haveassociated properties, such as title and author. Crawl uses connectors, such asthe Lotus
Notes Connector, toaccess content sources and retrieve data. These were known as protocol handlersin
previous versions ofSharePoint Products and Technologies. The properties are grouped, based on the
connector or IFilter—a piece of code thatenables the specific file formats to be indexed and thus be
searchable, now referred to as a formathandler—used to crawl the content source; so, Microsoft Office
documents (for example, Word and Exceldocuments) would be grouped under Office, whereas properties
from websites would be grouped under Web.You can include the contents and metadata of crawled
properties in the search index file. To do this, you must map thecrawled properties to managed
properties,because only these are included in the search index.
Thecrawler itself does not complete document parsing; that function is provisionedby the Content
ProcessingComponent.
TheCrawl Component uses one or more crawl databases to temporarily storeinformation about crawled
itemsand to maintain a crawl history. The database holds information such as thelast crawl time, the last
crawlID and the last crawl update type. Information about content sources, such astheir schedules and
locations,are synchronized to the registry on crawl role servers from the searchadministration database
 
Content Processing Component
Whenthe Crawl Component forwards content and
metadata, the Content Processing Component
performs tasks on the content to prepare it for the
search index file. The search index workswith
processed content, calledartifacts. The Content
ProcessingComponent tasks include parsing
documents, property mapping, and linguisticsprocessing. The latter detects language and
extracts language-based entities. TheContent
Processing Componentalso writes information
aboutthe URLs into the link database, which holds
information about links rather than content.Including and excluding file types
 
TheContent Processing Component can only process file data if the file type(extension) is included in the
listof available file types on the Manage Files Types page and the crawl server hasthe appropriate format
handlerinstalled. By default, some file types, such as email messages with a .emlextension, are supported.
Thedefault format handlers focus on content, so while a Microsoft
PowerPoint .pptx format appears on
the Manage File Types list by default, theMicrosoft PowerPoint .pps presentation file does not.
 
Analytics Processing Component
 
Oneof the major changes to the search service is
the inclusion of web analytics functionality. This
was previously a separate serviceapplication in
SharePointServer 2010. The new Search Analytics
function analyzes both the crawled items and how
users interact with search results; theseare entitled
Search Analyticsand Usage Analytics.

 
Search Analytics
 
Thesearch analytics information is used to
improve search relevance and to create search
reports, recommendations, and data links.This
information is thenreturned to the Content
ProcessingComponent for storage in the search index.
Information about search activity, such as the number of searchclicks from a search results page, helps to
improve the relevance of the search results by analyzing previoususer activities. This information is stored 
in the link database.This information is then further analyzed by a series of sub-analyses. Thefollowing
table shows thesub-analyses that act on Search Analysis results.
 
Usage Analytics
 
Usageanalytics analyzes usage events, such as views from the event store. When auser completes an
action, suchas viewing a page, the event is collected and stored in usage files on each WFEserver. This
information ispushed to an event store where it is stored until it is processed by theAnalytics Processing
Component.The results are then returned to the Content Processing Component to beincluded in the
search index.The usage events that are analyzed include:
 
Views

Recommendations displayed

Recommendation Clicked
 
Index Component
 
Thesearch index is a set of files that are stored in
separate folders on a server. The Content
Processing Component processes items
provisionedby the Crawl Components, maps
crawledproperties to managed properties, and
formats these as artifacts that can be stored on the
search index. The indexes can include:
 
Full-text indexes

Indexes of the Managed properties

An index for attribute vectors

Numeric indexes

Index Component

Index Partition

Index Replica
 
Query Processing Component
 
TheQuery Processing Component, which sits
between the Index Component and the search
front-end client, handles processing when a user
executes a search query and processes theresults
to be returned. Whenthe Indexing Component—
oranother search provider—returns a result set,
the Query Processing Component performs any
additional processing that is required.
 
TheQuery Processing Component performs some
linguistic processing to maximize query efficiency
and effectiveness, such as:
Word stemming. Stemming returns words
closely related to or stemming from another word. For example, the stemmer relates words such as
“jumping,” “jumped,” and “jumper” to the verb “to jump.”

Word breaking. This refers to the breaking of words that are linked by some form of hyphenation; for
example, the term “server-based“ is linked by a hyphen. In this case, word breaking returns both
“server” and “based,” with higher relevance given to an item with both present.

 
The queryprocessing workflow is as follows:
The Query Processing Component receives a query from the search front-end client and processes to
maximize precision, recall, and relevancy. These actions include:

Applying Web Part transformations.
 
 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息