[ solr入门 ] - 在schema.xml中加入中文分词(IKAnalyzer)
2012-02-08 16:37
471 查看
/article/4851185.html
一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。
1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:
2.在schema.xml配置IKAnalyzer
3.启动solr进行验证
在field中选择type,并输入test,在field value中输入一段中文,Analyze既可以看到分词效果。
verbose output 选项可以查看分词详细信息。
具体的schema.xml的配置可以查看solr wiki:
http://wiki.apache.org/solr/SchemaXml
一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。
1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:
2.在schema.xml配置IKAnalyzer
<!-- IKAnalyzer3.2.8 中文分词--> <fieldType name="text" class="solr.TextField"> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
3.启动solr进行验证
在field中选择type,并输入test,在field value中输入一段中文,Analyze既可以看到分词效果。
verbose output 选项可以查看分词详细信息。
具体的schema.xml的配置可以查看solr wiki:
http://wiki.apache.org/solr/SchemaXml
Data Types The <types> section allows you to define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type. Any subclass of FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package. For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr. Common options that field types can have are... sortMissingLast=true|false sortMissingFirst=true|false indexed=true|false stored=true|false multiValued=true|false omitNorms=true|false omitTermFreqAndPositions=true|false Solr1.4 omitPositions|false Solr3.4 positionIncrementGap=N TextFields can also support Analyzers with highly configurable Tokenizers and Token Filters. Field types that store text (TextField, StrField) support compression of stored contents: compressed=true|false compressThreshold=<integer> compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true; a common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.
相关文章推荐
- [ solr入门 ] - 在schema.xml中加入自己的分词工具
- solr schema.xml 加入中文分析器 IKAnalyzer 就找不到solrcore
- Solr配置,schema.xml的配置,以及中文分词
- Solr配置,schema.xml的配置,以及中文分词
- Solr4:加入中文分词IKAnalyzer2012 FF
- Solr学习二:加入中文分词工具IKAnalyzer
- Solr配置,schema.xml的配置,以及中文分词
- 全文检索引擎Solr系列——整合中文分词组件IKAnalyzer
- solr学习之(二)_在solr4.2中部署IKAnalyzer中文分词插件
- solr5.3.1 添加中文分词之IKAnalyzer
- 全文检索引擎Solr系列——整合中文分词组件IKAnalyzer
- [solr] - IKAnalyzer 分词加入
- [solr] - IKAnalyzer 分词加入
- solr 扩展中文分词 IKAnalyzer 配置文件路径
- Solr6.5配置中文分词IKAnalyzer和拼音分词pinyinAnalyzer (二)
- 全文检索引擎Solr系列——整合中文分词组件IKAnalyzer
- Solr之——配置中文分词IKAnalyzer和拼音分词pinyinAnalyzer
- IKAnalyzer 扩展词典(强制分词)【solr里添加扩展词典,扩展词典的格式必须是 utf-8 的无BOM格式编码。jav开发中 IKAnalyzer.cfg.xml必须在类路径根下】
- Apache Solr schema.xml及solrconfig.xml文件中文注解
- Nutch集成Solr中文分词Schema