SCWS中文分词【安装和demo】
2013-11-18 21:18
288 查看
SCWS程序安装指南(摘自官网,修改了少许bug)
操作系统:Linux(Ubuntu 12.04)
1. 取得 scws-1.2.2 的代码
wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
2. 解开压缩包
[hightman@d1 ~]$ tar xvjf scws-1.2.2.tar.bz2
3. 进入目录执行配置脚本和编译
[hightman@d1 ~]$ cd scws-1.2.2[hightman@d1 ~/scws-1.2.2]$ ./configure --prefix=/usr/local/scws ; make ; make install
注:这里和通用的 GNU 软件安装方式一样,具体选项参数执行 ./configure --help 查看。
常用选项为:--prefix=<scws的安装目录>
4. 顺利的话已经编译并安装成功到 /usr/local/scws 中了,执行下面命令看看文件是否存在
[hightman@d1 ~/scws-1.2.2]$ ls -al /usr/local/scws/lib/libscws.la
5. 试试执行 scws-cli 文件
[hightman@d1 ~/scws-1.2.2]$ /usr/local/scws/bin/scws -h
scws (scws-cli/1.2.2)
Simple Chinese Word Segmentation - Command line usage.
Copyright (C)2007 by hightman.
...
6 用 wget 下载并解压词典,或从主页下载然后自行解压再将 *.xdb 放入 /usr/local/scws/etc 目录中
[hightman@d1 ~/scws-1.2.2]$ cd /usr/local/scws/etc
[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-gbk.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-gbk.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-utf8.tar.bz2
7. 写个小程序测试一下
[hightman@d1 ~]$ cat > test.c
#include <scws/scws.h>
#include <stdio.h>
main()
{
scws_t s;
s = scws_new();
scws_free(s);
printf("test ok!\n");
}
8. 编译测试程序
gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib
./test
自己改写的一个小demo,将其包装成了一个函数char* SCWS(char* text),输入要分词的句子,输出分词的结果,以空格间隔
文件名:test.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <scws/scws.h>
#define SCWS_PREFIX "/usr/local/scws"
char* SCWS(char* text)
{
scws_t s;
scws_res_t res , cur;
if (!(s = scws_new()))
{
printf("ERROR: cann't init the scws!\n");
exit(-1);
}
scws_set_charset(s, "utf8");
scws_set_dict(s, "/usr/local/scws/etc/dict.utf8.xdb", SCWS_XDICT_XDB);
scws_set_rule(s, "/usr/local/scws/etc/rules.utf8.ini");
scws_send_text(s, text, strlen(text));
char* text_seg = (char*)malloc(1024);
text_seg[0] = '\0';
while(res = cur = scws_get_result(s))
{
while (cur != NULL)
{
strncat(text_seg , text+cur->off , cur->len);
strcat(text_seg , " ");
cur = cur->next;
}
}
scws_free(s);
return text_seg;
}
main()
{
char* res = SCWS("这是一个句子");
printf("%s\n" , res);
}
PS:
1.编译时记得首先要root,不然分词会有bug,切记!
2.编译时使用 gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib 生成可执行文件
操作系统:Linux(Ubuntu 12.04)
1. 取得 scws-1.2.2 的代码
wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
2. 解开压缩包
[hightman@d1 ~]$ tar xvjf scws-1.2.2.tar.bz2
3. 进入目录执行配置脚本和编译
[hightman@d1 ~]$ cd scws-1.2.2[hightman@d1 ~/scws-1.2.2]$ ./configure --prefix=/usr/local/scws ; make ; make install
注:这里和通用的 GNU 软件安装方式一样,具体选项参数执行 ./configure --help 查看。
常用选项为:--prefix=<scws的安装目录>
4. 顺利的话已经编译并安装成功到 /usr/local/scws 中了,执行下面命令看看文件是否存在
[hightman@d1 ~/scws-1.2.2]$ ls -al /usr/local/scws/lib/libscws.la
5. 试试执行 scws-cli 文件
[hightman@d1 ~/scws-1.2.2]$ /usr/local/scws/bin/scws -h
scws (scws-cli/1.2.2)
Simple Chinese Word Segmentation - Command line usage.
Copyright (C)2007 by hightman.
...
6 用 wget 下载并解压词典,或从主页下载然后自行解压再将 *.xdb 放入 /usr/local/scws/etc 目录中
[hightman@d1 ~/scws-1.2.2]$ cd /usr/local/scws/etc
[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-gbk.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-gbk.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-utf8.tar.bz2
7. 写个小程序测试一下
[hightman@d1 ~]$ cat > test.c
#include <scws/scws.h>
#include <stdio.h>
main()
{
scws_t s;
s = scws_new();
scws_free(s);
printf("test ok!\n");
}
8. 编译测试程序
gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib
./test
自己改写的一个小demo,将其包装成了一个函数char* SCWS(char* text),输入要分词的句子,输出分词的结果,以空格间隔
文件名:test.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <scws/scws.h>
#define SCWS_PREFIX "/usr/local/scws"
char* SCWS(char* text)
{
scws_t s;
scws_res_t res , cur;
if (!(s = scws_new()))
{
printf("ERROR: cann't init the scws!\n");
exit(-1);
}
scws_set_charset(s, "utf8");
scws_set_dict(s, "/usr/local/scws/etc/dict.utf8.xdb", SCWS_XDICT_XDB);
scws_set_rule(s, "/usr/local/scws/etc/rules.utf8.ini");
scws_send_text(s, text, strlen(text));
char* text_seg = (char*)malloc(1024);
text_seg[0] = '\0';
while(res = cur = scws_get_result(s))
{
while (cur != NULL)
{
strncat(text_seg , text+cur->off , cur->len);
strcat(text_seg , " ");
cur = cur->next;
}
}
scws_free(s);
return text_seg;
}
main()
{
char* res = SCWS("这是一个句子");
printf("%s\n" , res);
}
PS:
1.编译时记得首先要root,不然分词会有bug,切记!
2.编译时使用 gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib 生成可执行文件
相关文章推荐
- PHP scws中文分词扩展安装
- SCWS中文分词,安装说明(以:Win32环境、utf8字符集为例)
- SCWS中文分词,demo演示
- 开源php中文分词系统SCWS安装和使用实例
- 开源php中文分词系统SCWS安装和使用实例
- 中文分词插件SCWS-1.2.3 在Linux环境的安装说明(包括php扩展)
- wamp下安装scws(中文分词)
- 开源php中文分词系统SCWS安装和使用实例_php实例
- Elasticsearch2.1.0安装中文分词插件ik1.6
- scws中文分词组件
- Elasticsearch安装中文分词插件ik
- php中文分词系统SCWS的用法
- Solr4.0+IKAnalyzer中文分词安装
- solr 5.5.1安装并配置中文分词IKAnalyzer
- elasticsearch1.6.0安装ik1.4中文分词插件
- scws中文分词组件
- php中文分词系统SCWS的用法
- coreseek sphinx+mmseg 斯分克斯 + 中文分词安装
- scws简单中文分词
- Elasticsearch安装中文分词插件ik