您的位置:首页 > 其它

SCWS中文分词【安装和demo】

2013-11-18 21:18 288 查看
SCWS程序安装指南(摘自官网,修改了少许bug)

操作系统:Linux(Ubuntu 12.04)

1. 取得 scws-1.2.2 的代码

wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
2. 解开压缩包

[hightman@d1 ~]$ tar xvjf scws-1.2.2.tar.bz2

3. 进入目录执行配置脚本和编译

[hightman@d1 ~]$ cd scws-1.2.2[hightman@d1 ~/scws-1.2.2]$ ./configure --prefix=/usr/local/scws ; make ; make install

注:这里和通用的 GNU 软件安装方式一样,具体选项参数执行 ./configure --help 查看。

常用选项为:--prefix=<scws的安装目录>

4. 顺利的话已经编译并安装成功到 /usr/local/scws 中了,执行下面命令看看文件是否存在

[hightman@d1 ~/scws-1.2.2]$ ls -al /usr/local/scws/lib/libscws.la

5. 试试执行 scws-cli 文件

[hightman@d1 ~/scws-1.2.2]$ /usr/local/scws/bin/scws -h

scws (scws-cli/1.2.2)

Simple Chinese Word Segmentation - Command line usage.

Copyright (C)2007 by hightman.

...

6 用 wget 下载并解压词典,或从主页下载然后自行解压再将 *.xdb 放入 /usr/local/scws/etc 目录中

[hightman@d1 ~/scws-1.2.2]$ cd /usr/local/scws/etc

[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-gbk.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-gbk.tar.bz2

[hightman@d1 /usr/local/scws/etc]$ tar xvjf scws-dict-chs-utf8.tar.bz2

7. 写个小程序测试一下

[hightman@d1 ~]$ cat > test.c

#include <scws/scws.h>

#include <stdio.h>

main()

{

scws_t s;

s = scws_new();

scws_free(s);

printf("test ok!\n");

}

8. 编译测试程序

gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib

./test

自己改写的一个小demo,将其包装成了一个函数char* SCWS(char* text),输入要分词的句子,输出分词的结果,以空格间隔

文件名:test.c

#include <stdio.h>

#include <string.h>

#include <stdlib.h>

#include <scws/scws.h>

#define SCWS_PREFIX "/usr/local/scws"

char* SCWS(char* text)

{

scws_t s;

scws_res_t res , cur;

if (!(s = scws_new()))

{

printf("ERROR: cann't init the scws!\n");

exit(-1);

}

scws_set_charset(s, "utf8");

scws_set_dict(s, "/usr/local/scws/etc/dict.utf8.xdb", SCWS_XDICT_XDB);

scws_set_rule(s, "/usr/local/scws/etc/rules.utf8.ini");

scws_send_text(s, text, strlen(text));

char* text_seg = (char*)malloc(1024);

text_seg[0] = '\0';

while(res = cur = scws_get_result(s))

{

while (cur != NULL)

{

strncat(text_seg , text+cur->off , cur->len);

strcat(text_seg , " ");

cur = cur->next;

}

}

scws_free(s);

return text_seg;

}

main()

{

char* res = SCWS("这是一个句子");

printf("%s\n" , res);

}

PS:

1.编译时记得首先要root,不然分词会有bug,切记!

2.编译时使用 gcc -o test -I/usr/local/scws/include -L/usr/local/scws/lib test.c -lscws -Wl,--rpath -Wl,/usr/local/scws/lib 生成可执行文件
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: