您的位置：首页 > 编程语言 > PHP开发

ZH奶酪：自然语言处理工具LTP语言云调用方法

2015-06-18 10:28 651 查看

前言

LTP语言云平台

不支持离线调用；

　　支持分词、词性标注、命名实体识别、依存句法分析、语义角色标注；

　　不支持自定义词表，但是你可以先用其他支持自定义分词的工具（例如中科院的NLPIR）把文本进行分词，再让ltp帮你标注

支持C#、Go、Java、JavaScript、Nodejs、PHP、Python、R、Ruby等语言调用；　　

还有一些错误响应、频率限制、重要说明（这几个我至今也没用到）；

正文

官方网址：http://www.ltp-cloud.com/

使用文档：http://www.ltp-cloud.com/document/

在线演示：http://www.ltp-cloud.com/demo/

各种语言调用实例可以到Github上下载：https://github.com/HIT-SCIR/ltp-cloud-api-tutorial

例如Python版本的：https://github.com/HIT-SCIR/ltp-cloud-api-tutorial/tree/master/Python

Step1：注册

在这个网址申请一个API key，稍后会用到；

Step2：一个简单的例子（Python版）

（1）复制代码：从Github上复制一段代码（取决于你使用的语言和所需的功能）

（2）修改代码：

　　<1>把 api_key = "YourApiKey" 中的 "YourApiKey" 修改成你Step1申请的API Key；

　　<2>把 text = "我爱北京天安门" 修改成你要处理的文本；

　　<3>根据需求设置不同的参数（其实只需要api_key，text，pattern，format四个参数就够了，仔细看下pattern）：

　　　

# -*- coding: utf-8 -*-
#!/usr/bin/env python

# This example shows how to use Python to access the LTP API to perform full
# stack Chinese text analysis including word segmentation, POS tagging, dep-
# endency parsing, name entity recognization and semantic role labeling and
# get the result in specified format.

import urllib2, urllib
import sys

if __name__ == '__main__':
if len(sys.argv) < 2 or sys.argv[1] not in ["xml", "json", "conll"]:
print >> sys.stderr, "usage: %s [xml/json/conll]" % sys.argv[0]
sys.exit(1)

uri_base = "http://ltpapi.voicecloud.cn/analysis/?"
api_key  = "YourApiKey"
text     = "我爱北京天安门"
# Note that if your text contain special characters such as linefeed or '&',
# you need to use urlencode to encode your data
text     = urllib.quote(text)
format   = sys.argv[1]
pattern  = "all"

url      = (uri_base
+ "api_key=" + api_key + "&"
+ "text="    + text    + "&"
+ "format="  + format  + "&"
+ "pattern=" + "all")

try:
response = urllib2.urlopen(url)
content  = response.read().strip()
print content
except urllib2.HTTPError, e:
print >> sys.stderr, e.reason

Step3：运行

如果要批量处理txt或者xml文件，需要自己写一段批量处理的代码，下边是我之前项目中用到的一段批量处理某一目录下txt文件代码（就是加了一层循环和设置了一个输出）：

# -*- coding: utf-8 -*-
#!/usr/bin/env python

# This example shows how to use Python to access the LTP API to perform full
# stack Chinese text analysis including word segmentation, POS tagging, dep-
# endency parsing, name entity recognization and semantic role labeling and
# get the result in specified format.

import urllib2, urllib
import sys

if __name__ == '__main__':
uri_base = "http://ltpapi.voicecloud.cn/analysis/?"
api_key  = "7132G4z1HE3S********DSxtNcmA1jScSE5XumAI"

f = open("E:\\PyProj\\Others\\rite_sentence.txt")
fw = open("E:\\PyProj\\Others\\rite_pos.txt",'w')

line = f.readline()
while(line):
text     = line
# Note that if your text contain special characters such as linefeed or '&',
# you need to use urlencode to encode your data
text     = urllib.quote(text)
format   = "plain"
pattern  = "pos"

url      = (uri_base
+ "api_key=" + api_key + "&"
+ "text="    + text    + "&"
+ "format="  + format  + "&"
+ "pattern=" + pattern)

try:
response = urllib2.urlopen(url)
content  = response.read().strip()
print content
fw.write(line+content+'\n')
except urllib2.HTTPError, e:
print >> sys.stderr, e.reason
line = f.readline()
fw.close()
f.close()

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航