您的位置:首页 > 编程语言 > Python开发

Natural Language Processing With Python (2)

2014-08-26 11:34 381 查看
Chapter 3:

This chapter describes the skill to process raw text.

Some important point:

1. Access text from web and disk : api such as urlopen(), open(), read(), write() and some string operation . Also some tool to process text of html.

2. Text processing with Unicode : file/terminal(specific encoding) -> In-memory program including python processing(Unicode) -> file/terminal (specific encoding)

3. Regular expressions : re.search, find, findall, replace, splite and so on (remember to add r charater for raw text of regular expression).

Another api in nltk is nltk.regexp_tokenize() which is similar to findall.

Useful for finding word stems and searching tokenized text.

4. Normalizing Text and Segmentation : Stemmers, Lemmatization, Sentence Segmantation, Word Segmantation.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: