Natural Language Processing With Python (2)
2014-08-26 11:34
381 查看
Chapter 3:
This chapter describes the skill to process raw text.
Some important point:
1. Access text from web and disk : api such as urlopen(), open(), read(), write() and some string operation . Also some tool to process text of html.
2. Text processing with Unicode : file/terminal(specific encoding) -> In-memory program including python processing(Unicode) -> file/terminal (specific encoding)
3. Regular expressions : re.search, find, findall, replace, splite and so on (remember to add r charater for raw text of regular expression).
Another api in nltk is nltk.regexp_tokenize() which is similar to findall.
Useful for finding word stems and searching tokenized text.
4. Normalizing Text and Segmentation : Stemmers, Lemmatization, Sentence Segmantation, Word Segmantation.
This chapter describes the skill to process raw text.
Some important point:
1. Access text from web and disk : api such as urlopen(), open(), read(), write() and some string operation . Also some tool to process text of html.
2. Text processing with Unicode : file/terminal(specific encoding) -> In-memory program including python processing(Unicode) -> file/terminal (specific encoding)
3. Regular expressions : re.search, find, findall, replace, splite and so on (remember to add r charater for raw text of regular expression).
Another api in nltk is nltk.regexp_tokenize() which is similar to findall.
Useful for finding word stems and searching tokenized text.
4. Normalizing Text and Segmentation : Stemmers, Lemmatization, Sentence Segmantation, Word Segmantation.
相关文章推荐
- Natural Language Processing With Python (3)
- <Natural Language Processing with Python>学习笔记一
- spaCy:Industrial-strength Natural Language Processing (NLP) with Python and Cython
- Natural Language Processing with Python 1.1
- 《Natural Language Processing with Python》6.2节的一些错误
- <Natural Language Processing with Python>学习笔记二
- Natural Language Processing With Python (1)
- 【Natural Language Processing】TF-IDF及其Python实现
- spaCy is a library for advanced natural language processing in Python and Cython:spaCy 工业级自然语言处理工具
- Natural language processing: Deep Neural networks with multitask learning
- CS224n: Natural Language Processing with Deep Learning——assigment 3 代码
- CS 288: Statistical Natural Language Processing
- NLP:Natural Language Processing
- The Stanford Natural Language Processing Group
- 《Natural Language Processing》斯坦福视频学习笔记——3.编辑距离
- Natural Language Processing (nlp) 路线图
- 【每周一文】Natural Language Processing (almost) From Scratch
- 《Natural Language Processing》斯坦福视频学习笔记——1.introduction
- Applied Natural Language Processing —— 读书笔记 第一章
- The Stanford NLP (Natural Language Processing) Group