admin管理员组文章数量:1794759
NLTK
NLTK 库的使用方法
- 安装
收起
bash
复制
代码语言:javascript代码运行次数:0运行复制pip install nltk
- 下载相关数据
- 首次使用时,需要下载 NLTK 的语料库和其他数据资源。在 Python 脚本或交互式环境中运行以下代码:
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
nltk.download()
- 这会弹出一个下载器窗口,你可以选择需要下载的数据,如
punkt
(用于句子和单词切分的语料库)、averaged_perceptron_tagger
(词性标注器)等。
三、代码示例
1. 句子和单词切分(Tokenization)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "Natural Language Processing is an interesting field. It has many applications."
# 句子切分
sentences = nltk.sent_tokenize(text)
print("Sentences:")
for sentence in sentences:
print(sentence)
# 单词切分
words = []
for sentence in sentences:
word_tokens = nltk.word_tokenize(sentence)
words.extend(word_tokens)
print("\nWords:")
for word in words:
print(word)
2. 词性标注(Part - of - Speech Tagging)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "I love apples. They are delicious."
words = nltk.word_tokenize(text)
tagged_words = nltk.pos_tag(words)
print("Tagged words:")
for word, tag in tagged_words:
print(word, "-", tag)
3. 命名实体识别(Named Entity Recognition)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "Apple Inc. is headquartered in Cupertino, California."
words = nltk.word_tokenize(text)
tagged_words = nltk.pos_tag(words)
named_entities = nltk.ne_chunk(tagged_words)
print("Named entities:")
print(named_entities)
4. 词干提取(Stemming)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ["running", "runs", "ran", "easily", "fairly"]
for word in words:
stem = ps.stem(word)
print(word, "->", stem)
本文标签: NLTK
版权声明:本文标题:NLTK 内容由林淑君副主任自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.xiehuijuan.com/baike/1754330379a1701338.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论