How to stem or lemmatize words using NLTK?

To stem or lemmatize words using NLTK in Python, you can follow these steps:

Install the NLTK library if it’s not already installed in your system.

pip install nltk

Import the necessary libraries and download the WordNet corpus.

import nltk
nltk.download('wordnet')

Initialize the stemmer or lemmatizer object. NLTK provides several options for stemmers or lemmatizers, such as Porter stemmer or WordNet lemmatizer.

from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

Tokenize your text into words and apply the stemmer or lemmatizer to each word using a list comprehension.

from nltk.tokenize import word_tokenize
text = "Stemming and lemmatization are important techniques in natural language processing"
words = word_tokenize(text)
stemmed_words = [stemmer.stem(word) for word in words]
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

Here, stemmed_words will contain the stemmed list of words and lemmatized_words will contain the lemmatized list of words.

Alternatively, You can use the stemming or lemmatization module to stem or lemmatize text.

from stemming.porter2 import stem
print(stem('stemming and lemmatization are important techniques in natural language processing'))

from lemmatization.lemmatize import lemmatize
print(lemmatize('stemming and lemmatization are important techniques in natural language processing'))

Either way, the resulting stemmed_words, lemmatized_words or ‘stemmed_text’, ‘lemmatized_text’ will contain the original text with all of the stemming and lemmatization applied.

How to stem or lemmatize words using NLTK?

相关标签

How to remove stop words using NLTK?

How to identify named entities using NLTK?

博客作者

GLM 是真敢删啊？！说好的 P0 安全规范呢？

如果要投票一个最弱智的ai模型一定是千问

告别手动拼接：PromptForge 如何重新定义你的 AI 工作流

Privacy Policy for TerryVoiceRead Chrome Extension

告别龟速！NAS迅雷内测体验，速度起飞，附邀请码！