深度阅读

How to stem or lemmatize words using NLTK?

作者
作者
2023年08月22日
更新时间
16.81 分钟
阅读时间
0
阅读量

To stem or lemmatize words using NLTK in Python, you can follow these steps:

  1. Install the NLTK library if it’s not already installed in your system.
pip install nltk
  1. Import the necessary libraries and download the WordNet corpus.
import nltk
nltk.download('wordnet')
  1. Initialize the stemmer or lemmatizer object. NLTK provides several options for stemmers or lemmatizers, such as Porter stemmer or WordNet lemmatizer.
from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
  1. Tokenize your text into words and apply the stemmer or lemmatizer to each word using a list comprehension.
from nltk.tokenize import word_tokenize
text = "Stemming and lemmatization are important techniques in natural language processing"
words = word_tokenize(text)
stemmed_words = [stemmer.stem(word) for word in words]
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

Here, stemmed_words will contain the stemmed list of words and lemmatized_words will contain the lemmatized list of words.

Alternatively, You can use the stemming or lemmatization module to stem or lemmatize text.

from stemming.porter2 import stem
print(stem('stemming and lemmatization are important techniques in natural language processing'))

from lemmatization.lemmatize import lemmatize
print(lemmatize('stemming and lemmatization are important techniques in natural language processing'))

Either way, the resulting stemmed_words, lemmatized_words or ‘stemmed_text’, ‘lemmatized_text’ will contain the original text with all of the stemming and lemmatization applied.

博客作者

热爱技术,乐于分享,持续学习。专注于Web开发、系统架构设计和人工智能领域。