To convert categorical data to numerical data in scikit-learn, you can use several techniques, including Label Encoding and One-Hot Encoding.

Label Encoding: Label Encoding is a technique that assigns a unique integer to each category in the feature. Scikit-learn provides LabelEncoder class for this purpose. Here’s an example:


from sklearn.preprocessing import LabelEncoder
import numpy as np

create sample data

data = np.array([‘apple’, ‘banana’, ‘pear’, ‘pear’, ‘banana’])

label encode the data

encoder = LabelEncoder()
encoded_data = encoder.fit_transform(data)

print(encoded_data)

In this example, we are using `LabelEncoder` to convert categories to integers. The fit_transform method fits the encoder on the data and labels each category with a unique integer.

2. One-Hot Encoding: One-Hot Encoding is a technique that creates a binary column for each category in the feature. Scikit-learn provides `OneHotEncoder` class for this purpose. Here's an example:

from sklearn.preprocessing import OneHotEncoder
import numpy as np

create sample data

data = np.array([[‘red’, ‘S’], [‘blue’, ‘M’], [‘green’, ‘L’], [‘blue’, ‘XL’]])

one-hot encode the data

encoder = OneHotEncoder(handle_unknown=’ignore’)
encoded_data = encoder.fit_transform(data)

print(encoded_data.toarray())



In this example, we are using `OneHotEncoder` to create binary columns for each category. The `toarray()` method converts the sparse matrix to a dense matrix.

By using these techniques, you can convert categorical data to numerical data in scikit-learn and prepare it for use in machine learning models.

How to convert categorical data to numeric data in scikit-learn?

create sample data

label encode the data

create sample data

one-hot encode the data

相关标签

How to handle missing data when using scikit-learn?

How to perform data normalization using scikit-learn?

博客作者

GLM 是真敢删啊？！说好的 P0 安全规范呢？

如果要投票一个最弱智的ai模型一定是千问

告别手动拼接：PromptForge 如何重新定义你的 AI 工作流

Privacy Policy for TerryVoiceRead Chrome Extension

告别龟速！NAS迅雷内测体验，速度起飞，附邀请码！