How to split data into training and testing in Python

作者

2023年08月22日

更新时间

11.74 分钟

阅读时间

阅读量

There are several ways to split data into training and testing sets in Python, but one popular method is to use the train_test_split() function from the Scikit-learn library. Here’s an example of how to use train_test_split() to split a dataset into training and testing sets:

from sklearn.model_selection import train_test_split
import pandas as pd

# Load the dataset into a pandas dataframe
df = pd.read_csv('data.csv')

# Split the dataset into features (X) and labels (y)
X = df.drop('label', axis=1)
y = df['label']

# Split the dataset into training and testing sets, with 80% of the data for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this example, train_test_split() is used to split the dataset into training and testing sets, with 80% of the data for training and 20% for testing. The feature data (X) and label data (y) are passed as arguments to the function, and the resulting training and testing sets are stored in separate variables (X_train, X_test, y_train, and y_test). The random_state argument is used to ensure that the split is reproducible.

How to split data into training and testing in Python

相关标签

How to print all files within a directory using Python?

How to Install a Python Package with a .whl and zip File?

博客作者

GLM 是真敢删啊？！说好的 P0 安全规范呢？

如果要投票一个最弱智的ai模型一定是千问

告别手动拼接：PromptForge 如何重新定义你的 AI 工作流

Privacy Policy for TerryVoiceRead Chrome Extension

告别龟速！NAS迅雷内测体验，速度起飞，附邀请码！