深度阅读

How to split data into training and testing in Python

作者
作者
2023年08月22日
更新时间
11.74 分钟
阅读时间
0
阅读量

There are several ways to split data into training and testing sets in Python, but one popular method is to use the train_test_split() function from the Scikit-learn library. Here’s an example of how to use train_test_split() to split a dataset into training and testing sets:

from sklearn.model_selection import train_test_split
import pandas as pd

# Load the dataset into a pandas dataframe
df = pd.read_csv('data.csv')

# Split the dataset into features (X) and labels (y)
X = df.drop('label', axis=1)
y = df['label']

# Split the dataset into training and testing sets, with 80% of the data for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this example, train_test_split() is used to split the dataset into training and testing sets, with 80% of the data for training and 20% for testing. The feature data (X) and label data (y) are passed as arguments to the function, and the resulting training and testing sets are stored in separate variables (X_train, X_test, y_train, and y_test). The random_state argument is used to ensure that the split is reproducible.

相关标签

博客作者

热爱技术,乐于分享,持续学习。专注于Web开发、系统架构设计和人工智能领域。