python · March 26, 2023 0

How to split data into training and testing in Python

Table of Content

There are several ways to split data into training and testing sets in Python, but one popular method is to use the train_test_split() function from the Scikit-learn library. Here’s an example of how to use train_test_split() to split a dataset into training and testing sets:

from sklearn.model_selection import train_test_split
import pandas as pd

# Load the dataset into a pandas dataframe
df = pd.read_csv('data.csv')

# Split the dataset into features (X) and labels (y)
X = df.drop('label', axis=1)
y = df['label']

# Split the dataset into training and testing sets, with 80% of the data for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this example, train_test_split() is used to split the dataset into training and testing sets, with 80% of the data for training and 20% for testing. The feature data (X) and label data (y) are passed as arguments to the function, and the resulting training and testing sets are stored in separate variables (X_train, X_test, y_train, and y_test). The random_state argument is used to ensure that the split is reproducible.

%d bloggers like this: