深度阅读

How to remove duplicates in pandas?

作者
作者
2023年08月22日
更新时间
10.26 分钟
阅读时间
0
阅读量

To remove duplicates in a pandas DataFrame, you can use the drop_duplicates() method. This method returns a new DataFrame with duplicate rows removed, based on one or more columns. Here is an example:

import pandas as pd

# Create a DataFrame with duplicate rows
df = pd.DataFrame({'col1': ['A', 'B', 'A'], 'col2': [1, 2, 1]})

# Remove duplicates based on col1 and col2 columns
df = df.drop_duplicates(['col1', 'col2'])

# Print the new DataFrame
print(df)

In this code, we create a DataFrame df with duplicate rows, and then use the drop_duplicates() method to remove duplicates based on the col1 and col2 columns. The resulting DataFrame has only the unique rows.

If you want to remove duplicates based on all columns, you can call drop_duplicates() without any arguments:

# Remove duplicates based on all columns
df = df.drop_duplicates()

# Print the new DataFrame
print(df)

In this code, we call drop_duplicates() without any arguments to remove duplicates based on all columns.

相关标签

博客作者

热爱技术,乐于分享,持续学习。专注于Web开发、系统架构设计和人工智能领域。