深度阅读

How to merge and join data using pandas

作者
作者
2023年08月22日
更新时间
18.75 分钟
阅读时间
0
阅读量

To merge and join data using pandas, you can use the merge() and join() functions. Here’s how:

  1. merge(): This function is used to merge two DataFrames based on a common column or index. It takes two DataFrames as input and returns a merged DataFrame. Here’s an example:
import pandas as pd

# Create two sample DataFrames
df1 = pd.DataFrame({'employee': ['John', 'Lisa', 'Mike', 'Mary'],
                    'department': ['Accounting', 'Engineering', 'Engineering', 'HR']})

df2 = pd.DataFrame({'employee': ['John', 'Lisa', 'Mike', 'Mark'],
                    'salary': [50000, 60000, 55000, 70000]})

# Merge the two DataFrames on the 'employee' column
merged_df = pd.merge(df1, df2, on='employee')

print(merged_df)

This will output a merged DataFrame with columns ‘employee’, ‘department’, and ‘salary’ based on the ‘employee’ column.

  1. join(): This function is used to join two DataFrames based on their index. It is similar to merge(), but instead of merging on a column, it merges on the index. Here’s an example:
import pandas as pd

# Create two sample DataFrames
df1 = pd.DataFrame({'department': ['Accounting', 'Engineering', 'HR']},
                    index=['John', 'Lisa', 'Mike'])

df2 = pd.DataFrame({'salary': [50000, 60000, 70000]},
                    index=['John', 'Lisa', 'Mark'])

# Join the two DataFrames on their index
joined_df = df1.join(df2)

print(joined_df)

This will output a joined DataFrame with columns ‘department’ and ‘salary’ based on the index. Note that there is no value for ‘salary’ for the ‘Mike’ row, since there is no corresponding index in df2.

Both merge() and join() functions have many options to customize the behavior of the operation, such as how to handle missing values or how to merge on multiple columns. You can refer to the pandas documentation for more information.

相关标签

博客作者

热爱技术,乐于分享,持续学习。专注于Web开发、系统架构设计和人工智能领域。