Merge Two Python Dataframes And Avoid Adding Same Match Twice Before Moving To The Next Row
For pandas.merge(df1, df2, on='Col_4') will operate by inner join by default which will take rows on the shared columns that have the exact values in these shared columns. Question
Solution 1:
You can add a serial number serial
for each group of same value of col_4
in each of df1
and df2
. Then, merge by col_4
and this serial number serial
, as follows:
We generate the serial number by .groupby()
+ cumcount()
:
df1['serial'] = df1.groupby('col_4').cumcount()
df2['serial'] = df2.groupby('col_4').cumcount()
df1.merge(df2, on=['col_4', 'serial'])
Result:
ID_x col_1_x col_2_x col_3_x col_4 serial ID_y col_1_y col_2_y col_3_y
0 1 1 6 11 apple 0 1 8 12 12
1 2 2 7 12 apple 1 1 9 13 13
2 3 3 8 13 apple 2 3 10 15 14
3 5 4 9 14 apple 3 5 11 17 15
Optionally, you can further remove this serial number column serial
, as follows:
df1.merge(df2, on=['col_4', 'serial']).drop('serial', axis=1)
Result:
ID_x col_1_x col_2_x col_3_x col_4 ID_y col_1_y col_2_y col_3_y
0 1 1 6 11 apple 1 8 12 12
1 2 2 7 12 apple 1 9 13 13
2 3 3 8 13 apple 3 10 15 14
3 5 4 9 14 apple 5 11 17 15
Edit
You can also simplify the codes by incorporating the generations of serial numbers into the step of .merge()
, as follows: (Thanks for the suggestion by @HenryEcker)
df1.merge(df2,
left_on=['col_4', df1.groupby('col_4').cumcount()],
right_on=['col_4', df2.groupby('col_4').cumcount()]
).drop('key_1', axis=1)
Post a Comment for "Merge Two Python Dataframes And Avoid Adding Same Match Twice Before Moving To The Next Row"