Skip to content Skip to sidebar Skip to footer

Merge Two Python Dataframes And Avoid Adding Same Match Twice Before Moving To The Next Row

For pandas.merge(df1, df2, on='Col_4') will operate by inner join by default which will take rows on the shared columns that have the exact values in these shared columns. Question

Solution 1:

You can add a serial number serial for each group of same value of col_4 in each of df1 and df2. Then, merge by col_4 and this serial number serial, as follows:

We generate the serial number by .groupby() + cumcount():

df1['serial'] = df1.groupby('col_4').cumcount()

df2['serial'] = df2.groupby('col_4').cumcount()

df1.merge(df2, on=['col_4', 'serial'])

Result:

   ID_x  col_1_x  col_2_x  col_3_x  col_4  serial  ID_y  col_1_y  col_2_y  col_3_y
0     1        1        6       11  apple       0     1        8       12       12
1     2        2        7       12  apple       1     1        9       13       13
2     3        3        8       13  apple       2     3       10       15       14
3     5        4        9       14  apple       3     5       11       17       15

Optionally, you can further remove this serial number column serial, as follows:

df1.merge(df2, on=['col_4', 'serial']).drop('serial', axis=1)

Result:

   ID_x  col_1_x  col_2_x  col_3_x  col_4  ID_y  col_1_y  col_2_y  col_3_y
0     1        1        6       11  apple     1        8       12       12
1     2        2        7       12  apple     1        9       13       13
2     3        3        8       13  apple     3       10       15       14
3     5        4        9       14  apple     5       11       17       15

Edit

You can also simplify the codes by incorporating the generations of serial numbers into the step of .merge(), as follows: (Thanks for the suggestion by @HenryEcker)

df1.merge(df2, 
          left_on=['col_4', df1.groupby('col_4').cumcount()],
          right_on=['col_4', df2.groupby('col_4').cumcount()]
          ).drop('key_1', axis=1)

Post a Comment for "Merge Two Python Dataframes And Avoid Adding Same Match Twice Before Moving To The Next Row"