Skip to content Skip to sidebar Skip to footer

Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

Working in Python / pandas / dataframes I have these two dataframes: Dataframe one: 1 2 3 1 Stockholm 100 250 2 Stockholm 150 376 3 St

Solution 1:

Use:

df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge) 

df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated row with a True
df_final = df1[~df1['Duplicated']] # selects only rows which are not duplicated.
del df_final['Duplicated'] # delete the indicator column

The idea is as follows:

  1. do a inner join on all the columns
  2. append the output of the inner join to df1
  3. identify the duplicated rows in df1
  4. select the not duplicated rows in df1

Each number corresponds to each line of code.


Post a Comment for "Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe"