Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe
Working in Python / pandas / dataframes I have these two dataframes: Dataframe one: 1 2 3 1 Stockholm 100 250 2 Stockholm 150 376 3 St
Solution 1:
Use:
df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge)
df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated row with a True
df_final = df1[~df1['Duplicated']] # selects only rows which are not duplicated.
del df_final['Duplicated'] # delete the indicator column
The idea is as follows:
- do a inner join on all the columns
- append the output of the inner join to df1
- identify the duplicated rows in df1
- select the not duplicated rows in df1
Each number corresponds to each line of code.
Post a Comment for "Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe"