Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

November 29, 2022 Post a Comment

Working in Python / pandas / dataframes I have these two dataframes: Dataframe one: 1 2 3 1 Stockholm 100 250 2 Stockholm 150 376 3 St

Solution 1:

Use:

df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge) 

df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated row with a True
df_final = df1[~df1['Duplicated']] # selects only rows which are not duplicated.
del df_final['Duplicated'] # delete the indicator column

The idea is as follows:

do a inner join on all the columns
append the output of the inner join to df1
identify the duplicated rows in df1
select the not duplicated rows in df1

Each number corresponds to each line of code.

Python stackoverflow Examples

Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

Solution 1:

Post a Comment for "Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe"