Combining Columns Within The Same Df Python/Pandas

February 25, 2023 Post a Comment

I'm new to the programming world and can't figure out how to concatenate columns in pandas. I'm not looking to join these columns, but rather stack them on top of each other. This

Solution 1:

It looks like you are looking for "append":

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10, (3,2)),columns=list('AB'))
df2 = pd.DataFrame(np.random.randint(1,10, (3,2)),columns=list('AB'))
df3=df.append(df2)

In [2]: df3
Out[2]: 
   A  B
0  7  6
1  8  3
2  2  1
0  2  2
1  1  3
2  5  5

Solution 2:

If you are certain column ordering is consistent and tiled [A,B,C,A,B,C...], then you can create a new DataFrame by reshaping the old. Otherwise safer alternatives exist with pd.wide_to_long which uses the actual column names.

Sample Data

import numpy as np
import pandas as pd

np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (3, 15)),
                  columns=list('BACDE')*3)
#   B  A  C  D  E  B  A  C  D  E  B  A  C  D  E
#0  3  3  7  2  4  7  2  1  2  1  1  4  5  1  1
#1  5  2  8  4  3  5  8  3  5  9  1  8  4  5  7
#2  2  6  7  3  2  9  4  6  1  3  7  3  5  5  7

Reshape

cols = pd.unique(df.columns)  # Preserves Order
pd.DataFrame(df.values.reshape(-1, len(cols)), columns=cols)
#   B  A  C  D  E
#0  3  3  7  2  4
#1  7  2  1  2  1
#2  1  4  5  1  1
#3  5  2  8  4  3
#4  5  8  3  5  9
#5  1  8  4  5  7
#6  2  6  7  3  2
#7  9  4  6  1  3
#8  7  3  5  5  7

`pd.wide_to_long`

Useful when your columns are not in the same tiling order, or if you have more of some than others. Requires you to modify the column names by adding _N for which occurrence of the column it is.

cols = pd.unique(df.columns)
s = pd.Series(df.columns).groupby(df.columns).cumcount()
df.columns = [f'{col}_{N}' for col,N in zip(df.columns, s)]

pd.wide_to_long(df.reset_index(), stubnames=cols, i='index', j='num', sep='_').reset_index(drop=True)
#   B  A  C  D  E
#0  3  3  7  2  4
#1  5  2  8  4  3
#2  2  6  7  3  2
#3  7  2  1  2  1
#4  5  8  3  5  9
#5  9  4  6  1  3
#6  1  4  5  1  1
#7  1  8  4  5  7
#8  7  3  5  5  7

Solution 3:

The following example is relevant when you exactly know where your columns are. Building on ALollz's code:

import numpy as np
import pandas as pd

np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (3, 15)),
                  columns=list('BACDE')*3)
#   B  A  C  D  E  B  A  C  D  E  B  A  C  D  E
#0  3  3  7  2  4  7  2  1  2  1  1  4  5  1  1
#1  5  2  8  4  3  5  8  3  5  9  1  8  4  5  7
#2  2  6  7  3  2  9  4  6  1  3  7  3  5  5  7

# Using iloc

df1 = df.iloc[:, :5]

df2 = df.iloc[:,5:10]

df3 = df.iloc[:,10:]

df_final= pd.concat([df1,df2,df3]).reset_index(drop=True)

Result df_final:

    B   A   C   D   E

0   3   3   7   2   4
1   5   2   8   4   3
2   2   6   7   3   2
3   7   2   1   2   1
4   5   8   3   5   9
5   9   4   6   1   3
6   1   4   5   1   1
7   1   8   4   5   7
8   7   3   5   5   7

Python stackoverflow Examples