Skip to content Skip to sidebar Skip to footer

How To Put Lists Of Numpy Ndarrays Into Columns Of Pandas Dataframe?

I have input which is formatted like so [notice there will be more than just a and b]: inp = { 'a': np.zeros((1000, 3, 4)), 'b': np.zeros((1000, 5, 2)), 'c': np.zeros((

Solution 1:

You can use np.ndarray.reshape then use np.column_stack here.

inp = {
    "a": np.zeros((1000, 3, 4)),
    "b": np.zeros((1000, 5, 2)),
    "c": np.zeros((1000, 7, 8, 2,)),
    "d": np.zeros((1000, 6,)),
}

arrs = [arr.reshape(1000, -1) for arr in inp.values()]
out = np.column_stack(arrs)

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])
out.shape
(1000, 140) # 3*4 + 5*2 + 7*8*2 + 6 = 140

For columns you can use itertools.product and use itertools.chain.from_iterable here.

from itertools import product, chain
shapes = [(name, arr.shape[1:]) for name, arr in inp.items()]

defcol_names(val):
    *prefix, shape = val
    names = [map(str, range(i)) for i in shape]
    returnmap('_'.join, product(prefix, *names))
cols = [*chain.from_iterable(col_names(val) for val in shapes)]
len(cols) # 140
cols
['a_0_0',
 'a_0_1',
 'a_0_2',
 ...
 'a_2_2',
 'a_2_3',
 'b_0_0',
 'b_0_1',
 ...
 'b_4_1',
 'c_0_0_0',
 ...
 'c_6_6_1',
 'c_6_7_0',
 'c_6_7_1',
 'd_0',
 ...
 'd_5']

Now use cols as columns in your dataFrame.

pd.DataFrame(out, columns=cols)
     a_0_0  a_0_1  a_0_2  a_0_3  a_1_0  a_1_1  ...  d_0  d_1  d_2  d_3  d_4  d_5
00.00.00.00.00.00.0  ...  0.00.00.00.00.00.010.00.00.00.00.00.0  ...  0.00.00.00.00.00.020.00.00.00.00.00.0  ...  0.00.00.00.00.00.030.00.00.00.00.00.0  ...  0.00.00.00.00.00.040.00.00.00.00.00.0  ...  0.00.00.00.00.00.0
..     ...    ...    ...    ...    ...    ...  ...  ...  ...  ...  ...  ...  ...
9950.00.00.00.00.00.0  ...  0.00.00.00.00.00.09960.00.00.00.00.00.0  ...  0.00.00.00.00.00.09970.00.00.00.00.00.0  ...  0.00.00.00.00.00.09980.00.00.00.00.00.0  ...  0.00.00.00.00.00.09990.00.00.00.00.00.0  ...  0.00.00.00.00.00.0

[1000 rows x 140 columns]

Solution 2:

You can create two dataframes and then pd.concat them:

inp = {
    "a": np.zeros((1000, 3, 4)),
    "b": np.zeros((1000, 5, 2))
}

df1 = pd.DataFrame([{'a_{}_{}'.format(i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in inp['a']])
df2 = pd.DataFrame([{'b_{}_{}'.format(i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in inp['b']])

print( pd.concat([df1, df2], axis=1) )

Prints:

     a_0_0  a_0_1  a_0_2  a_0_3  a_1_0  ...  b_2_1  b_3_0  b_3_1  b_4_0  b_4_1
00.00.00.00.00.0  ...    0.00.00.00.00.010.00.00.00.00.0  ...    0.00.00.00.00.020.00.00.00.00.0  ...    0.00.00.00.00.030.00.00.00.00.0  ...    0.00.00.00.00.040.00.00.00.00.0  ...    0.00.00.00.00.0
..     ...    ...    ...    ...    ...  ...    ...    ...    ...    ...    ...
9950.00.00.00.00.0  ...    0.00.00.00.00.09960.00.00.00.00.0  ...    0.00.00.00.00.09970.00.00.00.00.0  ...    0.00.00.00.00.09980.00.00.00.00.0  ...    0.00.00.00.00.09990.00.00.00.00.0  ...    0.00.00.00.00.0

[1000 rows x 22 columns]

EDIT: To have arbitrary number of keys:

inp = {
    "a": np.zeros((1000, 3, 4)),
    "b": np.zeros((1000, 5, 2))
}

dfs = []
for k, v in inp.items():
    dfs.append( pd.DataFrame([{'{}_{}_{}'.format(k, i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in v])  )

print( pd.concat(dfs, axis=1) )

Post a Comment for "How To Put Lists Of Numpy Ndarrays Into Columns Of Pandas Dataframe?"