How To Put Lists Of Numpy Ndarrays Into Columns Of Pandas Dataframe?
I have input which is formatted like so [notice there will be more than just a and b]: inp = { 'a': np.zeros((1000, 3, 4)), 'b': np.zeros((1000, 5, 2)), 'c': np.zeros((
Solution 1:
You can use np.ndarray.reshape
then use np.column_stack
here.
inp = {
"a": np.zeros((1000, 3, 4)),
"b": np.zeros((1000, 5, 2)),
"c": np.zeros((1000, 7, 8, 2,)),
"d": np.zeros((1000, 6,)),
}
arrs = [arr.reshape(1000, -1) for arr in inp.values()]
out = np.column_stack(arrs)
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
out.shape
(1000, 140) # 3*4 + 5*2 + 7*8*2 + 6 = 140
For columns you can use itertools.product
and use itertools.chain.from_iterable
here.
from itertools import product, chain
shapes = [(name, arr.shape[1:]) for name, arr in inp.items()]
defcol_names(val):
*prefix, shape = val
names = [map(str, range(i)) for i in shape]
returnmap('_'.join, product(prefix, *names))
cols = [*chain.from_iterable(col_names(val) for val in shapes)]
len(cols) # 140
cols
['a_0_0',
'a_0_1',
'a_0_2',
...
'a_2_2',
'a_2_3',
'b_0_0',
'b_0_1',
...
'b_4_1',
'c_0_0_0',
...
'c_6_6_1',
'c_6_7_0',
'c_6_7_1',
'd_0',
...
'd_5']
Now use cols
as columns in your dataFrame.
pd.DataFrame(out, columns=cols)
a_0_0 a_0_1 a_0_2 a_0_3 a_1_0 a_1_1 ... d_0 d_1 d_2 d_3 d_4 d_5
00.00.00.00.00.00.0 ... 0.00.00.00.00.00.010.00.00.00.00.00.0 ... 0.00.00.00.00.00.020.00.00.00.00.00.0 ... 0.00.00.00.00.00.030.00.00.00.00.00.0 ... 0.00.00.00.00.00.040.00.00.00.00.00.0 ... 0.00.00.00.00.00.0
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
9950.00.00.00.00.00.0 ... 0.00.00.00.00.00.09960.00.00.00.00.00.0 ... 0.00.00.00.00.00.09970.00.00.00.00.00.0 ... 0.00.00.00.00.00.09980.00.00.00.00.00.0 ... 0.00.00.00.00.00.09990.00.00.00.00.00.0 ... 0.00.00.00.00.00.0
[1000 rows x 140 columns]
Solution 2:
You can create two dataframes and then pd.concat
them:
inp = {
"a": np.zeros((1000, 3, 4)),
"b": np.zeros((1000, 5, 2))
}
df1 = pd.DataFrame([{'a_{}_{}'.format(i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in inp['a']])
df2 = pd.DataFrame([{'b_{}_{}'.format(i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in inp['b']])
print( pd.concat([df1, df2], axis=1) )
Prints:
a_0_0 a_0_1 a_0_2 a_0_3 a_1_0 ... b_2_1 b_3_0 b_3_1 b_4_0 b_4_1
00.00.00.00.00.0 ... 0.00.00.00.00.010.00.00.00.00.0 ... 0.00.00.00.00.020.00.00.00.00.0 ... 0.00.00.00.00.030.00.00.00.00.0 ... 0.00.00.00.00.040.00.00.00.00.0 ... 0.00.00.00.00.0
.. ... ... ... ... ... ... ... ... ... ... ...
9950.00.00.00.00.0 ... 0.00.00.00.00.09960.00.00.00.00.0 ... 0.00.00.00.00.09970.00.00.00.00.0 ... 0.00.00.00.00.09980.00.00.00.00.0 ... 0.00.00.00.00.09990.00.00.00.00.0 ... 0.00.00.00.00.0
[1000 rows x 22 columns]
EDIT: To have arbitrary number of keys:
inp = {
"a": np.zeros((1000, 3, 4)),
"b": np.zeros((1000, 5, 2))
}
dfs = []
for k, v in inp.items():
dfs.append( pd.DataFrame([{'{}_{}_{}'.format(k, i1, i2): v2 for i1, v1 inenumerate(row) for i2, v2 inenumerate(v1)} for row in v]) )
print( pd.concat(dfs, axis=1) )
Post a Comment for "How To Put Lists Of Numpy Ndarrays Into Columns Of Pandas Dataframe?"