Skip to content Skip to sidebar Skip to footer

Strange Error During Conversion From Panda Dataframe To Numpy Array

I have a pandas dataframe with two columns: 'review'(text) and 'sentiment'(1/0) X_train = df.loc[0:25000, 'review'].values y_train = df.loc[0:25000, 'sentiment'].values X_test = df

Solution 1:

The df.loc is label based, i.e. it includes the upper bound. Use iloc:

df.iloc[:25000, 1].values # here 1is the columnof'review'for example

if you want NumPy-like slicing.

With iloc you need to supply both rows and columns as integers or integer slices.

Example

>>>df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})>>>df
   a  b
0  1  4
1  2  5
2  3  6

This is label based, i.e. upper bound inclusive:

>>> df.loc[:1, 'a']0112Name: a, dtype: int64

This works like slicing in NumPy, i.e. upper bound exclusive:

>>> df.iloc[:2, 0]0112Name: a, dtype: int64

Post a Comment for "Strange Error During Conversion From Panda Dataframe To Numpy Array"