Skip to content Skip to sidebar Skip to footer

Can Pandas Sparseseries Store Values In The Float16 Dtype?

The reason why I want to use a smaller data type in the sparse pandas containers is to reduce memory usage. This is relevant when working with data that originally uses bool (e.g.

Solution 1:

The SparseArray constructor can be used to convert its underlying ndarray's dtype. To convert all sparse series in a dataframe, one can iterate over the df's series, convert their arrays, and replace the series with converted versions.

import pandas as pd
import numpy as np

defconvert_sparse_series_dtype(sparse_series, dtype):
    dtype = np.dtype(dtype)
    if'float'notinstr(dtype):
        raise TypeError('Sparse containers only support float dtypes')

    sparse_array = sparse_series.values
    converted_sp_array = pd.SparseArray(sparse_array, dtype=dtype)

    converted_sp_series = pd.SparseSeries(converted_sp_array)
    return converted_sp_series


defconvert_sparse_columns_dtype(sparse_dataframe, dtype):
    for col_name in sparse_dataframe:
        ifisinstance(sparse_dataframe[col_name], pd.SparseSeries):
            sparse_dataframe.loc[:, col_name] = convert_sparse_series_dtype(
                 sparse_dataframe[col_name], dtype
            )

This achieves the stated purpose of reducing the sparse dataframe's memory footprint:

In []: sparse_df.info()
<class'pandas.sparse.frame.SparseDataFrame'>RangeIndex:19849 entries, 0to19848Columns:145 entries, topic.party_nl.p.pvda to topic.sub_cat_Reizen
dtypes: float64(145)
memory usage: 1.1 MB

In []: convert_sparse_columns_dtype(sparse_df, 'float16')In []: sparse_df.info()
<class'pandas.sparse.frame.SparseDataFrame'>RangeIndex:19849 entries, 0to19848Columns:145 entries, topic.party_nl.p.pvda to topic.sub_cat_Reizen
dtypes: float16(145)
memory usage: 279.2 KB

In []: bool_df.equals(sparse_df.to_dense().astype('bool'))
Out[]: True

It is, however, a somewhat lousy solution, because the converted dataframe behaves unpredictibly when it interacts with other dataframes. For instance, when converted sparse dataframes are concatenated with other dataframes, all contained series become dense series. This is not the case for unconverted sparse dataframes. They remain sparse series in the resulting dataframe.

Post a Comment for "Can Pandas Sparseseries Store Values In The Float16 Dtype?"