Skip to content Skip to sidebar Skip to footer

How To Read Only 5 Records From S3 Bucket And Return It Without Getting All Data Of Csv File

Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the

Solution 1:

You can use the pandas capability of reading a file in chunks, just loading as much data as you need.

data_iter = pd.read_csv(obj['Body'], chunksize = 5)
data = data_iter.get_chunk()
print(data)

Solution 2:

You can use a HTTP Range: header (see RFC 2616), which take a byte range argument. S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file.

Sample code:

import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())

This will return only the byte_range_data provided in the header.

But you will need to modify this to convert the string into Dataframe. Maybe read + join for the \t and \n present in the string coming from the .csv file


Post a Comment for "How To Read Only 5 Records From S3 Bucket And Return It Without Getting All Data Of Csv File"