Skip to content Skip to sidebar Skip to footer

Excel Worksheet To Numpy Array

I'm trying to do an unbelievably simple thing: load parts of an Excel worksheet into a Numpy array. I've found a kludge that works, but it is embarrassingly unpythonic: say my work

Solution 1:

You could do

A = np.array([[i.value for i in j] for j in ws['C1':'E38']])

EDIT - further explanation. (firstly thanks for introducing me to openpyxl, I suspect I will use it quite a bit from time to time)

  1. the method of getting multiple cells from the worksheet object produces a generator. This is probably much more efficient if you want to work your way through a large sheet as you can start straight away without waiting for it all to load into your list.
  2. to force a generator to make a list you can either use list(ws['C1':'E38']) or a list comprehension as above
  3. each row is a tuple (even if only one column wide) of
  4. Cell objects. These have a lot more about them than just a number but if you want to get the number for your array you can use the .value attribute. This is really the crux of your question, csv files don't contain the structured info of an excel spreadsheet.
  5. there isn't (as far as I can tell) a built in method for extracting values from a range of cells so you will have to do something effectively as you have sketched out.

The advantages of doing it my way are: no need to work out the dimension of the array and make an empty one to start with, no need to work out the corrected index number of the np array, list comprehensions faster. Disadvantage is that it needs the "corners" defining in "A1" format. If the range isn't know then you would have to use iter_rows, rows or columns

A = np.array([[i.value for i in j[2:5]]for j in ws.rows])

if you don't know how many columns then you will have to loop and check values more like your original idea

Solution 2:

If you don't need to load data from multiple files in an automated manner, the package tableconvert I recently wrote may help. Just copy and paste the relevant cells from the excel file into a multiline string and use the convert() function.

import numpy as np
from tableconvert.converter import convert

array = convert("""
123    456    3.14159
SOMETEXT    2,71828    0
""")

print(type(array))
print(array)

Output:

<class 'numpy.ndarray'>
[[ 123.       456.         3.14159]
 [       nan    2.71828    0.     ]]

Post a Comment for "Excel Worksheet To Numpy Array"