Skip to content Skip to sidebar Skip to footer

Using Python Requests To Mask As A Browser And Download A File

I'm trying to use the python requests library to download a file from this link: http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download Clic

Solution 1:

You don't need to supply any headers:

import requests

url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"

response = requests.get(url, stream=True)
print(response.status_code)

# Write server response to filewithopen("nasdaq.csv", 'wb') as f:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)

You can also just write the content:

import requests

# Write server response to filewithopen("nasdaq.csv", 'wb') as f:
       f.write(requests.get(url).content)

Or use urlib:

urllib.urlretrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download","nasdaq.csv")

All methods give you the 3137 line csv file:

"Symbol","Name","LastSale","MarketCap","ADR TSO","IPOyear","Sector","Industry","Summary Quote",
"TFSC","1347 Capital Corp.","9.79","58230920","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfsc",
"TFSCR","1347 Capital Corp.","0.15","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscr",
"TFSCU","1347 Capital Corp.","10","41800000","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscu",
"TFSCW","1347 Capital Corp.","0.178","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscw",
"PIH","1347 Property Insurance Holdings, Inc.","7.51","46441171.61","n/a","2014","Finance","Property-Casualty Insurers","http://www.nasdaq.com/symbol/pih",
"FLWS","1-800 FLOWERS.COM, Inc.","7.87","510463090.04","n/a","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",
"FCTY","1st Century Bancshares, Inc","7.81","80612492.62","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/fcty",
"FCCY","1st Constitution Bancorp (NJ)","12.39","93508122.96","n/a","n/a","Finance","Savings Institutions","http://www.nasdaq.com/symbol/fccy",
"SRCE","1st Source Corporation","30.54","796548769.38","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/srce",
"VNET","21Vianet Group, Inc.","20.26","1035270865.78","51099253","2011","Technology","Computer Software: Programming, Data Processing","http://www.nasdaq.com/symbol/vnet",
   ...................................

If for some reason it does not work for you then you might need to upgrade your version of requests.

Solution 2:

You actually don't need those headers. You don't even need to save to a file.

import requests
import csv

url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
response = requests.get(url)
data = csv.DictReader(response.content.splitlines())
for row indata:
    print row

Sample output:

{'Sector': 'Technology', 'LastSale': '2.46', 'Name': 'Zynga Inc.', '': '', 'Summary Quote': 'http://www.nasdaq.com/symbol/znga', 'Symbol': 'ZNGA', 'Industry': 'EDP Services', 'MarketCap': '2295110123.7', 'IPOyear': '2011', 'ADR TSO': 'n/a'}

You can use csv.reader instead of DictReader if you like.

Solution 3:

An alternative, and shorter, solution for this problem would be:

importurllibdownloadFile= urllib.URLopener()
downloadFile.retrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download", "companylist.csv")

This code uses the URL Library to create URL Request object (downloadFile) and then it retrieves the data from the NASDAQ link and saves it as companylist.csv.

According to the Python documentation, if you want to send a custom User-Agent (such as the Firefox User-Agent), you can subclass URLopener and set the version attribute to the user-agent you would like to use.

Note: According to the Python documentation, as of Python v3.3, urllib.URLopener() is deprecated. As such, it may eventually be removed from the Python standards. However, as of Python v3.6 (Dev), urllib.URLopener() is still supported as a legacy interface.

Post a Comment for "Using Python Requests To Mask As A Browser And Download A File"