Crawl Website From List Of Values Using Scrapy
I have a list of NPIs which I want to scrape the names of the providers for from npidb.org The NPI values are stored in a csv file. I am able to do it manually by pasting the URLs
Solution 1:
Assume you have a list of npi from csv file, then you can simply use format
to change the website address as following(I also add the part to get list from csv file. If you have it already, you can omit that part):
defstart_requests(self):
# get npis from csv file
npis = []
withopen('test.csv', 'r') as f:
for line in f.readlines():
l = line.strip()
npis.append((l))
# generate the list of address depending on npi
start_urls = []
for npi in npis:
start_urls.append('https://npidb.org/npi-lookup/?npi={}'.format(npi))
for url in start_urls:
yield scrapy.Request(url=url, callback=self.parse)
Solution 2:
Well, it depends on the structure of your csv file, but if it contains the npis in separate lines, you could do something like
defstart_requests(self):
withopen('npis.csv') as f:
for line in f:
yield scrapy.Request(
url='https://npidb.org/npi-lookup/?npi={}'.format(line.strip()),
callback=self.parse
)
Post a Comment for "Crawl Website From List Of Values Using Scrapy"