Scraping Data From Href
Solution 1:
So after playing with this for a little while, I don't think the best way to do this is with selenium. It would require using driver.back()
and waiting for elements to re-appear, and a whole mess of other stuff. I was able to get what you want using just requests
, re
and bs4
. re
is included in the Python standard library and if you haven't installed requests
, you can do it with pip as follows: pip install requests
from bs4 import BeautifulSoup
import re
import requests
base_url = 'http://www.localstore.co.uk'
url = 'http://www.localstore.co.uk/stores/75061/dfs/'
res = requests.get(url)
soup = BeautifulSoup(res.text)
shops = []
links = soup.find_all('a', href=re.compile('.*\/store\/.*'))
for l in links:
full_link = base_url + l['href']
town = l['title'].split(',')[1].strip()
res = requests.get(full_link)
soup = BeautifulSoup(res.text)
info = soup.find('span', attrs={"itemprop": "postalCode"})
postalcode = info.text
shops.append(dict(town_name=town, postal_code=postalcode))
print shops
Solution 2:
Your code has some problems. You are using an infinite loop without breaking condition. Also shops= {}
is a dict
but you are using append
method on it.
Instead of using selenium
you can use python-requests or urllib2.
But In your code you can do something like this,
driver = webdriver.Firefox()
driver.get('http://www.localstore.co.uk/stores/75061/dfs/')
html = driver.page_source
soup = BeautifulSoup(html)
listings = soup.select('td.searchResults')
for l in listings:
driver.find_element_by_css_selector("a[title*='DFS']").click()
shops = []
html = driver.page_source
soup = BeautifulSoup(html)
info = soup.find('span', attrs={"itemprop": "postalCode"})
if info:
info_text = info.get_text()
shops.append(info_text)
print shops
In Beautifulsoup you can find a tag by it's attribute like this:
soup.find('span', attrs={"itemprop": "postalCode"})
also if it doesn't find anything, it will return None
and .get_text()
method on it will raise AttributeError
. So check first before applying .get_text()
Post a Comment for "Scraping Data From Href"