Skip to content Skip to sidebar Skip to footer

Scraping Data From Href

I was trying to get the postcodes for DFS, for that i tried getting the href for each shop and then click on it, the next page has shop location from which i can get the postal cod

Solution 1:

So after playing with this for a little while, I don't think the best way to do this is with selenium. It would require using driver.back() and waiting for elements to re-appear, and a whole mess of other stuff. I was able to get what you want using just requests, re and bs4. re is included in the Python standard library and if you haven't installed requests, you can do it with pip as follows: pip install requests

from bs4 import BeautifulSoup
import re
import requests

base_url = 'http://www.localstore.co.uk'
url = 'http://www.localstore.co.uk/stores/75061/dfs/'
res = requests.get(url)
soup = BeautifulSoup(res.text)

shops = []

links = soup.find_all('a', href=re.compile('.*\/store\/.*'))

for l in links:
    full_link = base_url + l['href']
    town = l['title'].split(',')[1].strip()
    res = requests.get(full_link)
    soup = BeautifulSoup(res.text)
    info = soup.find('span', attrs={"itemprop": "postalCode"})
    postalcode = info.text
    shops.append(dict(town_name=town, postal_code=postalcode))

print shops

Solution 2:

Your code has some problems. You are using an infinite loop without breaking condition. Also shops= {} is a dict but you are using append method on it. Instead of using selenium you can use python-requests or urllib2.

But In your code you can do something like this,

driver = webdriver.Firefox()
driver.get('http://www.localstore.co.uk/stores/75061/dfs/')
html = driver.page_source
soup = BeautifulSoup(html)
listings = soup.select('td.searchResults')

for l in listings:
    driver.find_element_by_css_selector("a[title*='DFS']").click()
    shops = []
    html = driver.page_source
    soup = BeautifulSoup(html)
    info = soup.find('span', attrs={"itemprop": "postalCode"})
    if info:
        info_text = info.get_text()
        shops.append(info_text)
    print shops

In Beautifulsoup you can find a tag by it's attribute like this:

soup.find('span', attrs={"itemprop": "postalCode"})

also if it doesn't find anything, it will return None and .get_text() method on it will raise AttributeError. So check first before applying .get_text()

Post a Comment for "Scraping Data From Href"