Counting Html Images With Python
I need some feedback on how to count HTML images with Python 3.01 after extracting them, maybe my regular expression are not used properly. Here is my code: import re, os import ur
Solution 1:
using beautifulsoup4 (an html parser) rather than a regex:
import urllib.request
import bs4 # beautifulsoup4
html = urllib.request.urlopen('http://www.imgur.com/').read()
soup = bs4.BeautifulSoup(html)
images = soup.findAll('img')
print(len(images))
Solution 2:
A couple of points about your code:
- It's much easiser to use a dedicated HTML parsing library to parse your pages (that's the python way).. I personally prefer Beautiful Soup
- You're over-writing your
line
variable in the loop total
will always be 0 with your current logic- no need to compile your RE, as it will be cached by the interpreter
- you're discarding your exception, so no clues about what's going on in the code!
- there could be other attributes to the
<img>
tags.. so your Regex is a little basic, also, use there.findall()
method to catch multiple instances on the same line...
changing your code around a little, I get:
import re
from urllib.request import urlopen
def get_image(url):
total = 0
page = urlopen(url).readlines()
for line in page:
hit = re.findall('<img.*?>', str(line))
total += len(hit)
print('{0} Images total: {1}'.format(url, total))
get_image("http://google.com")
get_image("http://flickr.com")
Post a Comment for "Counting Html Images With Python"