Counting Html Images With Python

June 27, 2023 Post a Comment

I need some feedback on how to count HTML images with Python 3.01 after extracting them, maybe my regular expression are not used properly. Here is my code: import re, os import ur

Solution 1:

using beautifulsoup4 (an html parser) rather than a regex:

import urllib.request

import bs4  # beautifulsoup4

html = urllib.request.urlopen('http://www.imgur.com/').read()
soup = bs4.BeautifulSoup(html)
images = soup.findAll('img')
print(len(images))

Solution 2:

A couple of points about your code:

It's much easiser to use a dedicated HTML parsing library to parse your pages (that's the python way).. I personally prefer Beautiful Soup
You're over-writing your line variable in the loop
total will always be 0 with your current logic
no need to compile your RE, as it will be cached by the interpreter
you're discarding your exception, so no clues about what's going on in the code!
there could be other attributes to the <img> tags.. so your Regex is a little basic, also, use the re.findall() method to catch multiple instances on the same line...

changing your code around a little, I get:

import re
from urllib.request import urlopen

def get_image(url):

    total  = 0
    page   = urlopen(url).readlines()

    for line in page:

        hit   = re.findall('<img.*?>', str(line))
        total += len(hit)

    print('{0} Images total: {1}'.format(url, total))

get_image("http://google.com")
get_image("http://flickr.com")

Python stackoverflow Examples

Counting Html Images With Python

Solution 1:

Solution 2:

Post a Comment for "Counting Html Images With Python"