Issues With Web Scraping Using Beautiful Soup On Dynamic HTML Websites
I'm trying to scrape a range of HTML files using Beautiful Soup, however I'm getting some really weird results, I think this is because the query is dynamic and I'm not very experi
Solution 1:
The page is loading dynamically through Ajax. Looking at network inspector, the page loads all data from very big JSON file located at https://www.acc.co.nz/for-providers/treatment-recovery/work-type-detail-sheets/getSheets. To load all job data, you can use this script:
url = "https://www.acc.co.nz/for-providers/treatment-recovery/work-type-detail-sheets/getSheets"
import requests
import json
headers = {'X-Requested-With': 'XMLHttpRequest'}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
# For printing all data in pretty form uncoment this line:
# print(json.dumps(data, indent=4, sort_keys=True))
for d in data:
print(f'ID:\t{d["ID"]}')
print(f'Job Title:\t{d["JobTitle"]}')
print(f'Created:\t{d["Created"]}')
print('*' * 80)
# Available keys in this JSON:
# ClassName
# LastEdited
# Created
# ANZSCO
# JobTitle
# Description
# WorkTasks
# WorkEnvironment
# PhysicalMentalDemands
# Comments
# EntryRequirements
# Group
# ID
# RecordClassName
This prints:
ID: 2327
Job Title: Watch and Clock Maker and Repairer
Created: 2017-07-11 11:33:52
********************************************************************************
ID: 2328
Job Title: Web Administrator
Created: 2017-07-11 11:33:52
********************************************************************************
ID: 2329
Job Title: Welder
Created: 2017-07-11 11:33:52
...and so on
In the script I wrote available keys you can use to access your specific job data.
Post a Comment for "Issues With Web Scraping Using Beautiful Soup On Dynamic HTML Websites"