Scrapy Interpreting Html Entities On Extract
During a crawling, I captured links usually that way: response.xpath('//a[contains(@class, something)/@href').extract() But for some reason in that specific page was not working.
Solution 1:
After sometime, I discovered that the same page on firefox was rendering weird... My problem has been happening because the page being crawled was with the content-type as "text/xml" and not html.
To fix my code I did other selector:
sel = scrapy.Selector(text=response.body)
sel.xpath("//a[contains(@class, something)/@href").extract()
And now I have the correct result!
['details?lm=&printerView=true&accessType=1&id=A43', (...)]
Post a Comment for "Scrapy Interpreting Html Entities On Extract"