Skip to content Skip to sidebar Skip to footer

Scrapy Csv Output "randomly" Missing Fields

My scrapy crawler correctly reads all fields as the debug output shows: 2017-01-29 02:45:15 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.willhaben.at/iad/immobilie

Solution 1:

If you yielding scraped results as dict, CSV columns will be populated from the keys of first yielded dict:

def_write_headers_and_set_fields_to_export(self, item):
    if self.include_headers_line:
        ifnot self.fields_to_export:
            ifisinstance(item, dict):
                # for dicts try using fields of the first item
                self.fields_to_export = list(item.keys())
            else:
                # use fields declared in Item
                self.fields_to_export = list(item.fields.keys())
        row = list(self._build_row(self.fields_to_export))
        self.csv_writer.writerow(row)

So you should either define and populate Item with all the fields defined explicitly, or write custom CSVItemExporter.

Post a Comment for "Scrapy Csv Output "randomly" Missing Fields"