Skip to content Skip to sidebar Skip to footer

Url Text File Not Found When Deployed To Scraping Hub And Spider Run

Problem My spider relies on a .txt file that contains the URLs the spider goes to. I have placed that file in the same directory the spider code is located, and in every director

Solution 1:

You need to declare the files in the package_data section of your setup.py file.

For example, if your Scrapy project has the following structure:

myproject/
  __init__.py
  settings.py
  resources/
    cities.txt
scrapy.cfg
setup.py

You would use the following in your setup.py to include the cities.txt file:

setup(
    name='myproject',
    version='1.0',
    packages=find_packages(),
    package_data={
        'myproject': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe=False,
)

Note that the zip_safe flag is set to False , as this may be needed in some cases.

Now you can access the cities.txt file content from setting.py like this:

importpkgutildata= pkgutil.get_data("myproject", "resources/cities.txt")

Post a Comment for "Url Text File Not Found When Deployed To Scraping Hub And Spider Run"