Url Text File Not Found When Deployed To Scraping Hub And Spider Run
Problem My spider relies on a .txt file that contains the URLs the spider goes to. I have placed that file in the same directory the spider code is located, and in every director
Solution 1:
You need to declare the files in the package_data section of your setup.py
file.
For example, if your Scrapy project has the following structure:
myproject/
__init__.py
settings.py
resources/
cities.txt
scrapy.cfg
setup.py
You would use the following in your setup.py
to include the cities.txt
file:
setup(
name='myproject',
version='1.0',
packages=find_packages(),
package_data={
'myproject': ['resources/*.txt']
},
entry_points={
'scrapy': ['settings = myproject.settings']
},
zip_safe=False,
)
Note that the zip_safe
flag is set to False , as this may be needed in some cases.
Now you can access the cities.txt
file content from setting.py
like this:
importpkgutildata= pkgutil.get_data("myproject", "resources/cities.txt")
Post a Comment for "Url Text File Not Found When Deployed To Scraping Hub And Spider Run"