Spark-submit With Specific Python Librairies
I have a pyspark code depending on third party librairies. I want to execute this code on my cluster which run under mesos. I do have a zipped version of my python environment that
Solution 1:
To submit you zip folder to python spark, you need to send the files using :
spark-submit --py-files your_zip your_code.py
While using it inside your code, you will have to use below statement:
sc.addPyFile("your_zip")
import your_zip
Hope this will help!!
Solution 2:
May be helpful to some people, if you have dependencies.
I found a solution on how to properly load a virtual environment to the master and all the slave workers:
virtualenv venv --relocatable
cd venv
zip -qr ../venv.zip *
PYSPARK_PYTHON=./SP/bin/python spark-submit --master yarn --deploy-mode cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./SP/bin/python --driver-memory 4G --archives venv.zip#SP filename.py
Post a Comment for "Spark-submit With Specific Python Librairies"