Custom Apache Beam Python Version In Dataflow
Solution 1:
I will answer myself as I got the answer of this question at one Apache Beam's JIRA I have been helping with.
If you want to use a custom Apache Beam Python version in Google Cloud Dataflow (that is, run your pipeline with the --runner DataflowRunner
, you must use the option --sdk_location <apache_beam_v1.2.3.tar.gz>
when you run your pipeline; where <apache_beam_v1.2.3.tar.gz>
is the location of the corresponding packaged version that you want to use.
For example, as of this writing, if you have checked out the HEAD
version of the Apache Beam's git repository, you have to first package the repository by navigating to the Python SDK with cd beam/sdks/python
and then run python setup.py sdist
(a compressed tar file will be created in the dist
subdirectory).
Thereafter you can run your pipeline like this:
python your_pipeline.py [...your_options...] --sdk_location beam/sdks/python/dist/apache-beam-2.2.0.dev0.tar.gz
Google Cloud Dataflow will use the supplied SDK.
Post a Comment for "Custom Apache Beam Python Version In Dataflow"