Skip to content Skip to sidebar Skip to footer

Google Cloud Dataflow Python SDK Updates

On using the Google Cloud Dataflow Python SDK happens that at start reading a lot of data from the Cloud Storage it takes a while and causes the error AssertionError: Job did not r

Solution 1:

As per official Google Cloud Platform docs here:

The Cloud Dataflow SDK 2.5.0 is the last Cloud Dataflow SDK release that is separate from the Apache Beam SDK releases. The Cloud Dataflow service fully supports official Apache Beam SDK releases.

So yes, google-cloud-dataflow 2.5.0 is the last release, and from that version on you should use the official apache-beam releases. Bear in mind that you will need to install the library using the extra [gcp]:

pip install apache-beam[gcp]

Finally, the fix in 6535 should be applied already, since I installed the library "pip install apache-beam[gcp]===2.8.0" and I went to the file "apache_beam/runners/dataflow/dataflow_runner.py" and it has the fix applied there.


Post a Comment for "Google Cloud Dataflow Python SDK Updates"