Can't Connect To Cassandra From Pyspark
I'm trying to connect to cassandra from Pyspark and run some queries. Here are all the steps I have done: First I installed Spark: wget http://www.apache.org/dyn/closer.lua/spark/s
Solution 1:
- run pyspark with:
./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.2
- In the code, create dict with connection config
hosts = {"spark.cassandra.connection.host": 'host_dns_or_ip_1,host_dns_or_ip_2,host_dns_or_ip_3'}
- In the code, Create Dataframe using connection config
data_frame = sqlContext.read.format("org.apache.spark.sql.cassandra").options(**hosts).load(keyspace="your_keyspace", table="your_table")
Solution 2:
The following works for me:
./bin/pyspark --master local[*] --packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.2 --conf spark.cassandra.connection.host=host.name --conf spark.cassandra.auth.username=cassandraname --conf spark.cassandra.auth.password=cassandrapwd
>>> df = spark.read.format("org.apache.spark.sql.cassandra")\
.options(table="tablename", keyspace="keyspacename").load()
>>> df.show()
Post a Comment for "Can't Connect To Cassandra From Pyspark"