Skip to content Skip to sidebar Skip to footer

Can't Connect To Cassandra From Pyspark

I'm trying to connect to cassandra from Pyspark and run some queries. Here are all the steps I have done: First I installed Spark: wget http://www.apache.org/dyn/closer.lua/spark/s

Solution 1:

  1. run pyspark with:
    ./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.2
  2. In the code, create dict with connection config
    hosts = {"spark.cassandra.connection.host": 'host_dns_or_ip_1,host_dns_or_ip_2,host_dns_or_ip_3'}
  3. In the code, Create Dataframe using connection config
    data_frame = sqlContext.read.format("org.apache.spark.sql.cassandra").options(**hosts).load(keyspace="your_keyspace", table="your_table")

Solution 2:

The following works for me:

./bin/pyspark --master local[*] --packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.2 --conf spark.cassandra.connection.host=host.name --conf spark.cassandra.auth.username=cassandraname --conf spark.cassandra.auth.password=cassandrapwd

>>> df = spark.read.format("org.apache.spark.sql.cassandra")\
   .options(table="tablename", keyspace="keyspacename").load()

>>> df.show()

Post a Comment for "Can't Connect To Cassandra From Pyspark"