Spark Streaming with Twitter and Kafka

Requirements

Java (v1.8.0_261)
Scala (v2.11.8)
Spark (2.4.0)
sbt
Kafka (v2.11-2.4.0)
Elastic Search (v7.9.3)
Kibana (v7.9.3-darwin-x86_64)
Logstash (v7.9.3)

Execution

How to Implement

The following instructions are for Linux/MacOS.

Assumptions: All the required tools (sbt, elasticsearch, kafka, kibana and logstash) are downloaded and twitter developer account is created.

To run the project, download the folder and in the terminal follow the steps:

STEP 1: Change direction (cd) to Kafka folder and start the zookeeper server by running the following command: sh bin/zookeeper-server-start.sh config/zookeeper.properties

STEP 2: In another terminal, cd to Kafka folder and run the Kafka server like following: sh bin/kafka-server-start.sh config/server.properties

STEP 3: In another terminal, cd to Kafka folder and create a topic. sh bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topicA

STEP 4: Now, start Kafka consumer and check connection logs. sh bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topicA --from-beginning

STEP 5: Now cd to elasticsearch/bin and start the server. ./elasticsearch localhost:9200 is now running.

STP 6: Now run the Kibana server by cd to kibana/bin. ./kibana localhost:5601 is now working. On the localhost, go to the dev tools and run GET /_cat/health?v&pretty Ideally, cluster health should be at 100%.

STEP 7: Logstash configuration is created as ‘logstash-simple.conf’ and located in the project (kafka) folder. To load the configuration, copy the path of the project and cd to logstach/bin and run ./logstash -f pathToProject/logstash-simple.conf logstash-simple.conf configuration is as follows: input { kafka { bootstrap_servers => "localhost:9092" topics => ["topicA"] } } output { elasticsearch { hosts => ["localhost:9200"] index => "index3" } }

STEP 8: Now create the fat jar using sbt by cd to the project (kafka) folder and running: sbt assembly Jar is created at pathToProject/target/scala-2.11/kafka-assembly-0.1.jar Project is now compiled

STEP 9: Now run the compiled project as following: spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.6 --class TwitterSentimentAnalysis pathToProject/target/scala-2.11/kafka-assembly-0.1.jar topicA accessKey secretKey tokenAccessKey tokenSecretKey Where pathToProject is your path where project is downloaded and accessKey, secretKey, tokenAccessKey, tokenSecretKey are keys from your twitter developer account.

STEP 10: Once the project is running and data is streaming, go to localhost:5601 and create an index ‘index3’. Once the index is created, you can see the data incoming in the ‘discovery’ section and create various visualisations. Note: If you are not able to create an index because of a server error, copy paste the following two lines in the same sequence in a new terminal window while the project is still running.

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Now, try creating index again.

Visualization

Analyzed data is read by consumer and is being checked for consumer logs.

Data from consumer log in Kibana ‘discovery’ looks like:

coming in at about 650 tweets on average per minute for our keywords.

The following pie chart shows the tweet counts based on the classification.

The following timelion graph shows classification of tweets by count per minute.

Summary

By looking at the visualizations and analysis, we can say that approximately 78% of the tweets are classified as NEGATIVE, about 14% are NEUTRAL and 8% are POSITIVE. At any given point of time, negative tweets about ‘elections’ are considerable high than positive or neutral and the number is varying at a constant rate over time.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
twitterAnalysis/kafka		twitterAnalysis/kafka
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming with Twitter and Kafka

Requirements

Execution

How to Implement

Visualization

Summary

About

Uh oh!

Releases

Packages

Languages

codebankss/structured-streaming

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming with Twitter and Kafka

Requirements

Execution

How to Implement

Visualization

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages