Big Data Pipeline
Lambda Architecture - Batch Layer
with
AngularJS
Java Restful Web Services
Apache Hadoop
Apache Spark
Apache Cassandra
on Amazon Web Services Cloud Platform
INGEST STORE Process Visualize
BIG Data Pipeline
Data Pipeline
AngularJS
Web App
Rest
Web Services
Apache
Web Logs
S3
Log/Data File
Spark
Engine
Spark
SQL
HDFS
Apache
Cassandra S3
HDFS
Apache
Cassandra
AngularJS
Web App
0255075100125
April
-7.507.51522.530
-4048121620
INGEST STORE PROCES
S
VISUALIZE
STORE
Interactive
Queries
BIG Data Batch Layer Pipeline
Spark Cluster
AngularJS
Web App
ClickStream
Data
Apache
Web Logs
Log/Data File
Spark
Streaming
Spark
SQL
Apache
Kafka
S3
HDFS
Apache
Cassandra
AngularJS
Web App
0255075100125
April
-7.507.51522.530
-4048121620
INGEST STREA
M
PROCES
S
VISUALIZE
STORE
Interactive
Queries
Spark Cluster
TCP
Sockets
BIG Data Real-Time Layer
Pipeline
Install Web Server
EC2 instance for Web Server
cat /etc/*-release
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
mkdir webserver
cd webserver
wget http://www-eu.apache.org/dist/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.tar.gz
tar xvzf apache-tomcat-8.0.36.tar.gz
ubuntu@ip-172-31-59-137:~/webserver/apache-tomcat-8.0.36/bin$ ./startup.sh
Commands to setup Apache Tomcat 8.0
Apache Tomcat 8.0 running on EC2 Instance
Install Apache Cassandra - 3 Node Cluster on AWS
3 EC2 instance for Cassandra Cluster
cat /etc/*-release
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
mkdir db
cd db
wget http://www-eu.apache.org/dist/cassandra/3.0.7/apache-cassandra-3.0.7-bin.tar.gz
tar xvzf apache-cassandra-3.0.7-bin.tar.gz
cd apache-cassandra-3.0.7/
cd apache-cassandra-3.0.7
bin/cassandra -f
bin/cqlsh
cassandra1 ——-> 52.87.183.121
cassandra2 ——-> 52.207.239.229
cassandra3 ——-> 54.174.185.29
Commands to setup Apache Cassandra 3.0.7
Repeat for all 3 EC2 instances
Change following in conf/cassandra.yaml
cluster_name: 'Test Cluster’
listen_address:
broadcast_address: 54.174.185.29
seeds: “52.87.183.121,52.207.239.229"
rpc_address:
cassandra1 ——-> 52.87.183.121
cassandra2 ——-> 52.207.239.229
cassandra3 ——-> 54.174.185.29
3 Node Cassandra Server running on AWS EC2 Instances
3 Node Cassandra Server running
CREATE KEYSPACE users;
WITH replication = {'class':'SimpleStrategy', 'replication_facto
CREATE TABLE user(
id int PRIMARY KEY,
name text
);
select * from user;
AngularJS - Java Restful WebServices Deployed on
AWS Cloud
AngularJS - Java Restful WebServices
AngularJS - Java Restful WebServices
AngularJS - Java Restful WebServices
Tomcat Web Server Web Log we will be processing
with Apache Hadoop/Spark
Web Log and Python Application deployed to
AWS Bucket
Spark job executed on AWS EMR - Spark Cluster
Results stored in Cassandra Database
Results stored in AWS S3 Bucket
Python Application BatchLogAnalyzer.py executed on
AWS Spark Cluster
Results compared in console and Cassandra Database
Thank You
hkbhadraa@gmail.com

Big data Lambda Architecture - Batch Layer Hands On

  • 1.
    Big Data Pipeline LambdaArchitecture - Batch Layer with AngularJS Java Restful Web Services Apache Hadoop Apache Spark Apache Cassandra on Amazon Web Services Cloud Platform
  • 2.
    INGEST STORE ProcessVisualize BIG Data Pipeline Data Pipeline
  • 3.
    AngularJS Web App Rest Web Services Apache WebLogs S3 Log/Data File Spark Engine Spark SQL HDFS Apache Cassandra S3 HDFS Apache Cassandra AngularJS Web App 0255075100125 April -7.507.51522.530 -4048121620 INGEST STORE PROCES S VISUALIZE STORE Interactive Queries BIG Data Batch Layer Pipeline Spark Cluster
  • 4.
    AngularJS Web App ClickStream Data Apache Web Logs Log/DataFile Spark Streaming Spark SQL Apache Kafka S3 HDFS Apache Cassandra AngularJS Web App 0255075100125 April -7.507.51522.530 -4048121620 INGEST STREA M PROCES S VISUALIZE STORE Interactive Queries Spark Cluster TCP Sockets BIG Data Real-Time Layer Pipeline
  • 5.
  • 6.
    EC2 instance forWeb Server
  • 7.
    cat /etc/*-release sudo add-apt-repositoryppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir webserver cd webserver wget http://www-eu.apache.org/dist/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.tar.gz tar xvzf apache-tomcat-8.0.36.tar.gz ubuntu@ip-172-31-59-137:~/webserver/apache-tomcat-8.0.36/bin$ ./startup.sh Commands to setup Apache Tomcat 8.0
  • 8.
    Apache Tomcat 8.0running on EC2 Instance
  • 9.
    Install Apache Cassandra- 3 Node Cluster on AWS
  • 10.
    3 EC2 instancefor Cassandra Cluster
  • 11.
    cat /etc/*-release sudo add-apt-repositoryppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer java -version mkdir db cd db wget http://www-eu.apache.org/dist/cassandra/3.0.7/apache-cassandra-3.0.7-bin.tar.gz tar xvzf apache-cassandra-3.0.7-bin.tar.gz cd apache-cassandra-3.0.7/ cd apache-cassandra-3.0.7 bin/cassandra -f bin/cqlsh cassandra1 ——-> 52.87.183.121 cassandra2 ——-> 52.207.239.229 cassandra3 ——-> 54.174.185.29 Commands to setup Apache Cassandra 3.0.7 Repeat for all 3 EC2 instances Change following in conf/cassandra.yaml cluster_name: 'Test Cluster’ listen_address: broadcast_address: 54.174.185.29 seeds: “52.87.183.121,52.207.239.229" rpc_address: cassandra1 ——-> 52.87.183.121 cassandra2 ——-> 52.207.239.229 cassandra3 ——-> 54.174.185.29
  • 12.
    3 Node CassandraServer running on AWS EC2 Instances
  • 13.
    3 Node CassandraServer running CREATE KEYSPACE users; WITH replication = {'class':'SimpleStrategy', 'replication_facto CREATE TABLE user( id int PRIMARY KEY, name text ); select * from user;
  • 14.
    AngularJS - JavaRestful WebServices Deployed on AWS Cloud
  • 15.
    AngularJS - JavaRestful WebServices
  • 16.
    AngularJS - JavaRestful WebServices
  • 17.
    AngularJS - JavaRestful WebServices
  • 18.
    Tomcat Web ServerWeb Log we will be processing with Apache Hadoop/Spark
  • 19.
    Web Log andPython Application deployed to AWS Bucket
  • 20.
    Spark job executedon AWS EMR - Spark Cluster
  • 21.
    Results stored inCassandra Database
  • 22.
    Results stored inAWS S3 Bucket
  • 23.
    Python Application BatchLogAnalyzer.pyexecuted on AWS Spark Cluster
  • 24.
    Results compared inconsole and Cassandra Database
  • 25.