- Still under-development.
- Document may not be entirely correct
- Some "moving" parts are not fully described or not even listed yet.
Here we are going to simulate a system using Kafka and Spark to examine its capability to handle continous data distribution, and at the
same time testing how PouchDB and CouchDB can do auto-sync between frontend and backend
There are 11 components:
- Kafka Brokers
- Zookeeper
- CMAK as Kafka WebUI Manager
- CouchDB
- User Frontend Application
- Admin Frontend Application
- Kafka Consumer
- Backend API as Kafka Producer
- Spark Master
- Spark Worker
- Apache Zeppelin
- Node, Npm
- Make
- Pyenv, Pipenv
- Docker
Basicall, run Make setup to install all dependencies and creating a docker network
$ make setupGiven our CouchDB (must be exposed to localhost) with default Authentication config being used (refer to ./env/couchdb)
Add CORS to CouchDB so AdminApp can sync with it
$ make add_cors_couchAlternatively, when running as a docker service, CORS can be enabled using GUI from http://localhost:5984/_utils/#_config/nonode@nohost. Note that CouchDB Cluster Config might need some manual adjustment
Using CMAK as Kafka-Manager to add Cluster then setup proper partition-assignment going to http://localhost:9000
- Run the whole system Use the following command to fire the system up (and optionally scale the services - eg: 2 (k)afka-brokers, 3 (s)park-worker)
NOTE: Each spark-executor will require 1 single-core from the CPU, so depend on your machine spec, you can have more or less of them.
$ make up-scale k=2 s=3- Submit the Spark application
If zookeeper and the brokers are ready to work, deploy the Spark app using.
$ make submit_jobAs the job has been succesfully deployed, go to http://localhost:8080 and find the newly running application.
If you want to make any change to the job, just modify the scala codes in ./spark_job and re-run the above command.
- Start the frontend applications Start frontend apps (UserApp and AdminApp) in 2 separated terminals, using 2 commands
$ make fe_user
$ make fe_admin- Sending data to Kafka
Use the UserApp at
http://localhost:3001to send data continously (stream of numbers) to Backend-Producer.
Alternatively you can go tob http://localhost:8000/docs and use Swagger to make api request.
-
Watching changes from frontend AdminApp If the admin app is already running, go to
http://localhost:3002and see changes if there are any messages being streamed to CouchDB -
Enjoy hacking on your own :)
Considering what to add to complete the Architechture
- Add CouchDB
- Add API Client App
- Add Admin Client App with PouchDB for db real-time tracking
- Developing Producer Backend API
- Add Spark to consumer, connect to Kafka for streaming
- Scale Consumer
- Scale Kafka Broker and Spark Worker
- Use gRPC for backend-producer
- Store calculated data from Spark to CouchDB
- Apache Beam?
- Apache Avro
- KSQL?
- Deploy everything with Kubernetes
- Stress-testing




