This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 56
Home
Rich Hagarty edited this page Jan 11, 2018
·
9 revisions
Clickstream analysis using Apache Spark and Apache Kafka
Use Apache Spark and Apache Kafka to demonstrate how to detect real-time trending topics on the Wikipedia web site. Apache Kafka will act as a message queue, and the Apache Spark structured streaming engine will be used to perform the analytics.
Cognitive and Data Analytics
N/A
TBD
When the reader has completed this journey, they will understand how to:
- Use Jupyter Notebooks to load, visualize, and analyze data
- Run Notebooks in IBM Data Science Experience
- Perform clickstream analysis using Apache Spark Structured Streaming.
- Build a low-latency processing stream utilizing Apache Kafka.
- User connects with Apache Kafka service and sets up a running instance of a clickstream.
- Run a Jupyter Notebook in IBM's Data Science Experience that interacts with the underlying Apack Spark service. Alternatively, this can be done locally by running the Spark Shell.
- The Spark service reads and processes data from the Kafka service.
- Processed Kafka data is relayed back to the user via the Jupyter Nodebook (or console sink if running locally).
- IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Apache Spark: An open-source distributed computing framework that allows you to perform large-scale data processing.
- Apache Kafka: Kafka is used for building real-time data pipelines and streaming apps. It is designed to be horizontally scalable, fault-tolerant and fast.
- Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
- Message Hub: A scalable, high-throughput message bus. Wire micro-services together using open protocols.