Introduction to Apache Spark

IBM Proof of Technology

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/

Hands on Labs

IBM created basic labs to help people learn Apache Sparl.

To perform these labs, get on IBM's free Data Science platform at http://datascience.ibm.com/

Labs:
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction to Apache Spark

IBM Proof of Technology

Hands on Labs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction to Apache Spark

IBM Proof of Technology

Hands on Labs