Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 902 Bytes

README.md

File metadata and controls

19 lines (15 loc) · 902 Bytes

Introduction to Apache Spark

IBM Proof of Technology

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/

Hands on Labs

IBM created basic labs to help people learn Apache Sparl.

To perform these labs, get on IBM's free Data Science platform at http://datascience.ibm.com/

Labs:
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model