-
Notifications
You must be signed in to change notification settings - Fork 4
Project Prerequisites
Brigitta Sipőcz edited this page Feb 26, 2020
·
3 revisions
This project will involve using two pieces of software: Jupyter and Apache Spark. Below, we outline a few steps you can take to familiarize yourself with using these two pieces of software and to get a taste of what the development process will look like in this project.
- Download and install Jupyter on your system.
- Download and install Apache Spark on your system and ensure you can interface with it from within Jupyter. You can build from the source code, download a pre-built binary to your system, or install
pyspark
using the Python package managersconda
orpip
. - Within a Jupyter notebook, import pyspark and compute the value of Pi. This is a canonical example of parallel computation in Spark, and you should be able to find many examples online.
- Follow the JupyterLab documentation (or other examples / documentation) to learn how to build your own JupyterLab Extension.
- Alter your extension to show a preview of the Spark UI. (Hint: if you are running locally, the UI can usually be accessed at
localhost:4040
.) - (Bonus) Multiple Spark clusters can be created simultaneously on one machine. If the default port of
4040
is already in use, the Spark driver will attempt to bind to sequential ports following the default, e.g.4041
,4042
, etc. Make it possible to interact with your widget to accommodate this behavior. For example, you can create an input text field that changes which Spark UI is being shown. - (Bonus) Building on the previous step, create a widget that detects all Spark clusters running on your system and exposes a selector (e.g. a dropdown menu) to choose between these. The selection doesn't need to do anything, you simply need to make a way to show a user what Spark clusters they have available to them. Feel free to come up with any solution you desire, including working server-side with Python. Feel free to also submit only a description of how you might solve this without actually implementing your solution.
- Create a pull request from your fork to merge your work into this repository on a new branch named
<your-name>-gsoc-prereqs
so that we can review your submission.