Synapse End to End Workshop

https://github.com/davew-msft/synapse

(this repo has branches. master is currently set to fb)

Synapse Performance Notes Dockerized Spark Containers

What Technologies are Covered

Synapse workspaces
Synapse Spark

Target audience

Data Engineers/DBAs/Data Professionals
Data Scientists
App Developers

Workshop Agenda & Objectives

This is a tentative schedule. We may have to adjust given timelines, desires, etc

Day 1

Introductions/Objectives/Level-Setting
Synapse Navigation
- What is Synapse?
- Source control integration (do we need to set this up?)
Overview and Basic Setup
- what is a notebook? navigation, etc
- Let's make sure you can connect to my sample data lake
Data Lake organization
- How is your data lake structured? Connecting to it from a notebook.
Data Sandboxing/Data Engineering
- Querying data with SQL and pySpark, data pipelining principles
Basic ETL
- Extract, Transform, Loading. Delta-formatted tables

Questions for Sean

Will they have SQLSvrless?
Can they hit opendatasets? and my datalake?
Should we use their data?

Day 2

Let's make sure you can connect to my sample data lake
Data Sandboxing/Data Engineering
- exploration-sparkStarter.ipynb
  - Querying data with SQL and pySpark, data pipelining principles
  - cell magics
- Basic Aggregations and Visualizations
  - we'll do this as an independent lab
  - now let's save it to YOUR datalake
- mounts ... fb/mounts.ipynb in workspace. This is conversational.

whoami = 'davew'
path = something + '/something.parquet'
df.write.parquet(path, mode = 'overwrite')
df.write.csv(csv_path, mode = 'overwrite', header = 'true')

df_parquet = spark.read.parquet(parquet_path)
df_parquet.show(10)

Delta Lake
- what is it and how is it materialized in the lake?
- what is a managed vs an unmanaged table.
Using Requirements.txt and shared libraries
- %run wasn't working for them???
- Excel
variables and notbook pipelining
Lab 020: Shared Metadata
Streaming Data
- Real-time streaming data pipelines with Spark Structured Streaming. Take a batch process and make it stream. We start with the data already in Bronze
Streaming data from Kafka/Event Hubs
- Streaming data using Event Hubs and Kafka.

Wrap Up

You should probably delete the resource group we created today to control costs.

If you'd rather keep the resources for future reference you can simply PAUSE the dedicated SQL Pool and the charges should be minimal.

Other Notes

templates folder has a bunch of my patterns that you may be able to leverage

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
environment-setup		environment-setup
fb-labs		fb-labs
img		img
notebooks		notebooks
perf		perf
scripts		scripts
templates		templates
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Lab000.md		Lab000.md
Lab001.md		Lab001.md
Lab001a.md		Lab001a.md
Lab002.md		Lab002.md
Lab003.md		Lab003.md
Lab005.md		Lab005.md
Lab010.md		Lab010.md
Lab011.md		Lab011.md
Lab012.md		Lab012.md
Lab020.md		Lab020.md
Lab021.md		Lab021.md
Lab030.sql		Lab030.sql
Lab050.md		Lab050.md
Lab051.md		Lab051.md
Lab055.md		Lab055.md
Lab056.md		Lab056.md
Lab056a.md		Lab056a.md
Lab057.md		Lab057.md
Lab058.md		Lab058.md
Lab100.md		Lab100.md
Lab300.md		Lab300.md
Lab301.md		Lab301.md
Lab302.md		Lab302.md
Lab400.md		Lab400.md
Lab410.md		Lab410.md
Lab420.md		Lab420.md
Lab421.md		Lab421.md
Lab422.md		Lab422.md
Lab600.md		Lab600.md
README.md		README.md
etl_patterns.md		etl_patterns.md
serverless_etl.sql		serverless_etl.sql
synapse.md		synapse.md
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synapse End to End Workshop

What Technologies are Covered

Target audience

Workshop Agenda & Objectives

Day 1

Questions for Sean

Day 2

Day 3

Other Possible Topics

ML/AI in Synapse

Monitoring

Wrap Up

Other Notes

About

Releases

Packages

Contributors 2

Languages

License

davew-msft/synapse

Folders and files

Latest commit

History

Repository files navigation

Synapse End to End Workshop

What Technologies are Covered

Target audience

Workshop Agenda & Objectives

Day 1

Questions for Sean

Day 2

Day 3

Other Possible Topics

ML/AI in Synapse

Monitoring

Wrap Up

Other Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages