Skip to content

The final project for the IBM Data Engineering Professional Certificate

Notifications You must be signed in to change notification settings

WazirRohiman/ibm_data-engineering_capstone_project

Repository files navigation

IBM DATA ENGINEERING CAPSTONE PROJECT

The final project for the IBM Data Engineering Professional Certificate consisted of the following tasks:

1. Data Platform Architecture & OLTP Database

Design a data platform that uses MySQL as an OLTP database

Objectives

  • Explain the architecture of a data platform.
  • Design the schema for an OLTP database.
  • Load data into the OLTP database.
  • Query the data in the OLTP database using SQL.
  • Automate database administration tasks.

2. Querying Data In NoSQL Database

Design a data platform that uses MongoDB as a NoSQL database

Objectives

  • Import data into a MongoDB database.
  • Query data in a MongoDB database.
  • Export data from MongoDB.

3. Build A Data Warehouse

Design and implement a data warehouse, then generate reports from the data in the data warehouse.

Objectives

  • Design a Data Warehouse using the pgAdmin ERD design tool.
  • Load data into a Data Warehouse.
  • Write cube and rollup aggregation queries.
  • Create Materialized Views (/MQTs).

4. Data Analytics

Assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.

Objectives

  • Create a project in IBM Cloud Pak for Data / Watson Studio.
  • Add a Cognos Dashboard Embedded (CDE) service to a project in IBM Cloud Pak for Data.
  • Navigate around the CDE user interface
  • Upload external data files to a project in IBM Cloud Pak for Data.
  • Create a Cognos data source that connects to tables in a data warehouse.
  • Create a business intelligence dashboard using Cognos Analytics or CDE.
  • Add several data visualizations and charts to the dashboard.

5. ETL & Data Pipeline

Use a python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.

Objectives

  • Implement ETL operations to insert new data into the data warehouse.
  • Extract data from OLTP database.
  • Extract data from NoSQL database.
  • Extract data from a web server log file.
  • Transform data from various data sources.
  • Load transformed data into a data warehouse.
  • Create, run, and manage data pipelines using Apache Airflow.

Big Data Analytics With Apache Spark

Use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.

Objectives

  • Analyze search terms data from an e-commerce web server.
  • Deploy a sales forecast Machine Learning (ML) model.
  • Predict the sales for a future year by executing the ML model on an Apache Spark cluster.

About

The final project for the IBM Data Engineering Professional Certificate

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published