The final project for the IBM Data Engineering Professional Certificate consisted of the following tasks:
Design a data platform that uses MySQL as an OLTP database
- Explain the architecture of a data platform.
- Design the schema for an OLTP database.
- Load data into the OLTP database.
- Query the data in the OLTP database using SQL.
- Automate database administration tasks.
Design a data platform that uses MongoDB as a NoSQL database
- Import data into a MongoDB database.
- Query data in a MongoDB database.
- Export data from MongoDB.
Design and implement a data warehouse, then generate reports from the data in the data warehouse.
- Design a Data Warehouse using the pgAdmin ERD design tool.
- Load data into a Data Warehouse.
- Write cube and rollup aggregation queries.
- Create Materialized Views (/MQTs).
Assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.
- Create a project in IBM Cloud Pak for Data / Watson Studio.
- Add a Cognos Dashboard Embedded (CDE) service to a project in IBM Cloud Pak for Data.
- Navigate around the CDE user interface
- Upload external data files to a project in IBM Cloud Pak for Data.
- Create a Cognos data source that connects to tables in a data warehouse.
- Create a business intelligence dashboard using Cognos Analytics or CDE.
- Add several data visualizations and charts to the dashboard.
Use a python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.
- Implement ETL operations to insert new data into the data warehouse.
- Extract data from OLTP database.
- Extract data from NoSQL database.
- Extract data from a web server log file.
- Transform data from various data sources.
- Load transformed data into a data warehouse.
- Create, run, and manage data pipelines using Apache Airflow.
Use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.
- Analyze search terms data from an e-commerce web server.
- Deploy a sales forecast Machine Learning (ML) model.
- Predict the sales for a future year by executing the ML model on an Apache Spark cluster.