This project draws inspiration from YouTube video by Mr. K Talks Tech, titled Build Your End-to-End Project in Just 3 Hours: Master Azure Data Engineering | Bing Data Analytics.. Full credit goes to them for the original content and inspiration.
This project provides an end-to-end solution for gathering and analyzing daily news content using Microsoft Fabric’s Saas based architecture data engineering platform. This project utilizes the Bing Web Search API to collect recent news articles, perform sentiment analysis, and present the insights via an interactive Power BI dashboard.
Project is based on Microsoft Fabric. The primary components of this pipeline include Microsoft Fabric's:
- Data Ingestion
- Data Factory
- Storage
- Lakehouse (both raw files and tables)
- Processing and orchestration
- Jupyter Notebooks (PySpark)
- Visualization
- PowerBI
- Alerts and Notifications
- Data Activator Each component playing a vital role in the seamless flow of data collection, processing, and visualization. Automated alerts ensure timely access to new data, allowing for up-to-date reviews of news sentiment and trends.
The pipeline includes the following primary tasks:
- Copy Latest News: Use the Bing Web Search API to retrieve "latest news" and store the results as Bing_latest_news.JSON in the lakehouse.
- Data Transformation: Convert Bing_latest_news.JSON into a structured Delta table format.
- Sentiment Analysis: Apply a pre-trained machine learning model to determine whether each article's description has a positive, negative, or neutral sentiment.
- Create Bing Resource (standard version is free, and it'll be enough for this project)
- Set up Microsoft Fabric (60 days free trial)
- Create new Space BingNewsDashboard
- Go to Data Engineering
- Create Lakehouse
bing_lake_db
- Go to Data Science
- Create New Notebook (
Process raw JSON file
):- Write a notebook, change all items marked with
#TODO
(Code available in./notebooks
subdir in this repo) - Add Lakehouse (
bing_lake_db
) as source
- Write a notebook, change all items marked with
- Create New Notebook (
Process Sentiment Analysis
):- Write a notebook, change all items marked with
#TODO
(Code available in./notebooks
subdir in this repo) - Add Lakehouse (
bing_lake_db
) as source
- Write a notebook, change all items marked with
Go to Data Engineering & create new Data Pipeline named bing news data ingestion
Before creating tasks, click on empty area and create a parameter named search_term
with a default value latest news
.
Then, create the following tasks in the pipeline:
- Copy Data (
Copy Latest News
)- Source:
- Connection: New >> REST
- Relative URL:
?q=@{pipeline().parameters.search_term}&count=100&mkt=en-US
, where:?q=@{pipeline().parameters.search_term}
: it is a typical search query. It uses the parameter "latest news" we created in step 2count=100
: increase count of news from 10 (standard) to 100mkt=en-US
: limit news to US market
(you can play with different parameters using News Search APIs v7 query parameters)
- Additional headers >> New:
- Name:
Ocp-Apim-Subscription-Key
(taken from News Search API v7 headers documentation) - Value:
Insert value of endpoint key from your Bing Search resource in azure
- Name:
- Destination: as per the below screenshot
- Source:
- Notebook (
Data Transformation
)
- Settings:
- Workspace:
Bing Search
- Notebook:
Process raw JSON file
(created in earlier step)
- Workspace:
- Notebook (
Sentiment Analysis
)
- Settings:
- Workspace:
Bing Search
- Notebook:
Bing Sentiment Analysis
(created in earlier step)
- Notebook:
- Workspace:
Move to this step after you ran the above pipeline at least once.
Create the model
- Go to your
bing_lake_db
- Select a table
sentiment analysis
- Create New semantic model
- name it
bing-news-dashboard-model
- select table
sentiment_analysis
- name it
Once you created the semantic model, you can modify it:
- Go to model
- Click
Open Data Model
- Change data category of the
url
column:- Click
url
- Go to properties
- Change data category to
Web URL
.
(This will change the url column into a hyperlink in your report, which you'll create in the next step)
- Click
- Add new measures (
Negative Sentiment %
,Positive Sentiment %
,Neutral Sentiment %
):- Click
New Measure
- Add three new measures using the code in
./semantic_model
subdir
(You can use those measure when creating a PowerBI report (next step))
- Click
- Go to your semantic model
bing-news-dashboard-model
- Click
Explore this Data
>>Auto-Generate report
- AI will auto-genearte a nice report for you.
- You can add a new page to create custom report (using custom measures and modified columns which you did earlier)
If you want to set an alert:
- right-click on a measure in your Power BI report
- Click set alert
- Change condition
- Select action (email or teams)
- Run ETL Pipeline: Trigger the pipeline to start the data migration process.
- Schedule ETL Pipeline: You can schedule your pipeline to run daily at 6AM, giving you fresh news every morning
- Alert: You'll be notified every time you anything changes in your report. (so you know new news are there!)
- Monitor: You can use Fabric's Monitor functionality to see the runs
For any questions or inquiries, please contact me at [email protected]
Thanks for reading!