GitHub - baileythill/BigQueryML: Trends Marketplace

Machine Learning Using BigQueryML

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Resources
Contributers

About The Project

The data science world as we know it is obsessed with building the most innovative and effective ML models. Machine learning on large datasets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people.Many companies couldn’t afford data scientists with such a level of expertise but have a need to build models to drive their business forward.

BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills.Companies can use their data analysts who understand the data but have limited knowledge and programming expertise.

Our project will walk you through the project on how to use this amazing tool using a sample NCAA Basketball dataset. Using this dataset we will attempt to produce analyses on how some of the most fundamental tracked stats impact a teams chances of winning a basketball game.

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

A working knowledge of database querying skills.

Installation

To get started with BigQueryML

Create a free trial account with Google Cloud:- Create Account
In the google console page:- Create a Project
Make sure billing is enabled.
Ensure that the API is enabled.

(back to top)

Usage

Navigate to the BigQuery Console. Check the access of your account to the console here

Step one: Create a Project

Start with creating a project in the GCP Sandbox.

In the navigation panel, there is an option to create a project right next to "Google Cloud Platform".
Start with a new project. All the steps going forward- the data, models and results will be stored at one place in this project.

Step two: Create your dataset

The second step is to create a BigQuery dataset to store your model. To create your dataset, follow these steps:

In the Cloud Console, go to the BigQuery page. Go to the BigQuery page
In the navigation panel, in the Resources section, click your project name.
On the right side, in the details panel, click
On the Create dataset page, for Dataset ID, enter census.
Leave all of the other default settings in place and click Create dataset.

Step Three:

Creating Tables. Add the data that you want to work on to the database created in step2.

CREATE TABLE IF NOT EXISTS  `big-data-bas.bigdata.main_data`
AS 
SELECT
Game_id, field_goals_made, 
turnovers, offensive_rebounds, 
defensive_rebounds, free_throws_made, 
assists, blocks, steals ,win,
CASE
 WHEN row_num <= 20862  THEN 'training'
 WHEN row_num <= 26822  THEN 'evaluation'
 ELSE 'prediction'
END AS dataframe

FROM
(
SELECT 
 Game_id, field_goals_made, 
 turnovers, offensive_rebounds, 
 defensive_rebounds, free_throws_made, 
 assists, blocks, steals ,win, ROW_NUMBER() OVER (partition by win) as row_num

FROM `bigquery-public-data.ncaa_basketball.mbb_teams_games_sr`
WHERE win is NOT NULL AND
 turnovers  IS NOT NULL AND
 offensive_rebounds IS NOT NULL AND 
 defensive_rebounds  IS NOT NULL AND
 free_throws_made IS NOT NULL AND
 assists  IS NOT NULL AND
 blocks  IS NOT NULL AND
 steals  IS NOT NULL AND
 field_goals_made IS NOT NULL 
);

Step Four:

Create a Model.

# Model
CREATE MODEL IF NOT EXISTS 'big-data-bas.bigdata.win_model1`
OPTIONS 
(MODEL_TYPE = 'BOOSTED_TREE_CLASSIFIER', 
learn_rate = 0.1,
early_stop = TRUE,
input_label_cols=['win']) AS
SELECT 
field_goals_made, 
turnovers, offensive_rebounds, 
defensive_rebounds, free_throws_made, 
assists, blocks, steals ,win
FROM `big-data-bas.bigdata.train_V1`
WHERE win is NOT NULL;

Model Fine Tuning You can check the performance of your classifier at different score thresholds and choose an appropriate thereshold as per your business problem.

Step Five:

Evaluation of the Model's predictions:

# Evaluate the model
SELECT
  *
FROM
  ML.EVALUATE (MODEL `big-data-bas.bigdata.win_model1`,
    (
    SELECT
      *
    FROM
      `big-data-bas.bigdata.main_data`
    WHERE
      dataframe = 'evaluation'
    )
  )
  ;

Evaluation Results

(back to top)

Here's out of sample preformance of our model with probability threshold of 0.5.

License

This project repository is created in partial fulfillment of the requirements for the Big Data Analytics course offered by the Master of Science in Business Analytics program at the Carlson School of Management, University of Minnesota.

(back to top)

Resources used

These are some useful resources to get started with Bigquery ML.

(back to top)

Contributers

Bailey Hill - @LinkedIn - [email protected]
Kiran - @LinkedIn - [email protected]
Priyanka Bhosale - @LinkedIn - [email protected]
Qujiaheng (Johnny) Zhang - @LinkedIn - [email protected]
Rahul Agarwal - @LinkedIn - [email protected]

Project Link: https://github.com/bvfst/BigQueryML

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Getting Started

Prerequisites

Installation

Usage

Navigate to the BigQuery Console. Check the access of your account to the console here

Step one: Create a Project

Step two: Create your dataset

Step Three:

Step Four:

Step Five:

License

Resources used

Contributers

About

Releases

Packages

baileythill/BigQueryML

Folders and files

Latest commit

History

Repository files navigation

About The Project

Getting Started

Prerequisites

Installation

Usage

Navigate to the BigQuery Console. Check the access of your account to the console here

Step one: Create a Project

Step two: Create your dataset

Step Three:

Step Four:

Step Five:

License

Resources used

Contributers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages