Skip to content

Latest commit

 

History

History
236 lines (146 loc) · 15.3 KB

changelog.md

File metadata and controls

236 lines (146 loc) · 15.3 KB

Changelog

The changelog will record what content was changed (e.g. changed an existing paragraph to a better-explained version, re-ran the notebook using an updated version of the package), added (e.g. a completely new jupyter notebook).

[2019-04]

[2019-03]

Added

[2019-02]

Added

[2019-01]

Added

  • Quantile Regression and its application in A/B testing.
    • Quick Introduction to Quantile Regression. [nbviewer][html]
    • Quantile Regression's application in A/B testing. [nbviewer][html]

[2018-12]

Added

  • First Foray Into Discrete/Fast Fourier Transformation. [nbviewer][html]

[2018-11]

Added

[2018-10]

Added

  • Kullback-Leibler (KL) Divergence. [nbviewer][html]
  • Calibrated Recommendation. [nbviewer][html]
  • Influence Maximization from scratch. Includes discussion on Independent Cascade (IC), Submodular Optimization algorithms including Greedy and Lazy Greedy, a.k.a Cost Efficient Lazy Forward (CELF) [nbviewer][html]

[2018-09]

Added

Introduction to Residual Networks (ResNets) and Class Activation Maps (CAM). [nbviewer][html]

Changed

Hosted html-version of all jupyter notebook on github pages.

[2018-08]

Added

  • (Text) Content-Based Recommenders. Introducing Approximate Nearest Neighborhood (ANN) - Locality Sensitive Hashing (LSH) for cosine distance from scratch. [nbviewer]
  • Benchmarking ANN implementations (nmslib). [nbviewer]

[2018-07]

Added

  • Getting started with time series analysis with Exponential Smoothing (Holt-Winters). [nbviewer]
  • Framing time series problem as supervised-learning. [nbviewer]
  • Tuning Spark Partitions. [nbviewer]

[2018-06]

Added

  • Evaluation metrics for imbalanced dataset. [nbviewer]

Changed

  • H2O API walkthrough (using GBM as an example). [nbviewer]
    • Moved H2O notebook to its own sub-folder.
    • Added model interpretation using partial dependence plot.

[2018-05]

Added

  • RNN, LSTM - PyTorch hello world. [nbviewer]
  • Recurrent Neural Network (RNN) - language modeling basics. [nbviewer]

[2018-04]

Added

  • Long Short Term Memory (LSTM) - Tensorflow. [nbviewer]
  • Vanilla RNN - Tensorflow. [nbviewer]
  • WARP (Weighted Approximate-Rank Pairwise) Loss using lightfm. [nbviewer]

[2018-03]

Added

[2018-02]

Added

  • H2O API walkthrough (using GBM as an example). [nbviewer]
  • Factorization Machine from scratch. [nbviewer]

Changed

  • The spark folder has been renamed to big_data to incorporate other big data tools.

[2018-01]

Added

  • Partial Dependece Plot (PDP), model-agnostic approach for directional feature influence. [nbviewer]
  • Parallel programming with Python (threading, multiprocessing, concurrent.futures, joblib). [nbviewer]

[2017-12]

Added

  • LightGBM API walkthrough and a discussion about categorical features in tree-based models. [nbviewer]
  • Curated tips and tricks for technical and soft skills. [nbviewer]
  • Detecting collinearity amongst features (Variance Inflation Factor for numeric features and Cramer's V statistics for categorical features), also introduces Linear Regression from a Maximum Likelihood perspective and the R-squared evaluation metric. [nbviewer]

Changed

  • Random Forest from scratch and Extra Trees. [nbviewer]
    • Refactored code for visualizating tree's feature importance.
  • Building intuition on Ridge and Lasso regularization using scikit-learn. [nbviewer]
    • Include section when there are collinear features in the dataset.
  • mlutils: Machine learning utility function package [folder]
    • Refer to its changelog for details.
  • data_science_is_software. [nbviewer]
    • Mention notebook extension, a project that contains various functionalities that makes jupyter notebook even more pleasant to work with.

[2017-11]

Added

  • Introduction to Singular Value Decomposition (SVD), also known as Latent Semantic Analysis/Indexing (LSA/LSI). [nbviewer]

[2017-10]

Added

  • mlutils: Machine learning utility function package [folder]

Changed

  • Bernoulli and Multinomial Naive Bayes from scratch. [nbviewer]
    • Fixed various typos and added a more efficient implementation of Multinomial Naive Bayes.
  • TF-IDF (text frequency - inverse document frequency) from scratch. [nbviewer]
    • Moved to its own tfidf folder.
    • Included the full tfidf implementation from scratch.

[2017-09]

Added

Changed

  • Using built-in data structure and algorithm. [nbviewer]
    • Merged the content from the two notebooks namedtuple and defaultdict and sorting with itemgetter and attrgetter into this one and improved the section on priority queue.

[2017-08]

Added

  • Understanding iterables, iterator and generators. [nbviewer]
  • Word2vec (skipgram + negative sampling) using Gensim (includes text preprocessing with spaCy). [nbviewer]
  • Frequentist A/B testing (includes a quick review of concepts such as p-value, confidence interval). [nbviewer]
  • AUC (Area under the ROC, precision/recall curve) from scratch (includes building a custom scikit-learn transformer). [nbviewer]

Changed

  • Optimizing Pandas (e.g. reduce memory usage using category type). [nbviewer]
    • This is a revamp of the old content Pandas's category type.

[2017-07]

Added

  • cohort : Cohort analysis. Visualize user retention by cohort with seaborn's heatmap and illustrating pandas's unstack. [nbviewer]

Changed

  • Bayesian Personalized Ranking (BPR) from scratch & AUC evaluation. [nbviewer]
    • A more efficient matrix operation using Hadamard product.
  • Cython and Numba quickstart for high performance python. [nbviewer]
    • Added Numba parallel prange.
  • ALS-WR for implicit feedback data from scratch & mean average precision at k (mapk) and normalized cumulative discounted gain (ndcg) evaluation. [nbviewer]
    • Included normalized cumulative discounted gain (ndcg) evaluation.
  • Gradient Boosting Machine (GBM) from scratch. [nbviewer]
    • Added a made up number example on how GBM works.
  • data_science_is_software. [nbviewer]
    • Mention nbdime, a tool that makes checking changes in jupyter notebook on github a lot easier.
    • Mention semantic versioning (what each number in the package version usually represents).
    • Mention configparser, a handy library for storing and loading configuration files.
  • K-fold cross validation, grid/random search from scratch. [nbviewer]
    • Minor change in Kfolds educational implementation (original was passing redundant arguments to a method).
    • Minor change in random search educational implementation (did not realize scipy's .rvs method for generating random numbers returns a single element array instead of a number when you pass in size = 1).

[2017-06]

This is the first time that the changelog file is added, thus every existing notebook will fall under the added category. Will try to group the log by month (one or two) in the future. Note that this repo will be geared towards Python3. Hence, even though the repo contains some R-related contents, they are not that well maintained and will most likely be translated to Python3. As always, any feedbacks are welcomed.

Added

  • Others (Genetic Algorithm)
  • Regression (Linear, Ridge/Lasso)
  • Market Basket Analysis (Apriori)
  • Clustering (K-means++, Gaussian Mixture Model)
  • Deep Learning (Feedforward, Convolutional Neural Nets)
  • Model Selection (Cross Validation, Grid/Random Search)
  • Dimensionality Reduction (Principal Component Analysis)
  • Classification (Logistic, Bernoulli and Multinomial Naive Bayes)
  • Text Analysis (TF-IDF, Chi-square feature selection, Latent Dirichlet Allocation)
  • Tree Models (Decision Tree, Random Forest, Extra Trees, Gradient Boosting Machine)
  • Recommendation System (Alternating Least Squares with Weighted Regularization, Bayesian Personalized Ranking)
  • Python Programming (e.g. logging, unittest, decorators, pandas category type)