Skip to content

Latest commit

 

History

History
708 lines (369 loc) · 75.4 KB

README.md

File metadata and controls

708 lines (369 loc) · 75.4 KB

Machine- and Deep Learning resources

License: MIT PR's Welcome

Machine and deep learning and data analysis resources. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Table of content

Cheatsheets

Awesome Deep Learning

Keras, Tensorflow

PyTorch

JAX

JAX is a combination of Automatic Differentiation and XLA (Accelerated Linear ALgebra). XLA is a compiler developed by Google to work on TPU units. Jax has Numpy as its higher layer of abstraction, and works the same way on CPU, GPU, and TPU (much faster).

  • awesome-jax - JAX - A curated list of resources

  • JAX - Jupyter (Colab) notebooks introducing JAX basic (jit, vmap, pmap, grad, and other) and advanced concepts, by @yvrjsharma

Graph Neural Networks

Transformers

DL Books

DL Courses & Tutorials

DL Videos

DL Papers

DL Papers Genomics

DL Tools

  • Interactive_Tools - Interactive Tools for Machine Learning, Deep Learning and Math. Play with deep neural network in browser

  • ivy - The Unified Machine Learning Framework supporting JAX, TensorFlow, PyTorch, MXNet, and Numpy. Python module. Documentation

  • keras - Deep Learning for humans http://keras.io/

  • MXNet-Gluon-Style-Transfer - neural artistic style transfer using MXNet. PyTorch and Torch implementations available

  • openai.com - GPT-3 Access Without the Wait (API access to GPT-3)

  • OpenCV - Open Source Computer Vision library. GitHub, opencv-python - CPU-only OpenCV packages for Python. Documentation. Video - 3h OpenCV crash course

  • pathology_learning - Using traditional machine learning and deep learning methods to predict stuff from TCGA pathology slides

  • ruta - Unsupervised Deep Architechtures in R, autoencoders. Requires Keras and TensorFlow. Book

  • tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research

  • Janggu - deep learning interface to genomic data (FASTA, BAM, BigWig, BED, GFF). Numpy-like Bioseq and Cover objects accessable by Keras. Includes model evaluation and interpretation features. Pypi, Docs, Janggu - Deep learning for genomics

  • maui - Multi-omics Autoencoder Integration. Latent factors from different data types (stacked variational autoencoders), and their clustering, testing for association with survival. Tested vs. latent factors extracted using Multifactor Analysis (MFA) and iCluster+, on TCGA colorectal cancer RNA-seq, SNPs, CNVs. Evaluation of Colorectal Cancer Subtypes and Cell Lines Using Deep Learning

  • Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. GitHub

  • Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  • PennAI - AI-Driven Data Science, entry-level machine learning interface for non-experts. A System for Accessible Artificial Intelligence

Auto ML

DL models

DL projects

ChatGPT, LLMs

  • awesome-chatgpt - Curated list of awesome tools, demos, docs for ChatGPT and GPT-3

  • chatgpt-clone - Build Yo'own ChatGPT with OpenAI API & Gradio. A Python app for web browser intercage to ChatGPT.

  • h2ogpt - open-source GPT with document and image Q&A, 100% private chat, no data leaks, Apache 2.0 https://arxiv.org/pdf/2306.08161.pdf Live Demo: https://gpt.h2o.ai/

  • llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

  • LLMsPracticalGuide - A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

  • mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. Documentation

  • nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.

  • ollama - Get up and running with Llama 2 and other large language models locally.

  • openai-cookbook - Examples and guides for using the OpenAI API. Rendered version

  • privateGPT - Interact privately with your documents using the power of GPT, 100% privately, no data leaks.

DL Misc

  • app.wombo.art - deep generative model dreaming awesome images from text, Android and iOS apps available. Tweet describing the VQGAN+CLIP technology behind it

  • CSrankings - A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas. Website

  • ColossalAI - A Unified Deep Learning System for Big Model Era. Scaling deep learning models using data, pipeline, tensor, and sequence parallelism. 1D, 2D, 2.5D, 3D distributed operators. Examples of each. Written in PyTorch, needs a configuration file defining parallelism. Benchmarked against DeepSpeed, Megatron-LM.

    Paper Li, Shenggui, Jiarui Fang, Zhengda Bian, Hongxin Liu, Yuliang Liu, Haichen Huang, Boxiang Wang, and Yang You. “Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training,” n.d.

Awesome Machine learning

ML Books

ML Courses & Tutorials

ML Videos

ML Papers

  • Whalen, Sean, Jacob Schreiber, William S. Noble, and Katherine S. Pollard. “Navigating the Pitfalls of Applying Machine Learning in Genomics.” Nature Reviews Genetics 23, no. 3 (March 2022): 169–81. https://doi.org/10.1038/s41576-021-00434-9. - Five machine learning problems in genomics, distributional differences, dependency structure, confounding variables, information leakage, unbalanced data. Description, examples, solutions.

  • Domingos, Pedro. “A Few Useful Things to Know about Machine Learning.” Communications of the ACM 55, no. 10 (October 1, 2012): 78. https://doi.org/10.1145/2347736.2347755. Twelve lessons for machine learning. Overview of machine learning problems and algorithms, problem of overfitting, causes and solutions, curse of dimensionality, issues with high-dimensional data, feature engineering, bagging, boosting, stacking, model sparsity. Video lectures

ML Tools

  • mlr3 - Machine learning in R R package, the unified interface to classification, regression, survival analysis, and other machine learning tasks. GitHub repo, mlr3gallery - Examples of problems and code solutions, mlr3 Manual - mlr3 bookdown. More on the mlr3 package site, including videos

ML Misc

Material in Russian

  • Scientific_graphics_in_python - matplotlib for scientific graphics. 3 parts, 13 chapters. By Pavel Shabanov

  • ml-course-hse - machine learning course at the Computer Sciences Department, High Schoool of Economy. Multiple years, videos

  • mlcourse_open - OpenDataScience Machine Learning course (Both in English and Russian). Python-based ML course, with video lectures. Video

  • DL_CSHSE_spring2018 - Deep learning, Anton Osokin, Higher School of Economics, Computer Sciences Department (Russian), course material, and video lectures

  • Ordinary Differential Equations - Обыкновенные дифференциальные уравнения, Интерактивный учебник, Илья Щуров (НИУ ВШЭ)

  • Calculus - Математический анализ, Записки лекций, Илья Щуров (НИУ ВШЭ). Tweet

  • mathprofi.ru - Высшая математика – просто и доступно. Mirror