This chapter presents two unsupervised learning techniques that leverage deep learning: autoencoders, which have been around for decades, and Generative Adversarial Networks (GANs), which were introduced by Ian Goodfellow in 2014 and which Yann LeCun has called the most exciting idea in AI in the last ten years.
- An autoencoder is a neural network trained to reproduce the input while learning a new representation of the data, encoded by the parameters of a hidden layer. Autoencoders have long been used for nonlinear dimensionality reduction and manifold learning. More recently, autoencoders have been designed as generative models that learn probability distributions over observed and latent variables. A variety of designs leverage the feedforward network, Convolutional Neural Network (CNN), and recurrent neural network (RNN) architectures we covered in the last three chapters.
- GANs are a recent innovation that train two neural nets—a generator and a discriminator—in a competitive setting. The generator aims to produce samples that the discriminator is unable to distinguish from a given class of training data. The result is a generative model capable of producing new (fake) samples that are representative of a certain target distribution. GANs have produced a wave of research and can be successfully applied in many domains. An example from the medical domain that could potentially be highly relevant for trading is the generation of time-series data that simulates alternative trajectories and can be used to train supervised or reinforcement algorithms.
More specifically, this chapter covers:
-
Which types of autoencoders are of practical use and how they work
-
How to build and train autoencoders using Python
-
How GANs work, why they're useful, and how they could be applied to trading
-
How to build GANs using Python
-
Unsupervised Learning, Yann LeCun, 2016
An autoencoder, in contrast, is a neural network designed exclusively to learn a new representation, that is, an encoding of the input. To this end, the training forces the network to faithfully reproduce the input. Since autoencoders typically use the same data as input and output, they are also considered an instance of self-supervised learning. In the process, the parameters of a hidden layer become the code that represents the input.
- Autoencoders, Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning Book, Chapter 14, MIT Press 2016
A traditional use case includes dimensionality reduction, achieved by limiting the size of the hidden layer so that it performs lossy compression. Such an autoencoder is called undercomplete and the purpose is to force it to learn the most salient properties of the data by minimizing a loss function. In addition to feedforward architectures, autoencoders can also use convolutional layers to learn hierarchical feature representations.
The powerful capabilities of neural networks to represent complex functions require tight limitations of the capacity of the encoder and decoder to force the extraction of a useful signal rather than noise. In other words, when it is too easy for the network to recreate the input, it fails to learn only the most interesting aspects of the data. This challenge is similar to the overfitting phenomenon that frequently occurs when using models with a high capacity for supervised learning. Just as in these settings, regularization can help by adding constraints to the autoencoder that facilitate the learning of a useful representation.
Sequence-to-sequence autoencoders are based on RNN components, such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRUs). They learn a compressed representation of sequential data and have been applied to video, text, audio, and time-series data.
- A ten-minute introduction to sequence-to-sequence learning in Keras, Francois Chollet, September 2017
- Unsupervised Learning of Video Representations using LSTMs, Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov, 2016
Variational Autoencoders (VAE) are more recent developments focused on generative modeling. More specifically, VAEs are designed to learn a latent variable model for the input data. Note that we encountered latent variables in Chapter 14, Topic Modeling.
Hence, VAEs do not let the network learn arbitrary functions as long as it faithfully reproduces the input. Instead, they aim to learn the parameters of a probability distribution that generates the input data. In other words, VAEs are generative models because, if successful, you can generate new data points by sampling from the distribution learned by the VAE.
- Auto-encoding variational bayes, Diederik P Kingma, Max Welling, 2014
The Keras library makes it fairly straightforward to build various types of autoencoders and the following examples are adapted from Keras' tutorials.
The notebook deep_autoencoders illustrates how to implement several of the autoencoder models introduced in the preceding section using Keras. This includes autoencoders using deep feedforward nets and sparsity constraints.
The notebook convolutional_denoising_autoencoders goes on to demonstrate how to implement convolutionals and denoising autencoders to recover corrupted image inputs.
Sequence-to-sequence autoencoders are based on RNN components like long short-term memory (LSTM) or gated recurrent units (GRUs). They learn a compressed representation of sequential data and have been applied to video, text, audio, and time-series data.
- Gradient Trader Part 1: The Surprising Usefulness of Autoencoders
- Deep Learning Financial Market Data
- Motivation: Regulators identify prohibited patterns of trading activity detrimental to orderly markets. Financial Exchanges are responsible for maintaining orderly markets. (e.g. Flash Crash and Hound of Hounslow.)
- Challenge: Identify prohibited trading patterns quickly and efficiently. Goal: Build a trading pattern search function using Deep Learning. Given a sample trading pattern identify similar patterns in historical LOB data.
The notebook variational_autoencoder shows how to build a Variational Autoencoder using Keras.
The supervised learning algorithms that we focused on for most of this book receive input data that's typically complex and predicts a numerical or categorical label that we can compare to the ground truth to evaluate its performance. These algorithms are also called discriminative models because they learn to differentiate between different output classes.
The goal of generative models is to produce complex output, such as realistic images, given simple input, which can even be random numbers. They achieve this by modeling a probability distribution over the possible output. This probability distribution can have many dimensions, for example, one for each pixel in an image or its character or token in a document. As a result, the model can generate output that are very likely representative of the class of output.
- NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow, 2017
- Why is unsupervised learning important?, Yoshua Bengio on Quora, 2018
- GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation, Minsuk Kahng, Nikhil Thorat, Duen Horng (Polo) Chau, Fernanda B. Viégas, and Martin Wattenberg, IEEE Transactions on Visualization and Computer Graphics, 25(1) (VAST 2018), Jan. 2019
- Generative Adversarial Networks, Ian Goodfellow, et al, 2014
- Generative Adversarial Networks: an Overview, Antonia Creswell, et al, 2017
- Generative Models, OpenAI Blog
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (DCGAN), Luke Metz et al, 2016
- Conditional Generative Adversarial Net, Medhi Mirza and Simon Osindero, 2014
- Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Xi Chen et al, 2016
- Stackgan: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, Shaoting Zhang et al, 2016
- Photo-realistic Single Image Super-resolution Using a Generative Adversarial Network, Alejando Acosta et al, 2016
- Unpaired Image-to-image Translation Using Cycle-consistent Adversarial Networks, Juan-Yan Zhu et al, 2018
- Learning What and Where to Draw, Scott Reed, et al 2016
- Fantastic GANs and where to find them
- Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs, Cristóbal Esteban, Stephanie L. Hyland, Gunnar Rätsch, 2016
- MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks, Dan Li, Dacheng Chen, Jonathan Goh, and See-Kiong Ng, 2019
- GAN — Some cool applications, Jonathan Hui, 2018
- gans-awesome-applications, curated list of awesome GAN applications
The notebook deep_convolutional_generative_adversarial_network illustrates the implementation of a GAN using Python. It uses the Deep Convolutional GAN (DCGAN) example to synthesize images from the fashion MNIST dataset
- Keras-GAN, numerous Keras GAN implementations
- PyTorch-GAN, numerous PyTorch GAN implementations