Analytics Zoo

Analytics + AI Platform for Apache Spark and BigDL

What is Analytics Zoo?

Analytics Zoo makes it easy to build deep learning application on Spark and BigDL, by providing an end-to-end Analytics + AI Platform (including high level pipeline APIs, built-in deep learning models, reference use cases, etc.).

High level pipeline APIs
- nnframes: native deep learning support in Spark DataFrames and ML Pipelines
- autograd: build custom layer/loss using auto differentiation operations
- Transfer learning: customize pretained model for feature extraction or fine-tuning
- Model serving: productionize model serving and inference using POJO APIs
Built-in deep learning models
- Object detection API: high-level API and pretrained models (e.g., SSD and Faster-RCNN) for object detection
- Image classification API: high-level API and pretrained models (e.g., VGG, Inception, ResNet, MobileNet, etc.) for image classification
- Text classification API: high-level API and pre-defined models (using CNN, LSTM, etc.) for text classification
- Recommedation API: high-level API and pre-defined models (e.g., Neural Collaborative Filtering, Wide and Deep Learning, etc.) for recommendation
Reference use cases: a collection of end-to-end reference use cases (e.g., anomaly detection, sentiment analysis, fraud detection, image augmentation, object detection, variational autoencoder, etc.)

How to use Analytics Zoo?

To get started, please refer to the Python install guide or Scala install guide.
For more information, You may refer to the Analytis Zoo document website
For additional questions and discussions, you can join the Google User Group (or subscribe to the Mail List)

High level pipeline APIs

Analytics Zoo provides a set of easy-to-use, high level pipeline APIs that natively support Spark DataFrames and ML Pipelines, autograd and custom layer/loss, transfer learning, etc.

`nnframes`

nnframes provides native deep learning support in Spark DataFrames and ML Pipelines, so that you can easily build complex deep learning pipelines in just a few lines, as illustrated below. (See more details here)

Initialize NNContext and load images into DataFrames using NNImageReader

from zoo.common.nncontext import *
from zoo.pipeline.nnframes import *
sc = init_nncontext()
imageDF = NNImageReader.readImages(image_path, sc)

Process loaded data using DataFrames transformations

getName = udf(lambda row: ...)
getLabel = udf(lambda name: ...)
df = imageDF.withColumn("name", getName(col("image"))).withColumn("label", getLabel(col('name')))

Processing image using built-in feature engineering operations

from zoo.feature.image import *
transformer = RowToImageFeature() -> ImageResize(64, 64) -> ImageChannelNormalize(123.0, 117.0, 104.0) \
              -> ImageMatToTensor() -> ImageFeatureToTensor())

Define model using Keras-style APIs

from zoo.pipeline.api.keras.layers import *
from zoo.pipeline.api.keras.models import *
model = Sequential().add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1, 28, 28))) \
                .add(MaxPooling2D(pool_size=(2, 2))).add(Flatten()).add(Dense(10, activation='softmax')))

Train model using Spark ML Pipelines

classifier = NNClassifier(model, CrossEntropyCriterion(),transformer).setLearningRate(0.003) \
                .setBatchSize(40).setMaxEpoch(1).setFeaturesCol("image").setCachingSample(False)
nnModel = classifier.fit(df)

`autograd`

autograd provides automatic differentiation for math operations, so that you can easily build your own custom loss and layer (in both Python and Scala), as illustracted below. (See more details here)

Define model using Keras-style API and autograd

import zoo.pipeline.api.autograd as A
from zoo.pipeline.api.keras.layers import *
from zoo.pipeline.api.keras.models import *

input = Input(shape=[2, 20])
features = TimeDistributed(layer=Dense(30))(input)
f1 = features.index_select(1, 0)
f2 = features.index_select(1, 1)
diff = A.abs(f1 - f2)
model = Model(input, diff)

Optionally define custom loss function using autograd

def mean_absolute_error(y_true, y_pred):
    return mean(abs(y_true - y_pred), axis=1)

Train model with custom loss function

model.compile(optimizer=SGD(), loss=mean_absolute_error)
model.fit(x=..., y=...)

Transfer learning

Using the high level transfer learning APIs, you can easily customize pretrained models for feature extraction or fine-tuning. (See more details here)

Load an existing model (pretrained in Caffe)

from zoo.pipeline.api.net import *
full_model = Net.load_caffe(def_path, model_path)

Remove the last few layers

# create a new model by removing layers after pool5/drop_7x7_s1
model = full_model.new_graph(["pool5/drop_7x7_s1"])

Freeze the first few layers

# freeze layers from input to pool4/3x3_s2 inclusive
model.freeze_up_to(["pool4/3x3_s2"])

Add a few new layers

from zoo.pipeline.api.keras.layers import *
from zoo.pipeline.api.keras.models import *
inputs = Input(name="input", shape=(3, 224, 224))
inception = model.to_keras()(inputs)
flatten = Flatten()(inception)
logits = Dense(2)(flatten)
newModel = Model(inputs, logits)

Model Serving

Using the POJO model serving API, you can productionize model serving and infernece in any Java based frameworks (e.g., Spring Framework, Apache Storm, Kafka or Flink, etc.), as illustrated below:

import com.intel.analytics.zoo.pipeline.inference.AbstractInferenceModel;
import com.intel.analytics.zoo.pipeline.inference.JTensor;

public class TextClassificationModel extends AbstractInferenceModel {
    public TextClassificationModel() {
        super();
    }
}

TextClassificationModel model = new TextClassificationModel();
model.load(modelPath, weightPath);

List<JTensor> inputs = preprocess(...);
List<List<JTensor>> result = model.predict(inputs);
...

Built-in deep learning models

Analytics Zoo provides several built-in deep learning models that you can use for a variety of problem types, such as object detection, image classification, text classification, recommendation, etc.

Object detection API

Using Analytics Zoo Object Detection API (including a set of pretrained detection models such as SSD and Faster-RCNN), you can easily build your object detection applications (e.g., localizing and identifying multiple objects in images and videos), as illustrated below. (See more details here)

Download object detection models in Analytics Zoo

You can download a collection of detection models (pretrained on the PSCAL VOC dataset and COCO dataset) from detection model zoo.

Use Object Detection API for off-the-shell inference

from zoo.models.image.objectdetection import *
model = ObjectDetector.load_model(model_path)
image_set = ImageSet.read(img_path, sc)
output = model.predict_image_set(image_set)

Image classification API

Using Analytics Zoo Image Classification API (including a set of pretrained detection models such as VGG, Inception, ResNet, MobileNet, etc.), you can easily build your image classification applications, as illustrated below. (See more details here)

Download image classification models in Analytics Zoo

You can download a collection of image classification models (pretrained on the ImageNet dataset) from image classification model zoo.

Use Image classification API for off-the-shell inference

from zoo.models.image.imageclassification import *
model = ImageClassifier.load_model(model_path)
image_set = ImageSet.read(img_path, sc)
output = model.predict_image_set(image_set)

Text classification API

Analytics Zoo Text Classification API provides a set of pre-defined models (using CNN, LSTM, etc.) for text classifications. (See more details here)

Recommendation API

Analytics Zoo Recommendation API provides a set of pre-defined models (such as Neural Collaborative Filtering, Wide and Deep Learning, etc.) for recommendations. (See more details here)

Reference use cases

Analytics Zoo provides a collection of end-to-end reference use cases, including anomaly detection (for time series data), sentiment analysis, fraud detection, image augmentation, object detection, variational autoencoder, etc. (See more details here)

Name		Name	Last commit message	Last commit date
Latest commit History 681 Commits
.github		.github
apps		apps
backend		backend
docker		docker
docs		docs
pyzoo		pyzoo
scripts		scripts
zoo		zoo
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
make-dist.sh		make-dist.sh
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analytics Zoo

What is Analytics Zoo?

How to use Analytics Zoo?

High level pipeline APIs

`nnframes`

`autograd`

Transfer learning

Model Serving

Built-in deep learning models

Object detection API

Image classification API

Text classification API

Recommendation API

Reference use cases

About

Releases

Packages

Languages

License

KaylaTek/analytics-zoo

Folders and files

Latest commit

History

Repository files navigation

Analytics Zoo

What is Analytics Zoo?

How to use Analytics Zoo?

High level pipeline APIs

nnframes

autograd

Transfer learning

Model Serving

Built-in deep learning models

Object detection API

Image classification API

Text classification API

Recommendation API

Reference use cases

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`nnframes`

`autograd`

Packages