This folder contains pipeline templates and samples for Transformer.
The following templates/samples are currently available:
Name | Description |
---|---|
Clickstream Analysis on Amazon EMR, Amazon Redshift and Elasticsearch | Ingest raw clickstream logs from Amazon S3, perform aggregations and store those on Amazon Redshift and ElasticSearch for analysis |
ML - Train NLP Model in PySpark | Train a Spark MLlib Logistic Regression model for Natural Language Processing (NLP) using PySpark processor |
ML - Train Random Forest Regression Model in Scala | Train a Spark MLlib Random Forest Regression model using Scala processor |
Slowly Changing Dimension - Type 2 | Slowly Changing Dimension - Type 2 |
Spark ETL To Derive Sales Insights on Azure HDInsight And Power BI | Extract raw data and transform it (cleanse and curate) before storing it in multiple destinations for efficient downstream analysis |
Tx Retail Inventory - Join Agg Repartition | Example using Join, Aggregation and Repartition |
Tx Scala UDF | Example using Scala to create, register and use a User-Defined Function |
Tx Slowly Changing Dimensions - Type 1 | Slowly Changing Dimension (SCD) - Type 1 |
For any queries, questions, comments related to these pipelines reach out on any of these channels: