Php etl is a ETL that will allow transformation of data in a simple and efficient way.
It comes in 2 components :
The rule engine allows to have configuration based transformations to transform a particular data.
This is integrated inside the ETL with the RuleTransformOperation
.
The rule engine can be used in standalone, see docs
A ETL chain is described by links
, we call those operations.
There are a few operations pre-coded in the library. You can create you own. The entry of a ETL chain is
a standard php \Iterator
.
There are 2 ways of writing a chain, either you code it; or you write describe the chain in a yaml file.
The fallowing 2 exemples does takes the same input and returns the same output. The only thing that changes is the way it's done.
First of all you need to inizialize all the objects. Hopefully I will have a Symfony bundle around to have a easier to use package.
<?php
$ruleApplier = new \Oliverde8\Component\RuleEngine\RuleApplier(
new \Psr\Log\NullLogger(),
[
new \Oliverde8\Component\RuleEngine\Rules\Get(new \Psr\Log\NullLogger()),
new \Oliverde8\Component\RuleEngine\Rules\Implode(new \Psr\Log\NullLogger()),
new \Oliverde8\Component\RuleEngine\Rules\StrToLower(new \Psr\Log\NullLogger()),
new \Oliverde8\Component\RuleEngine\Rules\StrToUpper(new \Psr\Log\NullLogger()),
new \Oliverde8\Component\RuleEngine\Rules\ExpressionLanguage(new \Psr\Log\NullLogger()),
]
);
$builder = new \Oliverde8\Component\PhpEtl\ChainBuilder();
$builder->registerFactory(new RuleTransformFactory('rule-engine-transformer', RuleTransformOperation::class, $ruleApplier));
$builder->registerFactory(new SimpleGroupingFactory('simple-grouping', SimpleGroupingOperation::class));
$builder->registerFactory(new CsvFileWriterFactory('csv-write', FileWriterOperation::class));
You can see find here more information here.
First let's prepare 2 transformations one to prepare for grouping.
transform1.yml
rules:
uid:
- implode:
values:
- 'PREFIX'
- [{get : {field: 'ID_PART1'}}]
- [{get : {field: "ID_PART2"}}]
with: "_"
You can also use the symfony expression language to write a similar rule. (This wont behave the same way if one if the fields are empty)
rules:
uid:
- expression_language:
expression: '"PREFIX_"rowData["ID_PART1"]"_"~rowData["ID_PART2"]'
And now the second one to finalize the data.
**transform2.yml**
```yaml
rules:
uid: # Fetching the uid of the first element!
- get : {field: [0, 'uid']}
label: # Fetching te label from the first if available and if not the second item.
- get : {field: [0, 'label']}
- get : {field: [1, 'label']}
# ...
# ...
<?php
$inputIterator = new Csv(__DIR__ . '/exemples/input.csv');
$operations = [];
// Prepare a `key` so we can properly group the results.
$operations[] = new RuleTransformOperation($ruleApplier, Yaml::parse('transform1.yml'), true);
// Group multiple identical lines.
$operations[] = new SimpleGroupingOperation(['uid']);
// Removing all unessery columns. just keep what wee need and flatten the result after the grouping.
$operations[] = new RuleTransformOperation($ruleApplier, Yaml::parse('transform2.yml'), false);
// Now write the results
$operations[] = new FileWriterOperation(new Writer(__DIR__ . '/exemples/output.csv'));
// We can continue doing other transformations and writing results.
// ....
// Let's process our input file..
$chainProcessor = new ChainProcessor($operations);
$context = [];
$chainProcessor->process($inputIterator, $context);
Let's create a yml file describing the transformation.
transform1:
operation: rule-engine-transformer
options:
add: true
columns:
uid:
- implode:
values:
- 'PREFIX'
- [{get : {field: 'ID_PART1'}}]
- [{get : {field: "ID_PART2"}}]
with: "_"
group:
operation: simple-grouping
options:
grouping-key: [sku]
group-identifier: [locale]
transform2:
operation: rule-engine-transformer
options:
add: false
columns:
uid: # Fetching the uid of the first element!
- get : {field: [0, 'uid']}
label: # Fetching te label from the first if available and if not the second item.
- get : {field: [0, 'label']}
- get : {field: [1, 'label']}
write:
operation: csv-write
options:
file: exemples/output.csv
Now we can run this :
<?php
$builder = new ChainBuilder();
$chainProcessor = $builder->buildChainProcessor(Yaml::parse(file_get_contents(__DIR__ . '/exemples/etl_chain.yml')));
$context = [];
$chainProcessor->process($inputIterator, $context);
As you can see we have simply used the chain builder instead of creating each operation manually.
Before creating your own operations you need to understand how the system works.
A ItemInterface
will travel througth the chain. There are multiple different type of items that will
have different effects.
-
GroupedItem Will be returned by grouping operations, these will be splitted back into individual DataItems automatically.
-
ChainBreakItem can be returned by an operation when it wishes to stop the chain. For exemple an operation could filter unwanted data.
-
DataItem is the most common item, it will trigger the
processData
method of your operation. -
ChainStopItem This item will travel thought the chain, to let now each operation that the chain has stopped. It will trigger the
processStop
method.
If an operation don't have the appropriate method to handle a particular Item, the item will simply skip the Operation.
If you wish for you operation to be usable in with the configuration builder you will also need to create a Factory for it.
-
Why isn't there a database extract ? PHP-Etl is a library, not meant to be used standalone, it is meant to be used inside symfony, or magento projects. Each of these framework/cms's have their own way of handling the database. I do plan to make a Symfony bundle at one point which could have doctrine extractor.
-
Is there a validaiton of the chain configuration ? Yes sir there is. you will get an error like this :
There was an error building the operation 'simple-grouping' :
- "grouping-key" : This field is missing.
- "file" : This field was not expected.
-
ChainBuilder
- Write unit tests.
-
Create a few more generic rules.
- Data formatting
- Numeric formatter
- ...
-
Improve docs.