Skip to content

Commit

Permalink
doc(agg): add an example with custom UDF spec (#586)
Browse files Browse the repository at this point in the history
  • Loading branch information
shcheklein authored Nov 12, 2024
1 parent b0e3a32 commit 02e8efe
Showing 1 changed file with 21 additions and 1 deletion.
22 changes: 21 additions & 1 deletion src/datachain/lib/dc.py
Original file line number Diff line number Diff line change
Expand Up @@ -895,7 +895,7 @@ def agg(
2. Group-based UDF function input: Instead of individual rows, the function
receives a list all rows within each group defined by `partition_by`.
Example:
Examples:
```py
chain = chain.agg(
total=lambda category, amount: [sum(amount)],
Expand All @@ -904,6 +904,26 @@ def agg(
)
chain.save("new_dataset")
```
An alternative syntax, when you need to specify a more complex function:
```py
# It automatically resolves which columns to pass to the function
# by looking at the function signature.
def agg_sum(
file: list[File], amount: list[float]
) -> Iterator[tuple[File, float]]:
yield file[0], sum(amount)
chain = chain.agg(
agg_sum,
output={"file": File, "total": float},
# Alternative syntax is to use `C` (short for Column) to specify
# a column name or a nested column, e.g. C("file.path").
partition_by=C("category"),
)
chain.save("new_dataset")
```
"""
udf_obj = self._udf_to_obj(Aggregator, func, params, output, signal_map)
return self._evolve(
Expand Down

0 comments on commit 02e8efe

Please sign in to comment.