This release consists of 252 commits from 83 contributors. See credits at the end of this changelog for more information.
Breaking changes:
- Encapsulate fields of
EquivalenceProperties
#14040 (alamb) - Encapsulate fields of
EquivalenceGroup
#14039 (alamb) - Chore: refactor DataSink traits to avoid duplication #14121 (mertak-synnada)
- Make
LexOrdering::inner
non pub, add comments, update usages #14155 (alamb) - make AnalysisContext aware of empty sets to represent certainly false bounds #14279 (buraksenn)
- Increase MSRV to 1.81.0 #14330 (alamb)
- minor: remove unused is_sorted method from utils #14370 (buraksenn)
Performance related:
- Implement predicate pruning for
like
expressions (prefix matching) #12978 (adriangb) - Improve speed of
median
by implementing specialGroupsAccumulator
#13681 (Rachelint)
Implemented enhancements:
- feat(substrait): introduce consume_rel and consume_expression #13963 (vbarua)
- feat(substrait): modular substrait producer #13931 (vbarua)
- feat: support
RightAnti
forSortMergeJoin
#13680 (irenjj) - feat(optimizer): Enable filter pushdown on window functions #14026 (nuno-faria)
- feat: add
AsyncCatalogProvider
helpers for asynchronous catalogs #13800 (westonpace) - feat: add support for
LogicalPlan::DML(...)
serde #14079 (milenkovicm) - feat: add
alias()
method for DataFrame #14127 (jonahgao) - feat: Use
SchemaRef
inJoinFilter
#14182 (irenjj)
Fixed bugs:
- fix: Avoid re-wrapping planning errors Err(DataFusionError::Plan) for use in plan_datafusion_err #14000 (avkirilishin)
- fix: Preserve session id when using
ctx.enable_url_table()
#14004 (milenkovicm) - fix: incorrect error message of function_length_check #14056 (niebayes)
- fix: make get_valid_types handle TypeSignature::Numeric correctly #14060 (niebayes)
- fix: incorrect NATURAL/USING JOIN schema #14102 (jonahgao)
- fix: encode should work with non-UTF-8 binaries #14087 (mesejo)
- fix: handle scalar predicates in CASE expressions to prevent internal errors for InfallibleExprOrNull eval method #14156 (avkirilishin)
- fix: fetch is missed in the EnsureSorting #14192 (xudong963)
- fix: add support for Decimal128 and Decimal256 types in interval arithmetic #14126 (waynexia)
- fix: run sqllogictest with complete #14254 (logan-keede)
- fix: LogicalPlan::get_parameter_types fails to return all placeholders #14312 (dhegberg)
- fix: FULL OUTER JOIN and LIMIT produces wrong results #14338 (zhuqi-lucas)
- fix: LimitPushdown rule uncorrect remove some GlobalLimitExec #14245 (zhuqi-lucas)
Documentation updates:
- doc-gen: migrate scalar functions (crypto) documentation #13918 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (datetime) documentation 1/2 #13920 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (array) documentation 1/3 #13928 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (array) documentation 3/3 #13930 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (array) documentation 2/3 #13929 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (string) documentation 4/4 #13927 (Chen-Yuan-Lai)
- Consolidate example dataframe_subquery.rs into dataframe.rs #13950 (zjregee)
- Support multiple alternative syntaxes in the
user_doc
macro, porttrim
to use new macro #13952 (delamarch3) - doc-gen: migrate scalar functions (datetime) documentation 2/2 #13921 (Chen-Yuan-Lai)
- Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests #13936 (Omega359)
- Supporting writing schema metadata when writing Parquet in parallel #13866 (wiedld)
- docs: Add datafusion python 43.1.0 blog post to events page #13974 (Omega359)
- Include license and notice files in more crates #13985 (ankane)
- Update doc example to remove deprecated DELIMITER option for external tables #14002 (matthewmturner)
- Added references to IDE documentation for dev containers #14014 (Omega359)
- docs(ci): use up-to-date protoc with docs.rs #14048 (wackywendell)
- added "DEFAULT_CLI_FORMAT_OPTIONS" for cli and sqllogic test #14052 (jatin510)
- Add telemetry.sh to list of use cases #14090 (TheBuilderJR)
- minor: Add link to example for
AsyncSchemaProvider
#14062 (alamb) - Minor: Document the rationale for the lack of Cargo.lock #14071 (alamb)
- Chore: Add UDF documentation guide #14171 (xarus01)
- Added job board as a separate header in the documentation #14191 (edmondop)
- Added documentation for the Spaceship Operator (<=>) #14214 (Spaarsh)
- Improve
ScalarUDFImpl
docs #14248 (alamb) - Deprecate max statistics size properly #14188 (logan-keede)
- doc: Add funnel.io to known users list #14316 (jonbjo)
- Support arrays_overlap function (alias of
array_has_any
) #14217 (erenavsarogullari) - chore: Adding commit activity badge #14386 (comphead)
- docs: Clarify join behavior in
DataFrame::join
#14393 (rkrishn7)
Other:
- Support unparsing implicit lateral
UNNEST
plan to SQL text #13824 (goldmedal) - fix case_column_or_null with nullable when conditions #13886 (richox)
- Changed the url for downloading IMDB dataset from benchmark - Fixed Issue #13896 #13903 (Spaarsh)
- Introduce
UserDefinedLogicalNodeUnparser
for User-defined Logical Plan unparsing #13880 (goldmedal) - Preserve constant values across union operations #13805 (gokselk)
- chore(deps): update sqllogictest requirement from 0.23.0 to 0.24.0 #13902 (xudong963)
- fix : use
get_record_batch_memory_size
for calculating RecordBatch memory size in topK #13906 (getChan) - ci improvements, update protoc #13876 (Omega359)
- Introduce LogicalPlan invariants, begin automatically checking them #13651 (wiedld)
- Correct return type for initcap scalar function with utf8view #13909 (timsaucer)
- Consolidate example: simplify_udaf_expression.rs into advanced_udaf.rs #13905 (takaebato)
- Implement maintains_input_order for AggregateExec #13897 (alihan-synnada)
- Make it easier to make optimizers: Move join input swapping and related methods into PhysicalOperators #13910 (alamb)
- doc-gen: migrate scalar functions (string) documentation 3/4 #13926 (Chen-Yuan-Lai)
- Update sqllogictest requirement from 0.24.0 to 0.25.0 #13917 (dependabot[bot])
- Consolidate Examples: memtable.rs and parquet_multiple_files.rs #13913 (alamb)
- Minor : Improve hash join build side recordbatch size accuracy #13916 (getChan)
- doc-gen: migrate scalar functions (math) documentation 1/2 #13922 (Chen-Yuan-Lai)
- doc-gen: migrate scalar functions (math) documentation 2/2 #13923 (Chen-Yuan-Lai)
- Support explain query when running dfbench with clickbench #13942 (zhuqi-lucas)
- Consolidate example to_date.rs into dateframe.rs #13939 (alamb)
- Revert "Update sqllogictest requirement from 0.24.0 to 0.25.0 (#13917)" #13945 (alamb)
- doc-gen: migrate scalar functions (string) documentation 1/4 #13924 (Chen-Yuan-Lai)
- chore: Create devcontainer.json #13520 (rluvaton)
- Minor: consolidate ConfigExtension example into API docs #13954 (alamb)
- Parallelize pruning utf8 fuzz test #13947 (alamb)
- Add swap_inputs to SMJ #13984 (ozankabak)
- fix(datafusion-functions-nested):
arrow-distinct
now work with null rows #13966 (rluvaton) - Update release instructions for 44.0.0 #13959 (alamb)
- Extract postgres container from sqllogictest, update datafusion-testing pin #13971 (Omega359)
- Update rstest requirement from 0.23.0 to 0.24.0 #13977 (dependabot[bot])
- Move hash collision test to run only when merging to main #13973 (Omega359)
- Update itertools requirement from 0.13 to 0.14 #13965 (dependabot[bot])
- Change trigger, rename
hash_collision.yml
toextended.yml
and add comments #13988 (alamb) - doc-gen: migrate scalar functions (string) documentation 2/4 #13925 (Chen-Yuan-Lai)
- Update substrait requirement from 0.50 to 0.51 #13978 (dependabot[bot])
- Update release README for datafusion-cli publishing #13982 (alamb)
- Minor: sort requirement check for
Last
function'smerge_batch
#13980 (jayzhan211) - Improve deserialize_to_struct example #13958 (alamb)
- Optimize CASE expression for "expr or expr" usage. #13953 (aweltsch)
- Consolidate csv_opener.rs and json_opener.rs into a single example (#… #13981 (cj-zhukov)
- FIX : Incorrect NULL handling in BETWEEN expression #14007 (getChan)
- FIX: Out of bounds error when inserting into MemTable with zero partitions #14011 (tobixdev)
- Minor: Rewrite LogicalPlan::max_rows for Join and Union, made it easier to understand #14012 (maruschin)
- Chore: update wasm-supported crates, add tests #14005 (Lordworms)
- Use workspace rust-version for all workspace crates #14009 (Jefffrey)
- [Minor] refactor: make ArraySort public for broader access #14006 (dharanad)
- Update sqllogictest requirement from =0.24.0 to =0.26.0 #14017 (dependabot[bot])
url
dependancy update #14019 (vadimpiven)- Minor: Improve zero partition check when inserting into
MemTable
#14024 (jonahgao) - Minor: make nested functions public and implement Default trait #14030 (dharanad)
- Minor: Remove redundant implementation of
StringArrayType
#14023 (tlm365) - Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995 (kosiew)
- Fix error on
array_distinct
when input is empty #13810 #14034 (cht42) - Update petgraph requirement from 0.6.2 to 0.7.1 #14045 (dependabot[bot])
- Encapsulate fields of
OrderingEquivalenceClass
(make field non pub) #14037 (alamb) - Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options #14021 (timvw)
- Unparsing optimized (> 2 inputs) unions #14031 (MohamedAbdeen21)
- Minor: Document output schema of LogicalPlan::Aggregate and LogicalPl… #14047 (alamb)
- Simplify error handling in case.rs (#13990) #14033 (cj-zhukov)
- Custom scalar to sql overrides support for DuckDB Unparser dialect #13915 (sgrebnov)
- Improve perfomance of
reverse
function #14025 (tlm365) - Fix bug in
nth_value
whenignoreNulls
is true and no nulls in values #14042 (cht42) - minor: re-export TypeSignatureClass from the datafusion-expr package #14051 (niebayes)
- Fix clippy for Rust 1.84 #14065 (jonahgao)
- test: Add plan execution during tests for bounded source #14013 (avkirilishin)
- Bump
ctor
to0.2.9
#14069 (mbrobbel) - Refactor into
LexOrdering::collapse
,LexRequirement::collapse
avoid clone #14038 (alamb) - Bump
wasm-bindgen-*
crates #14068 (mbrobbel) - Update ctor dep to latest #14070 (berkaysynnada)
- Minor: Make
group_schema
asPhysicalGroupBy
method #14064 (jayzhan211) - Minor: Move
LimitPushdown
tests to be in the same file as the code #14076 (alamb) - Add comments to physical optimizer tests #14075 (alamb)
- Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) #13996 (zhuqi-lucas)
- chore: deprecate
ValuesExec
in favour ofMemoryExec
#14032 (jonathanc-n) - Improve performance of
find_in_set
function #14020 (tlm365) - Minor: use hashmap for
physical_exprs_contains
and movePhysicalExprRef
tophysical-expr-common
#14081 (jayzhan211) - Simplify the return type of
sql_select_to_rex()
#14088 (jonahgao) - Minor: Add a link to RecordBatchStreamAdapter to
SendableRecordBatchStream
#14084 (alamb) - Update substrait requirement from 0.51 to 0.52 #14107 (mbrobbel)
- fix null date args in range #14093 (cht42)
- Avoid Aliased Window Expr Enter Unreachable Code #14109 (berkaysynnada)
- clarify logic in nth_value window function #14104 (zjregee)
- Move JoinSelection into datafusion-physical-optimizer crate (#14073) #14085 (cj-zhukov)
- Distinguish None and empty projection in unparser #14116 (XiangpengHao)
- Add a hint about normalization in error message (#14089) #14113 (cj-zhukov)
- Add sqlite sqllogictest run to extended.yml #14101 (Omega359)
- Minor: fix duplicated SharedBitmapBuilder definitions #14122 (lewiszlw)
- doc-gen: make user_doc to work with predefined consts #14086 (ding-young)
- Return err if wildcard is not expanded before type coercion #14130 (xudong963)
- Fix combine with session config #14139 (XiangpengHao)
- Minor: move resolve_overlap a method on
OrderingEquivalenceClass
#14138 (alamb) - doc-gen: migrate scalar functions (encoding & regex) documentation #13919 (Chen-Yuan-Lai)
- bugfix: create view with multi union may get wrong schema #14133 (Curricane)
- Deduplicate function
get_final_indices_from_shared_bitmap
#14145 (lewiszlw) - Add tests for PR #14133 (view with multi unions) #14152 (Curricane)
- Update datafusion-testing git hash #14137 (Omega359)
- Reuse
on
expressions values in HashJoinExec #14131 (lewiszlw) - chore: move
SanityChecker
intophysical-optimizer
crate #14083 (mnpw) - Propagate table constraints through physical plans to optimize sort operations #14111 (gokselk)
- NestedLoopJoin Projection Pushdown #14120 (jayzhan-synnada)
- Fix: regularize order bys when consuming from substrait #14125 (gabotechs)
- Remove dependency on physical-optimizer on functions-aggregates #14134 (alamb)
- doc-gen: migrate scalar functions (other, conditional, and struct) documentation #14163 (Chen-Yuan-Lai)
- chore: fix flaky tests #14170 (xudong963)
- Upgrade arrow-rs, parquet to
54.0.0
and pyo3 to0.23.3
#14153 (Owen-CH-Leung) - Minor: Simplify Bloom Filter Check #14165 (alamb)
- Fix doctests in ScalarValue (#14164) #14178 (cj-zhukov)
- Add
ScalarValue::try_as_str
to get str value from logical strings #14167 (alamb) - Minor: Consolidate dataframe tests into core_integration #14169 (alamb)
- Add a hint about expected extension in error message in register_csv,… #14168 (cj-zhukov)
- Test: Validate memory limit for sort queries to extended test #14142 (2010YOUY01)
- refactor: switch BooleanBufferBuilder to NullBufferBuilder in sort function #14183 (Chen-Yuan-Lai)
- refactor: switch BooleanBufferBuilder to NullBufferBuilder in correlation function #14181 (Chen-Yuan-Lai)
- Minor: Rename extended test job name #14199 (alamb)
- Support spaceship operator (
<=>
) support (alias forIS NOT DISTINCT FROM
#14187 (Spaarsh) - Add benchmark for planning sorted unions #14157 (alamb)
- refactor: switch BooleanBufferBuilder to NullBufferBuilder in functions-nested functions #14201 (Chen-Yuan-Lai)
- Improve
case
expr constant handling for when<scalar>
#14159 (alamb) - Minor add ticket references to deprecated code #14174 (alamb)
- Minor: fix hash join doc #14206 (lewiszlw)
- Introduce
return_type_from_args
for ScalarFunction. #14094 (jayzhan211) - Interface for physical plan invariant checking. #13986 (wiedld)
- fix concat ws simplify #14213 (cht42)
- test: interval analysis unit tests #14189 (hiltontj)
- Move EnforceSorting into datafusion-physical-optimizer crate #14219 (buraksenn)
- Merge SortMergeJoin filtered batches into larger batches #14160 (comphead)
- Move
EnforceDistribution
intodatafusion-physical-optimizer
crate #14190 (logan-keede) - Make scalar and array handling for array_has consistent #13683 (Kimahriman)
- Faster reverse() string function for ASCII-only case #14195 (UBarney)
- Update datafusion-testing pin #14242 (alamb)
- Add publishing datafusion-sqllogictest to release README.md #14233 (alamb)
- Minor: use the minimum fetch #14221 (xudong963)
- Expose more components from sqllogictest #14249 (xudong963)
- consolidation of examples: date_time_functions #14240 (logan-keede)
- Made imdb download (data_imdb) function atomic #14225 (Spaarsh)
- chore: deprecate function with typo in name (is_expr_constant_accross_partitions) #14252 (andygrove)
- Implemented
simplify
for thestarts_with
function to convert it into a LIKE expression. #14119 (jatin510) - fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers #14223 (nuno-faria)
- Minor: Add tests for types of arithmetic operator output types #14250 (alamb)
- chore(deps): bump clap from 4.5.26 to 4.5.27 in /datafusion-cli #14264 (dependabot[bot])
- chore(deps): bump rstest from 0.22.0 to 0.24.0 in /datafusion-cli #14262 (dependabot[bot])
- consolidate physical_optimizer tests into core/tests/physical_optimizer #14244 (alamb)
- chore(deps): bump dirs from 5.0.1 to 6.0.0 in /datafusion-cli #14266 (dependabot[bot])
- implement display for ColumnarValue #14220 (zjregee)
- chore(deps): bump indexmap from 2.7.0 to 2.7.1 in /datafusion-cli #14260 (dependabot[bot])
- chore(deps): bump serde_json from 1.0.135 to 1.0.137 in /datafusion-cli #14261 (dependabot[bot])
- Add casting of
count
toInt64
inarray_repeat
function to ensure consistent integer type handling #14236 (jatin510) - Minor: Update documentations for memory pool #14278 (appletreeisyellow)
- Extract useful methods from sqllogictest bin #14267 (xudong963)
- move projection pushdown optimization logic to ExecutionPlan trait #14235 (buraksenn)
- Move All Physical Optimizer Tests to core/tests and Remove functions-aggregate Dependency #14298 (berkaysynnada)
- Add more tests showing coercing behavior with literals #14270 (alamb)
- bench: add array_agg benchmark #14302 (rluvaton)
- Complete moving PhysicalOptimizer into
datafusion-physical-optimizer
#14300 (berkaysynnada) - Minor: Update documentation about crate organization #14304 (alamb)
- Export
datafusion_physical_optimizer
, only usedatafusion
crate in the examples #14305 (alamb) - add tests to check precision loss fix #14284 (himadripal)
- Expose sqllogictest Error &&
convert_schema_to_types
#14313 (xudong963) - refactor: switch
BooleanBufferBuilder
toNullBufferBuilder
in unit tests of multi_group_by #14325 (Chen-Yuan-Lai) - minor: add unit tests for monotonicity.rs #14307 (buraksenn)
- Add relation to alias expr in schema display #14311 (phisn)
- Improve deprecation message for MemoryExec #14322 (alamb)
- Customize window frame support for dialect #14288 (Sevenannn)
- refactor: switch
BooleanBufferBuilder
toNullBufferBuilder
in an unit test of common_scalar #14339 (Chen-Yuan-Lai) - Add
ColumnStatistics::Sum
#14074 (gatesn) - refactor: switch
BooleanBufferBuilder
toNullBufferBuilder
in unit tests for unnest #14321 (Chen-Yuan-Lai) - Fix build "missing field
sum_value
in initializer ofColumnStatistics
" #14345 (alamb) - chore(deps): bump serde_json from 1.0.137 to 1.0.138 in /datafusion-cli #14351 (dependabot[bot])
- chore(deps): bump tempfile from 3.15.0 to 3.16.0 in /datafusion-cli #14350 (dependabot[bot])
- Update version in
datafusion-cli/Dockerfile
to 1.81 #14344 (alamb) - perf(array-agg): add fast path for array agg for
merge_batch
#14299 (rluvaton) - moving memory.rs out of datafusion/core #14332 (logan-keede)
- refactor: switch
BooleanBufferBuilder
toNullBufferBuilder
in binary_map #14341 (Chen-Yuan-Lai) - Restore ability to run single SLT file #14355 (findepi)
- chore(deps): bump home from 0.5.9 to 0.5.11 in /datafusion-cli #14257 (dependabot[bot])
- chore(deps): bump aws-sdk-ssooidc from 1.51.0 to 1.57.1 in /datafusion-cli #14314 (dependabot[bot])
- refactor: switch
BooleanBufferBuilder
toNullBufferBuilder
in single_group_by #14360 (Chen-Yuan-Lai) - chore(deps): bump rustyline from 14.0.0 to 15.0.0 in /datafusion-cli #14265 (dependabot[bot])
- chore(deps): bump aws-sdk-sts from 1.51.0 to 1.57.0 in /datafusion-cli #14263 (dependabot[bot])
- chore(deps): bump aws-sdk-sso from 1.50.0 to 1.56.0 in /datafusion-cli #14259 (dependabot[bot])
- chore(deps): bump korandoru/hawkeye from 5 to 6 #14354 (dependabot[bot])
- chore(deps): bump aws-sdk-ssooidc from 1.57.1 to 1.58.0 in /datafusion-cli #14369 (dependabot[bot])
- Minor: include the number of files run in sqllogictest display #14359 (alamb)
- move information_schema to datafusion-catalog #14364 (logan-keede)
- Unpin aws sdk dependencies in
datafusion-cli
#14361 (alamb) - Core: Fix incorrect searched CASE optimization #14349 (findepi)
- Feature: AggregateMonotonicity #14271 (mertak-synnada)
- Core: Fix UNION field nullability tracking #14356 (findepi)
- Revert
is_sorted
removal, deprecate instead #14388 (alamb) - FFI support for versions and alternate tokio runtimes #13937 (timsaucer)
- Do not rename struct fields when coercing types in
CASE
#14384 (alamb) - Add
TableProvider::insert_into
into FFI Bindings #14391 (davisp)
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
46 Andrew Lamb
22 Ian Lai
20 dependabot[bot]
8 Bruce Ritchie
8 xudong.w
6 Sergey Zhukov
6 logan-keede
5 Burak Şen
5 Jay Zhan
5 Jonah Gao
4 Berkay Şahin
4 Qi Zhu
4 Raz Luvaton
4 Spaarsh
4 cht42
4 张林伟
3 Aleksey Kirilishin
3 Jagdish Parihar
3 Matthijs Brobbel
3 Namgung Chan
3 Piotr Findeisen
3 Tai Le Manh
3 niebayes
3 wiedld
3 zjregee
2 Dharan Aditya
2 Goksel Kabadayi
2 Jax Liu
2 Marko Milenković
2 Oleks V
2 Tim Saucer
2 Victor Barua
2 Will Golioto
2 Xiangpeng Hao
2 irenjj
2 mertak-synnada
2 nuno-faria
1 Adam Binford
1 Adrian Garcia Badaracco
1 Alihan Çelikcan
1 Andre Weltsch
1 Andrew Kane
1 Andy Grove
1 Chunchun Ye
1 Daniel Hegberg
1 Daniel Mesejo
1 Edmondo Porcu
1 Eren Avsarogullari
1 Eugene Marushchenko
1 Gabriel
1 Himadri Pal
1 Jack Park
1 Jeffrey Vo
1 Jonas Björk
1 Jonathan Chen
1 Lordworms
1 Matthew Turner
1 Mehmet Ozan Kabak
1 Mohamed Abdeen
1 Mrinal Paliwal
1 Nicholas Gates
1 Owen Leung
1 Paul J. Davis
1 Qianqian
1 Rohan Krishnaswamy
1 Ruihang Xia
1 Sergei Grebnov
1 Takahiro Ebato
1 TheBuilderJR
1 Tim Van Wassenhove
1 Tobias Schwarzinger
1 Trevor Hilton
1 UBarney
1 Vadim Piven
1 Wendell Smith
1 Weston Pace
1 Yongting You
1 Zhang Li
1 delamarch3
1 ding-young
1 kamille
1 kosiew
1 phisn
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.