Releases · rapidsai/cudf

09 Aug 17:06

raydouglass

v23.08.00

8150d38

v23.08.00

🚨 Breaking Changes

Enforce deprecations and add clarifications around existing deprecations (#13710) @galipremsagar
Separate MurmurHash32 from hash_functions.cuh (#13681) @davidwendt
Avoid storing metadata in pointers in ORC and Parquet writers (#13648) @vuule
Expose streams in all public copying APIs (#13629) @vyasr
Remove deprecated cudf::strings::slice_strings (by delimiter) functions (#13628) @davidwendt
Remove deprecated cudf.set_allocator. (#13591) @bdice
Change build.sh to use pip install instead of setup.py (#13507) @vyasr
Remove unused max_rows_tensor parameter from subword tokenizer (#13463) @davidwendt
Fix decimal scale reductions in _get_decimal_type (#13224) @charlesbluca

🐛 Bug Fixes

Add CUDA version to cudf_kafka and libcudf-example build strings. (#13769) @bdice
Fix typo in wheels-test.yaml. (#13763) @bdice
Don't test strings shorter than the requested ngram size (#13758) @vyasr
Add CUDA version to custreamz build string. (#13754) @bdice
Fix writing of ORC files with empty child string columns (#13745) @vuule
Remove the erroneous "empty level" short-circuit from ORC reader (#13722) @vuule
Fix character counting when writing sliced tables into ORC (#13721) @vuule
Parquet uses row group row count if missing from header (#13712) @hyperbolic2346
Fix reading of RLE encoded boolean data from parquet files with V2 page headers (#13707) @etseidl
Fix a corner case of list lexicographic comparator (#13701) @ttnghia
Fix combined filtering and column projection in dask_cudf.read_parquet (#13697) @rjzamora
Revert fetch-rapids changes (#13696) @vyasr
Data generator - include offsets in the size estimate of list elments (#13688) @vuule
Add cuda-nvcc-impl to cudf for numba CUDA 12 (#13673) @jakirkham
Fix combined filtering and column projection in read_parquet (#13666) @rjzamora
Use thrust::identity as hash functions for byte pair encoding (#13665) @PointKernel
Fix loc-getitem ordering when index contains duplicate labels (#13659) @wence-
[REVIEW] Introduce parity with pandas for MultiIndex.loc ordering & fix a bug in Groupby with as_index (#13657) @galipremsagar
Fix memcheck error found in nvtext tokenize functions (#13649) @davidwendt
Fix has_nonempty_nulls ignoring column offset (#13647) @ttnghia
[Java] Avoid double-free corruption in case of an Exception while creating a ColumnView (#13645) @razajafri
Fix memcheck error in ORC reader call to cudf::io::copy_uncompressed_kernel (#13643) @davidwendt
Fix CUDA 12 conda environment to remove cubinlinker and ptxcompiler. (#13636) @bdice
Fix inf/NaN comparisons for FLOAT orderby in window functions (#13635) @mythrocks
Refactor Index search to simplify code and increase correctness (#13625) @wence-
Fix compile warning for unused variable in split_re.cu (#13621) @davidwendt
Fix tz_localize for dask_cudf Series (#13610) @shwina
Fix issue with no decompressed data in ORC reader (#13609) @vuule
Fix floating point window range extents. (#13606) @mythrocks
Fix localize(None) for timezone-naive columns (#13603) @shwina
Fixed a memory leak caused by Exception thrown while constructing a ColumnView (#13597) @razajafri
Handle nullptr return value from bitmask_or in distinct_count (#13590) @wence-
Bring parity with pandas in Index.join (#13589) @galipremsagar
Fix cudf.melt when there are more than 255 columns (#13588) @hcho3
Fix memory issues in cuIO due to removal of memory padding (#13586) @ttnghia
Fix Parquet multi-file reading (#13584) @etseidl
Fix memcheck error found in LISTS_TEST (#13579) @davidwendt
Fix memcheck error found in STRINGS_TEST (#13578) @davidwendt
Fix memcheck error found in INTEROP_TEST (#13577) @davidwendt
Fix memcheck errors found in REDUCTION_TEST (#13574) @davidwendt
Preemptive fix for hive-partitioning change in dask (#13564) @rjzamora
Fix an issue with dask_cudf.read_csv when lines are needed to be skipped (#13555) @galipremsagar
Fix out-of-bounds memory write in cudf::dictionary::detail::concatenate (#13554) @davidwendt
Fix the null mask size in json reader (#13537) @karthikeyann
Fix cudf::strings::strip for all-empty input column (#13533) @davidwendt
Make sure to build without isolation or installing dependencies (#13524) @vyasr
Remove preload lib from CMake for now (#13519) @vyasr
Fix missing separator after null values in JSON writer (#13503) @karthikeyann
Ensure single_lane_block_sum_reduce is safe to call in a loop (#13488) @wence-
Update all versions in pyproject.toml files. (#13486) @bdice
Remove applying nvbench that doesn't exist in 23.08 (#13484) @robertmaynard
Fix chunked Parquet reader benchmark (#13482) @vuule
Update JNI JSON reader column compatability for Spark (#13477) @revans2
Fix unsanitized output of scan with strings (#13455) @davidwendt
Reject functions without bytecode from _can_be_jitted in GroupBy Apply (#13429) @brandon-b-miller
Fix decimal scale reductions in _get_decimal_type (#13224) @charlesbluca

📖 Documentation

Fix doxygen groups for io data sources and sinks (#13718) @davidwendt
Add pandas compatibility note to DataFrame.query docstring (#13693) @beckernick
Add pylibcudf to developer guide (#13639) @vyasr
Fix repeated words in doxygen text (#13598) @karthikeyann
Update docs for top-level API. (#13592) @bdice
Fix the the doxygen text for cudf::concatenate and other places (#13561) @davidwendt
Document stream validation approach used in testing (#13556) @vyasr
Cleanup doc repetitions in libcudf (#13470) @karthikeyann

🚀 New Features

Support min and max aggregations for list type in groupby and reduction (#13676) @ttnghia
Add nvtext::jaccard_index API for strings columns (#13669) @davidwendt
Add read_parquet_metadata libcudf API (#13663) @karthikeyann
Expose streams in all public copying APIs (#13629) @vyasr
Add XXHash_64 hash function to cudf (#13612) @davidwendt
Java support: Floating point order-by columns for RANGE window functions (#13595) @mythrocks
Use cuco::static_map to build string dictionaries in ORC writer (#13580) @vuule
Add pylibcudf subpackage with gather implementation (#13562) @vyasr
Add JNI for lists::concatenate_list_elements (#13547) @ttnghia
Enable nested types for lists::concatenate_list_elements (#13545) @ttnghia
Add unicode encoding for string columns in JSON writer (#13539) @karthikeyann
Remove numba kernels from find_index_of_val (#13517) @brandon-b-miller
Floating point order-by columns for RANGE window functions (#13512) @mythrocks
Parse column chunk metadata statistics in parquet reader (#13472) @karthikeyann
Add abs function to apply (#13408) @brandon-b-miller
[FEA] AST filtering in parquet reader (#13348) @karthikeyann
[FEA] Adds option to recover from invalid JSON lines in JSON tokenizer (#13344) @elstehle
Ensure cccl packages don't clash with upstream version (#13235) @robertmaynard
Update struct_minmax_util to experimental row comparator (#13069) @divyegala
Add stream parameter to hashing APIs (#12090) @vyasr

🛠️ Improvements

Pin dask and distributed for 23.08 release (#13802) @galipremsagar
Relax protobuf pinnings. (#13770) @bdice
Switch fully unbounded window functions to use aggregations (#13727) @mythrocks
Switch to new wheel building pipeline (#13723) @vyasr
Revert CUDA 12.0 CI workflows to branch-23.08. (#13719) @bdice
Adding identify minimum version requirement (#13713) @hyperbolic2346
Enforce deprecations and add clarifications around existing deprecations (#13710) @galipremsagar
Optimize ORC reader performance for list data (#13708) @vyasr
fix limit overflow message in a docstring (#13703) @ahmet-uyar
Alleviates JSON parser's need for multi-file sources to end with a newline (#13702) @elstehle
Update cython-lint and replace flake8 with ruff (#13699) @vyasr
Add __dask_tokenize__ definitions to cudf classes (#13695) @rjzamora
Convert libcudf hashing benchmarks to nvbench (#13694) @davidwendt
Separate MurmurHash32 from hash_functions.cuh (#13681) @davidwendt
Improve performance of cudf::strings::split on whitespace (#13680) @davidwendt
Allow ORC and Parquet writers to write nullable columns without nulls as non-nullable (#13675) @vuule
Raise a NotImplementedError in to_datetime when utc is passed (#13670) @shwina
Add rmm_mode parameter to nvbench base fixture (#13668) @davidwendt
Fix multiindex loc ordering in pandas-compat mode (#13660) @wence-
Add nvtext hash_character_ngrams function (#13654) @davidwendt
Avoid storing metadata in pointers in ORC and Parquet writers (#13648) @vuule
Acquire spill lock in to/from_arrow (#13646) @shwina
Expose stable versions of libcudf sort routines (#13634) @wence-
Separate out hash_test.cpp source for each hash API (#13633) @davidwendt
Remove deprecated cudf::strings::slice_strings (by delimiter) functions (#13628) @davidwendt
Create separate libcudf hash APIs for each supported hash function (#13626) @davidwendt
Add convert_dtypes API (#13623) @shwina
Clean up cupy in dependencies.yaml. (#13617) @bdice
Use cuda-version to constrain cudatoolkit. (#13615) @bdice
Add murmurhash3_x64_128 function to libcudf (#13604) @davidwendt
Performance improvement for cudf::strings::like (#13594) @davidwendt
Remove deprecated cudf.set_allocator. (#13591) @bdice
Clean up cudf device atomic with cuda::atomic_ref (#13583) @PointKernel
Add java bindings for distinct count (#13573) @revans2
Use nvcomp conda package. (#13566) @bdice
Add exception to string_scalar if input string exceeds size_type (#13560) @davidwendt
Add dispatch for cudf.Dataframe to/from pyarrow.Table conversion (#13558) @rjzamora
Get rid of cuco::pair_type aliases (#13553) @PointKernel
Introduce parity with pandas when sort=False in Groupby (#13551) @galipremsagar
Update CMake in docker to 3.26.4 (#13550) @NvTimLi...

Contributors

robertmaynard, madsbk, and 30 other contributors

Assets 2

29 Jun 13:28

raydouglass

v23.06.01

6a548b0

v23.06.01

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Bump typing_extensions minimum version to 4.0.0 (#13618) @shwina
Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to set the null count on returning columns (#13331) @davidwendt
Change cudf::detail::concatenate_masks to return null-count (#13330) @davidwendt
Move meta calculation in `dask_cu...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

07 Jun 15:25

raydouglass

v23.06.00

f881d40

v23.06.00

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to set the null count on returning columns (#13331) @davidwendt
Change cudf::detail::concatenate_masks to return null-count (#13330) @davidwendt
Move meta calculation in dask_cudf.read_parquet (#13327) @rjzamora
Changes to support Numpy >...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

07 Nov 21:43

raydouglass

v23.04.01

7e070fc

v23.04.01

🚨 Breaking Changes

Pin dask and distributed for release (#13070) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Update minimum pandas and numpy pinnings (#12887) @galipremsagar
Deprecate names & dtype in Index.copy (#12825) @galipremsagar
Deprecate Index.is_* methods (#12820) @galipremsagar
Deprecate datetime_is_numeric from describe (#12818) @galipremsagar
Deprecate na_sentinel in factorize (#12817) @galipremsagar
Make string methods return a Series with a useful Index (#12814) @shwina
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings (#12609) @davidwendt
Replace message parsing with throwing more specific exceptions (#12426) @vyasr

🐛 Bug Fixes

Pin curand version (#13127) @vyasr
Fix memcheck script to execute only _TEST files found in bin/gtests/libcudf (#13006) @davidwendt
Fix DataFrame constructor to broadcast scalar inputs properly (#12997) @galipremsagar
Drop force_nullable_schema from chunked parquet writer (#12996) @galipremsagar
Fix gtest column utility comparator diff reporting (#12995) @davidwendt
Handle index names while performing groupby (#12992) @galipremsagar
Fix __setitem__ on string columns when the scalar value ends in a null byte (#12991) @wence-
Fix sort_values when column is all empty strings (#12988) @eriknw
Remove unused variable and fix memory issue in ORC writer (#12984) @ttnghia
Pre-emptive fix for upstream dask.dataframe.read_parquet changes (#12983) @rjzamora
Remove MANIFEST.in use auto-generated one for sdists and package_data for wheels (#12960) @vyasr
Update to use rapids-export(COMPONENTS) feature. (#12959) @robertmaynard
cudftestutil supports static gtest dependencies (#12957) @robertmaynard
Include gtest in build environment. (#12956) @vyasr
Correctly handle scalar indices in Index.__getitem__ (#12955) @wence-
Avoid building cython twice (#12945) @galipremsagar
Fix set index error for Series rolling window operations (#12942) @galipremsagar
Fix calculation of null counts for Parquet statistics (#12938) @etseidl
Preserve integer dtype of hive-partitioned column containing nulls (#12930) @rjzamora
Use get_current_device_resource for intermediate allocations in COLLECT_LIST window code (#12927) @karthikeyann
Mark dlpack tensor deleter as noexcept to match PyCapsule_Destructor signature. (#12921) @bdice
Fix conda recipe post-link.sh typo (#12916) @pentschev
min_rows and num_rows are swapped in ComputePageSizes declaration in Parquet reader (#12886) @etseidl
Expect cupy to now support bool arrays for dlpack. (#12883) @vyasr
Use python -m pytest for nightly wheel tests (#12871) @bdice
Parquet writer column_size() should return a size_t (#12870) @etseidl
Fix cudf::hash_partition kernel launch error with decimal128 types (#12863) @davidwendt
Fix an issue with parquet chunked reader undercounting string lengths. (#12859) @nvdbaranec
Remove tokenizers pre-install pinning. (#12854) @vyasr
Fix parquet RangeIndex bug (#12838) @rjzamora
Remove KAFKA_HOST_TEST from compute-sanitizer check (#12831) @davidwendt
Make string methods return a Series with a useful Index (#12814) @shwina
Tell cudf_kafka to use header-only fmt (#12796) @vyasr
Add GroupBy.dtypes (#12783) @galipremsagar
Fix a leak in a test and clarify some test names (#12781) @revans2
Fix bug in all-null list due to join_list_elements special handling (#12767) @karthikeyann
Add try/except for expected null-schema error in read_parquet (#12756) @rjzamora
Throw an exception if an unsupported page encoding is detected in Parquet reader (#12754) @etseidl
Fix a bug with num_keys in _scatter_by_slice (#12749) @thomcom
Bump pinned rapids wheel deps to 23.4 (#12735) @sevagh
Rework logic in cudf::strings::split_record to improve performance (#12729) @davidwendt
Add always_nullable flag to Dremel encoding (#12727) @divyegala
Fix memcheck read error in compound segmented reduce (#12722) @davidwendt
Fix faulty conditional logic in JIT GroupBy.apply (#12706) @brandon-b-miller
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Handle parquet list data corner case (#12698) @nvdbaranec
Fix missing trailing comma in json writer (#12688) @karthikeyann
Remove child fom newCudaAsyncMemoryResource (#12681) @abellina
Handle bool types in round API (#12670) @galipremsagar
Ensure all of device bitmask is initialized in from_arrow (#12668) @wence-
Fix from_arrow to load a sliced arrow table (#12665) @galipremsagar
Fix dask-cudf read_parquet bug for multi-file aggregation (#12663) @rjzamora
Fix AllocateLikeTest gtests reading uninitialized null-mask (#12643) @davidwendt
Fix find_common_dtype and values to handle complex dtypes (#12537) @galipremsagar
Fix fetching of MultiIndex values when a label is passed (#12521) @galipremsagar
Fix Series comparison vs scalars (#12519) @brandon-b-miller
Allow casting from UDFString back to StringView to call methods in strings_udf (#12363) @brandon-b-miller

📖 Documentation

Fix GroupBy.apply doc examples rendering (#12994) @brandon-b-miller
add sphinx building and s3 uploading for dask-cudf docs (#12982) @quasiben
Add developer documentation forbidding default parameters in detail APIs (#12978) @vyasr
Add README symlink for dask-cudf. (#12946) @bdice
Remove return type from @return doxygen tags (#12908) @davidwendt
Fix docs build to be pydata-sphinx-theme=0.13.0 compatible (#12874) @galipremsagar
Add skeleton API and prose documentation for dask-cudf (#12725) @wence-
Enable doctests for GroupBy methods (#12658) @brandon-b-miller
Add comment about CUB patch for SegmentedSortInt.Bool gtest (#12611) @davidwendt

🚀 New Features

Add JNI method for strings::replace multi variety (#12979) @NVnavkumar
Add nunique aggregation support for cudf::segmented_reduce (#12972) @davidwendt
Refactor orc chunked writer (#12949) @ttnghia
Make Parquet writer nullable option application to single table writes (#12933) @vuule
Refactor io::orc::ProtobufWriter (#12877) @ttnghia
Make timezone table independent from ORC (#12805) @vuule
Cache JIT GroupBy.apply functions (#12802) @brandon-b-miller
Implement initial support for avro logical types (#6482) (#12788) @tpn
Update tests/column_utilities to use experimental::equality row comparator (#12777) @divyegala
Update distinct/unique_count to experimental::row hasher/comparator (#12776) @divyegala
Update hash_partition to use experimental::row::row_hasher (#12761) @divyegala
Update is_sorted to use experimental::row::lexicographic (#12752) @divyegala
Update default data source in cuio reader benchmarks (#12740) @PointKernel
Reenable stream identification library in CI (#12714) @vyasr
Add regex_program strings splitting java APIs and tests (#12713) @cindyyuanjiang
Add regex_program strings replacing java APIs and tests (#12701) @cindyyuanjiang
Add regex_program strings extract java APIs and tests (#12699) @cindyyuanjiang
Variable fragment sizes for Parquet writer (#12685) @etseidl
Add segmented reduction support for fixed-point types (#12680) @davidwendt
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Add regex_program searching APIs and related java classes (#12666) @cindyyuanjiang
Add logging to libcudf (#12637) @vuule
Add compound aggregations to cudf::segmented_reduce (#12573) @davidwendt
Convert rank to use to experimental row comparators (#12481) @divyegala
Use rapids-cmake parallel testing feature (#12451) @robertmaynard
Enable detection of undesired stream usage (#12089) @vyasr

🛠️ Improvements

Pin dask and distributed for release (#13070) @galipremsagar
Pin cupy in wheel tests to supported versions (#13041) @vyasr
Pin numba version (#13001) @vyasr
Rework gtests SequenceTest to remove using namepace cudf (#12985) @davidwendt
Stop setting package version attribute in wheels (#12977) @vyasr
Move detail reduction functions to cudf::reduction::detail namespace (#12971) @davidwendt
Remove default detail mrs: part7 (#12970) @vyasr
Remove default detail mrs: part6 (#12969) @vyasr
Remove default detail mrs: part5 (#12968) @vyasr
Remove default detail mrs: part4 (#12967) @vyasr
Remove default detail mrs: part3 (#12966) @vyasr
Remove default detail mrs: part2 (#12965) @vyasr
Remove default detail mrs: part1 (#12964) @vyasr
Add force_nullable_schema parameter to Parquet writer. (#12952) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Remove remaining default stream parameters (#12943) @vyasr
Fix cudf::segmented_reduce gtest for ANY aggregation (#12940) @davidwendt
Implement groupby.head and groupby.tail (#12939) @wence-
Fix libcudf gtests to pass null-count=0 for empty validity masks (#12923) @davidwendt
Migrate parquet encoding to use experimental row operators (#12918) @PointKernel
Fix benchmarks coded in namespace cudf and using namespace cudf (#12915) @karthikeyann
Fix io/text gtests coded in namespace cudf::test (#12914) @karthikeyann
Pass SCCACHE_S3_USE_SSL to conda builds (#12910) @ajschmidt8
Fix FST, JSON gtests & benchmarks coded in namespace cudf::test (#12907) @karthikeyann
Generate pyproject dependencies using dfg (#12906) @vyasr
Update libcudf counting functions to specify cudf::size_type (#12904) @davidwendt
Fix moto env vars & pass AWS_SESSION_TOKEN to conda builds (#12902) @ajschmidt8
Rewrite CSV wri...

Contributors

robertmaynard, thomcom, and 38 other contributors

Assets 2

12 Apr 14:26

raydouglass

v23.04.00

4d31a6f

v23.04.00

🚨 Breaking Changes

Pin dask and distributed for release (#13070) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Update minimum pandas and numpy pinnings (#12887) @galipremsagar
Deprecate names & dtype in Index.copy (#12825) @galipremsagar
Deprecate Index.is_* methods (#12820) @galipremsagar
Deprecate datetime_is_numeric from describe (#12818) @galipremsagar
Deprecate na_sentinel in factorize (#12817) @galipremsagar
Make string methods return a Series with a useful Index (#12814) @shwina
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings (#12609) @davidwendt
Replace message parsing with throwing more specific exceptions (#12426) @vyasr

🐛 Bug Fixes

Fix memcheck script to execute only _TEST files found in bin/gtests/libcudf (#13006) @davidwendt
Fix DataFrame constructor to broadcast scalar inputs properly (#12997) @galipremsagar
Drop force_nullable_schema from chunked parquet writer (#12996) @galipremsagar
Fix gtest column utility comparator diff reporting (#12995) @davidwendt
Handle index names while performing groupby (#12992) @galipremsagar
Fix __setitem__ on string columns when the scalar value ends in a null byte (#12991) @wence-
Fix sort_values when column is all empty strings (#12988) @eriknw
Remove unused variable and fix memory issue in ORC writer (#12984) @ttnghia
Pre-emptive fix for upstream dask.dataframe.read_parquet changes (#12983) @rjzamora
Remove MANIFEST.in use auto-generated one for sdists and package_data for wheels (#12960) @vyasr
Update to use rapids-export(COMPONENTS) feature. (#12959) @robertmaynard
cudftestutil supports static gtest dependencies (#12957) @robertmaynard
Include gtest in build environment. (#12956) @vyasr
Correctly handle scalar indices in Index.__getitem__ (#12955) @wence-
Avoid building cython twice (#12945) @galipremsagar
Fix set index error for Series rolling window operations (#12942) @galipremsagar
Fix calculation of null counts for Parquet statistics (#12938) @etseidl
Preserve integer dtype of hive-partitioned column containing nulls (#12930) @rjzamora
Use get_current_device_resource for intermediate allocations in COLLECT_LIST window code (#12927) @karthikeyann
Mark dlpack tensor deleter as noexcept to match PyCapsule_Destructor signature. (#12921) @bdice
Fix conda recipe post-link.sh typo (#12916) @pentschev
min_rows and num_rows are swapped in ComputePageSizes declaration in Parquet reader (#12886) @etseidl
Expect cupy to now support bool arrays for dlpack. (#12883) @vyasr
Use python -m pytest for nightly wheel tests (#12871) @bdice
Parquet writer column_size() should return a size_t (#12870) @etseidl
Fix cudf::hash_partition kernel launch error with decimal128 types (#12863) @davidwendt
Fix an issue with parquet chunked reader undercounting string lengths. (#12859) @nvdbaranec
Remove tokenizers pre-install pinning. (#12854) @vyasr
Fix parquet RangeIndex bug (#12838) @rjzamora
Remove KAFKA_HOST_TEST from compute-sanitizer check (#12831) @davidwendt
Make string methods return a Series with a useful Index (#12814) @shwina
Tell cudf_kafka to use header-only fmt (#12796) @vyasr
Add GroupBy.dtypes (#12783) @galipremsagar
Fix a leak in a test and clarify some test names (#12781) @revans2
Fix bug in all-null list due to join_list_elements special handling (#12767) @karthikeyann
Add try/except for expected null-schema error in read_parquet (#12756) @rjzamora
Throw an exception if an unsupported page encoding is detected in Parquet reader (#12754) @etseidl
Fix a bug with num_keys in _scatter_by_slice (#12749) @thomcom
Bump pinned rapids wheel deps to 23.4 (#12735) @sevagh
Rework logic in cudf::strings::split_record to improve performance (#12729) @davidwendt
Add always_nullable flag to Dremel encoding (#12727) @divyegala
Fix memcheck read error in compound segmented reduce (#12722) @davidwendt
Fix faulty conditional logic in JIT GroupBy.apply (#12706) @brandon-b-miller
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Handle parquet list data corner case (#12698) @nvdbaranec
Fix missing trailing comma in json writer (#12688) @karthikeyann
Remove child fom newCudaAsyncMemoryResource (#12681) @abellina
Handle bool types in round API (#12670) @galipremsagar
Ensure all of device bitmask is initialized in from_arrow (#12668) @wence-
Fix from_arrow to load a sliced arrow table (#12665) @galipremsagar
Fix dask-cudf read_parquet bug for multi-file aggregation (#12663) @rjzamora
Fix AllocateLikeTest gtests reading uninitialized null-mask (#12643) @davidwendt
Fix find_common_dtype and values to handle complex dtypes (#12537) @galipremsagar
Fix fetching of MultiIndex values when a label is passed (#12521) @galipremsagar
Fix Series comparison vs scalars (#12519) @brandon-b-miller
Allow casting from UDFString back to StringView to call methods in strings_udf (#12363) @brandon-b-miller

📖 Documentation

Fix GroupBy.apply doc examples rendering (#12994) @brandon-b-miller
add sphinx building and s3 uploading for dask-cudf docs (#12982) @quasiben
Add developer documentation forbidding default parameters in detail APIs (#12978) @vyasr
Add README symlink for dask-cudf. (#12946) @bdice
Remove return type from @return doxygen tags (#12908) @davidwendt
Fix docs build to be pydata-sphinx-theme=0.13.0 compatible (#12874) @galipremsagar
Add skeleton API and prose documentation for dask-cudf (#12725) @wence-
Enable doctests for GroupBy methods (#12658) @brandon-b-miller
Add comment about CUB patch for SegmentedSortInt.Bool gtest (#12611) @davidwendt

🚀 New Features

Add JNI method for strings::replace multi variety (#12979) @NVnavkumar
Add nunique aggregation support for cudf::segmented_reduce (#12972) @davidwendt
Refactor orc chunked writer (#12949) @ttnghia
Make Parquet writer nullable option application to single table writes (#12933) @vuule
Refactor io::orc::ProtobufWriter (#12877) @ttnghia
Make timezone table independent from ORC (#12805) @vuule
Cache JIT GroupBy.apply functions (#12802) @brandon-b-miller
Implement initial support for avro logical types (#6482) (#12788) @tpn
Update tests/column_utilities to use experimental::equality row comparator (#12777) @divyegala
Update distinct/unique_count to experimental::row hasher/comparator (#12776) @divyegala
Update hash_partition to use experimental::row::row_hasher (#12761) @divyegala
Update is_sorted to use experimental::row::lexicographic (#12752) @divyegala
Update default data source in cuio reader benchmarks (#12740) @PointKernel
Reenable stream identification library in CI (#12714) @vyasr
Add regex_program strings splitting java APIs and tests (#12713) @cindyyuanjiang
Add regex_program strings replacing java APIs and tests (#12701) @cindyyuanjiang
Add regex_program strings extract java APIs and tests (#12699) @cindyyuanjiang
Variable fragment sizes for Parquet writer (#12685) @etseidl
Add segmented reduction support for fixed-point types (#12680) @davidwendt
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Add regex_program searching APIs and related java classes (#12666) @cindyyuanjiang
Add logging to libcudf (#12637) @vuule
Add compound aggregations to cudf::segmented_reduce (#12573) @davidwendt
Convert rank to use to experimental row comparators (#12481) @divyegala
Use rapids-cmake parallel testing feature (#12451) @robertmaynard
Enable detection of undesired stream usage (#12089) @vyasr

🛠️ Improvements

Pin dask and distributed for release (#13070) @galipremsagar
Pin cupy in wheel tests to supported versions (#13041) @vyasr
Pin numba version (#13001) @vyasr
Rework gtests SequenceTest to remove using namepace cudf (#12985) @davidwendt
Stop setting package version attribute in wheels (#12977) @vyasr
Move detail reduction functions to cudf::reduction::detail namespace (#12971) @davidwendt
Remove default detail mrs: part7 (#12970) @vyasr
Remove default detail mrs: part6 (#12969) @vyasr
Remove default detail mrs: part5 (#12968) @vyasr
Remove default detail mrs: part4 (#12967) @vyasr
Remove default detail mrs: part3 (#12966) @vyasr
Remove default detail mrs: part2 (#12965) @vyasr
Remove default detail mrs: part1 (#12964) @vyasr
Add force_nullable_schema parameter to Parquet writer. (#12952) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Remove remaining default stream parameters (#12943) @vyasr
Fix cudf::segmented_reduce gtest for ANY aggregation (#12940) @davidwendt
Implement groupby.head and groupby.tail (#12939) @wence-
Fix libcudf gtests to pass null-count=0 for empty validity masks (#12923) @davidwendt
Migrate parquet encoding to use experimental row operators (#12918) @PointKernel
Fix benchmarks coded in namespace cudf and using namespace cudf (#12915) @karthikeyann
Fix io/text gtests coded in namespace cudf::test (#12914) @karthikeyann
Pass SCCACHE_S3_USE_SSL to conda builds (#12910) @ajschmidt8
Fix FST, JSON gtests & benchmarks coded in namespace cudf::test (#12907) @karthikeyann
Generate pyproject dependencies using dfg (#12906) @vyasr
Update libcudf counting functions to specify cudf::size_type (#12904) @davidwendt
Fix moto env vars & pass AWS_SESSION_TOKEN to conda builds (#12902) @ajschmidt8
Rewrite CSV writer benchmark with nvbench (#12901) @PointKernel
Rework some code logic to reduce iterator and comparator inlining to improve compile time (#12900) @davidwendt
Deprecate `line_te...

Contributors

robertmaynard, thomcom, and 38 other contributors

Assets 2

29 Jun 13:28

rapids-bot

v23.06.00a

302054d

[NIGHTLY] v23.06.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Bump typing_extensions minimum version to 4.0.0 (#13618) @shwina
Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to ...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

09 Feb 16:14

raydouglass

v23.02.00

5ad4a85

v23.02.00

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
Fix erroneously skipped ORC ZSTD test (#12486) @vuule
Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
Raise warnings as errors in the test suite (#12468) @v...

Contributors

benfred, robertmaynard, and 29 other contributors

Assets 2

08 Dec 19:18

GPUtester

v22.12.01

f700408

v22.12.01

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

strings_udf: use libcudf caching of character tables (#12343) @wence-
Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow err...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

08 Dec 15:20

GPUtester

v22.12.00

baae3a6

v22.12.00

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow error during decimal binops (#12063) @galipremsagar
Change cudf::detail::...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

09 Feb 16:17

rapids-bot

v23.02.00a

480b4cc

[NIGHTLY] v23.02.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix update-version.sh (#12745) @raydouglass
Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test...

Contributors

benfred, robertmaynard, and 30 other contributors

Assets 2

Releases: rapidsai/cudf

v23.08.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.06.01

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.06.00

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.04.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.04.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

[NIGHTLY] v23.06.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.02.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v22.12.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v22.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

[NIGHTLY] v23.02.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors