Release v22.08.00 · rapidsai/cudf

🚨 Breaking Changes

Remove legacy join APIs (#11274) @vyasr
Remove lists::drop_list_duplicates (#11236) @ttnghia
Remove Index.replace API (#11131) @vyasr
Remove deprecated Index methods from Frame (#11073) @vyasr
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Remove Arrow CUDA IPC code (#10995) @shwina
Buffer: make .ptr read-only (#10872) @madsbk

🐛 Bug Fixes

Fix distributed error related to loop_in_thread (#11428) @galipremsagar
Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
Revert "Allow CuPy 11" (#11409) @jakirkham
Fix moto timeouts (#11369) @galipremsagar
Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
Fix memory_usage() for ListSeries (#11355) @thomcom
Fix constructing Column from column_view with expired mask (#11354) @shwina
Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
Fix issue related to numpy array and category dtype (#11282) @galipremsagar
Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
Fix compile error due to missing header (#11257) @ttnghia
Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
Fix tests/rolling/empty_input_test (#11238) @ttnghia
Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia
Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
Fix cumulative count index behavior (#11188) @brandon-b-miller
Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
Ensure cuco export set is installed in cmake build (#11147) @jlowe
Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
Fix compile error due to missing header (#11126) @ttnghia
Fix __cuda_array_interface__ failures (#11113) @galipremsagar
Support octal and hex within regex character class pattern (#11112) @davidwendt
Fix split_re matching logic for word boundaries (#11106) @davidwendt
Handle multiple files metadata in read_parquet (#11105) @galipremsagar
Fix index alignment for Series objects with repeated index (#11103) @shwina
FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
Fix regex word boundary logic to include underline (#11099) @davidwendt
Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
Maintain the input index in the result of a groupby-transform (#11068) @shwina
Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
Fix warn_unused_result error in parquet test (#11026) @karthikeyann
Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
Fix small error in page row count limiting (#10991) @etseidl
Fix a row index entry error in ORC writer issue (#10989) @vuule
Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

📖 Documentation

Fix issues with day & night modes in python docs (#11400) @galipremsagar
Update missing data handling APIs in docs (#11345) @galipremsagar
Add lists filtering APIs to doxygen group. (#11336) @bdice
Remove unused import in README sample (#11318) @vyasr
Note null behavior in where docs (#11276) @brandon-b-miller
Update docstring for spans in get_row_data_range (#11271) @vyasr
Update nvCOMP integration table (#11231) @vuule
Add dev docs for documentation writing (#11217) @vyasr
Documentation fix for concatenate (#11187) @dagardner-nv
Fix unresolved links in markdown (#11173) @karthikeyann
Fix cudf version in README.md install commands (#11164) @jvanstraten
Switch language from None to "en" in docs build (#11133) @galipremsagar
Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
Add docs to rolling var, std, count. (#11035) @bdice
Fix docs for Numba UDFs. (#11020) @bdice
Replace column comparison utilities functions with macros (#11007) @karthikeyann
Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
Fix Doxygen warnings in table header files (#10964) @karthikeyann
Fix Doxygen warnings in column header files (#10963) @karthikeyann
Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
Generate Doxygen Tag File for Libcudf (#10932) @isVoid
Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
Add missing documentation in aggregation.hpp (#10887) @karthikeyann
Revise PR template. (#10774) @bdice

🚀 New Features

Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
Adding byte array view structure (#11322) @hyperbolic2346
Adding byte_array statistics (#11303) @hyperbolic2346
Add column indexes to Parquet writer (#11302) @etseidl
Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
FST benchmark (#11243) @karthikeyann
Adds the Finite-State Transducer algorithm (#11242) @elstehle
Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
Add 24 bit dictionary support to Parquet writer (#11216) @devavret
Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
Add JNI bindings for extractAllRecord (#11196) @anthony-chang
Add cudf.options (#11193) @isVoid
Add thrift support for parquet column and offset indexes (#11178) @etseidl
Adding binary read/write as options for parquet (#11160) @hyperbolic2346
Support nth_element for window functions (#11158) @mythrocks
Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
Implement Groupby pct_change (#11144) @skirui-source
Add JNI for set operations (#11143) @ttnghia
Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
Feature/python benchmarking (#11125) @vyasr
Support nan_equality in cudf::distinct (#11118) @ttnghia
Added JNI for getMapValueForKeys (#11104) @razajafri
Refactor semi_anti_join (#11100) @ttnghia
Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
Adds the Logical Stack algorithm (#11078) @elstehle
Add doxygen-check pre-commit hook (#11076) @karthikeyann
Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
Add Doxygen CI check (#11057) @karthikeyann
Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
Support set operations (#11043) @ttnghia
Support for ZLIB compression in ORC writer (#11036) @vuule
Adding feature swaplevels (#11027) @VamsiTallam95
Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
Function for bfill, ffill #9591 (#11022) @Sreekiran096
Generate group offsets from element labels (#11017) @ttnghia
Feature axes (#10979) @VamsiTallam95
Generate group labels from offsets (#10945) @ttnghia
Add missing cuIO benchmark coverage for duration types (#10933) @vuule
Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
Reindex Improvements (#10815) @brandon-b-miller
Implement value_counts for DataFrame (#10813) @martinfalisse

🛠️ Improvements

Pin dask & distributed for release (#11433) @galipremsagar
Use documented header template for doxygen (#11430) @galipremsagar
Relax arrow version in dev env (#11418) @galipremsagar
Allow CuPy 11 (#11393) @jakirkham
Improve multibyte_split performance (#11347) @cwharris
Switch death test to use explicit trap. (#11326) @vyasr
Add --output-on-failure to ctest args. (#11321) @vyasr
Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
Add JNI support for the join_strings API (#11309) @revans2
Add cupy version to setup.py install_requires (#11306) @vyasr
removing some unused code (#11305) @hyperbolic2346
Add test of wildcard selection (#11300) @vyasr
Update parquet reader to take stream parameter (#11294) @PointKernel
Spark list hashing (#11292) @bdice
Remove legacy join APIs (#11274) @vyasr
Fix cudf recipes syntax (#11273) @ajschmidt8
Fix cudf recipe (#11267) @ajschmidt8
Cleanup config files (#11266) @vyasr
Run mypy on all packages (#11265) @vyasr
Update to isort 5.10.1. (#11262) @vyasr
Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
Remove redundant black config specifications. (#11258) @vyasr
Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
Move rolling impl details to detail/ directory. (#11250) @mythrocks
Remove lists::drop_list_duplicates (#11236) @ttnghia
Use cudf::lists::distinct in Python binding (#11234) @ttnghia
Use cudf::lists::distinct in Java binding (#11233) @ttnghia
Use cudf::distinct in Java binding (#11232) @ttnghia
Pin dask-cuda in dev environment (#11229) @galipremsagar
Remove cruft in map_lookup (#11221) @mythrocks
Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
Remove Frame._index (#11210) @vyasr
Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
Document why Development component is needing for CMake. (#11200) @vyasr
cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
Standardize join internals around DataFrame (#11184) @vyasr
Move character case table declarations from src to detail (#11183) @davidwendt
Remove usage of Frame in StringMethods (#11181) @vyasr
Expose get_json_object_options to Python (#11180) @SrikarVanavasam
Fix decimal128 stats in parquet writer (#11179) @etseidl
Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
Refactor and optimize Frame.where (#11168) @vyasr
Add npos const static member to cudf::string_view (#11166) @davidwendt
Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
Clean up _copy_type_metadata (#11156) @vyasr
Add nvcc conda package in dev environment (#11154) @galipremsagar
Struct binary comparison op functionality for spark rapids (#11153) @rwlee
Refactor inline conditionals. (#11151) @bdice
Refactor Spark hashing tests (#11145) @bdice
Add new _from_data_like_self factory (#11140) @vyasr
Update get_cucollections to use rapids-cmake (#11139) @vyasr
Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
Remove Index.replace API (#11131) @vyasr
Move char-type table function declarations from src to detail (#11127) @davidwendt
Clean up repo root (#11124) @bdice
Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
Take iterators by value in clamp.cu. (#11084) @bdice
Performance improvements for row to column conversions (#11075) @hyperbolic2346
Remove deprecated Index methods from Frame (#11073) @vyasr
Use per-page max compressed size estimate for compression (#11066) @devavret
column to row refactor for performance (#11063) @hyperbolic2346
Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
Unpin dask & distributed for development (#11058) @galipremsagar
Add support for Series.between (#11051) @galipremsagar
Fix groupby include (#11046) @bwyogatama
Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
Remove public API of cudf.merge_sorted. (#11032) @bdice
Drop python 3.7 in code-base (#11029) @galipremsagar
Addition & integration of the integer power operator (#11025) @AtlantaPepsi
Refactor lists::contains (#11019) @ttnghia
Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
Clean up parquet unit test (#11005) @PointKernel
Add missing #pragma once to header files (#11004) @karthikeyann
Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
Refactor cudf::contains (#10997) @ttnghia
Remove Arrow CUDA IPC code (#10995) @shwina
Change file extension for groupby benchmark (#10985) @ttnghia
Sort recipe include checks. (#10984) @bdice
Update cuCollections for thrust upgrade (#10983) @PointKernel
Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
Include <optional> for GCC 11 compatibility. (#10927) @bdice
Enable builds with scikit-build (#10919) @vyasr
Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
Improve the capture of fatal cuda error (#10884) @sperlingxx
Cleanup regex compiler operators and operands source (#10879) @davidwendt
Buffer: make .ptr read-only (#10872) @madsbk
Configurable NaN handling in device_row_comparators (#10870) @rwlee
Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
Upgrade to arrow-8 (#10816) @galipremsagar
Remove getattr method in RangeIndex class (#10538) @skirui-source
Adding bins to value counts (#8247) @marlenezw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v22.08.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors