forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from apache:main #179
Open
pull
wants to merge
1,032
commits into
sysfce2:main
Choose a base branch
from
apache:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…kaging workflows to build against MATLAB `R2024b` (#44704) ### Rationale for this change MATLAB [R2024b](https://www.mathworks.com/products/new_products/latest_features.html) is now available for use with the [matlab-actions/setup-matlab](https://github.com/matlab-actions/setup-matlab) GitHub Action. We should update the [matlab.yml CI workflow](https://github.com/apache/arrow/blob/aab7d81aeec1e8f106bdda953cdeb00f7f78b355/.github/workflows/matlab.yml#L126), as well as the [crossbow packaging workflows for the MATLAB MLTBX files](https://github.com/apache/arrow/blob/aab7d81aeec1e8f106bdda953cdeb00f7f78b355/dev/tasks/matlab/github.yml#L34) to build against R2024b. ### What changes are included in this PR? 1. Updated the `.github/workflows/matlab.yml` CI workflow file to build the MATLAB Interface against MATLAB R2024b. 2. Updated the `dev/tasks/matlab/github.yml` `crossbow` packaging workflow to build the MATLAB MLTBX files against MATLAB R2024b. 3. Bumped `mathworks/libmexclass` version to commit [cac7c3630a086bd5ba41413af44c833cef189c09](mathworks/libmexclass@cac7c36) to work around mathworks/libmexclass#92 ### Are these changes tested? Yes. 1. CI workflow [successfully passed on all platforms in `mathworks/arrow`](https://github.com/mathworks/arrow/actions/runs/11805483560). 2. Crossbow job: https://github.com/ursacomputing/crossbow/actions/runs/11805816426. ### Are there any user-facing changes? Yes. 1. All changes to the MATLAB interface will now be built against R2024b. 2. The MATLAB MLTBX release artifacts will now be built against R2024b. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! * GitHub Issue: #44703 Authored-by: Kevin Gurney <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from 7.0.3 to 7.0.5. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's changelog</a>.</em></p> <blockquote> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>fix escaping bug introduced by backtracking (<a href="https://github.com/moxystudio/node-cross-spawn/commit/640d391fde65388548601d95abedccc12943374f">640d391</a>)</li> </ul> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>) (<a href="https://github.com/moxystudio/node-cross-spawn/commit/5ff3a07d9add449021d806e45c4168203aa833ff">5ff3a07</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/085268352dcbcad8064c64c5efb25268b4023184"><code>0852683</code></a> chore(release): 7.0.5</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/640d391fde65388548601d95abedccc12943374f"><code>640d391</code></a> fix: fix escaping bug introduced by backtracking</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/bff0c87c8b627c4e6d04ec2449e733048bebb464"><code>bff0c87</code></a> chore: remove codecov</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/a7c6abc6fee79641d45b452fe6217deaa1bd0973"><code>a7c6abc</code></a> chore: replace travis with github workflows</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/9b9246e0969e86656d7ccd527716bc3c18842a19"><code>9b9246e</code></a> chore(release): 7.0.4</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/5ff3a07d9add449021d806e45c4168203aa833ff"><code>5ff3a07</code></a> fix: disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/9521e2da94d94998f948e0455903e62d87884600"><code>9521e2d</code></a> chore: fix tests in recent node js versions</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/97ded399e9c9ae325040fc52274c1cd4def357f8"><code>97ded39</code></a> chore: convert package lock</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/d52b6b9da499ca464e609162a6afeb326f1dbbb1"><code>d52b6b9</code></a> chore: remove unused argument (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/156">#156</a>)</li> <li><a href="https://github.com/moxystudio/node-cross-spawn/commit/5d843849e1ed434b7030e0aa49281c2bf4ad2e71"><code>5d84384</code></a> chore: add travis jobs on ppc64le (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/142">#142</a>)</li> <li>Additional commits viewable in <a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=7.0.3&new-version=7.0.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@ dependabot rebase` will rebase this PR - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@ dependabot merge` will merge this PR after your CI passes on it - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging - `@ dependabot reopen` will reopen this PR if it is closed - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/arrow/network/alerts). </details> Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change If we have a nightly CI job to test offline build, we will be able to notice that offline build support is broken. ### What changes are included in this PR? * Add the `ARROW_OFFLINE` option that uses `cpp/thirdparty/download_dependencies.sh` and disable DNS resolution * Add `ubuntu-cpp-bundled-offline` service that enable the `ARROW_OFFLINE` option * Add `test-ubuntu-24.04-cpp-bundled-offline` nightly job ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #27919 Lead-authored-by: Raúl Cumplido <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change This PR aims to upgrade `orc-core` to 1.9.5. ### What changes are included in this PR? To use the latest bug-fixed version in testing. - https://orc.apache.org/news/2024/11/14/ORC-1.9.5/ ### Are these changes tested? Pass the CIs. ### Are there any user-facing changes? No, this is a test dependency. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: David Li <[email protected]>
…uilds (#44755) ### Rationale for this change `<Windows.h>` doesn't work cross-compiling to Windows with case-sensitive file system. ### What changes are included in this PR? Use lowercase. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #44754 Authored-by: Maarten Pronk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change I noticed a few typos in the documentation and a single typo in the pyproject file. This change corrects these typos. ### What changes are included in this PR? The following spelling corrections have been applied: fo -> to scenerios -> scenarios wich -> which suiteable -> suitable avaiable -> available (this is in the pyproject.toml file regarding a NumPy comment for older Python versions) ### Are these changes tested? No executable code is modified with these changes. ### Are there any user-facing changes? Only the documentation is being updated. Authored-by: Tyler White <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change This PR aims to upgrade ORC to 2.0.3 to bring the following bug fixes. - https://orc.apache.org/news/2024/08/15/ORC-2.0.2/ - apache/orc#1963 - apache/orc#1997 - apache/orc#1981 - https://orc.apache.org/news/2024/11/14/ORC-2.0.3/ - apache/orc#2055 ### What changes are included in this PR? To use the latest bug fixed version. ### Are these changes tested? This should pass the CIs. ### Are there any user-facing changes? No. * GitHub Issue: #44744 Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change There is a minor typo in the test case. ### What changes are included in this PR? Fix the simple typo in the test. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. Authored-by: c8ef <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
#44750) ### Rationale for this change ```console $ pre-commit run --show-diff-on-failure --color=always --all-files shellcheck ShellCheck v0.10.0.......................................................Failed - hook id: shellcheck - exit code: 1 In ci/scripts/c_glib_test.sh line 25: : ${ARROW_GLIB_VAPI:=true} ^----------------------^ SC2223 (info): This default assignment may cause DoS due to globbing. Quote it. In ci/scripts/c_glib_test.sh line 37: pushd ${source_dir} ^-----------^ SC2086 (info): Double quote to prevent globbing and word splitting. Did you mean: pushd "${source_dir}" In ci/scripts/c_glib_test.sh line 54: pushd ${build_dir} ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting. Did you mean: pushd "${build_dir}" For more information: https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ... https://www.shellcheck.net/wiki/SC2223 -- This default assignment may cause... ``` ### What changes are included in this PR? * Add missing quotes. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #44749 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…on (#44701) ### Rationale for this change Currently `from_buffers` is not working with StringView on Python because we validate against num_buffers. This only take into account the mandatory buffers but does not take into account the variadic_spec that can be present for both string_view and binary_view ### What changes are included in this PR? Take into account whether the type contains a variadic_spec for the non-mandatory buffers and only check lower_bound number of buffers. ### Are these changes tested? Yes, I've added a couple of tests. ### Are there any user-facing changes? We are exposing a new method on the Python DataType. `has_variadic_buffers` which tells us whether the number of buffers expected is only lower-bounded by num_buffers. * GitHub Issue: #44651 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…es (#44768) ### Rationale for this change See issue. ### What changes are included in this PR? For `ToLittleEndian`/`ToBigEndian`, the result should always be in the specified endianness, not depend on host order. In the test, instead of casting the `uint8_t` data into a `uint16_t` (with unspecified endianness handling), compare the bytes directly in their expected orders. ### Are these changes tested? Tested on little-endian, still building for big-endian. ### Are there any user-facing changes? Fixes #44767 * GitHub Issue: #44767 Authored-by: Elliott Sales de Andrade <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…g a map type via the C data interface (#44715) ### Rationale for this change Import of a map type from the C data interface drops field metadata (including extension type information) which does not happen when importing a map type from IPC or a list of structs. This affects the ability to roundtrip data through pyarrow/Arrow C++ if extension types are not registered. ### What changes are included in this PR? The mechanism to import the map type was changed to align with the method used for IPC import. ### Are these changes tested? Yes. ### Are there any user-facing changes? The current behaviour was surprising/inconsistent, so I think this PR brings it in more line with the current expectation/documentation. * GitHub Issue: #44714 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…sabled (#44737) ### Rationale for this change The Async C Device Stream Interface unit tests requiring threading to be enabled, but a couple of our CI runs go with ARROW_ENABLE_THREADING disabled. ### What changes are included in this PR? The Async C Device Stream interface tests are guarded with `#ifdef ARROW_ENABLE_THREADING` to prevent CI timeouts. * GitHub Issue: #44734 Authored-by: Matt Topol <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…5285) ### Rationale for this change The level histogram of size statistics can be omitted if its max level is 0. We haven't implemented this yet and enforces histogram size to be equal to `max_level + 1`. However, when reading a Parquet file with omitted level histogram, exception will be thrown. ### What changes are included in this PR? Omit level histogram when max level is 0. ### Are these changes tested? Yes, a test case has been added to reflect the change. ### Are there any user-facing changes? No. * GitHub Issue: #45283 Lead-authored-by: Gang Wu <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Gang Wu <[email protected]>
…45249) ### Rationale for this change Benchmark data shows that enabling page index and size stats by default does not have significant penalty. ### What changes are included in this PR? Enable the parquet writer to generate page index and size stats by default. ### Are these changes tested? Pass CIs. ### Are there any user-facing changes? No. * GitHub Issue: #45227 Authored-by: Gang Wu <[email protected]> Signed-off-by: Gang Wu <[email protected]>
### Rationale for this change GitHub hosted arm runner is available: https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/ ### What changes are included in this PR? Use `ubuntu-24.04-arm` instead of self-hosted runner. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45307 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change See #44363. This improves compatibility with other Flight implementations and means user code works with empty data without needing to treat it as a special case to work around this limitation. ### What changes are included in this PR? * Adds new async overloads of `FlightClient.StartPut` that immediately send the schema, before any data batches are sent. * Updates the test server to send the schema on `DoGet` even when there are no data batches. * Enables the `primitive_no_batches` test case for C# Flight. ### Are these changes tested? Yes, using a new unit test and with the integration tests. ### Are there any user-facing changes? Yes. New overloads of the `FlightClient.StartPut` method have been added that are async and accept a `Schema` parameter, and ensure the schema is sent when no data batches are sent. * GitHub Issue: #44363 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
### What changes are included in this PR? Use latest Minio server release, which includes a fix for minio/minio#20845 This allows us to remove the boto3 version constraint. ### Are these changes tested? Yes, by existing CI tests. ### Are there any user-facing changes? Yes. * GitHub Issue: #45305 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…pyarrow (#45189) ### Rationale for this change We are using two C++ deprecated APIs: - we are using decimal instead of smallest_decimal - we are using Arrow:Status on GetRecordBatchReader instead of Arrow::Result ### What changes are included in this PR? Update code to use non deprecated functions. ### Are these changes tested? Yes via CI with existing tests. ### Are there any user-facing changes? No * GitHub Issue: #45129 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…safe integer check for fractional part (#45251) <!-- Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request) * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html) If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project. Then could you also rename the pull request title in the following format? GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} or MINOR: [${COMPONENT}] ${SUMMARY} --> ### Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? This PR uses string constructor for bigint to calculate the power of 10 with scale as the exponent value of the expression. To Fix precision loss like ```typecript > BigInt(10) ** BigInt(25); 10000000000000000000000000n > BigInt(Math.pow(10, 25)) 10000000000000000905969664n ``` Also, we remove the unnecessary safe integer check for the fraction part. ### Are these changes tested? add some unit tests ### Are there any user-facing changes? no <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking. --> <!-- **This PR includes breaking changes to public APIs.** --> <!-- Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious. --> <!-- **This PR contains a "Critical Fix".** --> * GitHub Issue: #45250 --------- Co-authored-by: Paul Taylor <[email protected]>
### Rationale for this change [Array::Validate](https://arrow.apache.org/docs/cpp/api/array.html#_CPPv4NK5arrow5Array8ValidateEv) available in the C++ API. But, GLib doesn't support that method yet. ### What changes are included in this PR? This PR adds a validation method in the array class. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #44757 Authored-by: Hiroyuki Sato <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…d MinIO (#45310) ### Rationale for this change Some AWS SDK versions have faulty chunked encoding when the body is 0 bytes: aws/aws-sdk-cpp#3259 ### What changes are included in this PR? Work around faulty chunked encoding implementation by only setting a body stream if non-empty. ### Are these changes tested? Locally for now, but will be picked by CI (and conda-forge) at some point. ### Are there any user-facing changes? No. * GitHub Issue: #45304 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
### Rationale for this change We don't need to use Artifactory for NuGet packages because it's not published to Artifactory. ### What changes are included in this PR? * Use GitHub Releases for easy to maintain * Use `.github/workflows/csharp.yml` instead of nightly job for easy to maintain ### Are these changes tested? Almost yes. ### Are there any user-facing changes? No. * GitHub Issue: #45316
### Rationale for this change Add a "rank_quantile" function following the Wikipedia definition: https://en.wikipedia.org/wiki/Percentile_rank ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes, an additional compute function. * GitHub Issue: #45190 Lead-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Rossi Sun <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…t for encrypted file (# 44043) ### Rationale for this change Limit num-of row-groups when build parquet ### What changes are included in this PR? Limit num-of row-groups when build parquet ### Are these changes tested? No ### Are there any user-facing changes? No * GitHub Issue: #44042 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: mwish <[email protected]>
### Rationale for this change [Array::ValidateFull](https://arrow.apache.org/docs/cpp/api/array.html#_CPPv4NK5arrow5Array12ValidateFullEv) available in the C++ API. But, GLib doesn't support that method yet. ### What changes are included in this PR? This PR adds a validation method in the array class. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #44758 Authored-by: Hiroyuki Sato <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…nalPackage (#45349) ### Rationale for this change We need to convert it on Windows. ### What changes are included in this PR? Convert it to CMake path. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #39023 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change [RecordBatch::Validate](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch8ValidateEv) available in the C++ API. But, GLib doesn't support that method yet. ### What changes are included in this PR? This PR adds a validation method in the record-batch class. Before this change, the `Validate()` method was used in the `garrow_record_batch_new` implicitly. This PR removes it and adds it as a separate method. Users need to call `garrow_record_batch_validate()` explicitly by themselves. This is a backward incompatible change. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. **This PR includes breaking changes to public APIs.** * GitHub Issue: #44759 Authored-by: Hiroyuki Sato <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…#45355) ### Rationale for this change #45353 implemented `garrow_record_batch_validate()` method. But it was the incorrect location. We need to move the code location to use it as a C API. ### What changes are included in this PR? Move the `garrow_record_batch_validate()` location between `G_BEGIN_DECLS` and `G_END_DECLS`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #45354 Authored-by: Hiroyuki Sato <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…om.xml to detect version (#45348) ### Rationale for this change java/pom.xml was moved to apache/arrow-java from apache/arrow. ### What changes are included in this PR? Detect version from cpp/CMakeLists.txt. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45347 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change See #45120 ### What changes are included in this PR? Disable pointless test ### Are these changes tested? N/A ### Are there any user-facing changes? No * GitHub Issue: #45357 Lead-authored-by: David Li <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…5359) ### Rationale for this change Add a MemoryPool method to print allocator-specific statistics to stderr, to help diagnose perceived memory consumption issues. Also add missing Python bindings for `MemoryPool::total_bytes_allocated` and `MemoryPool::num_allocations`. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45358 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…alculation for fixed length and null masks (#45336) ### Rationale for this change #45334 ### What changes are included in this PR? 1. An all-mighty test case that can effectively reveal all the bugs mentioned in the issue; 2. Other than directly fixing the bugs (actually simply casting to 64-bit somewhere in the multiplication will do), I did some refinement to the buffer accessors of the row table, in order to eliminate more potential similar issues (which I believe do exist): 1. `null_masks()` -> `null_masks(row_id)` which does overflow-safe indexing inside; 2. `is_null(row_id, col_pos)` which does overflow-safe indexing and directly gets the bit of the column; 3. `data(1)` -> `fixed_length_rows(row_id)` which first asserts the row table being fixed-length, then does overflow-safe indexing inside; 4. `data(2)` -> `var_length_rows()` which only asserts the row table being var-length. It is supposed to be paired by the `offsets()` (which is already 64-bit by #43389 ); 5. The `data(0/1/2)` members are made private. 3. The AVX2 specializations are fixed individually by using 64-bit multiplication and indexing. ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: #45334 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
…updated flags used with delvewheel repair (#45323) ### Rationale for this change It is already explained in the issue. ### What changes are included in this PR? This PR installs delvewheel by using its latest version instead of using a github branch. The new flag `--with-mangle` introduced in the latest version of delvewheel is used with `delvewheel repair` command. Removed comments that referred to the use of the github branch for delvewheel installation. ### Are these changes tested? No, these changes are not tested because to run Windows containers I will need [Windows 11 Pro or Enterprise](https://docs.docker.com/desktop/setup/install/windows-install/#:~:text=To%20run%20Windows%20containers%2C%20you%20need%20Windows%2010%20or%20Windows%2011%20Professional%20or%20Enterprise%20edition.%20Windows%20Home%20or%20Education%20editions%20only%20allow%20you%20to%20run%20Linux%20containers.). I do not have a machine that satisfies this requirement. * GitHub Issue: #45278 Authored-by: anubhav <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change Our CI config for vcpkg declares superfluous Boost dependencies. ### Are these changes tested? Yes, by existing CI tests. ### Are there any user-facing changes? No. * GitHub Issue: #45361 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change CRAN uses 11.6 as the minimal macOS version now: https://cran.r-project.org/web/checks/check_flavors.html ### What changes are included in this PR? Update to 11.6 from 10.13. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45356 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…nd multiple row groups (#45350) ### Rationale for this change Loading `arrow::ArrayStatistics` logic depends on `parquet::ColumnChunkMetaData`. We can't get `parquet::ColumnChunkMetaData` when requested row groups are empty because no associated row group and column chunk exist. We can't use multiple `parquet::ColumnChunkMetaData`s for now because we don't have statistics merge logic. So we can't load statistics when we use multiple row groups. ### What changes are included in this PR? * Don't load statistics when no row groups are used * Don't load statistics when multiple row groups are used * Add `parquet::ArrowReaderProperties::{set_,}should_load_statistics()` to enforce loading statistics by loading row group one by one ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #45339 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change Post #44945 the Java implementation lives in it's own repo. Update docs to point there. ### What changes are included in this PR? Updates to a few locations that reference old Java impl location. ### Are these changes tested? Rendered the Sphinx ones locally to check. ### Are there any user-facing changes? No Lead-authored-by: parthchonkar <[email protected]> Co-authored-by: Parth Chonkar <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…pandas>=2.3 (#45383) ### Rationale for this change The option already exists in pandas 2.2, but for that version our code does not work, so restricting it to pandas >= 2.3 * GitHub Issue: #45296 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change #45390 implemented `garrow_array_validate_full()`. But it used `validate_full` not `validate-full` for error tag. We should use hyphen-separated words for error tag for consistency. ### What changes are included in this PR? `validate_ful` -> `validate-full` ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #45390 Authored-by: Hiroyuki Sato <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change [RecordBatch::ValidateFull](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch12ValidateFullEv) available in the C++ API. But, GLib doesn't support that method yet. ### What changes are included in this PR? This PR adds a validation method in the record-batch class. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #44760 Lead-authored-by: Hiroyuki Sato <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…#45395) ### Rationale for this change Ubuntu 20.04 will reach EOL by 2025-04, so we must upgrade the MATLAB workflow's GitHub runner from Ubuntu 20.04 to Ubuntu 22.04 or Ubuntu 24.04. ### What changes are included in this PR? 1. Updated the Ubuntu MATLAB GitHub workflow to use Ubuntu 22.04 as the GitHub runner. 2. Updated the Ubuntu MATLAB crossbow task to use Ubuntu 22.04 as the GitHub runner. ### Are these changes tested? 1. All GitHub checks passed. 2. Manually triggered the MATLAB crossbow task and installed the MATLAB-Arrow Interface Toolbox on Debian-12. ### Are there any user-facing changes? N/A * GitHub Issue: #45388 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Sarah Gilmore <[email protected]>
### Rationale for this change Conan 1 is deprecated. We should use Conan 2. ### What changes are included in this PR? Use "conania/gcc11-ubuntu16.04:2.12.1" because it's the latest version. Based on https://github.com/conan-io/conan-docker-tools/blob/master/images/README.md#official-docker-images, "gcc11-ubuntu16.04" is only supported image. `ci/conan/` is synchronized with the latest https://github.com/conan-io/conan-center-index . ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45381 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
awaiting review
Component: C++ - Gandiva
Component: C++
Component: C#
Component: Documentation
Component: FlightRPC
Component: Gandiva
Component: GLib
Component: Go
Component: Java
Component: JavaScript
Component: MATLAB
Component: Parquet
Component: Python
Component: R
Component: Ruby
Component: Swift
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )