Skip to content

Commit

Permalink
ENH: Support more return columns (#17)
Browse files Browse the repository at this point in the history
Each column takes up some space in the return and this space is
currently limited but settable.

This bumps the limit for all tasks so that we should be able to use well
above 2500 columns everywhere.
However, using that many columns may require passing `--zcmem=100` or similar.

(Clearly, it is very plausible that we might bump this further if we run
into a use-case that needs that.)

---

Tested just with csv, since that is where it came up (but if used, it
tends to be the first call). Either way, it should affect everything, so
set it globally.

---------

Signed-off-by: Sebastian Berg <[email protected]>
  • Loading branch information
seberg authored Jan 29, 2025
1 parent a870b4f commit ac35896
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 2 deletions.
3 changes: 2 additions & 1 deletion cpp/src/core/library.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,8 @@ legate::Library create_and_registrate_library()
GlobalMemoryResource::set_as_default_mmr_resource();
}
// Set with_has_allocations globally since currently all tasks allocate (and libcudf may also)
auto options = legate::VariantOptions{}.with_has_allocations(true);
// Also ensure we can generally work with 2000+ return columns.
auto options = legate::VariantOptions{}.with_has_allocations(true).with_return_size(131072);
auto context =
legate::Runtime::get_runtime()->find_or_create_library(library_name,
legate::ResourceConfig{},
Expand Down
15 changes: 14 additions & 1 deletion python/tests/test_csv.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2024, NVIDIA CORPORATION
# Copyright (c) 2024-2025, NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -58,6 +58,19 @@ def test_read_single_rows(tmp_path):
assert_frame_equal(tbl, df)


def test_read_single_many_columns(tmp_path):
# Legate has a limit on number of returns which limnits the
# number of columns (currently). Make sure we support 1250.
# 2500+ are OK, but requires higher `--czmem`.
file = tmp_path / "file.csv"
# Write a file with many columns (and a few rows)
ncols = 1250
for i in range(5):
file.write_text(",".join([str(i) for i in range(ncols)]) + "\n")

csv_read(file, dtypes=["str"] * ncols)


def test_read_many_files_per_rank(tmp_path):
# Use uneven number to test splitting
filenames = str(tmp_path) + "/*.csv"
Expand Down

0 comments on commit ac35896

Please sign in to comment.