Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pagination to lists of files #572

Merged
merged 8 commits into from
May 2, 2024
Merged

Conversation

stevemessick
Copy link
Contributor

@stevemessick stevemessick commented Apr 23, 2024

This does not include the generated code, which should make reviewing easier. Once this is merged and the final handler PR is in, I'll do a release and commit the new generated code.

@stevemessick stevemessick marked this pull request as draft April 23, 2024 23:07
@stevemessick stevemessick marked this pull request as ready for review April 26, 2024 21:40
@stevemessick stevemessick requested a review from jmasukawa April 26, 2024 21:41
@jmasukawa jmasukawa requested a review from jplotts April 27, 2024 02:04
Copy link
Contributor

@jmasukawa jmasukawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Overall this LG, but i'll let Jim have the final approval here.

src/KaggleSwagger.yaml Outdated Show resolved Hide resolved
src/KaggleSwagger.yaml Outdated Show resolved Hide resolved
@@ -93,8 +92,9 @@ def __init__(self, fullpath, format):
def __enter__(self):
self._temp_dir = tempfile.mkdtemp()
_, dir_name = os.path.split(self._fullpath)
self.path = shutil.make_archive(os.path.join(self._temp_dir, dir_name),
self._format, self._fullpath)
self.path = shutil.make_archive(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action] just curious, what environment did you develop in? was this kaggle-web-dev container?

going forward, we should make sure we're all using the same workflow. Mod had used his mac (developed locally) so that's why the formatting diffs are different here.

if you've used kaggle-web-dev, that's totally fine, and we'll just make sure to keep using that (at least until we have a public docker container for dev for this project).

the whitespace difference is from differing yapf versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm running it in the container. I pretty much have to if I want to work on the C# and Python together. I tried opening the project in IntelliJ (because the Python plugin for Rider is currently not operational), but quickly realized I'd have to go through and install all the dependencies again. I didn't want to do that, since I don't know how many are pre-installed in the container. I mention that because I thought that's where the reformatting occurred (but yapf is more likely). I did restore almost all the format changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that running from the container has the best chance of making sure anyone (inside Kaggle) can make a release. Can this be documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Development section, I'm adding this text. Let me know if there's any problem with it:

Kaggle Internal

Obviously, this depends on Kaggle services. When you're extending the API and modifying
or adding to those services, you should be working in your Kaggle mid-tier development
environment. You'll run Kaggle locally, in the container, and test the Python code by
running it in the container so it can connect to your local testing environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the Development section of README.md.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you are not aware, the releases for kagglehub and kaggle-api are done using Google Cloud Build. Anyone can make a release in a consistent environment. You just need to trigger Cloud Build with a CLI command.

Docmentation for kagglehub: https://g3doc.corp.google.com/cloud/kaggle/models/g3doc/kagglehub.md#step-2-create-a-new-release

For kaggle-api, it is documented here: https://g3doc.corp.google.com/company/teams/kaggle/cli/index.md#release

src/kaggle/api/kaggle_api_extended.py Outdated Show resolved Hide resolved
src/kaggle/api/kaggle_api_extended.py Outdated Show resolved Hide resolved
@@ -2664,21 +2759,29 @@ def model_initialize(self, folder):
raise ValueError('Invalid folder: ' + folder)

meta_data = {
'ownerSlug': 'INSERT_OWNER_SLUG_HERE',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action on this PR] we may want to figure out how to use the latest version of yapf for formatting, sooner rather than later. the one in kaggle-web-dev is stuck on an older version because of that specific linux OS version.

IMHO, the other syntax here "before" (using later yapf) is much nicer.

src/kaggle/api/kaggle_api_extended.py Outdated Show resolved Hide resolved
src/kaggle/api/kaggle_api_extended.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jplotts jplotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Can you please revert the license removals? Also, I'm a bit unsure about the models API w.r.t. framework/instance/versions being required or not (specific questions below). Thanks for tackling this!

name: pageSize
type: integer
default: 1
description: Page size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit - Some say "Page size" and others "Number of items per page (default 20)". Can this be consistent for all entities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Number of items per page (default 20)".

Comment on lines 10 to 11
kaggle kernels files hermengardo/ps4e4-ensemble-eda --page-size=5 # valid page token required
kaggle datasets files nelgiriyewithana/apple-quality --page-size=7 --page-token=abcd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - consider using Kaggle-owned data for tests. For datasets, kaggle/meta-kaggle seems like a good option, and possibly use one of our learn notebooks or lastplacelarry for kernels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems kaggle/meta-kaggle is not available when running on localhost, and lastplacelarry doesn't have any kernels on localhost. But I found some alternates.

[**models_create_instance**](KaggleApi.md#models_create_instance) | **POST** /models/{ownerSlug}/{modelSlug}/create/instance | Create a new model instance
[**models_create_instance_version**](KaggleApi.md#models_create_instance_version) | **POST** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/create/version | Create a new model instance version
[**models_create_new**](KaggleApi.md#models_create_new) | **POST** /models/create/new | Create a new model
[**models_list**](KaggleApi.md#models_list) | **GET** /models/list | Lists models
[**models_list_files**](KaggleApi.md#models_list_files) | **GET** /models/list/{ownerSlug}/{modelSlug} | List model files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this list if there are multiple instances / variations? (Not important to respond to this comment, but can it be documented somewhere?)

Copy link
Contributor Author

@stevemessick stevemessick May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect and I'm not sure why it works. The CLI does not permit framework and instance to be omitted. I'm changing it to allow the version number to be optional, though. (MT PR is linked below.)

@@ -1,19 +1,3 @@
#!/usr/bin/python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally removed? I would think we'd want to keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was removed as part of file generation, not intentionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation of /tmp/autogen.sh failed when running in the container. I added instructions to the readme to work around that problem.

[**metadata_get**](KaggleApi.md#metadata_get) | **GET** /datasets/metadata/{ownerSlug}/{datasetSlug} | Get the metadata for a dataset
[**metadata_post**](KaggleApi.md#metadata_post) | **POST** /datasets/metadata/{ownerSlug}/{datasetSlug} | Update the metadata for a dataset
[**model_instance_versions_download**](KaggleApi.md#model_instance_versions_download) | **GET** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/{versionNumber}/download | Download model instance version files
[**model_instance_versions_files**](KaggleApi.md#model_instance_versions_files) | **GET** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/{versionNumber}/files | List model instance version files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the versionNumber required? Would it be possible for it to be optional (like for datasets, kernels, etc) and to assume the latest if it's not provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Kaggle/kaggleazure/pull/29404 adds support for using the latest version if none is specified.

@@ -93,8 +92,9 @@ def __init__(self, fullpath, format):
def __enter__(self):
self._temp_dir = tempfile.mkdtemp()
_, dir_name = os.path.split(self._fullpath)
self.path = shutil.make_archive(os.path.join(self._temp_dir, dir_name),
self._format, self._fullpath)
self.path = shutil.make_archive(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that running from the container has the best chance of making sure anyone (inside Kaggle) can make a release. Can this be documented somewhere?

@stevemessick stevemessick requested a review from jplotts May 1, 2024 20:20
@stevemessick stevemessick merged commit be3906e into main May 2, 2024
4 checks passed
@stevemessick stevemessick deleted the page-file-lists branch May 2, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants