-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pagination to lists of files #572
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Overall this LG, but i'll let Jim have the final approval here.
@@ -93,8 +92,9 @@ def __init__(self, fullpath, format): | |||
def __enter__(self): | |||
self._temp_dir = tempfile.mkdtemp() | |||
_, dir_name = os.path.split(self._fullpath) | |||
self.path = shutil.make_archive(os.path.join(self._temp_dir, dir_name), | |||
self._format, self._fullpath) | |||
self.path = shutil.make_archive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[no action] just curious, what environment did you develop in? was this kaggle-web-dev
container?
going forward, we should make sure we're all using the same workflow. Mod had used his mac (developed locally) so that's why the formatting diffs are different here.
if you've used kaggle-web-dev
, that's totally fine, and we'll just make sure to keep using that (at least until we have a public docker container for dev for this project).
the whitespace difference is from differing yapf
versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm running it in the container. I pretty much have to if I want to work on the C# and Python together. I tried opening the project in IntelliJ (because the Python plugin for Rider is currently not operational), but quickly realized I'd have to go through and install all the dependencies again. I didn't want to do that, since I don't know how many are pre-installed in the container. I mention that because I thought that's where the reformatting occurred (but yapf
is more likely). I did restore almost all the format changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that running from the container has the best chance of making sure anyone (inside Kaggle) can make a release. Can this be documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Development
section, I'm adding this text. Let me know if there's any problem with it:
Kaggle Internal
Obviously, this depends on Kaggle services. When you're extending the API and modifying
or adding to those services, you should be working in your Kaggle mid-tier development
environment. You'll run Kaggle locally, in the container, and test the Python code by
running it in the container so it can connect to your local testing environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the Development
section of README.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you are not aware, the releases for kagglehub
and kaggle-api
are done using Google Cloud Build. Anyone can make a release in a consistent environment. You just need to trigger Cloud Build with a CLI command.
Docmentation for kagglehub
: https://g3doc.corp.google.com/cloud/kaggle/models/g3doc/kagglehub.md#step-2-create-a-new-release
For kaggle-api
, it is documented here: https://g3doc.corp.google.com/company/teams/kaggle/cli/index.md#release
@@ -2664,21 +2759,29 @@ def model_initialize(self, folder): | |||
raise ValueError('Invalid folder: ' + folder) | |||
|
|||
meta_data = { | |||
'ownerSlug': 'INSERT_OWNER_SLUG_HERE', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[no action on this PR] we may want to figure out how to use the latest version of yapf
for formatting, sooner rather than later. the one in kaggle-web-dev
is stuck on an older version because of that specific linux OS version.
IMHO, the other syntax here "before" (using later yapf
) is much nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Can you please revert the license removals? Also, I'm a bit unsure about the models API w.r.t. framework/instance/versions being required or not (specific questions below). Thanks for tackling this!
src/KaggleSwagger.yaml
Outdated
name: pageSize | ||
type: integer | ||
default: 1 | ||
description: Page size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit - Some say "Page size" and others "Number of items per page (default 20)". Can this be consistent for all entities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "Number of items per page (default 20)".
tests/test_commands.sh
Outdated
kaggle kernels files hermengardo/ps4e4-ensemble-eda --page-size=5 # valid page token required | ||
kaggle datasets files nelgiriyewithana/apple-quality --page-size=7 --page-token=abcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - consider using Kaggle-owned data for tests. For datasets, kaggle/meta-kaggle
seems like a good option, and possibly use one of our learn notebooks or lastplacelarry
for kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems kaggle/meta-kaggle
is not available when running on localhost, and lastplacelarry
doesn't have any kernels on localhost. But I found some alternates.
docs/KaggleApi.md
Outdated
[**models_create_instance**](KaggleApi.md#models_create_instance) | **POST** /models/{ownerSlug}/{modelSlug}/create/instance | Create a new model instance | ||
[**models_create_instance_version**](KaggleApi.md#models_create_instance_version) | **POST** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/create/version | Create a new model instance version | ||
[**models_create_new**](KaggleApi.md#models_create_new) | **POST** /models/create/new | Create a new model | ||
[**models_list**](KaggleApi.md#models_list) | **GET** /models/list | Lists models | ||
[**models_list_files**](KaggleApi.md#models_list_files) | **GET** /models/list/{ownerSlug}/{modelSlug} | List model files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this list if there are multiple instances / variations? (Not important to respond to this comment, but can it be documented somewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect and I'm not sure why it works. The CLI does not permit framework and instance to be omitted. I'm changing it to allow the version number to be optional, though. (MT PR is linked below.)
kaggle/__init__.py
Outdated
@@ -1,19 +1,3 @@ | |||
#!/usr/bin/python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally removed? I would think we'd want to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was removed as part of file generation, not intentionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation of /tmp/autogen.sh
failed when running in the container. I added instructions to the readme to work around that problem.
docs/KaggleApi.md
Outdated
[**metadata_get**](KaggleApi.md#metadata_get) | **GET** /datasets/metadata/{ownerSlug}/{datasetSlug} | Get the metadata for a dataset | ||
[**metadata_post**](KaggleApi.md#metadata_post) | **POST** /datasets/metadata/{ownerSlug}/{datasetSlug} | Update the metadata for a dataset | ||
[**model_instance_versions_download**](KaggleApi.md#model_instance_versions_download) | **GET** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/{versionNumber}/download | Download model instance version files | ||
[**model_instance_versions_files**](KaggleApi.md#model_instance_versions_files) | **GET** /models/{ownerSlug}/{modelSlug}/{framework}/{instanceSlug}/{versionNumber}/files | List model instance version files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the versionNumber
required? Would it be possible for it to be optional (like for datasets, kernels, etc) and to assume the latest if it's not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/Kaggle/kaggleazure/pull/29404 adds support for using the latest version if none is specified.
@@ -93,8 +92,9 @@ def __init__(self, fullpath, format): | |||
def __enter__(self): | |||
self._temp_dir = tempfile.mkdtemp() | |||
_, dir_name = os.path.split(self._fullpath) | |||
self.path = shutil.make_archive(os.path.join(self._temp_dir, dir_name), | |||
self._format, self._fullpath) | |||
self.path = shutil.make_archive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that running from the container has the best chance of making sure anyone (inside Kaggle) can make a release. Can this be documented somewhere?
This does not include the generated code, which should make reviewing easier. Once this is merged and the final handler PR is in, I'll do a release and commit the new generated code.