Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make field more robust with None/nan repeated input #757

Merged
merged 2 commits into from
Oct 23, 2024
Merged

Conversation

ccl-core
Copy link
Contributor

@ccl-core ccl-core commented Oct 22, 2024

Copy link

github-actions bot commented Oct 22, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@ccl-core ccl-core requested a review from marcenacp October 22, 2024 13:47
@ccl-core ccl-core marked this pull request as ready for review October 22, 2024 13:52
@ccl-core ccl-core requested a review from a team as a code owner October 22, 2024 13:52
Copy link
Contributor

@marcenacp marcenacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -0,0 +1,3 @@
{"default/custom_instruction": "None", "default/topic": "None", "default/model_name": "None", "default/model": "None", "default/skip_prompt_formatting": false, "default/category": "orca", "default/views": null, "default/language": "None", "default/id": "None", "default/title": "None", "default/idx": "None", "default/hash": "None", "default/avatarUrl": "None", "default/system_prompt": "None", "default/source": "airoboros2.2"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some of the values "None" and some of the values null?

@@ -57,7 +57,7 @@ def _apply_transform_fn(value: Any, transform: Transform, field: Field) -> Any:
return pd.Timestamp(value).strftime(transform.format)
else:
raise ValueError(f"`format` only applies to dates. Got {field.data_type}")
elif transform.separator is not None:
elif transform.separator is not None and not _is_na(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the is_na case be handled at the top of the function?

It seems like it could also be a problem for other transform (regex, json_path, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

[{field.id: v} for v in value] if not _is_na(value) else [{field.id: value}]
)
else:
if not _is_na(value) and len(value) != len(result[parent_id]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function be simplified? Something like:

value = [{field.id: None}] if _is_na(value) else value
existing_value = result.get(parent_id, value)
if not _is_na(value) and len(value) != len(existing_value):
  raise ValueError(...)
for i in range(existing_value):
  result[parent_id][i][field.id] = None if _is_na(value) else value[i]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified it a bit, hope it goes in the direction you had in mind!

value = (
[_cast_value(self.node.ctx, v, field.data_type) for v in value]
if not _is_na(value)
else value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this function benefit an early return as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand (we return result, not value) but I changed it to a if..elif..else, it should be more readable now :)

Copy link
Contributor

@marcenacp marcenacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to unblock but left a few comments.

@ccl-core ccl-core merged commit 426c964 into main Oct 23, 2024
12 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants