Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump Error Keys #1139

Closed

Conversation

deckar01
Copy link
Member

Store errors by key when dumping collections to prevent item errors from being reported as their parent.

Fixes #1132


Example:

from marshmallow import Schema, ValidationError, fields


class Test(Schema):
    foo = fields.List(fields.Int)

Test().dump({'foo': ['five']})
# Before: {'foo': ['Not a valid integer.']}
# After: {'foo': {0: ['Not a valid integer.']}}

result[keys[key]] = error.valid_data
else:
if key in keys:
result[keys[key]] = deser_val
Copy link
Member Author

@deckar01 deckar01 Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy with copy and pasting this much code. It is effectively 6 8 4 copies of the same thing, with slight variations. I suspect there is a utility interface that can abstract this functionality while also improving the code's readability.

Copy link
Member Author

@deckar01 deckar01 Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first pass at deduplicating the logic. I didn't see a clean way to abstract the operations across field classes, but I avoided introducing any duplicate logic in this PR and deduplicated a few operations that were already present.

I need to test what the performance implications are for this abstraction. It should be minimal, because the primary difference is getattr being used in place of static attribute access once in each method. An alternative would be to use a lambda instead, but I don't have an intuition for which is preferable or faster.

A few methods were passing None into the attr and obj/data arguments instead of passing them through like the rest. Normalizing this did not affect the tests, but I suspect the original behavior was an obscure bug. I need to look at the implications of this change a little closer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the original behavior was an obscure bug.

Confirmed. See #1176.

@sloria sloria added this to the 3.0 milestone Feb 15, 2019
Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks correct; just a minor spelling change suggested.

I'm also not sure about how to deduplicate the logic across field classes without introducing a confusing amount of indirection. This could happen after @lafrech's suggested ContainerField becomes a reality, but no need to block merging of this PR nor #1066 for it.

Granted there are no performance regressions with this, I think this is on the right track.

marshmallow/fields.py Outdated Show resolved Hide resolved
@lafrech
Copy link
Member

lafrech commented Feb 15, 2019

I pushed a ContainerMixin proposal in #1066.

@lafrech
Copy link
Member

lafrech commented Jun 4, 2019

I finally dropped the ContainerMixin I proposed in #1066 and submitted a simpler PR (#1229) so don't count on it.

@deckar01
Copy link
Member Author

I will get this up to date now that #1229 has landed.

@sloria
Copy link
Member

sloria commented Jul 14, 2019

@deckar01 Are you still working on this? Let us know if you need help with it.

@deckar01 deckar01 force-pushed the 1132-nested-dump-errors branch from 76cf369 to 07f37e2 Compare July 15, 2019 14:41
@deckar01
Copy link
Member Author

Benchmark:

Setup: benchmark.py

# dev
load 18.1108 µs ± 0.65%
dump 10.8205 µs ± 1.14%
load no err 18.0016 µs ± 0.69%
dump no err 10.7058 µs ± 2.38%
# 1132-nested-dump-errors
load 18.7329 µs ± 0.85%
dump 10.8929 µs ± 0.42%
load no err 18.6097 µs ± 0.85%
dump no err 10.9445 µs ± 0.97%
# change
load +3.43%
dump +0.66%
load +3.37%
dump +2.23%

@sloria
Copy link
Member

sloria commented Jul 16, 2019

It's a minor performance regression, but I'm wondering if we should incur it at all...we've since decided that validation should only happen on deserialization and that the dump should be considered valid. Do we want to continue supporting validation-on-serialization?

@deckar01
Copy link
Member Author

The performance regression is a side effect of refactoring to avoid copy and pasting validation code. It could be omitted. I will continue the discussion about dump validation on the issue.

@deckar01
Copy link
Member Author

deckar01 commented Jul 17, 2019

Closing in favor of #1304.

@deckar01 deckar01 closed this Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove validation on serialization
3 participants