-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved fast serializer code generation logic for Enum type #54
Improved fast serializer code generation logic for Enum type #54
Conversation
Cool result! And how does it compare with vanilla Avro now? Is it equal or better? Can we get a diff of the generated code? Actually, I'm wondering... should we make the build trigger the code-gen for a set of schemas we care about, and check it under source control (not in the main resource, but perhaps in tests or in some kind of |
Fast-avro 1.4 serialization speed is ~10% faster than vanilla 1.4 now for Enum array. Below is the benchmark result of vanilla avro:
Here are fast-serializers of EnumArray schema:
Like the idea of source control of code-gen to review related changes more carefully and properly. |
Thanks for the gist... the change looks good. It looks quite weird to have a map lookup/set in this code. I wonder why it was done like this in the first place, and if that's a hint that there may be edge cases where this is important (I can't think of any, though). Also, you'll notice that there is yet another map lookup hidden in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I'd like to suggest some readability / code-style improvement while we're in this section of the code, if you don't mind...
thenBody.invoke(enumSchemaMapField, "put").arg(JExpr.lit(enumSchemaFingerprint)).arg(enumSchemaVar); | ||
|
||
codeModel.ref(Schema.class).staticInvoke("parse").arg(enumSchema.toString())); | ||
enumSchemaVarMap.put(enumSchemaFingerprint, enumSchemaVar); | ||
valueToWrite = JExpr.invoke(enumSchemaVar, "getEnumOrdinal").arg(enumValueCasted.invoke("toString")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this code to be a bit spaghetti... could we instead follow a pattern of
JVar enumSchemaVar = enumSchemaVarMap.computeIfAbsent(enumSchemaFingerprint, () ->
serializerClass.field(
JMod.PRIVATE | JMod.FINAL,
Schema.class,
getVariableName(enumSchema.getName() + "EnumSchema"),
codeModel.ref(Schema.class).staticInvoke("parse").arg(enumSchema.toString()))
);
valueToWrite = JExpr.invoke(enumSchemaVar, "getEnumOrdinal").arg(enumValueCasted.invoke("toString"));
... or something like that? Otherwise, the JExpr.invoke
part duplicated twice, and since it is meta-code, that makes things extra confusing :D ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Love the red tint :D !
27c2c74
to
373916b
Compare
Support Avro 1.4 Enum type serialization here was initially introduced by @gaojieliu . Do you have any concern? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Improved fast serialization speed by avoiding the cost of storing and retrieving Enum schemas in a Hashmap in Avro 1.4
373916b
to
14b17a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Bingfeng! Looks great!
@@ -14,7 +11,6 @@ | |||
implements FastSerializer<IndexedRecord> | |||
{ | |||
|
|||
private Map<Long, Schema> enumSchemaMap = new ConcurrentHashMap<Long, Schema>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh... deleting an unused variable... that sounds like the right call!
Improved fast serialization speed by avoiding the cost of storing and retrieving Enum schemas
in a Hashmap in Avro 1.4. We have seen ~44% improvement of serialization latency for Enum array.
This PR helps to resolve issue #50
JMH benchmark results of fast serialization time of an 200 elements EnumArray under Avro 1.4
Before
After