Skip to content
This repository has been archived by the owner on Apr 22, 2022. It is now read-only.

Mapping JSON values to enum fields is broken #205

Closed
aspruds opened this issue Apr 9, 2018 · 5 comments
Closed

Mapping JSON values to enum fields is broken #205

aspruds opened this issue Apr 9, 2018 · 5 comments
Assignees
Labels
Milestone

Comments

@aspruds
Copy link

aspruds commented Apr 9, 2018

It seems that enums are not working in the latest (1.8.2) version of Avro which is bundled with divolte-collector-0.7.0. When collecting data defined as enum, Divolte Collector throws the following exception: org.apache.avro.AvroTypeException: Not an enum: OW

Additional information is available at https://issues.apache.org/jira/browse/AVRO-1810. One possible solution is to downgrade Avro to 1.7.7.

For the reference, I'm using the following Avro field definition:
{ "name": "search", "type": [ "null", { "name": "SearchRecord", "type": "record", "fields": [ { "name": "screenName", "type": "string", "doc": "Screen name (e.g. fb_avail)" }, { "name": "language", "type": "string", "doc": "Language code in ISO 639-1 (lv, et)" }, { "name": "pointOfSale", "type": "string", "doc": "Point of Sale, for example, LV" }, { "name": "origin", "type": "string", "doc": "Origin, e.g. RIX" }, { "name": "destination", "type": "string", "doc": "Destination, e.g. TLL" }, { "name": "tripType", "type": {"name": "TripType", "type": "enum", "symbols" : ["OW", "RT"]}, "doc": "RT for round-trip, OW for one-way" }, { "name": "outboundDate", "type": "string", "doc": "Outbound date in format 2018-04-29" }, { "name": "inboundDate", "type": ["null", "string"], "default": null, "doc": "Inbound date in format 2018-04-29" }, { "name": "adults", "type": "int", "doc": "Number of adults" }, { "name": "children", "type": ["null", "int"], "default": null, "doc": "Number of children" }, { "name": "infants", "type": ["null", "int"], "default": null, "doc": "Number of infants" } ] } ], "default": null }

@asnare asnare added the bug label Apr 10, 2018
@asnare
Copy link
Member

asnare commented Apr 10, 2018

This is unfortunate; thanks for the report.

I don't think downgrading is the right solution, especially since I'd like to support some of the 1.8 features like the logical timestamp types. Instead I'd like to see us work around the issue (with test cases).

Some things the test-cases need to check:

  • Mapping directly.
  • Mapping via the jsonpath custom extractor path.
  • If the value being mapped doesn't match a known value, how that's handled.

For invalid values I think it should be treated as an empty value instead of throwing an exception.

@asnare
Copy link
Member

asnare commented Apr 13, 2018

It turns out the problem is not related to the Avro version; the linked report describes a different problem. Downgrading will not resolve the problem.

I've uncovered a few problems with the enum support as things stand:

  • Although there's support for mapping JSON to enum fields, the tests were incomplete. Mapping succeeds, but the resulting Avro record cannot be serialized. This is the bug you encountered.
  • We've never supported mapping to enumerations directly (either as a literal or anything else that produces a string value). This fails on startup because the type of the mapping and schema field don't match.

With this in mind I'm going to update this issue to address the former. Extending the mapping DSL so that string-based values can be applied to enum fields is an enhancement and I've created issue #208 to deal with that.

@asnare asnare self-assigned this Apr 13, 2018
@asnare asnare changed the title Enum support is broken for Avro 1.8.2 Mapping JSON values to enum fields is broken Apr 13, 2018
@aspruds
Copy link
Author

aspruds commented Apr 15, 2018

Hello,

First of all, thank you for having a look at this! I would like to respectfully disagree about your conclusion that downgrading Avro will not help in this specific situation. Note that I'm not claiming that doing that is a good idea otherwise (I agree with your argument about supporting logical timestamp values).

While stock Divolte distribution (0.7.0 and also 0.8.0) throws "org.apache.avro.AvroTypeException", there is no such exception thrown (and Avro file is sucessfully written) if I manually replace avro-1.8.2.jar with avro-1.7.7.jar. My mapping is included bellow:

BTEventRecord.avsc.txt
BTEventMapping.groovy.txt

The exception I'm seeing is:
java.util.concurrent.CompletionException: org.apache.avro.AvroTypeException: Not an enum: OW at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.avro.AvroTypeException: Not an enum: OW at org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:177) at org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:59) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:119) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166) at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166) at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62) at io.divolte.server.AvroRecordBuffer.<init>(AvroRecordBuffer.java:61) at io.divolte.server.AvroRecordBuffer.fromRecord(AvroRecordBuffer.java:79) at io.divolte.server.Mapping.map(Mapping.java:107)

@asnare
Copy link
Member

asnare commented Apr 18, 2018

You're right.

Downgrading to pre-Avro-1.8.0 does resolve the problem; I missed that AVRO-997 introduced a breaking-change with 1.8.0 that triggers this issue.

@asnare asnare added this to the 0.9 milestone Apr 18, 2018
@asnare
Copy link
Member

asnare commented Apr 20, 2018

This issue has been resolved by #211.

@asnare asnare closed this as completed Apr 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants