Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit polymorphism discriminator #2903

Open
Tmpod opened this issue Jan 19, 2025 · 3 comments
Open

Explicit polymorphism discriminator #2903

Tmpod opened this issue Jan 19, 2025 · 3 comments
Labels

Comments

@Tmpod
Copy link

Tmpod commented Jan 19, 2025

What is your use-case and why do you need this feature?

I've been experimenting with MongoDB through KMongo and KtMongo, specifically around polymorphic documents. Both kbson (used by KMongo) and the official Kotlin Mongo driver (used by KtMongo) support polymorphic serialization, but always through a virtual field (whose name is dependent on the format).

This isn't ideal. You have to resort to clunky workarounds to use the type-safe query API, you have use SerialName everywhere and in general it isn't intuitive at all. Implicit fields make code harder to reason about, and often require diving into library sources to really understand what's happening.

Here's an example with current implicit fields
enum class FooType { A, B }

@Serializable
sealed interface Foo {
    val common: Int
    val type: FooType
}

@Serializable
@SerialName("A")
class FooA(
    override val common: Int,
    val bar: Long,
) : Foo {
    // Can't be concrete otherwise you get weird behaviour between
    // the type field generated by kx.ser (with value from @SerialName)
    // and the actual concrete field
    override val type get() = FooType.A
}

@Serializable
@SerialName("B")
class FooB(
    override val common: Int,
    val baz: String,
) : Foo {
    override val type get() = FooType.B
}

suspend fun someQueries(db: Database) {
    db.getCollection<FooA>().insertOne(FooA(common = 1, bar = 2))
    db.getCollection<FooB>().insertOne(FooB(common = 1, baz = "3"))
    
    // Only works if the format-specific discriminator is called "type".
    // If it is something else you either have to change the property name
    // or use @SerialName
    val fooA = db.getCollection<FooA>().findOne { Foo::type eq FooType.A }
    val fooB = db.getCollection<FooB>().findOne { Foo::type eq FooType.B }
    val foo = db.getCollection<Foo>().findOne()
    println(fooA.bar)
    println(fooB.baz)
    println(foo.type)
}

Describe the solution you'd like

Let's consider a new @PolymorphicDiscriminator annotation that tells kx.ser and implementing formats which field should be used for polymorphic (de)serialization. One could write the following instead:

The same example but using this suggested annotation
enum class FooType { A, B }

@Serializable
sealed class Foo(
    @PolymorphicDiscriminator
    // Could also use @SerialName here if wanted
    val type: FooType,
    val common: Int,
)

@Serializable
class FooA(
    common: Int,
    val bar: Long,
) : Foo(FooType.A, common)
// EDIT: I'm aware these constructors are invalid with @Serializable,
// you'd have to override instead, like in the previous example.

@Serializable
class FooB(
    common: Int,
    val baz: String,
) : Foo(FooType.B, common)

suspend fun someQueries(db: Database) {
    db.getCollection<FooA>().insertOne(FooA(common = 1, bar = 2))
    db.getCollection<FooB>().insertOne(FooB(common = 1, baz = "3"))

    // We are now sure the type-safe API will resolve to the correct discriminator name,
    // because that's defined by the user, not the format developer
    val fooA = db.getCollection<FooA>().findOne { Foo::type eq FooType.A }
    val fooB = db.getCollection<FooB>().findOne { Foo::type eq FooType.B }
    val foo = db.getCollection<Foo>().findOne()
    println(fooA.bar)
    println(fooB.baz)
    println(foo.type)
}

I'm not well versed in compiler plugins or annotation processors, so I'm not entirely sure on the implementation details, but all the semantic information required for such an annotation exist. When you're processing a class that is part of a sealed hierarchy, you check if the value it defines is always known at compile-time and if it is unique (e.g. you can't have two implementors use the same discriminator value). I'd even say the value could be anything directly matchable in when expressions.
If the values are all consistent, you can generate the polymorphic deserialization function which selects the correct implementor serializer based on the decoded discriminator, akin to what already happens on the current implementation based on an implicit field.


Please correct me if anything I've said is wrong or if my suggestion is unattainable at the moment. Thank you!

@Tmpod Tmpod added the feature label Jan 19, 2025
@CLOVIS-AI
Copy link

Hey, KtMongo author here. I'll give a bit more context.

KtMongo relies on KotlinX.Serialization to convert Kotlin types to and from MongoDB BSON. Through this, we support sealed class and sealed interface simply because KotlinX.Serialization supports them, using an imaginary field type which contains a discriminant.

KtMongo provides an extensive DSL over MongoDB operators. This DSL is implemented through extension functions and generics, allowing us to refer to fields when writing requests:

@Serializable
class User(
    val name: String,
    val age: Int,
)

users.find {
    User::age gt 18
}

So far, so good. The problem comes because of the imaginary field KotlinX.Serialization creates on sealed types:

@Serializable
sealed class Foo

@Serializable
class FooA(val a: Int) : Foo()

@Serializable
class FooB(val b: String) : Foo()

A instance of FooB("foo") will be serialized as:

{
    "type": "FooB",
    "b": "foo"
}

We would like to be able to write a query for specifically instances of FooB, but this requires referring to the imaginary "field" type:

users.find {
    User::foo / Foo::type eq "foo"    // ⚠ Foo::type does not exist
}

This could be worked around by creating an extension function and hard-coding the name type, but KotlinX.Serialization will actually name the discriminant something else if there is already a field named type, so there is no good typesafe way of knowing how the field is called.

By using explicit enums, this issue removes the magic around the discriminant, so it can be accessed through regular Kotlin code.

@pdvrieze
Copy link
Contributor

@Tmpod The information you are looking for is where @SerialName is applied to the type of the serialized object (defaulting to the FQN of the type). You can give it any desired value (but it should be unique within the context of a single format instance).

@CLOVIS-AI You can get this information as text from the descriptor: serializer<Foo>.descriptor.serialName

@Tmpod
Copy link
Author

Tmpod commented Jan 20, 2025

@CLOVIS-AI Thank you for adding more context!

This could be worked around by creating an extension function and hard-coding the name type, but KotlinX.Serialization will actually name the discriminant something else if there is already a field named type, so there is no good typesafe way of knowing how the field is called.

What I'm proposing would avoid all that hardcoding. By explicitly marking a field as the discriminant, you remove the unintuitive behaviour (imo) of virtual fields. The serialization library knows which field it should encode/decode first and how to map values to serializers. I just don't know how feasible it is to get the discriminant value from each implementor in the context of an annotation processor or compiler plugin. To me, it should be possible, the value is static, but I really am not familiar with the capabilities exposed to processors/plugins (though I've already noticed limitations, such as annotations being incapable of taking values I'd consider compile-time constants, such as enum variants).

The information you are looking for is where @SerialName is applied to the type of the serialized object (defaulting to the FQN of the type). You can give it any desired value (but it should be unique within the context of a single format instance).

I'm not sure I follow. I used @SerialName in my first example, but the point is that, because it is useful to access the type of documents both for querying Mongo and other situations (without having to resort to when matches), kx.ser should allow explicit marking of discriminants in closed polymorphic hierarchies. It would also have the added benefit of making the code clearer to developers, since there's less "magic" involved — you know exactly which field is used to decide how to deserialize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants