Skip to content

[AVRO] #589: Fix schema not including base class for records with subclasses #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MichalFoksa
Copy link
Contributor

@MichalFoksa MichalFoksa commented May 14, 2025

Supported cases:

1. Concrete class annotated with @JsonSubTypes

When a concrete (non-abstract) class is annotated with @JsonSubTypes, then Avro type of the annotated class is part of resulting union.

 @JsonSubTypes({
        @JsonSubTypes.Type(value = Apple.class),
        @JsonSubTypes.Type(value = Pear.class),
 })
class Fruit { }

class Apple extends Fruit {}
class Pear extends Fruit {}

Fruit schema:

[
  {name: Fruit, type: record, fields: [...]},
  {name: Apple, type: record, fields: [...]},
  {name: Orange, type: record, fields: [...]}
]

Annotated class, Fruit, is part of the schema:

This was not supported case, annotated class was missing in union.

2. Abstract class annotated with @JsonSubTypes

When an abstract class is annotated with @JsonSubTypes, then abstract class is not in union.

 @JsonSubTypes({
        @JsonSubTypes.Type(value = Apple.class),
        @JsonSubTypes.Type(value = Pear.class),
 })
class AbstractFruit { }          // <----  abstract class

class Apple extends Fruit {}
class Pear extends Fruit {}

AbstractFruit schema:

[
  {name: Apple, type: record, fields: [...]},
  {name: Orange, type: record, fields: [...]}
]

Annotated abstract base class is not part of the schema:

This was supported case.

3. Abstract class somewhere in the middle of @JsonSubTypes hierarchy

When an abstract class is somewhere along @JsonSubTypes hierarchy, it does not end up in union.

@JsonSubTypes({
        @JsonSubTypes.Type(value = AbstractWaterVehicle.class),
})
class Vehicle {
}

@JsonSubTypes({
        @JsonSubTypes.Type(value = Boat.class),
})
abstract class AbstractWaterVehicle extends Vehicle { }   // <---- abstract class
class Boat extends AbstractWaterVehicle { }

Vehicle schema:

[
  {name: Vehicle, type: record, fields: [...]},
  {name: Boat, type: record, fields: [...]},
]

AbstractWaterVehicle class is not in union.

This was not supported case, AbstractWaterVehicle would be part of union

4. Class referenced twice in @JsonSubTypes hierarchy

When a class is referenced twice in @JsonSubTypes hierarchy, it occurs only once in union.

// Helium is twice in subtypes hierarchy, once as `ElementInterface` subtype and second time as subtype of `AbstractGas` subtype.
@JsonSubTypes({
        @JsonSubTypes.Type(value = Gas.class),
        @JsonSubTypes.Type(value = Helium.class),       // <---- 1. Helium 
})
interface ElementInterface { }

@JsonSubTypes({
        @JsonSubTypes.Type(value = Helium.class),       // <---- 2. Helium
        @JsonSubTypes.Type(value = Oxygen.class),
})
static class Gas implements ElementInterface { }

class Helium extends Gas { }
class Oxygen extends Gas { }

ElementInterface schema:

[
  {name: Gas, type: record, fields: [...]},
  {name: Helium, type: record, fields: [...]},
  {name: Oxygen, type: record, fields: [...]}
]

Helium class is not duplicated in union.

5. Base class explicitly in @JsonSubTypes or @union annotations

When a class is subclass of itself.

 @JsonSubTypes({
        @JsonSubTypes.Type(value = Fruit.class),   // <----  Fruit is annotated class and also in @JsonSubTypes
        @JsonSubTypes.Type(value = Apple.class),
        @JsonSubTypes.Type(value = Pear.class),
 })
class Fruit { } // or interface

class Apple extends Fruit {}
class Pear extends Fruit {}
 @Uion({
        Fruit.class,                              // <----  Fruit is annotated class and also in @Union
        Apple.class, Pear.class,
 })
class Fruit { }  // or interface

class Apple extends Fruit {}
class Pear extends Fruit {}

Both cases above would lead to endless loop and StackOverflowError.


See PolymorphicTypeAnnotationsTest.class.

@MichalFoksa MichalFoksa force-pushed the feature/2.19/589-schema-does-not-include-base-class-for-records-with-subclasses branch from 952cbb7 to f43adf5 Compare May 14, 2025 15:57
@MichalFoksa MichalFoksa changed the title #589 Fix schema not including base class for records with subclasses #589: Fix schema not including base class for records with subclasses May 14, 2025
@MichalFoksa MichalFoksa force-pushed the feature/2.19/589-schema-does-not-include-base-class-for-records-with-subclasses branch from f43adf5 to ee0c73c Compare May 15, 2025 20:08
@cowtowncoder
Copy link
Member

Sounds good. Was about to suggest that due to RecordVisitor changes (addition of a Field) should go against 2.x not 2.19 but since it's private field maybe it can be considered internal (non-API) change.

@MichalFoksa MichalFoksa force-pushed the feature/2.19/589-schema-does-not-include-base-class-for-records-with-subclasses branch from ee0c73c to 99311ae Compare May 15, 2025 21:34
@MichalFoksa
Copy link
Contributor Author

I have problem to choose most obvious handling of duplicates in unionSchemas (inside if (subTypes != null && !subTypes.isEmpty()) condition).

  1. when I use HashSet then _typeSchema must be added into union last.

  2. when I use ArrayList, then I have to call unionSchemas .contains(subTypeSchema) before adding subTypeSchema into unionSchemas. However ArrayList.contains() is only object reference-equality check.

  3. If reference-equality check is enough for unionSchemas , then IdentityHashMap can be used to create a Set (Set<Schema> unionSchemas = Collections.newSetFromMap(new IdentityHashMap<>()) ) and we have simple code.

@cowtowncoder
Copy link
Member

I have problem to choose most obvious handling of duplicates in unionSchemas (inside if (subTypes != null && !subTypes.isEmpty()) condition).

1. when I use HashSet then `_typeSchema` must be added into union last.

2. when I use ArrayList, then I have to call `unionSchemas .contains(subTypeSchema)` before adding subTypeSchema into `unionSchemas`. However  ArrayList.contains() is only object reference-equality check.

3. If reference-equality check is enough for `unionSchemas` , then `IdentityHashMap` can be used to create a Set (`Set<Schema> unionSchemas = Collections.newSetFromMap(new IdentityHashMap<>())` ) and we have simple code.

Would it make sense to first use a Map with Java type being resolved (either JavaType, if available, or Class if not) as key; and only after collecting use "values" to create whatever is needed (if it must be Set, then identity; otherwise List if we can use Collection)?

I may be missing/misunderstanding something, so apologies if above makes no sense.

@MichalFoksa MichalFoksa marked this pull request as ready for review May 17, 2025 06:35
@MichalFoksa MichalFoksa force-pushed the feature/2.19/589-schema-does-not-include-base-class-for-records-with-subclasses branch from ef29a98 to c540e07 Compare May 17, 2025 06:39
@MichalFoksa MichalFoksa force-pushed the feature/2.19/589-schema-does-not-include-base-class-for-records-with-subclasses branch from bff3e2b to 88fa1dc Compare May 17, 2025 13:30
@MichalFoksa MichalFoksa changed the title #589: Fix schema not including base class for records with subclasses [AVRO] #589: Fix schema not including base class for records with subclasses May 19, 2025
@MichalFoksa MichalFoksa changed the title [AVRO] #589: Fix schema not including base class for records with subclasses [AVRO] 589: Fix schema not including base class for records with subclasses May 19, 2025
@MichalFoksa MichalFoksa changed the title [AVRO] 589: Fix schema not including base class for records with subclasses [AVRO] #589: Fix schema not including base class for records with subclasses May 19, 2025
@MichalFoksa
Copy link
Contributor Author

@cowtowncoder

I am done. Have a look at it PLS.

* _typeSchema points to Fruit.class without subtypes record schema
*
* FIXME: When _typeSchema is not null, then _overridden must be true, therefore (_overridden == true) can be replaced with (_typeSchema != null),
* but it might be considered API change cause _overridden has protected access modifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that as a follow-up for 2.20 (2.x branch)

@cowtowncoder
Copy link
Member

@MichalFoksa Thank you! Had a quick look and things look better. Will try to properly review later tonight, get it merged hopefully.

// using hashCode() for equality check).
// Set ensures that each subType schema is once in resulting union.
// IdentityHashMap is used because it is using reference-equality.
final Set<Schema> unionSchemas = Collections.newSetFromMap(new IdentityHashMap<>());
Copy link
Member

@cowtowncoder cowtowncoder May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we have identity-based Set, where the only match is pure identity, why not just use ArrayList instead of Set to add/addAll to? Or do we get literal same Schema instances somehow?

Making that chance will not fail any of added tests at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we get literal same Schema instances somehow?

Yes. It happens when a class is references multiple times in hierarchy (maybe by mistake).

In the example below Helium is subtype of Element and also subtype of Gas.

Where Gas schema is union of

  • Gas (itself),
  • Helium
  • Oxygen.

Element schema is union of all type in

  • Gas schema (Gas, Helium and Oxygen)
  • Helium

When you collect all types togetjer without identity check you would get invalid union of

  • Gas
  • Helium
  • Oxygen
  • Helium again

because Helium is twice.

See use case 4 "Class referenced twice in @JsonSubTypes hierarchy**" above or PolymorphicTypeAnnotationsTest.class_is_referenced_twice_in_hierarchy_test test.

Change unionSchemas to ArrayList and the test will fail with

Failed to generate AvroSchema for ElementInterface, problem: Duplicate in union: Helium

 @JsonSubTypes({
   Type( Gas.class )
   Type( Helium.class ) })                         <---- First Helium occurrence 
 +---------------------+
 |      Element        |
 |    ( interface )    |
 +---------------------+
     ▲              ▲   @JsonSubTypes({
     |               |    Type( Helium.class )     <---- Second Helium occurrence
     |               |    Type( Oxygen.class ) })
     |     +---------------------+
     |     |        Gas          |
     |     +---------------------+
     |              ▲
     |              | 
     |     +--------+--------+
     |     |                 |
+-----------------+  +-----------------+
|     Helium      |  |     Oxygen      |
+-----------------+  +-----------------+

Copy link
Member

@cowtowncoder cowtowncoder May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes, right, but: I think duplicates are actually caught by alreadySeenClasses -- so I don't think unionSchemas prevent any duplicates. So that one could -- I think? -- be simple ArrayList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both duplicate checks are needed, each check is on different level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But like I said, changing Set to List did not fail any of the tests. So something is odd.
I think I'll try to do that again just to make sure I did not imagine it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( Hmmm, just tested:

final ArrayList<Schema> unionSchemas = new ArrayList<>();

Fails PolymorphicTypeAnnotationsTest#class_is_referenced_twice_in_hierarchy_test test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I must have been hallucinating. Apologies for noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, it happens.

@cowtowncoder cowtowncoder merged commit 0d7f007 into FasterXML:2.19 May 22, 2025
4 checks passed
@cowtowncoder cowtowncoder modified the milestones: 2.19.0, 2.19.1 May 22, 2025
@cowtowncoder
Copy link
Member

Merged all the way to 3.x, things working. Phew!

Next PR could be for minor changes for 2.20 (2.x), making fields final.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants