Skip to content

Conversation

@edvald-garden
Copy link

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This enables adding new fields to nested struct/message types on events and running ALTER SOURCE ... REFRESH SCHEMA (and related ALTER SOURCE statements). Previously this was only possible if fields were added at the root level of the event structure.

To achieve this, the schema compatibility check was re-implemented. An additional benefit is better error messages when a breaking change is made to an individual field.

See this Slack discussion for some more context: https://risingwave-community.slack.com/archives/C03BW71523T/p1760962872810179

This is my first PR here so I'd be quite surprised if it's good on the first attempt, so any feedback would be much appreciated :)

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@edvald-garden edvald-garden force-pushed the schema-refresh-allow-new-nested-fields branch from 1619c35 to eae9396 Compare October 21, 2025 15:59
Some(old_col) => {
// Column exists in both schemas, check compatibility
if let Err(_reason) =
is_protobuf_compatible(old_col.data_type(), new_col.data_type())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a problem here: you only check whether the fields are compatible with the Protobuf protocol in this generic method. However, the upstream schema does not guarantee that it is of Protobuf type (ie. for struct fields, it could also be generated by Avro or other possible nested types).

In addition, I believe that protobuf compatibility should not be checked within RisingWave, but rather ensured by external systems. Even if protobuf is not compatible, as long as the message itself and the schema are updated synchronously, RisingWave can still parse and compute it. This is because the essence of the protobuf compatibility issue arises from the uniqueness of binary serialization/deserialization, while RisingWave merely parses upstream messages into internal types. It does not concern itself with whether the schema is compatible over time, as long as the message itself is compatible with the schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@edvald-garden edvald-garden Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the name of the function is misleading here, which is in and of itself a good thing to fix, I've gone ahead and done that to avoid confusion. The function itself basically just checks based on the the internal DataType types and doesn't consider anything protobuf specific, I just had the compatibility semantics of protofuf in mind when first preparing it.

Question is (and I'm really asking, I'm admittedly in some deep waters here) is whether the semantics of the function hold up? Any potential downstream implications that I'm missing?

@edvald-garden edvald-garden force-pushed the schema-refresh-allow-new-nested-fields branch from eae9396 to c2a78c1 Compare October 21, 2025 18:04
@BugenZhao BugenZhao self-requested a review October 22, 2025 02:23
@BugenZhao
Copy link
Member

Hi and thanks for your contribution.

Just want to share some context before we proceed with this PR: adding nested fields is actually supported in ALTER TABLE .. ALTER COLUMN .. TYPE or ALTER TABLE .. REFRESH SCHEMA, but not for ALTER SOURCE. So there's already code logic for determining column type compatibility here:

impl ColumnIdGenerator {
/// Creates a new [`ColumnIdGenerator`] for altering an existing table.
pub fn new_alter(original: &TableCatalog) -> Self {
fn handle(existing: &mut Existing, path: &mut Path, id: ColumnId, data_type: DataType) {
macro_rules! with_segment {
($segment:expr, $block:block) => {
path.push($segment);
$block
path.pop();
};
}
match &data_type {
DataType::Struct(fields) => {
for ((field_name, field_data_type), field_id) in
fields.iter().zip_eq_fast(fields.ids_or_placeholder())
{
with_segment!(Segment::Field(field_name.to_owned()), {
handle(existing, path, field_id, field_data_type.clone());
});
}
}
DataType::List(list) => {
// There's no id for the element as list's own structure won't change.
with_segment!(Segment::ListElement, {
handle(existing, path, ColumnId::placeholder(), list.elem().clone());
});
}
DataType::Map(map) => {
// There's no id for the key/value as map's own structure won't change.
with_segment!(Segment::MapKey, {
handle(existing, path, ColumnId::placeholder(), map.key().clone());
});
with_segment!(Segment::MapValue, {
handle(existing, path, ColumnId::placeholder(), map.value().clone());
});
}
data_types::simple!() => {}
}
existing
.try_insert(path.clone(), (id, data_type))
.unwrap_or_else(|_| panic!("duplicate path: {:?}", path));
}

Would you help to check whether we can directly reuse such utility for ALTER SOURCE, and furthermore, unifying the handling path for schema change of SOURCE and TABLE to ensure they behave the same? There's also a tracking issue for this: #20475

@edvald-garden
Copy link
Author

Thanks for the added info @BugenZhao! Looks like I'll need to get back to this next week, but happy to look into whether the code you linked can be re-used, at least to some degree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants