Skip to content

[Cosmos] Refactor query responses to their own Pager and Page types #2393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 17, 2025

Conversation

analogrelay
Copy link
Member

@analogrelay analogrelay commented Mar 26, 2025

Cosmos DB queries for Items, Containers, Databases, or Offers, aren't guaranteed to be satisfied through HTTP requests. For example, Direct mode clients, such as the .NET SDK, use a custom TCP protocol to communicate directly with the replicas. In addition, cross-partition requests return "synthetic" pages of data that merge results from several requests (to each partition). In order to support this in the future, I'm changing the return type of the query related APIs from azure_core::Pager<T> to a new azure_data_cosmos::FeedPager<T> type. Instead of yielding azure_core::Response values, it yields azure_data_cosmos::FeedPage<T> values. A FeedPage is a lot like a Response, except for a few things:

  1. It doesn't have a status code, since I don't believe the status code is relevant in this case and it may not be an HTTP status code. We can always add it back if necessary.
  2. It contains full deserialized items. We could augment it to support some form of streaming if we want in the future.

It does retain the azure_core::http::Headers collection, because it's a collection (and thus could just be empty when we don't have any) and because all the Cosmos transports do support some form of headers that we may want to expose to the user.

@heaths The easiest way for me to make this split was to refactor azure_core::Pager<T> a bit so that I could use it. I'm not wedded to that design if you have concern. I wanted to be able to use the existing work done on Pager<T> but did not want it to yield azure_core::Response<T> values. An alternative I considered and am willing to fall back to is to just copy the Pager<T> code, rather than using the type itself. I refactored it by creating two separate types:

  1. PageStream<T> which contains all the paging logic, but yields bare T values.
  2. Pager<T> which is an alias for PageStream<Response<T>>, which allows existing usages to work as they did before. In most cases, a Pager<T> should yield a Response<T> , so I didn't want to force every other paginated service API to spell out Pager<Response<T>> in its return type.

@github-actions github-actions bot added Azure.Core The azure_core crate Cosmos The azure_cosmos crate labels Mar 26, 2025
@analogrelay analogrelay marked this pull request as ready for review March 27, 2025 18:03
Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice and clean.

@JeffreyRichter
Copy link
Member

@analogrelay Are you saying that today this client methods make HTTP requests but that in the future, they might not?
If so, I don't love the idea of changing the underlying implementation this drastically on customers. Customers will have an expectation of HTTP with headers, context, perhaps they added a pipeline policy that would no longer function as expected, etc.

Instead, I'd keep these methods always making HTTP requests and then introduce NEW methods (perhaps on a new/different client) that had different (transport) behavior.

@analogrelay
Copy link
Member Author

analogrelay commented Mar 28, 2025

Are you saying that today this client methods make HTTP requests but that in the future, they might not?

Correct. In fact it often does not use HTTP in several other languages. That's why several Cosmos clients use their own "Response" type (.NET , Java, JavaScript). This change is essentially aligning the Rust SDK with those SDKs.

If so, I don't love the idea of changing the underlying implementation this drastically on customers. Customers will have an expectation of HTTP with headers, context, perhaps they added a pipeline policy that would no longer function as expected, etc.

Customers should not have this expectation; it will cause unexpected surprises for them as we bring the more feature-rich transports1 online, and I'd rather make it very clear to customers that queries are NOT guaranteed to move through the HTTP pipeline. When we make HTTP requests, we do use that pipeline (though perhaps, to reinforce that point, we should not use the core pipeline and should instead use a separate pipeline for queries that can work for all transports; I'm not sure on that yet).

Again, it's good to point out here that several of our other clients work this way. HTTP requests are made through the "standard" HTTP pipeline and can be observed in policies, but activity on the non-HTTP transport does not.

Instead, I'd keep these methods always making HTTP requests and then introduce NEW methods (perhaps on a new/different client) that had different (transport) behavior.

This would be much more surprising to users, particularly given the fact that the user is usually not directly aware that their request is spanning multiple HTTP responses OR using a different transport (of course we don't hide it, but the point is they would certainly be surprised to have to make this decision themselves).


I think it's also worth noting a key distinction here with Cosmos. Cosmos data plane clients should not really be considered standard wrapper SDKs that call REST APIs. They are more akin to the client driver for a database, like PostgreSQL or SQL Server. Part of the transport surface area uses, or at least supports, REST, but a significant portion of that transport surface is non-HTTP (more like EventHubs AMQP support, for example).

I think it's reasonable for us to reuse the Core HTTP primitives for APIs that are entirely REST-based (essentially the "ControlPlane-over-DataPlane" APIs like CRUD for Databases/Containers). But this is the first step in moving the Data Plane operations to be independent of HTTP primitives (I expect to move point read/write operations like Item CRUD to this model as well).

Footnotes

  1. The non-HTTP transport is the only way for customers to get a guarantee of the latency SLA we publish, for example.

@heaths
Copy link
Member

heaths commented Mar 28, 2025

@JeffreyRichter it's also worth pointing out that @analogrelay's changes to azure_core not only don't adversely affect HTTP support, but are in line with how we want to deserialize the vast majority of models within the pipeline anyway. See my comment. In those cases that we do want to provide true stream (as we always do now) e.g., BlobClient::download_blob, they wouldn't use paging anyway. I just can't see how that would work.

@JeffreyRichter
Copy link
Member

First, just because other language SDKs do something does not mean it is the right or best thing to do..

My main problem is that potential breaking changes occur if today these methods make HTTP requests using client options like retry policy, logging, api-version, customer policies, and so on, and then, in the future when the protocol changes, the same client method ignores all the options or does something very different.

I am also concerned that a customer expects our Clients to make HTTP requests (as we never attempted to hide this or abstract it away) but that these methods will not use HTTP. However, this is much less of a concern to me than the breaking change concern as we do have clients today with methods that are not 1:1 with an HTTP request (like parallel blob upload/download). But, in these scenarios we must make it very clear to customers that these methods are using some rich algorithm on top of HTTP or not using HTTP at all. In other words, I can live with this decision but it's hard for me to live with a very-likely breaking change.

@Pilchie
Copy link
Member

Pilchie commented Mar 28, 2025

In other SDKs where we support other transports, we have a configuration option for which transport to use (for example: https://learn.microsoft.com/azure/cosmos-db/nosql/sdk-connection-modes), so the behavior change you are concerned about only happens in response to a customer code change. What @analogrelay is proposing is to enable changing that connectivity mode without also having to change all the query and point operation calls to something different.

@JeffreyRichter
Copy link
Member

Ok, that does make me feel better.
I'll say this: I the worked on the IOT SDKs years ago and, in this, customers could set a transport flag between selecting HTTP, AMQP, or MQTT. The end result was disastrous for customers and the SDK team while they tried to abstract away the differences between 3 fundamentally different transports and customers couldn't write code that just worked and is wasn't clear to them the pros/cons of selecting one transport over another.

So, my past experience scares me with this proposal but it is your SDK and if you're ok with maintaining this SDK and supporting your customers, go ahead.

@analogrelay
Copy link
Member Author

I should clarify, I was certainly never intending to imply that just because other SDKs do something it's right, but they do provide context for the decision in that we can look to the reasons why those SDKs made that decision (which is what I was referencing).

My main problem is that potential breaking changes occur if today these methods make HTTP requests using client options like retry policy, logging, api-version, customer policies, and so on, and then, in the future when the protocol changes, the same client method ignores all the options or does something very different.

This is precisely why I want to make this transition now, well before our initial GA. We have other SDKs that have not considered this in advance (the Go SDK in particular) and we are expecting some fairly disruptive changes in order to support this (perhaps moving, as you suggest above, to brand new APIs that do property abstract the two transports).

We have a fair amount of experience with abstracting these two transports in .NET and Java, and I plan to tap into that as we continue to move forward.

@JeffreyRichter
Copy link
Member

I See. I misunderstood then. I thought the plan was to GA with an HTTP implementation and then change the protocol in a subsequent release. Doing this before GA also makes me feel much better.

@analogrelay analogrelay force-pushed the ashleyst/change-query-responses branch from b760316 to b49fce2 Compare April 15, 2025 16:40
@analogrelay
Copy link
Member Author

I've rebased this and revived it after my week off last week. Looking for sign-off from Cosmos folks (@Pilchie @kirankumarkolli)

@azure-sdk
Copy link
Collaborator

azure-sdk commented Apr 15, 2025

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure_core
azure_data_cosmos

Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new commits.

@analogrelay analogrelay merged commit af09583 into Azure:main Apr 17, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Core The azure_core crate Cosmos The azure_cosmos crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants