Handle rate limits (429 responses) better #574

gavinclarkeuk · 2025-02-25T17:05:34Z

We have a terraform module for creating topics which does a couple of data source lookups for env and cluster details. If module users use the module in a for_each loop they very quickly run into rate limit issues. We mainly hit these just doing simple environment lookups, but I imagine this could happen for any resource.

Given the API responds with a 429, and various headers to tell the caller when they could retry, couldn't the terraform provider respect these and respond accordingly? Also I'm not sure if the provider follows all the other recommendations from the confluent api docs (e.g. introducing jitter).

The text was updated successfully, but these errors were encountered:

linouk23 · 2025-02-25T20:03:12Z

@gavinclarkeuk thanks for creating this issue!

The Terraform Provider for Confluent uses a smart HTTP client that retries up to four times for 429 and 5xx errors, employing an exponential backoff strategy:

In case you're still encountering issues, we recommend overriding the max_retries attribute.

provider "confluent" {
    cloud_api_key    = "..."
    cloud_api_secret = "..."
    max_retries = 10 # defaults to 4
}

Let us know if that helps!

Note: Regarding the process that performs a couple of data source lookups for environment and cluster details, consider updating your data sources to use lookup by id rather than by display_name to reduce the number of API calls.

gavinclarkeuk · 2025-02-26T12:34:54Z

Setting max_retries has fixed our immediate issue, but I still think there could be room for improvement here.

I appreciate we could restructure our terraform module so it didn't need to do the lookups, but that does have an impact on it's ease of use, so solving this upstream would be preferable (and useful for others).

I guess another approach could be to somehow cache the results for duplicate datasource lookups, or batch up and dedupe the api requests. I don't know enough about provider internals to know if that would be possible, but it would certainly reduce pressure on the API.

linouk23 · 2025-02-28T05:15:12Z

@gavinclarkeuk could you share the data sources where you can observe 429 issues? Thank you!

gavinclarkeuk · 2025-02-28T22:03:23Z

The two we are hitting the most are confluent_environment and confluent_kafka_cluster doing lookups by display name.

linouk23 · 2025-03-04T20:45:27Z

I guess another approach could be to somehow cache the results for duplicate datasource lookups, or batch up and dedupe the api requests. I don't know enough about provider internals to know if that would be possible, but it would certainly reduce pressure on the API.

That's a great idea, but I'm not sure whether it's possible, though 😁

The catch is that our API doesn't really support filtering by display_name for the majority of resources. However, we did add these filter parameters in Terraform (TF) by accepting display_name instead of the id attribute for a number of data sources due to customer demand. That said, we now believe it might have been a mistake. I feel it's okay to use them as long as there are no error 429s. But if you do run into this situation, I'm not sure we want to encourage users to keep leveraging this approach long-term to avoid issues.

Given the API responds with a 429, and various headers to tell the caller when they could retry, couldn't the terraform provider respect these and respond accordingly?

We do use a smart HTTP client: https://github.com/hashicorp/go-retryablehttp that parses these ratelimit headers automatically.

The two we are hitting the most are confluent_environment and confluent_kafka_cluster doing lookups by display name.

Could you file a support ticket to ask for a rate limit increase for org/v2 (confluent_environment) and cmk/v2 (confluent_kafka_cluster) APIs for list API calls, so our Product team could prioritize accordingly? Thank you!

linouk23 added the question Further information is requested label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle rate limits (429 responses) better #574

Handle rate limits (429 responses) better #574

gavinclarkeuk commented Feb 25, 2025

linouk23 commented Feb 25, 2025 •

edited

Loading

gavinclarkeuk commented Feb 26, 2025

linouk23 commented Feb 28, 2025

gavinclarkeuk commented Feb 28, 2025

linouk23 commented Mar 4, 2025 •

edited

Loading

Handle rate limits (429 responses) better #574

Handle rate limits (429 responses) better #574

Comments

gavinclarkeuk commented Feb 25, 2025

linouk23 commented Feb 25, 2025 • edited Loading

gavinclarkeuk commented Feb 26, 2025

linouk23 commented Feb 28, 2025

gavinclarkeuk commented Feb 28, 2025

linouk23 commented Mar 4, 2025 • edited Loading

linouk23 commented Feb 25, 2025 •

edited

Loading

linouk23 commented Mar 4, 2025 •

edited

Loading