You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lines 629 to 639 in 0aa76af
try {
return makeRequestCall( method, path, body )
} catch ( K8sResponseException | SocketException | SocketTimeoutException e ) {
if ( e instanceof K8sResponseException && e.response.code != 500 )
throw e
if ( ++attempt > maxRetries )
throw e
log.debug "[K8s] API request threw socket exception: $e.message for $method $path - Retrying request (attempt=$attempt)"
final long delay = (Math.pow(3, attempt - 1) as long) * 250
sleep( delay )
}
I wonder if it is retrying but the tunnel disconnect lasts longer than the duration of the 8 retries. Seems plausible if it is a regular outage. Can you see the "API request threw socket exception..." messages in your log?
@bentsherman, apologies if this should be raised a separate question/issue but this issue and conversation seem to be closely related to question/issue I am having.
I am running in a cluster that sometimes under heavy load does return temporary 503 errors and this causes nextflow to terminate and this can be seen in the logs:
`Request GET /api...returned an error code=503...`
I seems that similar codes have been switched to retryable for AWS Batch #4709.
I am using nextflow 24.04.3 and when the 503 occurs and the workflow exits the running processes are not terminated. Testing with 24.12.0-edge that has fix #5561 resolves that and terminates running processes even if 503 occurs.
Is there a reason why 503 is not also added as a retryable exit code as for AWS Batch?
Is this something that could potentially be added or a k8s setting that would allow the user to specify which errors to treat as retryable k8s.retryable_errors = [500, 502, 503 ]?
@jorgee when you have some time, can you update the K8s client to handle the same 5xx errors as AWS Batch? Would also be a good chance to use the same failsafe RetryPolicy pattern here like we do throughout the codebase
Originally posted by @crossthet in #5604
The text was updated successfully, but these errors were encountered: