Checkboxes for prior research
Describe the bug
GetObjectCommand can hang forever while reading the response body when the underlying TCP connection drops or stalls mid-transfer, with no error and no timeout to break the deadlock.
The failure is invisible to every safety mechanism the SDK offers:
client.send() resolves successfully as soon as the response headers arrive (HTTP 200). From the SDK's point of view the operation has already succeeded, so the failure happens entirely in the body-streaming phase that follows.
- The returned
Body — a ChecksumStream when the object carries a checksum, which is the default for newly-uploaded objects — never emits end or error when the underlying socket is destroyed or goes silent mid-body. The teardown is swallowed by the wrapper, so the consumer waits forever.
- This affects every way of reading the body: the SDK's own
Body.transformToString(), Node's stream/consumers (text()/json()), and manual data/error/end collectors all hang identically.
- Built-in retries don't help. Because
send() already succeeded, the retry strategy is complete by the time the body stalls — with maxAttempts: 5 the server receives exactly one request and the read still hangs.
requestTimeout doesn't help either — it only bounds the request up to the response headers, not the body transfer.
The net effect is a silent, unrecoverable deadlock: a process that has received a partial object body sits idle indefinitely (we observed 7+ hours, 0% CPU, no open socket, no exception) with no way to detect or recover without killing it. Any caller streaming object bodies over a connection that can be dropped mid-flight (NAT idle eviction, load-balancer reset, transient network blip, half-open peer) is exposed.
Regression Issue
SDK version number
@aws-sdk/client-s3: 3.1038.0, @smithy/node-http-handler 4.7.8
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v24.12.0
Reproduction Steps
The bug is a stalled/dropped TCP connection after the response headers are received but before the body finishes. The script below reproduces it deterministically with a local mock endpoint — no AWS credentials or real bucket required.
npm install @aws-sdk/client-s3
node repro.mjs
// repro.mjs
import http from 'node:http';
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
// Mock S3 endpoint: returns 200 + a checksum header (so the SDK wraps Body in a
// validating ChecksumStream), sends 44 KB of a 500 KB-declared body, then drops
// the socket mid-transfer — i.e. a connection killed in flight (NAT eviction,
// LB idle reset, transient network blip).
const server = http.createServer((req, res) => {
res.writeHead(200, {
'Content-Length': '500000',
'x-amz-checksum-crc32': 'AAAAAA==',
'x-amz-checksum-type': 'FULL_OBJECT'
});
res.write('{"data":[' + 'x'.repeat(44000));
setTimeout(() => req.socket.destroy(), 300); // connection drops mid-body
});
await new Promise(r => server.listen(0, '127.0.0.1', r));
const { port } = server.address();
const s3 = new S3Client({
region: 'us-east-1',
endpoint: `http://127.0.0.1:${port}`,
forcePathStyle: true,
credentials: { accessKeyId: 'x', secretAccessKey: 'y' },
maxAttempts: 5 // retries do not help — see below
});
let outcome = '>>> HANG (never settled) <<<';
const work = (async () => {
const { Body } = await s3.send(new GetObjectCommand({ Bucket: 'b', Key: 'k', ChecksumMode: 'ENABLED' }));
console.log('send() resolved; Body =', Body.constructor.name);
await Body.transformToString(); // <-- never settles
outcome = 'resolved';
})().catch(e => (outcome = 'rejected: ' + (e?.name || e?.message)));
await Promise.race([work, new Promise(r => setTimeout(r, 5000))]);
console.log('outcome after 5s:', outcome);
server.close();
process.exit(0);
Output:
send() resolved; Body = ChecksumStream
outcome after 5s: >>> HANG (never settled) <<<
Variations that all reproduce the same hang:
- Replace
req.socket.destroy() with never finishing the body (a silent stall — server stops sending, no FIN/RST). Hangs identically.
- Consume the body with
node:stream/consumers text()/json(), or with a manual data/error/end collector, instead of transformToString(). All hang.
- Remove the checksum headers (raw
IncomingMessage body): the destroy() case then surfaces as an error and rejects, but the silent stall case still hangs (no body-read timeout).
Observed Behavior
client.send() resolves successfully as soon as the response headers arrive (HTTP 200). From the SDK's perspective the operation has already succeeded.
- The returned
Body (a ChecksumStream when the object has a checksum, which is the default for new objects) then never emits end or error when the underlying connection is destroyed/stalls mid-body.
- The body-consuming promise (
transformToString(), node:stream/consumers, or any data/end collector) is therefore orphaned and hangs forever.
- Built-in retries do not fire: with
maxAttempts: 5 the mock server receives exactly one request — because send() already succeeded, the retry strategy is done; the failure is in the body phase, which it doesn't cover.
requestTimeout on NodeHttpHandler does not help either — it only bounds the request up to the response headers, not the body transfer.
In our deployed system this manifested as a job that consumed a partial S3 object then hung indefinitely (7+ hours, 0% CPU) with no error, no socket, and no way to recover without killing the process.
Expected Behavior
A GetObject body read should not be able to hang forever on a dropped/stalled connection. Concretely, at least one of:
- A configurable response/body read (socket inactivity) timeout that applies to the whole operation including streaming the body — so a stalled body eventually rejects.
- The body stream (including the
ChecksumStream wrapper) should propagate the underlying socket teardown as an error/premature-close on the stream the consumer is reading, so transformToString() / for await / data+end collectors reject instead of hanging.
Either would let callers catch the failure and retry, instead of silently deadlocking.
Possible Solution
- Attach a socket inactivity timeout (
socket.setTimeout) that remains armed through the body-streaming phase, not just until response headers, and destroy + error the body stream when it fires (configurable via the existing requestTimeout, or a new bodyTimeout/socketTimeout option).
- Ensure
@smithy/util-stream's ChecksumStream (and any other body wrappers) forward error/aborted/close-before-end from their source IncomingMessage to consumers, so a teardown is never swallowed.
- Until then, document clearly that consumers must impose their own body-read timeout, since
maxAttempts/requestTimeout do not cover this.
For reference, the workaround we shipped is a per-read idle watchdog that destroy()s the body stream if no bytes arrive within a timeout (resetting on each chunk), which makes the standard consumer reject; we then retry the whole GetObject. It works, but every SDK user reading object bodies needs this and most won't know to.
Additional Information/Context
Checkboxes for prior research
Describe the bug
GetObjectCommandcan hang forever while reading the response body when the underlying TCP connection drops or stalls mid-transfer, with no error and no timeout to break the deadlock.The failure is invisible to every safety mechanism the SDK offers:
client.send()resolves successfully as soon as the response headers arrive (HTTP 200). From the SDK's point of view the operation has already succeeded, so the failure happens entirely in the body-streaming phase that follows.Body— aChecksumStreamwhen the object carries a checksum, which is the default for newly-uploaded objects — never emitsendorerrorwhen the underlying socket is destroyed or goes silent mid-body. The teardown is swallowed by the wrapper, so the consumer waits forever.Body.transformToString(), Node'sstream/consumers(text()/json()), and manualdata/error/endcollectors all hang identically.send()already succeeded, the retry strategy is complete by the time the body stalls — withmaxAttempts: 5the server receives exactly one request and the read still hangs.requestTimeoutdoesn't help either — it only bounds the request up to the response headers, not the body transfer.The net effect is a silent, unrecoverable deadlock: a process that has received a partial object body sits idle indefinitely (we observed 7+ hours, 0% CPU, no open socket, no exception) with no way to detect or recover without killing it. Any caller streaming object bodies over a connection that can be dropped mid-flight (NAT idle eviction, load-balancer reset, transient network blip, half-open peer) is exposed.
Regression Issue
SDK version number
@aws-sdk/client-s3: 3.1038.0, @smithy/node-http-handler 4.7.8
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v24.12.0
Reproduction Steps
The bug is a stalled/dropped TCP connection after the response headers are received but before the body finishes. The script below reproduces it deterministically with a local mock endpoint — no AWS credentials or real bucket required.
Output:
Variations that all reproduce the same hang:
req.socket.destroy()with never finishing the body (a silent stall — server stops sending, no FIN/RST). Hangs identically.node:stream/consumerstext()/json(), or with a manualdata/error/endcollector, instead oftransformToString(). All hang.IncomingMessagebody): thedestroy()case then surfaces as anerrorand rejects, but the silent stall case still hangs (no body-read timeout).Observed Behavior
client.send()resolves successfully as soon as the response headers arrive (HTTP 200). From the SDK's perspective the operation has already succeeded.Body(aChecksumStreamwhen the object has a checksum, which is the default for new objects) then never emitsendorerrorwhen the underlying connection is destroyed/stalls mid-body.transformToString(),node:stream/consumers, or anydata/endcollector) is therefore orphaned and hangs forever.maxAttempts: 5the mock server receives exactly one request — becausesend()already succeeded, the retry strategy is done; the failure is in the body phase, which it doesn't cover.requestTimeoutonNodeHttpHandlerdoes not help either — it only bounds the request up to the response headers, not the body transfer.In our deployed system this manifested as a job that consumed a partial S3 object then hung indefinitely (7+ hours, 0% CPU) with no error, no socket, and no way to recover without killing the process.
Expected Behavior
A
GetObjectbody read should not be able to hang forever on a dropped/stalled connection. Concretely, at least one of:ChecksumStreamwrapper) should propagate the underlying socket teardown as anerror/premature-close on the stream the consumer is reading, sotransformToString()/for await/data+endcollectors reject instead of hanging.Either would let callers catch the failure and retry, instead of silently deadlocking.
Possible Solution
socket.setTimeout) that remains armed through the body-streaming phase, not just until response headers, and destroy + error the body stream when it fires (configurable via the existingrequestTimeout, or a newbodyTimeout/socketTimeoutoption).@smithy/util-stream'sChecksumStream(and any other body wrappers) forwarderror/aborted/close-before-endfrom their sourceIncomingMessageto consumers, so a teardown is never swallowed.maxAttempts/requestTimeoutdo not cover this.For reference, the workaround we shipped is a per-read idle watchdog that
destroy()s the body stream if no bytes arrive within a timeout (resetting on each chunk), which makes the standard consumer reject; we then retry the wholeGetObject. It works, but every SDK user reading object bodies needs this and most won't know to.Additional Information/Context
Body.transformToString()(the SDK's own helper),node:stream/consumers, and manualdata/error/endcollectors all hang — so this is upstream of any consumer choice; the body stream simply never signals completion or failure.@aws-sdk/client-s3from 3.1038.0 through 3.1068.0.