You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.
1. Observed Errors:
Despite increasing apiCallTimeout, connectionTimeout, and writeTimeout to 60+ minutes, the uploads still time out when attempting to upload large files.
Multipart Configuration Ignored: The multipart upload configuration is not being respected by S3AsyncClient.
Threshold and part size settings do not seem to trigger splitting large files into multiple parts.
The difference between the request time and the current time is too large.
ERROR - Failed to upload directory [file] to bucket [bucket]. Error message: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
2. Reproduction Steps : Try to upload around 19GB file with this script
object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024
logger.info(s"Starting S3 upload from directory: $dirPath")
val overrideConfig = ClientOverrideConfiguration.builder()
.retryStrategy(RetryMode.STANDARD) // Uses AWS SDK standard retry strategy
.apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
.apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
.build()
val s3AsyncClient = S3AsyncClient.builder()
.region(Region.US_EAST_1) // Set your AWS region
.credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
.overrideConfiguration(overrideConfig)
.httpClientBuilder(httpClient)
.multipartEnabled(true)
.multipartConfiguration(MultipartConfiguration.builder()
.thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
.build()
val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
logger.error(s"Directory does not exist or is not a directory: $dirPath")
return
}
val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()
try {
val filesToUpload: List[Path] = if (includeSubDir) {
Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile()).toList
} else {
Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile()).toList
}
if (filesToUpload.isEmpty) {
logger.warn("No files found to upload.")
return
}
filesToUpload.foreach { filePath =>
val relativePath = dir.toPath.relativize(filePath).toString.replace("\", "/")
val s3Key = keyPrefix + relativePath
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.
1. Observed Errors:
2. Reproduction Steps : Try to upload around 19GB file with this script
3. Script :
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.{RetryMode, RetryPolicy}
import software.amazon.awssdk.core.retry.backoff.FullJitterBackoffStrategy
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.{S3AsyncClient, S3Client}
import software.amazon.awssdk.services.s3.crt.{S3CrtHttpConfiguration, S3CrtRetryConfiguration}
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener
import scala.jdk.CollectionConverters.*
import java.io.File
import java.nio.file.{Files, Path}
import java.time.Duration
object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024
logger.info(s"Starting S3 upload from directory: $dirPath")
val httpClient = NettyNioAsyncHttpClient.builder()
.maxConcurrency(20) // Increase max connections
.connectionAcquisitionTimeout(Duration.ofMinutes(60))
.readTimeout(Duration.ofMinutes(60)) // Increase read timeout
.writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
.tcpKeepAlive(true)
.connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout
val overrideConfig = ClientOverrideConfiguration.builder()
.retryStrategy(RetryMode.STANDARD) // Uses AWS SDK standard retry strategy
.apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
.apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
.build()
val s3AsyncClient = S3AsyncClient.builder()
.region(Region.US_EAST_1) // Set your AWS region
.credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
.overrideConfiguration(overrideConfig)
.httpClientBuilder(httpClient)
.multipartEnabled(true)
.multipartConfiguration(MultipartConfiguration.builder()
.thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
.build()
val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
logger.error(s"Directory does not exist or is not a directory: $dirPath")
return
}
val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()
try {
val filesToUpload: List[Path] = if (includeSubDir) {
Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile()).toList
} else {
Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile()).toList
}
if (filesToUpload.isEmpty) {
logger.warn("No files found to upload.")
return
}
filesToUpload.foreach { filePath =>
val relativePath = dir.toPath.relativize(filePath).toString.replace("\", "/")
val s3Key = keyPrefix + relativePath
}
} catch {
case e: Exception =>
logger.error("Error during upload", e)
} finally {
transferManager.close()
}
}
}
4. Platform Details:
Beta Was this translation helpful? Give feedback.
All reactions