Skip to content

BQ connector - Unable to allocate buffer of size #26400

@archangel2018

Description

@archangel2018

Description

After upgrading trino from v471 to v475, when using the BigQuery connector with bigquery.arrow-serialization.enabled set to true (the default value), we are encountering an OutOfMemoryException with the message "Unable to allocate buffer of size". This happens specifically when the connector attempts to allocate memory for Arrow record batches.

The issue does not occur when we set bigquery.arrow-serialization.enabled to false in the connector configuration. This indicates that the problem is related to the Arrow serialization process.

The query is a simple SELECT statement on a large BigQuery table. The worker node has significant memory resources (768 GiB), and the query is the only one running, so a lack of overall system resources is unlikely to be the cause. The error logs point to a specific memory allocation failure within the Arrow buffer allocator, suggesting a potential memory leak or an incorrect memory management configuration within the connector's Arrow implementation.

Steps to Reproduce

  1. Configure the Trino BigQuery connector with the following settings:
    connector.name=bigquery
    bigquery.project-id=REDACTED
    bigquery.credentials-file=/etc/secrets/gcp/REDACTED.json
    bigquery.views-enabled=true
    bigquery.skip-view-materialization=true
    bigquery.metadata.parallelism=32
  2. Execute the following query:
    select * from "REDACTED"."REDACTED"."REDACTED" limit 5;
  3. Observe the OutOfMemoryException error in the worker logs.

Expected Behavior

The query should execute successfully, retrieving the data from BigQuery without memory allocation errors.

Actual Behavior

The query fails with an OutOfMemoryException and a "Memory was leaked by query" message. The log indicates that the Arrow buffer allocator is unable to reserve the required memory, even though the total allocation is well below the system limits.


Environment

Worker Config and JVM

coordinator=false
http-server.http.port=8080
query.max-memory=1000000GB
query.max-memory-per-node=650GB
memory.heap-headroom-per-node=50GB
discovery.uri=http://trinoetl-cluster-trino:8080
internal-communication.shared-secret=REDACTED
exchange.http-client.max-content-length=2047MB
http-server.process-forwarded=true
query.max-execution-time=1h
jmx.rmiregistry.port=9080
jmx.rmiserver.port=9081
shutdown.grace-period=120s
-server
-agentpath:/usr/lib/trino/bin/libjvmkill.so
-Xmx700G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
# Allow loading dynamic agent used by JOL
-XX:+EnableDynamicAgentLoading
-Djava.security.krb5.conf=/etc/secrets/krb5.conf/REDACTED.conf
--add-opens=java.base/java.nio=ALL-UNNAMED
-Dcom.sun.management.jmxremote.rmi.port=9081

Coordinator Config and JVM

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=1000000GB
query.max-memory-per-node=1GB
discovery.uri=http://localhost:8080
http-server.authentication.type=PASSWORD
internal-communication.shared-secret=REDACTED
exchange.http-client.max-content-length=2047MB
http-server.process-forwarded=true
query.max-execution-time=1h
jmx.rmiregistry.port=9080
jmx.rmiserver.port=9081
shutdown.grace-period=120s
-server
-agentpath:/usr/lib/trino/bin/libjvmkill.so
-Xmx26G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
# Allow loading dynamic agent used by JOL
-XX:+EnableDynamicAgentLoading
-Djava.security.krb5.conf=/etc/secrets/krb5.conf/REDACTED.conf
--add-opens=java.base/java.nio=ALL-UNNAMED
-Dcom.sun.management.jmxremote.rmi.port=9081

System Information

  • Trino Version: v475
  • Java Version: Corretto-23.0.2.7.1 (build 23.0.2+7-FR)
  • Platform: amd64
  • Worker AWS VM Type: r6a.24xlarge (96 cores, 768 GiB)
  • Hosted Platform: EKS
  • System Load: The query was executed when no other query was running or scheduled.

Error Log

2025-08-13T08:47:53.891Z    DEBUG   SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 io.trino.plugin.bigquery.BigQueryStorageArrowPageSource Starting to read from projects/GCPPROJECT/locations/us/sessions/REDACTED/streams/REDACTED
2025-08-13T08:47:53.891Z    DEBUG   SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 io.trino.plugin.bigquery.ReadRowsHelper Reading rows from projects/GCPPROJECT/locations/us/sessions/GCPPROJECT/streams/REDACTED offset 0
2025-08-13T08:47:54.248Z    DEBUG   bigquery-bq_ibp_aws-1   io.trino.plugin.bigquery.ReadRowsHelper ReadRowsResponse from BigQuery: stats {
  progress {
    at_response_end: 0.008838748559355736
  }
}
arrow_record_batch {
  serialized_record_batch: "REDACTED"
}
row_count: 4096
arrow_schema {
  serialized_schema: "REDACTED"
}
2025-08-13T08:47:54.249Z    DEBUG   SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 io.trino.plugin.bigquery.BigQueryStorageArrowPageSource Read 53324 bytes (total 53324) from projects/GCPPROJECT/locations/us/sessions/REDACTED/streams/REDACTED
2025-08-13T08:47:54.250Z    WARN    SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 io.trino.plugin.bigquery.BigQueryArrowBufferAllocator   Failed to allocate 262144 bytes for allocator 'projects/GCPPROJECT/locations/us/sessions/GCPPROJECT/streams/REDACTED' due to FAILED_LOCAL
2025-08-13T08:47:54.250Z    WARN    SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 io.trino.plugin.bigquery.BigQueryArrowBufferAllocator   Allocation failure details: Allocation outcome details:
allocator[projects/REDACTED/locations/us/sessions/GCPPROJECT/streams/REDACTED] reservation: 0 limit: 557753 used: 361472 requestedSize: 262144 allocatedSize: 0 localAllocationStatus: fail
2025-08-13T08:47:54.250Z    ERROR   SplitRunner-20250813_084752_00003_t2t8i.1.0.0-1-364 org.apache.arrow.memory.BaseAllocator   Memory was leaked by query. Memory leaked: (361472)
Allocator(projects/REDACTED/locations/us/sessions/GCPPROJECT/streams/REDACTED) 0/361472/361472/557753 (res/actual/peak/limit)
2025-08-13T08:47:54.250Z    DEBUG   task-notification-1 io.trino.execution.TaskStateMachine Task 20250813_084752_00003_t2t8i.1.0.0 is FAILING
2025-08-13T08:47:54.250Z    DEBUG   SplitRunner-67  io.trino.execution.executor.dedicated.SplitProcessor    Unable to allocate buffer of size 262144 (rounded from 138955) due to memory limit. Current allocation: 361472
org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 262144 (rounded from 138955) due to memory limit. Current allocation: 361472
    at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:330)
    at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:298)
    at org.apache.arrow.compression.ZstdCompressionCodec.doDecompress(ZstdCompressionCodec.java:62)
    at org.apache.arrow.vector.compression.AbstractCompressionCodec.decompress(AbstractCompressionCodec.java:78)
    at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:128)
    at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:88)
    at io.trino.plugin.bigquery.BigQueryArrowToPageConverter.convert(BigQueryArrowToPageConverter.java:105)
    at io.trino.plugin.bigquery.BigQueryStorageArrowPageSource.getNextSourcePage(BigQueryStorageArrowPageSource.java:126)
    at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:269)
    at io.trino.operator.Driver.processInternal(Driver.java:403)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
    at io.trino.operator.Driver.tryWithLock(Driver.java:709)
    at io.trino.operator.Driver.process(Driver.java:298)
    at io.trino.operator.Driver.processForDuration(Driver.java:269)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
    at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
    at io.trino.$gen.Trino_475____20250813_083524_2.run(Unknown Source)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
    at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:177)
    at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:164)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1575)
Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (361472)
Allocator(projects/GCPPROJECT/locations/us/sessions/GCPPROJECT/streams/REDACTED) 0/361472/361472/557753 (res/actual/peak/limit)
        at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:504)
        at io.trino.plugin.bigquery.BigQueryStorageArrowPageSource.getNextSourcePage(BigQueryStorageArrowPageSource.java:124)
        ... 20 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions