Skip to content

[JENKINS-75827] Deadlock on KubernetesProvisioningLimits during initialization #2755

@jenkins-infra-bot

Description

@jenkins-infra-bot

There is a potential deadlock around the KubernetesProvisioningLimits functionality on initialization:

==============
Deadlock Found
==============
"jenkins.util.Timer [#3]" id=44 (0x2c) state=WAITING cpu=81%
    - waiting on <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - locked <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      owned by "Computer.threadPoolForRemoting [#14]" id=257 (0x101)
    at java.base@​21.0.7/jdk.internal.misc.Unsafe.park(Native Method)
    at java.base@​21.0.7/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
    at java.base@​21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
    at java.base@​21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
    at java.base@​21.0.7/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
    at java.base@​21.0.7/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
    at hudson.model.Queue._withLock(Queue.java:1408)
    at hudson.model.Queue.withLock(Queue.java:1284)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.initInstance(KubernetesProvisioningLimits.java:46)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.register(KubernetesProvisioningLimits.java:78)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.LimitRegistrationResults.register(LimitRegistrationResults.java:29)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.provision(KubernetesCloud.java:698)
    at hudson.slaves.Cloud.lambda$provision$0(Cloud.java:192)
    at hudson.slaves.Cloud$$Lambda/0x00007a35ad7b7c18.get(Unknown Source)
    at hudson.Util.ifOverridden(Util.java:1553)
    at hudson.slaves.Cloud.provision(Cloud.java:192)
    at PluginClassLoader for kube-agent-management//com.cloudbees.jenkins.plugins.kube.KubernetesNodeProvisionerStrategy.apply(KubernetesNodeProvisionerStrategy.java:128)
    at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:325)
    at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:823)
    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
    at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
    at java.base@​21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base@​21.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
    at java.base@​21.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base@​21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base@​21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base@​21.0.7/java.lang.Thread.runWith(Thread.java:1596)
    at java.base@​21.0.7/java.lang.Thread.run(Thread.java:1583)

"Computer.threadPoolForRemoting [#14]" id=257 (0x101) state=BLOCKED cpu=76%
    - waiting to lock <0x07bae317> (a org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits)
      owned by "jenkins.util.Timer [#3]" id=44 (0x2c)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.unregister(KubernetesProvisioningLimits.java:120)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits$NodeListenerImpl.onDeleted(KubernetesProvisioningLimits.java:169)
    at jenkins.model.NodeListener.lambda$fireOnDeleted$2(NodeListener.java:97)
    at jenkins.model.NodeListener$$Lambda/0x00007a35ad351140.accept(Unknown Source)
    at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
    at jenkins.util.Listeners$$Lambda/0x00007a35acb37708.run(Unknown Source)
    at jenkins.util.Listeners.notify(Listeners.java:70)
    at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:97)
    at jenkins.model.Nodes.removeNode(Nodes.java:307)
    at jenkins.model.Jenkins.removeNode(Jenkins.java:2197)
    at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1ba0.run(Unknown Source)
    at hudson.model.Queue._withLock(Queue.java:1410)
    at hudson.model.Queue.withLock(Queue.java:1284)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1988.run(Unknown Source)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
    at jenkins.util.ErrorLoggingExecutorService$$Lambda/0x00007a35ad2c67f0.run(Unknown Source)
    at java.base@​21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base@​21.0.7/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base@​21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base@​21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base@​21.0.7/java.lang.Thread.runWith(Thread.java:1596)
    at java.base@​21.0.7/java.lang.Thread.run(Thread.java:1583)

This blocks the queue lock completely.

The initialization of the KubernetesProvisioningLimits synchronizes the object and then requires a queue lock.

On node terminations however (that can happen on startup as soon as RetentionStrategy kick off) take a Queue lock first and then need a KubernetesProvisioningLimits to unregister the node:


Originally reported by allan_burdajewicz, imported from: Deadlock on KubernetesProvisioningLimits during initialization
  • assignee: jgarciacloudbees
  • status: In Review
  • priority: Critical
  • component(s): kubernetes-plugin
  • resolution: Unresolved
  • votes: 0
  • watchers: 3
  • imported: 2025-12-02
Raw content of original issue

There is a potential deadlock around the KubernetesProvisioningLimits functionality on initialization:

==============
Deadlock Found
==============
"jenkins.util.Timer [#3]" id=44 (0x2c) state=WAITING cpu=81%
    - waiting on <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - locked <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      owned by "Computer.threadPoolForRemoting [#14]" id=257 (0x101)
    at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
    at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
    at [email protected]/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
    at [email protected]/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
    at hudson.model.Queue._withLock(Queue.java:1408)
    at hudson.model.Queue.withLock(Queue.java:1284)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.initInstance(KubernetesProvisioningLimits.java:46)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.register(KubernetesProvisioningLimits.java:78)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.LimitRegistrationResults.register(LimitRegistrationResults.java:29)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.provision(KubernetesCloud.java:698)
    at hudson.slaves.Cloud.lambda$provision$0(Cloud.java:192)
    at hudson.slaves.Cloud$$Lambda/0x00007a35ad7b7c18.get(Unknown Source)
    at hudson.Util.ifOverridden(Util.java:1553)
    at hudson.slaves.Cloud.provision(Cloud.java:192)
    at PluginClassLoader for kube-agent-management//com.cloudbees.jenkins.plugins.kube.KubernetesNodeProvisionerStrategy.apply(KubernetesNodeProvisionerStrategy.java:128)
    at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:325)
    at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:823)
    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
    at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
    at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at [email protected]/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
    at [email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
    at [email protected]/java.lang.Thread.run(Thread.java:1583)

"Computer.threadPoolForRemoting [#14]" id=257 (0x101) state=BLOCKED cpu=76%
    - waiting to lock <0x07bae317> (a org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits)
      owned by "jenkins.util.Timer [#3]" id=44 (0x2c)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.unregister(KubernetesProvisioningLimits.java:120)
    at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits$NodeListenerImpl.onDeleted(KubernetesProvisioningLimits.java:169)
    at jenkins.model.NodeListener.lambda$fireOnDeleted$2(NodeListener.java:97)
    at jenkins.model.NodeListener$$Lambda/0x00007a35ad351140.accept(Unknown Source)
    at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
    at jenkins.util.Listeners$$Lambda/0x00007a35acb37708.run(Unknown Source)
    at jenkins.util.Listeners.notify(Listeners.java:70)
    at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:97)
    at jenkins.model.Nodes.removeNode(Nodes.java:307)
    at jenkins.model.Jenkins.removeNode(Jenkins.java:2197)
    at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1ba0.run(Unknown Source)
    at hudson.model.Queue._withLock(Queue.java:1410)
    at hudson.model.Queue.withLock(Queue.java:1284)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)
    at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1988.run(Unknown Source)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
    at jenkins.util.ErrorLoggingExecutorService$$Lambda/0x00007a35ad2c67f0.run(Unknown Source)
    at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
    at [email protected]/java.lang.Thread.run(Thread.java:1583)

This blocks the queue lock completely.

The initialization of the KubernetesProvisioningLimits synchronizes the object and then requires a queue lock.

On node terminations however (that can happen on startup as soon as RetentionStrategy kick off) take a Queue lock first and then need a KubernetesProvisioningLimits to unregister the node:

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions