Jenkins stops processing builds in the build queue after an error appears in the logs

Issue

Jenkins jobs will sit in the build queue, and not get started, even when there are build agents available for the chosen 'label', and you see stack traces in the logs similar to the one shown below:

SEVERE  hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXXX failed

How do we know what is causing this queue freeze?

Environment

Resolution

The Queue.MaintainTask is a periodic task which is run in the instance, and it is responsible for maintenance operations such as adding elements to the queue, assigning elements in the queue to nodes or executors, etc. If, for some reason, this task fails, this causes the queue to become unresponsive and the builds eventually stop being run as they stay stuck in the queue.

To determine what is the cause for this problem, we need to pay special attention to the full stack trace of the error, which shows up in the logs.

You will be able to see some potential causes below, the intent of the list below is to allow you to understand the pattern that you might follow to determine the root cause of the failure, thus helping you recover the instance as fast as possible. Whenever possible, we will also add some Workaround or Solution details.

A CloudBees Nodes Plus plugin

SEVERE  hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXX failed
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from XXXX/XXX:XXX
                at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
                at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
                at hudson.remoting.Channel.call(Channel.java:957)
                at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1059)
                at hudson.Launcher$ProcStarter.start(Launcher.java:455)
                at com.cloudbees.jenkins.plugins.nodesplus.CustomNodeProbeBuildFilterProperty.getProbeResult(CustomNodeProbeBuildFilterProperty.java:180)

In the stacktrace we can clearly see that there is a correlation between the nodesplus.CustomNodeProbeBuildFilterProperty.getProbeResult method and the task failure.

A.1 Solution/Workaround

Check your nodes looking for any custom node probe and disable it first as an initial remediation step.

Verify that you are using cloudbees-nodes-plus 1.18 or higher, as this version included some extra verification that prevented faulty custom probes from causing a queue lock.

If after upgrading the plugin the issue persists, disable any custom node probes and open a support ticket.

B OpenText Application Automation Tools plugin

 SEVERE  hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXX failed
java.lang.NoClassDefFoundError: Could not initialize class org.apache.logging.log4j.core.impl.Log4jLogEvent
    at org.apache.logging.log4j.core.impl.DefaultLogEventFactory.createEvent(DefaultLogEventFactory.java:54)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:401)
    at org.apache.logging.log4j.core.config.DefaultReliabilityStrategy.log(DefaultReliabilityStrategy.java:49)
    at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
    at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2116)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2100)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1994)
    at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1966)
    at org.apache.logging.log4j.spi.AbstractLogger.error(AbstractLogger.java:739)
    at com.microfocus.application.automation.tools.octane.events.WorkflowListenerOctaneImpl.onNewHead(WorkflowListenerOctaneImpl.java:79)

In this case, the exception thrown is different, but the effect is the same, the periodic task starts failing.

B.1 Solution/Workaround

For this plugin, the recommended workaround is to upgrade the plugin at least up to version 5.6.2, as previous versions of the plugin would also be impacted by JENKINS-6070.

C Build Blocker plugin

SEVERE  hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXX failed
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*.XXX.*XXX
^
    at java.util.regex.Pattern.error(Pattern.java:1957)
    at java.util.regex.Pattern.sequence(Pattern.java:2125)
    at java.util.regex.Pattern.expr(Pattern.java:1998)
    at java.util.regex.Pattern.compile(Pattern.java:1698)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)
    at java.util.regex.Pattern.matches(Pattern.java:1133)
    at java.lang.String.matches(String.java:2121)
    at hudson.plugins.buildblocker.BlockingJobsMonitor.checkForPlannedBuilds(BlockingJobsMonitor.java:162)
    at hudson.plugins.buildblocker.BlockingJobsMonitor.checkForQueueEntries(BlockingJobsMonitor.java:86)
    at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:151)

Again, the stack trace will allow us to determine the source of the problem affecting the queue. hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher is the important stack frame which helps us find the plugin.

C.1 Solution/Workaround

The remediation step if you find yourself impacted by this issue is either to review that you are using a correct regular expression in the configuration page for the job showing in the thread referenced. Alternatively, you can try replacing this plugin with Lockable Resources plugin.

D Pipeline Graph Analysis plugin

SEVERE    hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXX failed
java.lang.IndexOutOfBoundsException: Index: 0
    at java.util.Collections$EmptyList.get(Collections.java:4454)
    at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
    at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
    at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)

D.1 Solution/Workaround

The solution for this error is to upgrade workflow-api plugin to version 2.35 or higher. This will ensure that you have the fix to prevent the edge case that triggered this issue.

E Block Queued Job plugin

SEVERE  hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@XXX failed
java.lang.NullPointerException
    at org.jenkinsci.plugins.blockqueuedjob.condition.JobResultBlockQueueCondition.isBlocked(JobResultBlockQueueCondition.java:70)
    at org.jenkinsci.plugins.blockqueuedjob.BlockItemQueueTaskDispatcher.canRun(BlockItemQueueTaskDispatcher.java:35)
    at hudson.model.Queue.getCauseOfBlockageForItem(Queue.java:1197)

E.1 Solution/Workaround

The solution for this error is to disable the plugin. At the time of writing, the last release of the plugin was in 2016. It does not have too many installations, so if you can disable it, that would be the most direct way to get the problem solved.

F Kubernetes plugin

The queue is blocked, and no builds are being processed. Shortly after, the instance goes down. After capturing a thread dump for the instance, you get a stack trace similar to the one shown below:

	java.lang.Object.wait(Native Method)
	hudson.remoting.Request.call(Request.java:177)
	hudson.remoting.Channel.call(Channel.java:954)
	org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave._terminate(KubernetesSlave.java:236)
	hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:67)
	hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:59)
	hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:43)
	hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:843)
	hudson.model.Queue._withLock(Queue.java:1380)

The queue is locked due to the KubernetesSlave._terminate() call.

F.1 Solution/Workaround

This is a known issue that was reported in JENKINS-54988 and it is due to a problem with the Kubernetes plugin.

The fix for this issue was released as Kubernetes Plugin 1.21.1

G High Availability and Horizontal Scalability

2025-06-10 12:35:00.034+0000 [id=51]	SEVERE	hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@4118451d failed
com.hazelcast.core.HazelcastInstanceNotActiveException: HazelcastInstance[[IP]:5701] is not active!
	at com.hazelcast.spi.impl.proxyservice.impl.ProxyRegistry.getService(ProxyRegistry.java:91)
	...
	at com.hazelcast.instance.impl.HazelcastInstanceImpl.getDistributedObject(HazelcastInstanceImpl.java:364)
	at com.hazelcast.instance.impl.HazelcastInstanceImpl.getMap(HazelcastInstanceImpl.java:183)
	at com.hazelcast.instance.impl.HazelcastInstanceProxy.getMap(HazelcastInstanceProxy.java:96)
	at com.cloudbees.jenkins.plugins.replication.builds.Adoption.owners(Adoption.java:97)
	at com.cloudbees.jenkins.plugins.replication.builds.Adoption$HAListener.allowLoad(Adoption.java:261)
	at hudson.model.RunMap.allowLoad(RunMap.java:258)
	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:578)
	at hudson.model.RunMap.getById(RunMap.java:237)
	at hudson.model.RunMap.getById(RunMap.java:65)
	at jenkins.model.lazy.BuildReferenceMapAdapter.unwrap(BuildReferenceMapAdapter.java:40)
	at jenkins.model.lazy.BuildReferenceMapAdapter$CollectionAdapter$1.adapt(BuildReferenceMapAdapter.java:187)
	at jenkins.model.lazy.BuildReferenceMapAdapter$CollectionAdapter$1.adapt(BuildReferenceMapAdapter.java:184)
	...
	at jenkins.model.lazy.LazyBuildMixIn.getEstimatedDurationCandidates(LazyBuildMixIn.java:279)
	at hudson.model.AbstractProject.getEstimatedDurationCandidates(AbstractProject.java:961)
	at hudson.model.Job.getEstimatedDuration(Job.java:1064)
	...
	at hudson.model.Queue.maintain(Queue.java:1665)
	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2919)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
	...

Hazelcast is the technology used by the High Availability and Horizontal Scalability. If it shows up in the Queue$MaintainTask stack trace, it means the root cause has something to do with CloudBees HA.

G.1 Solution/Workaround

Restarting the affected replica should reset the Hazelcast state. Ensure you are running the latest release to benefit from the latest reliability improvements.

On modern platforms, ensure that HA managed controllers have "Use new health check" enabled. The flag was introduced in release 2.504.1.6 and includes Hazelcast state. Hence, if replica has an issue with Hazelcast it is restarted by Kubernetes liveness probe.

This article is part of our Knowledge Base and is provided for guidance-based purposes only. The solutions or workarounds described here are not officially supported by CloudBees and may not be applicable in all environments. Use at your own discretion, and test changes in a safe environment before applying them to production systems.