Issue
-
In an HA configuration, you can see this stacktrace during the start process in the logs:
2016-08-24 11:06:12.763-0400 [id=53] SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Messaging.afterExtensionsAugmentedjava.lang.Error: java.lang.reflect.InvocationTargetException at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:110) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:176) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282) at jenkins.model.Jenkins$8.runTask(Jenkins.java:926) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:106) ... 8 more Caused by: java.io.IOError: java.io.EOFException at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:1011) at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:781) at org.mapdb.StoreDirect.get2(StoreDirect.java:469) at org.mapdb.StoreWAL.get2(StoreWAL.java:336) at org.mapdb.StoreWAL.get(StoreWAL.java:320) at org.mapdb.Caches$HashTable.get(Caches.java:246) at org.mapdb.EngineWrapper.get(EngineWrapper.java:58) at org.mapdb.BTreeMap.<init>(BTreeMap.java:541) at org.mapdb.DB.getTreeMap(DB.java:805) at com.cloudbees.opscenter.context.Messaging$Local.open(Messaging.java:611) at com.cloudbees.opscenter.context.Messaging$Local.access$400(Messaging.java:541) at com.cloudbees.opscenter.context.Messaging.open(Messaging.java:484) at com.cloudbees.opscenter.context.Messaging.afterExtensionsAugmented(Messaging.java:59) ... 13 more Caused by: java.io.EOFException at org.mapdb.Volume$FileChannelVol.readFully(Volume.java:947) at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:1008) ... 25 more
Resolution
The messaging
databases got corrupted, avoiding the cluster restart, and cannot be recreated automatically.
-
Stop all HA nodes.
-
Backup and remove the files
$JENKINS_HOME/messaging
,$JENKINS_HOME/messaging.p
and$JENKINS_HOME/messaging.t
. -
Start the cluster.
Resulting Issue that needs to be fixed
Because we are deleting this messaging database, the messaging from the Operations Center and the controller will be out of sync. We need to make sure to correct that issue.
-
Run this script on
Manage Jenkins> Script console
. This script will print out a list of connected controllers and their respective instance ID. -
Find the instance ID of the controller you just removed the database file for, and then look for that instance ID in the section beginning with `maxPulls: `
-
Record the number which comes up here: `- $INSTANCE_ID: $NUMBER `
-
Run this script on the
Manage Jenkins> Script console
of the controller:
import com.cloudbees.opscenter.context.Messaging; println Messaging.getInstance().local.outboxSequenceId Messaging.getInstance().local.outboxSequenceId.set($NUMBER+1); println Messaging.getInstance().local.outboxSequenceId
and set the $NUMBER value to the one found from the CJOC script above.
This will get your controller outboxSequenceId back to what it was before removing the database file.