Issue
When a large number of CloudBees CD/RO pipelines are running, occasionally the pipelines may take a long time to progress to the next stage or they may appear to be stuck.
In this case, you will see that the previous stage or task has completed but the next stage or task does not immediately move to running state.
You can identify this scenario by reviewing the CloudBees CD/RO server logs, commander.log
file, and looking for entries with the pattern runFlow.*_stageflow.*Retryable OptimisticLockException
similar to:
2024-04-15T10:46:23.874 | cdro-server1 | INFO | pool-005-302 | 1833987 | | ... runFlow#..._stageflow | TransactionRetryAspectImpl | Retryable OptimisticLockException: 'Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1'
A few such messages are fine. There is no functional impact of this as the system is designed to retry and recover in such cases. But in case of a heavy load in the system where there are a large number of running pipelines trying to start a new stage or task, occasionally this appears as slow or stuck pipelines.
Environment
-
CloudBees CD/RO 2024.06 and older versions
Resolution
To solve this issue, change the server setting for Flow name template from flow_$$[/timestamp yyyyMMddHHmmss]
to flow
$[/myFlowRuntime/id]_$[/timestamp yyyyMMddHHmmss]
.
-
Before updating the server setting, run the following command to verify that the current value of the server setting is
flow_$[/increment /server/ec_counters/flowCounter]_$[/timestamp yyyyMMddHHmmss]
.ectool getProperty /server/settings/flowNameTemplate
-
Run the following command to change the server setting to
flow_$[/myFlowRuntime/id]_$[/timestamp yyyyMMddHHmmss]
.ectool setProperty /server/settings/flowNameTemplate flow_$[/myFlowRuntime/id]_$[/timestamp yyyyMMddHHmmss]
-
Finally, run the following command to cofirm that the server setting is updated to
flow_$[/myFlowRuntime/id]_$[/timestamp yyyyMMddHHmmss]
.ectool getProperty /server/settings/flowNameTemplate