CloudBees Core automatically monitors its infrastructure elements and all of the Jenkins masters that it manages. These alerts can be helpful in troubleshooting.
The CloudBees Core infrastructure element monitoring includes Operations Center and Managed Masters. For the various infrastructure nodes, it monitors the following metrics.
Available disk space
CPU utilization for the most recent 5 minutes
RAM utilization for the most recent 5 minutes
If any of the data points for these metrics exceed 90% or more, a threshold which is currently immutable, CloudBees Core will emit an alert, for example:
Health checks failing: [worker-14: Disk util at 95%, worker-9: Worker down]
The following table show the possible error messages and corresponding descriptions.
Disk util at <number>%
Disk utilization reaches 90% or higher
RAM util at <number>%
RAM utilization reaches 90% or higher for five or more minutes
CPU util at <number>%
Total CPU utilization reaches 90% or higher for five or more minutes. The percent utilization is normalized to 100% across all CPU’s on the node.
Additional monitoring is available with the Elasticsearch Reporter plugin.