The Jenkins Metrics plugin from the open source community defines an API for integrating the Dropwizard Metrics API, defines a number of standard metrics, and provides some basic health checks.
Authentication
To access the Metrics API you need an API authentication token. See API authentication for more information.
Authorization
Authenticated users can access the Dropwizard Metrics Servlet at $JENKINS_URL/metrics/currentUser/
if they have the Metrics/View permission.
Access to the thread dumps and health checks are given a finer-grained control using the Metrics/ThreadDump and Metrics/HealthCheck permissions respectively.
The finer-grained control is because thread dumps and health checks may expose information about job names or build nodes which may be against organization security policies.
Machine access
Access to the Metrics Servlet can also be provided by issuing API keys. API keys can be configured from the CloudBees CI global configuration page (Manage Jenkins > Configure System) under the Metrics section. Multiple access keys can be generated and permissions associated with those keys can also be restricted at this level.
Alternatively, Metrics can be restricted to users with a specified access key to the metrics. This configuration can be set up using the CloudBees CI global configuration page (Manage Jenkins > Configure System) under the Metrics section. Multiple access keys can be generated, and access keys’ permissions can be restricted at this level.
Each API key can be configured with unique permissions for the four servlets that the Dropwizard Metrics Servlet offers and each API key can be configured with its own CORS supported origins.
-
HTTP GET
requests use the base URL$JENKINS_URL/metrics/$KEY/
-
HTTP POST
requests use the base URL$JENKINS_URL/metrics/
with the key provided by a standard URL encoded form parameter called key
Endpoints and methods
Endpoint | Required permissions | Description | Parameters |
---|---|---|---|
GET |
Metrics.VIEW |
Returns metrics in JSON format. |
|
GET |
Metrics.HEALTHCHECK |
Returns health check details in JSON format. |
|
GET |
Metrics.THREADDUMP |
Returns thread dump details. |
none |
GET |
none |
Returns "pong". |
none |
Request examples
Endpoint | Request example |
---|---|
GET |
|
GET |
|
GET |
|
GET |
|
In the request examples "admin" is the name of the user attempting to send the request and "https://my-operations-center.com/cjoc/" is the url for the operations center. |
Response examples
The sample responses for GET /metrics/currentUser/metrics
and GET /metrics/currentUser/threads
are lengthy and therefore not fully included in this documentation.
Sample GET /metrics/currentUser/metrics
and GET /metrics/currentUser/threads
responses are located in the CloudBees documentation samples repo on GitHub.
The metrics response is comprised of the version as well as five objects, representing the different metrics types described in the Response model.
GET /metrics/currentUser/metrics
sample response:
{ "version": "4.0.0", "gauges": {...}, "counters": {...}, "histograms": {...}, "meters": {...}, "timers": {...} }
GET /metrics/currentUser/healthcheck
sample response:
{ "disk-space": { "healthy": true }, "plugins": { "healthy": false, "message": "There are 11 failed plugins: operations-center-server; operations-center-updatecenter; operations-center-monitoring; operations-center-clusterops; master-provisioning-core; operations-center-jnlp-controller; operations-center-sso; operations-center-rbac; master-provisioning-kubernetes; bluesteel-cjoc; operations-center-kubernetes-cloud" }, "temporary-space": { "healthy": true }, "thread-deadlock": { "healthy": true } }
Response model
The following is a very basic set of standard servlets provided by Dropwizard:
-
Metrics: The Metrics servlet returns the metrics in JSON format. There is also support for JSONP format by including
Content-Type: text/javascript
in the request. -
Ping: The Ping servlet returns the text "pong" and a
HTTP/200
status code. -
Threads: The Thread servlet returns a thread dump from the master only.
-
Healthcheck: The Healthcheck servlet runs the healthchecks defined against the Metrics API and returns a detailed status in JSON (or JSONP) format, while the high level status is reported by the HTTP status code.
Metrics
The metrics JSON is organized into following five metrics types:
-
gauges: a gauge is an instantaneous measurement of a value.
-
counters: a counter is a gauge that tracks the count of something.
-
histograms: a histogram measures the statistical distribution of values in a stream of data. Histograms also maintain a reservoir sample of the stream data. In the Jenkins Metrics plugin the standard metric histograms use exponentially decaying reservoirs based on a forward-decaying priority reservoir with an exponential weighting towards newer data. Unlike some other exponentially decaying reservoirs, this strategy has the advantage of maintaining a statistically representative sampling reservoir. Histograms provide the following metrics:
-
the number of observed values
-
the average of all observed values
-
the standard deviation of observed values
-
the minimum observed value
-
the maximum observed value
-
the 50th percentile observed value
-
the 75th percentile observed value
-
the 95th percentile observed value
-
the 98th percentile observed value
-
the 99th percentile observed value
-
the 99.9th percentile observed value
-
-
meters: a meter measures the rate of events over time. Meters provide the following metrics:
-
the number of observed events
-
the average rate of all observed events
-
the average rate of observed events in the past minute
-
the average rate of observed events in the past five minutes
-
the average rate of observed events in the past fifteen minutes
-
-
timers: a timer is basically a histogram of the duration of events coupled with a meter of the rate of the event occurrence. Timers also maintain a exponentially decaying reservoir sample of the event duration data. These exponentially decaying reservoirs use a forward-decaying priority reservoir with an exponential weighting towards newer data. Unlike some other exponentially decaying reservoirs this strategy has the advantage of maintaining a statistically representative sampling reservoir. Timers provide the following metrics:
-
the number of observed events
-
the average rate of all observed observed
-
the average rate of observed events in the past minute
-
the average rate of observed events in the past five minutes
-
the average rate of observed events in the past fifteen minutes
-
the average duration of all observed events
-
the standard deviation of observed event durations
-
the minimum observed event duration
-
the maximum observed event duration
-
the 50th percentile observed event duration
-
the 75th percentile observed event duration
-
the 95th percentile observed event duration
-
the 98th percentile observed event duration
-
the 99th percentile observed event duration
-
the 99.9th percentile observed event duration
-
gauges
Element | Type | Category | Description |
---|---|---|---|
jenkins.executor.count.value |
object |
Jenkins specific metrics |
The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the on-line nodes. |
jenkins.executor.free.value |
object |
Jenkins specific metrics |
The number of executors available to Jenkins that are not currently in use. |
jenkins.executor.in-use.value |
object |
Jenkins specific metrics |
The number of executors available to Jenkins that are currently in use. |
jenkins.health-check.count |
object |
Jenkins specific metrics |
The number of health checks associated with the HealthCheckRegistry defined within the Jenkins Metrics plugin. |
jenkins.health-check.inverse-score |
object |
Jenkins specific metrics |
The ratio of health checks reporting failure to the total number of health checks. Larger values indicate decreasing health as measured by the health checks. (This is a value between 0 and 1 inclusive). |
jenkins.health-check.score |
object |
Jenkins specific metrics |
The ratio of health checks reporting success to the total number of health checks. Larger values indicate increasing health as measured by the health checks. (This is a value between 0 and 1 inclusive). |
jenkins.job.count.value |
object |
Jenkins specific metrics |
The number of jobs in Jenkins. |
jenkins.node.count.value |
object |
Jenkins specific metrics |
The number of build nodes available to Jenkins, both on-line and off-line. |
jenkins.node.offline.value |
object |
Jenkins specific metrics |
The number of build nodes available to Jenkins but currently off-line. |
jenkins.node.online.value |
object |
Jenkins specific metrics |
The number of build nodes available to Jenkins and currently on-line. |
jenkins.plugins.active |
object |
Jenkins specific metrics |
The number of plugins in the Jenkins instance that started successfully. |
jenkins.plugins.failed |
object |
Jenkins specific metrics |
The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues. |
jenkins.plugins.inactive |
object |
Jenkins specific metrics |
The number of plugins in the Jenkins instance that are not currently enabled. |
jenkins.plugins.withUpdate |
object |
Jenkins specific metrics |
The number of plugins in the Jenkins instance that have an newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance. |
jenkins.queue.blocked.value |
object |
Jenkins specific metrics |
The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
jenkins.queue.buildable.value |
object |
Jenkins specific metrics |
The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
jenkins.queue.pending.value |
object |
Jenkins specific metrics |
The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
jenkins.queue.size.value |
object |
Jenkins specific metrics |
The number of jobs that are in the Jenkins build queue. |
jenkins.queue.stuck.value |
object |
Jenkins specific metrics |
The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
system.cpu.load |
object |
System and Java Virtual Machine metrics |
The system load on the Jenkins master as reported by the JVM’s Operating System JMX MBean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores. |
vm.blocked.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that are currently blocked waiting for a monitor lock. |
vm.count |
object |
System and Java Virtual Machine metrics |
The total number of threads in the Jenkins master JVM. This is the sum of: |
vm.cpu.load |
object |
System and Java Virtual Machine metrics |
The rate of CPU time usage by the JVM per unit time on the Jenkins master. This is equivalent to the number of CPU cores being used by the Jenkins master JVM. |
vm.daemon.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that are marked as Daemon threads. |
vm.deadlocks |
object |
System and Java Virtual Machine metrics |
The number of threads that have a currently detected deadlock with at least one other thread. |
vm.file.descriptor.ratio |
object |
System and Java Virtual Machine metrics |
The ratio of used to total file descriptors. (This is a value between 0 and 1 inclusive). |
vm.memory.heap.committed |
object |
System and Java Virtual Machine metrics |
The amount of memory, in the heap that is used for object allocation, that is guaranteed by the operating system as available for use by the Jenkins master JVM. (Units of measurement: bytes). |
vm.memory.heap.init |
object |
System and Java Virtual Machine metrics |
The amount of memory, in the heap that is used for object allocation, that the Jenkins master JVM initially requested from the operating system. (Units of measurement: bytes). |
vm.memory.heap.max |
object |
System and Java Virtual Machine metrics |
The maximum amount of memory, in the heap that is used for object allocation, that the Jenkins master JVM is allowed to request from the operating system. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The JVM may fail to allocate memory even if the amount of used memory does not exceed this maximum size. (Units of measurement: bytes). |
vm.memory.heap.usage |
object |
System and Java Virtual Machine metrics |
The ratio of |
vm.memory.heap.used |
object |
System and Java Virtual Machine metrics |
The amount of memory, in the heap that is used for object allocation, that the Jenkins master JVM is currently using.(Units of measurement: bytes). |
vm.memory.non-heap.committed |
object |
System and Java Virtual Machine metrics |
The amount of memory, outside the heap that is used for object allocation, that is guaranteed by the operating system as available for use by the Jenkins master JVM. (Units of measurement: bytes). |
vm.memory.non-heap.init |
object |
System and Java Virtual Machine metrics |
The amount of memory, outside the heap that is used for object allocation, that the Jenkins master JVM initially requested from the operating system. (Units of measurement: bytes). |
vm.memory.non-heap.max |
object |
System and Java Virtual Machine metrics |
The maximum amount of memory, outside the heap that is used for object allocation, that the Jenkins master JVM is allowed to request from the operating system. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The JVM may fail to allocate memory even if the amount of used memory does not exceed this maximum size. (Units of measurement: bytes). |
vm.memory.non-heap.usage |
object |
System and Java Virtual Machine metrics |
The ratio of |
vm.memory.non-heap.used |
object |
System and Java Virtual Machine metrics |
The amount of memory, outside the heap that is used for object allocation, that the Jenkins master JVM is currently using. (Units of measurement: bytes). |
vm.memory.pools..usage |
objects |
System and Java Virtual Machine metrics |
The usage level of the memory pool, where a value of 0 represents an unused pool while a value of 1 represents a pool that is at capacity. The names are supplied by and dependent on the JVM. There will be one metric for each of the memory pools reported by the JVM. |
vm.memory.total.committed |
object |
System and Java Virtual Machine metrics |
The total amount of memory that is guaranteed by the operating system as available for use by the Jenkins master JVM. (Units of measurement: bytes). |
vm.memory.total.init |
object |
System and Java Virtual Machine metrics |
The total amount of memory that the Jenkins master JVM initially requested from the operating system. (Units of measurement: bytes). |
vm.memory.total.max |
object |
System and Java Virtual Machine metrics |
The maximum amount of memory that the Jenkins master JVM is allowed to request from the operating system. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The JVM may fail to allocate memory even if the amount of used memory does not exceed this maximum size. (Units of measurement: bytes). |
vm.memory.total.used |
object |
System and Java Virtual Machine metrics |
The total amount of memory that the Jenkins master JVM is currently using.(Units of measurement: bytes). |
vm.new.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that have not currently started execution. |
vm.runnable.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that are currently executing in the JVM. Some of these threads may be waiting for other resources from the operating system such as the processor. |
vm.terminated.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that have completed execution. |
vm.timed_waiting.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that have suspended execution for a defined period of time. |
vm.uptime.milliseconds |
object |
System and Java Virtual Machine metrics |
The number of milliseconds since the Jenkins master JVM started. |
vm.waiting.count |
object |
System and Java Virtual Machine metrics |
The number of threads in the Jenkins master JVM that are currently waiting on another thread to perform a particular action. |
counters
Element | Type | Category | Description |
---|---|---|---|
http.activeRequests |
object |
Web UI metrics |
The number of currently active requests against the Jenkins master Web UI. |
histograms
Element | Type | Category | Description |
---|---|---|---|
jenkins.executor.count.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.executor.free.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.executor.in-use.history |
object |
Jenkins specific metrics |
The number of executors available to Jenkins that are currently in use. |
jenkins.job.count.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.node.count.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.node.offline.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.node.online.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.queue.blocked.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.queue.buildable.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.queue.pending.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.queue.size.history |
object |
Jenkins specific metrics |
The historical statistics of |
jenkins.queue.stuck.history |
object |
Jenkins specific metrics |
The historical statistics of |
count |
number |
N/A |
The number of observed values. |
max |
number |
N/A |
The maximum observed value. |
mean |
number |
N/A |
The average of all observed values. |
min |
number |
N/A |
The minimum observed value. |
p50 |
number |
N/A |
The 50th percentile observed value. |
p75 |
number |
N/A |
The 75th percentile observed value. |
p95 |
number |
N/A |
The 95th percentile observed value. |
p98 |
number |
N/A |
The 98th percentile observed value. |
p99 |
number |
N/A |
The 99th percentile observed value. |
p999 |
number |
N/A |
The 99.9th percentile observed value. |
values |
array |
N/A |
A reservoir sample of stream data. |
stddev |
number |
N/A |
The standard deviation of observed values. |
meters
Element | Type | Category | Description |
---|---|---|---|
http.responseCodes.badRequest |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.created |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.forbidden |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.noContent |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.notFound |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.notModified |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.ok |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.other |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: |
http.responseCodes.serverError |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
http.responseCodes.serviceUnavailable |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is responding to requests with a |
jenkins.job.scheduled |
object |
Jenkins specific metrics |
The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. Multiplying this metric by A more accurate measure can be obtained from a job-by-job summation of the scheduling rate for that job and the average build duration of that job. The most accurate measure would require maintaining separate sums partitioned by the labels that each job can run against in order to determine the number of each type of executor required. Such calculations assume that: every build node is equivalent and/or the build times are comparable across all build nodes; and build times are unaffected by other jobs running in parallel on other executors on the same node. However in most cases even the basic result from multiplying |
count |
number |
N/A |
The number of observed events. |
m15_rate |
number |
N/A |
The average rate of observed events in the past fifteen minutes. |
m1_rate |
number |
N/A |
The average rate of observed events in the past minute. |
m5_rate |
number |
N/A |
The average rate of observed events in the past five minutes. |
mean_rate |
number |
N/A |
The average rate of all observed events. |
units |
string |
N/A |
The units used for calculations - "events/minute". |
timers
Element | Type | Category | Description |
---|---|---|---|
http.requests |
object |
Web UI metrics |
The rate at which the Jenkins master Web UI is receiving requests and the time spent generating the corresponding responses. |
jenkins.health-check.duration |
object |
Jenkins specific metrics |
The rate at which the health checks are being run and the duration of each health check run. The Jenkins Metrics plugin, by default, will run the health checks once per minute. The frequency can be controlled by the |
jenkins.job.blocked.duration |
object |
Jenkins specific metrics |
The rate at which jobs in the build queue enter the blocked state and the amount of time they spend in that state. |
jenkins.job.building.duration |
object |
Jenkins specific metrics |
The rate at which jobs are built and the time they spend building. |
jenkins.job.queuing.duration |
object |
Jenkins specific metrics |
The rate at which jobs are queued and the total time they spend in the build queue. |
jenkins.job.total.duration |
object |
Jenkins specific metrics |
The rate at which jobs are queued and the total time they spend from entering the build queue to completing building. |
jenkins.job.waiting.duration |
object |
Jenkins specific metrics |
The rate at which jobs enter the quiet period and the total amount of time that jobs spend in their quiet period. Jenkins allows configuring a quiet period for most job types. While in the quiet period multiple identical requests for building the job will be coalesced. Traditionally this was used with source control systems that do not provide an atomic commit facility - such as CVS - in order to ensure that all the files in a large commit were picked up as a single build. With more modern source control systems the quiet period can still be useful, for example to ensure that push notification of the came commit via redundant parallel notification paths get coalesced. |
count |
number |
N/A |
The number of observed events. |
max |
number |
N/A |
The maximum observed event duration. |
mean |
number |
N/A |
The average rate of all observed events. |
min |
number |
N/A |
The minimum observed event duration. |
p50 |
number |
N/A |
The 50th percentile observed event duration. |
p75 |
number |
N/A |
The 75th percentile observed event duration. |
p95 |
number |
N/A |
The 95th percentile observed event duration. |
p98 |
number |
N/A |
The 98th percentile observed event duration. |
p99 |
number |
N/A |
The 99th percentile observed event duration. |
p999 |
number |
N/A |
The 99.9th percentile observed event duration. |
values |
array |
N/A |
An exponentially decaying reservoir sample of the event duration data. |
stddev |
number |
N/A |
The standard deviation of observed event durations. |
m15_rate |
number |
N/A |
The average rate of observed events in the past fifteen minutes. |
m1_rate |
number |
N/A |
The average rate of observed events in the past minute. |
m5_rate |
number |
N/A |
The average rate of observed events in the past five minutes. |
mean_rate |
number |
N/A |
The average rate of all observed events. |
duration_units |
string |
N/A |
The units used for calculations - "seconds". |
rate_units |
string |
N/A |
The units used for calculations - "calls/minutes". |
Healthcheck
Element | Type | Category | Description |
---|---|---|---|
disk-space |
object |
Standard health checks |
Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
plugins |
object |
Standard health checks |
Returns FAIL if any of the Jenkins plugins failed to start. A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. |
temporary-space |
object |
Standard health checks |
Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
thread-deadlock |
object |
Standard health checks |
Returns FAIL if there are any deadlocked threads in the Jenkins master JVM. |