KBEC-00500 - CD Performance impact via configuration choices

Problem

The CD product performs well at large scale, yet performance drags can be observed when certain limits are being reached. Typically there will be environment adjustments that can be made to help mitigate such issues, but changes involving hardware or sizing may take time to get approved or purchased. What are some configuration settings that can help with performance concerns until the environment changes can take effect?

Summary

Any user logging into CD can have an impact on the performance of the underlying system. Specific UI pages can put demand on the underlying DB. Such patterns won’t be mitigated by ditributing user requests over to other webservers, so the following suggestions are available to help provide some relief until other environment aspects can be changed:

System UI Adjustments

System: UI auto-refresh frequency (as of v10.1)

Increase default to 60s
This helps to limit the number of requests that are being made from all active user sessions
In particular, some users may land on a page and leave it. So reducing the rate of refresh avoids making the database do repetitive lookups

Security: idle login session timeout - 4320 (3 days)

Adjust the default login times from 3-day limit default to something shorter like 1-hour to limit the impact of users leaving auto-refresh pages (see User UI Adjustments)

User UI Adjustments

User: Pipeline Runs Page

recommend users set to the lowest value of 20

User: Releases Page

recommend users set to the lowest value of 20

Jobs page

use the filters option to limit the # of jobs being presented on each page

Discourage users from sitting on Pipeline, Release or Jobs pages for hours

Pipeline and Release pages will auto-refresh at 30s rate (see UI auto-refresh frequency above)

Best Practice Suggestions

Avoid using search or findObject queries that are too open-ended.

Such requests can take a long time for the DB to return results and lock up cycles impacting other users and running work.

Change Tracking

Review which projects have Change Tracking Turned on and consider turning these off. (Requires a system restart)

Review whether Change Tracking is required in your system (Requires a system restart) Read: https://docs.cloudbees.com/docs/cloudbees-cd/latest/change-tracking/performance

System configuration

The default memory allocation for the CD JVM server is set to 40% of physical memory. This setting is located in the wrapper.conf file.
This setting is appropriate in most instances, but for customers with higher physical memory alotments, the setting can move higher.
Here is a good guideline for consderation on traditional environments:

Under 16GB : 40%

16-32GB: 50%

32-64GB: 60%

Over 64GB: 70%

Database Performance Considerations

Work with your DBA to monitor database performance to see if any modifications to your DB settings could improve results.

SQL-Server can benefit from defragmenting indexes regularly (typically either weekly or monthly)

Separating Development work from the Production system

The separation of Dev/QE/Prod is likely partk of how you are configuring pipelines to handle your SDLC, so a similar model is recommended for Configuring your CD system.
Development work for creating new pipelines and procedures involves exploratory and iterative cycles. Formally separating Dev from Prod can de-risk dev explorations from having an impact on Production resources and overall system performance. This also separates the build up of disk space tied to transitory changes in both the Production DB and workspaces.
Separating systems can ensure that limited users are applying changes to the CD system which helps bring a level of security to your Production system improving overall stability

Review implementation selections:

When designing solutions for CD/RO, procedures and processes may work fine under limited scope when testing, yet ultimately show themselves to be ineffectient under larger scale usage. Therefore a formalized code-review process should always be encouraged for CD/RO designs to consider how things might behave at larger scales.

However, if that level of oversight hasn’t previously been applied in your design, and new slowdowns are being observed, then a review of historical designs may also be required.

Areas to explore in this regard would be:

a) Use of dedicated resources vs pools

b) Number of resources inside of pools

c) Use of resource exclusivity

For any of the 3 patterns above, See: https://docs.cloudbees.com/d/kb-360056061452

d) The rate at which schedules are being initiated

Schedules cut across numerous projects, so understanding how they may have implications can be difficult to spot.
Schedules which launch minor work that completes quickly may not be a concern, while those that can run for extended timeframes, or demand long times for certain agents need to be understood.

e) The timing selections of various disparate schedules vs. peak-hour use

Schedules that overlap with regular daily demands should be negotiated with their teams to understand which runs could have their start time shifted, or perhaps have their priority adjusted downward to limit risk in interrupting more important work.

f) The timing of Data Management schedules vs. peak-hour use

Separate from project based schedules, the schedules created for Data Retention/Management also create a drag on DB performance. So timing these runs to minimize overlap with heavy usage hours is recommended

g) Webhook triggers

How often are these being triggered?
Which resources are involved?
Might some form of buffering be possible by using a faux-resource? See: https://docs.cloudbees.com/d/kb-360056061452

h) Use of pre-conditions/wait-until code

See:https://docs.cloudbees.com/d/kb-360057488331

Code created to internally watch results using loops that are constantly polling the system

j) Pipelines left incomplete.

Pipelines with gates or wait tasks that are not completed will add to the number of items being scanned for an update. Formally resolving these pipelines to ensure they are completed once known that they are no longer intended to be used is recommended.

k) Job/Pipeline cleanup

Use the Data Management feature to auto-clean your work is important to helping keep the system queries running efficiently.