The following information should help guide you towards setting up NFS storage for usage with CloudBees CI. It is assumed that a storage admin team will be responsible for providing a storage solution and NFS volume for you to use, and that this team will be able to assist with configuring your clients (CloudBees CI servers) to mount the volume. Our recommendations are based on our experience configuring NFS for the best performance under the typical IO workload of Jenkins.
1) Use SSD disks if possible
Your NFS volume should use SSD disks if possible. The performance improvement of SSD over spindles/spinning disks is 10-20x.
2) Use NFS v4.1 or higher
CloudBees engineering has validated NFS v4.1 and NFS 4.1 or higher is the recommended NFS version for all CloudBees CI installations.
NFS v4.0 is not supported as file storage vendors have reported to CloudBees customers that there are known performance issues with v4.0. For more details on supported platforms, please see these pages:
NFS v3 is known to be performant, but is considered insecure in most environments.
3) Configure NFS client mount point
An example with our recommended mount options:
10.0.0.200:/mnt/jenkins_home /mnt/nfs_jenkins_home nfs _netdev,rw,bg,hard,intr,rsize=32768,wsize=32768,vers=4.1,proto=tcp,timeo=600,retrans=2,noatime,nodiratime,async 0 0
Note that the _netdev param is essential. It prevents the OS from trying to mount the volume before the network interfaces have a chance to negotiate a connection.
We recommend the rsize=32768,wsize=32768
read and write block sizes because Jenkins performs a high volume of small reads and writes (mainly for working with build log files). These block sizes should yield the best performance for that.
The noatime,nodiratime,async
settings are also important for best performance.
4) Configure Jenkins to use the NFS mount point
You can configure Jenkins to use a different home directory. To do this, edit the service config file (location is dependent on your OS and version of CloudBees CI - see this guide for details) and change the JENKINS_HOME
variable:
JENKINS_HOME="/nfs/jenkinsHome"
Save the file and restart Jenkins.
service jenkins stop service jenkins start
5) General Troubleshooting
Check if the I/O operations are causing the Jenkins controller slowness
The top
command on Unix provides the time the CPU is waiting for I/O completion
wa, IO-wait : time waiting for I/O completion
On the example below we can see that only a 0.3% of the time the CPU was waiting for I/O completion.
top - 11:12:06 up 1:11, 1 user, load average: 0.03, 0.02, 0.05 Tasks: 74 total, 2 running, 72 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 501732 total, 491288 used, 10444 free, 4364 buffers KiB Swap: 0 total, 0 used, 0 free. 42332 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1054 jenkins 20 0 4189920 332444 6960 S 0.7 66.3 1:01.26 java 1712 vagrant 20 0 107700 1680 684 S 0.3 0.3 0:00.20 sshd 1 root 20 0 33640 2280 792 S 0.0 0.5 0:00.74 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
Determine if the disk Jenkins is using might be causing the performance issues
At this point, we can use nfsiostat
to understand if the mount we are using is or not causing the slowness. It’s best to run it several times, a few seconds apart, to get a sense of performance over time, because it will vary depending on how much IO activity there is. nfsiostat 2 20
will run it every two seconds for twenty runs.
# nfsiostat 10.130.12.150:/data01 mounted on /data01: op/s rpc bklog 0.08 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.052 6.436 124.154 0 (0.0%) 9.365 9.617 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.001 0.214 199.536 0 (0.0%) 5.673 72.526
The main things we are interested in are the avg RTT time (duration from the time that client’s kernel sends the RPC request until the time it receives the reply) and avg exe time (duration from the time that NFS client does the RPC request to its kernel until the RPC request is completed, this includes the RTT time). It’s normal for reads to be faster than writes, but you would not want to see exe times above 100 ms on a busy system.
Conclusion
For further assistance with using NFS, including performance issues, we recommend working with your storage admin team, or contacting CloudBees Support.