EC2 agent unable to connect to Controller with host key verification error

Article ID:360052957212
4 minute readKnowledge base

Issue

EC2 agents failing to connect to Controller. You observe below error when connecting EC2 instance to controller.

The instance EC2 (aws-xxx-xxx-xxxx) - docker-2020-09-18 (i-052d99921a8f6fc42) has a blank console. Maybe the console is yet not available. If enough time has passed, consider changing the key verification strategy or the AMI used by one printing out the host key in the instance console
Failed to connect via ssh: There was a problem while connecting to X.X.X.X:22
Jun 19, 2020 10:54:07 AM hudson.plugins.ec2.EC2Cloud
INFO: The instance console is blank. Cannot check the key. The connection to EC2 (Jenkins) - Default Slave (i-052d99921a8f6fc42) is not allowed
Jun 19, 2020 10:54:07 AM hudson.plugins.ec2.EC2Cloud
INFO: Failed to connect via ssh: There was a problem while connecting to X.X.X.X:22
Jun 19, 2020 10:54:07 AM hudson.plugins.ec2.EC2Cloud

Resolution

The above error message is related to host key verification introduced in version 1.50.3 of the Amazon EC2 plugin. When you set up a template for a Unix instance (Type AMI field), you can select a strategy to be used to guarantee the instance you’re connecting to is the expected one. There are four options to select for this strategy under the Advanced…​ configuration, on the Host Key Verification Strategy field of every configured AMI.

  • Check New Hard: Check the key presented by the instance against the instance console and stores it to check subsequent connections. If the key is not printed on the console, the connection is not trusted. This is the default behavior for new AMIs.

  • Check New Soft: Check the key against the instance console and stores it to check subsequent connections. If the key is not printed on the console, the connection is trusted anyway. This is the default behavior for existing AMIs (upgrading from a previous plugin version). This avoids future attacks but cannot guarantee the instance is the right one if a man-in-the-middle attack has already been committed.

  • Accept New: Accept the key on first connection and stores it to check subsequent connections. This doesn’t try to check the key against the console as the check-new-soft strategy does

  • Off: Don’t check the host key on any connection

If the Connect by SSH Process field is checked, the equivalent host key verification options are:

  • check-new-hard = yes

  • check-new-soft = accept-new

  • accept-new = accept-new

  • off = no

This error usually occurs when using Check New Hard or Check New Soft options and if any one of the below requirement is not met.

  • The IAM credentials configured for the plugin should have "ec2:GetConsoleOutput" permission allowed.

  • The AMI used should print the key used. It’s a common behaviour, for example the Amazon Linux 2 AMI prints it out. You can consult the AMI documentation to figure it out.

  • The launch timeout should be long enough to allow the plugin to check the instance console. With this strategy, the plugin waits for the console to be available, which can take a few minutes. The Launch Timeout in seconds field should have a number to allow that, for example 600 (10 minutes).Some EC2 instances like M5.metal require longer timeout value, approximately between 25 to 30 minutes. Setting it lower could lead to unpredictable issues during provisioning. By default there is no timeout, so it’s safe.

The long term fix is to ensure your environment is setup as per above requirement.

Workaround

  • Use Accept New verification strategy where the first key is accepted blindly without verifying against the instance console output. This should only be used as a temporary fix, eventually you should setup your environment to use Check New Hard which is considered to be the safest option.

  • In some environments you may experience below error when using Accept New and your environment is configured according to above recommendation.

[03/23/21 14:36:00] Launching agent
$ ssh -o StrictHostKeyChecking=accept-new -i /tmp/ec2_4261174220549794768.pem ec2-user@X.X.X.X -p 22 java -jar /tmp/remoting.jar -workDir /home/ec2-user
command-line line 0: unsupported option "accept-new".

This normally happens when you are using the Connect by SSH Process option in Amazon EC2 plugin AMI configuration and the version of OpenSSH running on your Controller host is older than release-7.6. The StrictHostKeyChecking option Accept New was introduced from release-7.6 of OpenSSH. You can verify the version of Openssh by running SSH -V command. The workaround is either to upgrade the version of OpenSSH on your controller to release-7.6+ or Unchecking Connect by SSH Process and using the in-process Java SSH client should also work as it does not dependent on external SSH process.

Limitations with external ssh process

When connecting to EC2 agents using the external ssh process(Connect by SSH Process field is checked) the known host file is populated using IP address of EC2 Instances. You are likely to face issues with host verification if new instances re-use IPs already added to the list of known host. You will see messages like the one below when this occurs.

[03/29/21 15:26:34] Launching agent
$ ssh -o StrictHostKeyChecking=accept-new -i /tmp/ec2_4365876265509662772.pem buildagent@X.X.X.X -p 22 java -jar /tmp/remoting.jar -workDir /home/buildagent/jenkins
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:Sd9gv2jpwAkVMXTcIYxEyKnEL8nhd5yWmCoVz8OqfYA.
Please contact your system administrator.
Add correct host key in /home/jenkins/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/jenkins/.ssh/known_hosts:375
ECDSA host key for 172.26.2.172 has changed and you have requested strict checking.
Host key verification failed.
ERROR: Unable to launch the agent for EC2 (jenkins-test-m1-hsv.adtran.com) - lightweight (i-08bbe0ff83e8e603f)
java.io.EOFException: unexpected stream termination

If you encounter such a situation the best workaround is to use the in-process Java SSH client(Connect by SSH Process field is unchecked). This uses the instance ID instead of IP adress to populate the list of known host.