Electric Make (emake) appears to hang and there are repeated messages in the system agent log on the Cluster Manager like the following:
Emake reported error: Removing failed agent winb1-1 from pool: Emake-agent handshake failed: couldn't connect to server at: 2006/09/14 13:36:2
This situation is caused by routing problems between the emake machine and the agent machine. The CM is able to communicate with the agent and the agent is responding with an OK status. The CM allocates the agent machine to the build, but emake is unable to open a socket to the agent. Eventually, its attempt to connect times out and it reports an error to the CM. Because the CM is able to get a good status response from the agent, it adds the agent back to the pool and assigns it to the build.
First, check that the emake machine can ping the agent in question. Typically, the ping fails, indicating a basic routing problem. Often, the agents are on a private switch that only sees the CM. The CM has two network interfaces, one on the private network and one on the larger LAN. In this configuration, only the CM can communicate with the agents. The only solution is to ensure that the agent network is bridged to the rest of the LAN.
A similar situation could occur if the CM and agents are behind a firewall. In this case, the emake machine could contact the CM because port 80 is allowed, but all other ports are denied. Again, the only solution is to allow emake unrestricted access to the agent ports. The list of ports to open can be obtained by looking at the agent details pages in the CM. The "Port:" field indicates the port that emake requires to be able to communicate with on each agent.