Description
This error means that a job failed on 3 separate agents and the job caused the agents to become non-responsive. When this occurs, Electric Make fails the job and aborts the build. The build proceeds only if you have set -k, -i, or other options that allow the build to continue after an error.
This ElectricAccelerator behavior is designed to prevent a single command from bringing down an entire cluster.
Reasons
This message could be displayed if any of the following occurs:
-
someone restarts agent machines without shutting down their agents first
-
an agent/host loses network connectivity
-
any other non-agent connectivity issue
For possible hints about why you received this message, examine the Messages tab on the Cluster Manager UI for the affected Agents when running that build.
Example
This is an example of job annotation for a job that triggered EC1073. Refer to the "timing" elements to find the agents involved in the failure.
<job id="J02532fa8" thread="f8932b70" type="rule" name="all" file="Makefile" line="2"> ... <output>ERROR EC1073: Job caused multiple agents to fail. </output> <timing invoked="1.122699" completed="1.161900" node="someagenthost-1"/> <timing invoked="1.164346" completed="1.241695" node="someagenthost-2"/> <timing invoked="1.244078" completed="1.321730" node="someagenthost-3"/> <failed code="1"/> </job>
There may be clues in earlier XML elements of this form:
<message ...>Lost connection to agent someagenthost-...</message>