Building with Electrify

9 minute read

Accelerator lets you use a large number of inexpensive machines to build a cluster of unlimited size. Electrify provides an alternate way to use this infrastructure by parallelizing the build process and distributing build steps across the cluster without having to use the built-in eMake tool. Electrify is installed as part of Accelerator.

Electrify can run an executable and monitor it as it spawns processes. Electrify intercepts these processes and sends them to the cluster for parallel execution.

How an Electrify Build Differs from an eMake Build

Electrify is a front end to the Accelerator cluster that lets you distribute work from a wide variety of processes in addition to the make-based processes that Accelerator traditionally manages.

Though all of the usual eMake command-line options are available, Electrify uses only a subset of them.

Local Versus Remote Execution

An important difference between an eMake build and an Electrify build is what part of the build activity occurs remotely versus locally. In an eMake build, effectively all build activity (except “#pragma runlocal” jobs) takes place on the cluster, where the EFS monitors file system access and propagates those changes made by one job to other jobs in the build.

In an Electrify build, more of the build activity takes place on the local system—At the very least, the build process itself (such as with SCons) runs locally. File system modifications made by processes running locally are typically “invisible” to Electrify, and therefore invisible to processes running on the cluster—just as “ #pragma runlocal ” jobs might make changes that are invisible to eMake.

Monitoring File System Modifications for Virtualization

Electrify uses a program called electrifymon to detect the file system modifications made by commands invoked by the “electrified” build tool but not distributed to the cluster. electrifymon provides a means to update the virtual file system state in Electrify in response to the file system modifications made by the local processes.

If you know that a build will not run any processes locally that modify the file system, you need not use electrifymon when invoking Electrify. However, some build tools themselves will make changes to the file system (for example, when SCons employs its build-avoidance mechanism by copying a previously-built object instead of invoking the compiler), so the safest choice is to use electrifymon to start.

On Windows, in addition to file system monitoring, electrifymon provides a sophisticated way to intercept process invocations and determine which processes to distribute to the cluster. On Linux, process interception is handled by the explicit use of proxy commands.

Electrify Limitations

  • Electrify does not provide eMake’s dependency detection or correction features.

  • The information written into annotation is more limited with Electrify than what eMake provides.

    Electrify annotation provides information only on the commands executed on the cluster, including command lines, file usage, and raw command output. Electrify does not provide information about dependencies, job relationships, targets, or other logical build structure data.

Prerequisites for Using Electrify

On all platforms, the tool that you want to monitor, such as SCons, must provide parallel support. That is, it must be capable of accurate parallel execution on its own. Platform-specific prerequisites appear below.

Linux Platforms

electrifymon must locate the electrifymon.so compiled library file so it can tell monitored programs to load the monitoring library that reports back to electrifymon. By default, electrifymon looks in the following locations:

Platform 32-bit 64-bit

Linux

/opt/ecloud/i686_Linux/32/lib

/opt/ecloud/i686_Linux/64/lib

Solaris (SPARC)

/opt/ecloud/sun4u_SunOS/lib

/opt/ecloud/sun4u_SunOS/64/lib

Solaris (x86)

/opt/ecloud/i686_SunOS.5.10/lib

/opt/ecloud/i686_SunOS.5.10/64/lib

You can override these locations by using the ELECTRIFYMON32DIR and ELECTRIFYMON64DIR environment variables.

Windows Platforms

  • Ensure that cl.exe, link.exe, and so on, are those of Microsoft Visual Studio.

    The wrapper application might have changed them to its version.

  • On 64-bit Windows platforms, if you did not install Accelerator in its default location, you must specify the complete location of ` electrifymon.exe` (including the executable name) for the EMAKE_ELECTRIFYMON environment variable.

    For example, if your custom install location is C:\programs\ECloud, then set the EMAKE_ELECTRIFYMON environment variable using: set EMAKE_ELECTRIFYMON = C:\programs\ECloud\i686_win32\64\bin\electrifymon.exe

CloudBees has evaluated the following build tools for use with Electrify.

  • SCons—CloudBees recommends using SCons with Electrify.

    Using SCons has no known limitations.

  • Ninja—See the Accelerating Ninja with ElectricAccelerator blog post for details.

  • Ant—CloudBees does not currently recommend using Ant with Electrify.

Selecting Commands to Parallelize in Electrify

Most build acceleration occurs through parallelizing a few specific commands, such as compiling and linking. Selecting many different additional commands to parallelize might not provide much more acceleration.

If a file is created or modified by one or more parallelized commands, then you should parallelize all commands that use that file.

Running a Build Using Electrify

The following sections describe how to run builds using Electrify.

Changing the Electrify Monitoring Mode

You can change the mode that Electrify uses for monitoring. Just add the --emake-electrify=mode eMake option to your Electrify command-line options, where mode can be preload for LD_PRELOAD intercept or trace for the ptrace system call.

Electrify Command Syntax

electrify [<options>] <other tools’ command line>

List of Electrify Command-Line Options

Command-line options are listed in alphabetical order.

Option Environment Variable Description

--electrify-allow-regexp=<perl-regular-expression>

(Linux only) Sends all processes to the cluster whose full command-line matches the regular expression.

--electrify-deny-regexp=<perl-regular-expression>

(Linux only) Processes whose command line matches the expression will be executed locally. You can use this after the --electrify-allow-regexp option to more precisely select the processes that are sent to the cluster.

--electrify-localfile=<x>

(Windows only) Integrates local file access (create, rename, and so on) by locally running tools with the remote file system. You can set x to NT or y.

Set nt if you want to monitor undocumented low-level file access Nt functions. This monitors the following functions: NtCreateFile, NtDeleteFile, NtClose, NtWriteFile, and NtSetInformationFile. Though this includes only five functions, their functionality is rich, so this selection includes nearly all scenarios where the local file system changes.

Set y to monitor documented win32 APIs for file access. This monitors the following win32 APIs: CreateFileW, CreateDirectoryA, CreateDirectoryExA, CreateDirectoryExW, CreateDirectoryW, DeleteFileA, DeleteFileW, MoveFileA, MoveFileExA, MoveFileExW, MoveFileW, RemoveDirectoryA, RemoveDirectoryW, SetFileAttributesA, and SetFileAttributesW. Though there are many functions, their functionality is less than Nt functions, particularly because some tools such as Cygwin cp.exe use NtCreateFile and so on. In general, use the y flag for testing only.

--electrify-log=<fullpath>

ELECTRIFY_LOG

fullpath is the path of the file you want to log. This logs all process creation and interception information.

--electrify-not-intercept=<x;y;>

ELECTRIFY_NOT_INTERCEPT

x and y are commands that you do not want to be monitored, meaning the monitoring process does not inject a DLL to them and their child processes, so they will not be distributed.

--electrify-not-remote=<x;y;>

x and y are commands that are not distributed to the cluster. Use the command’s full name, such as cl.exe, link.exe, gcc.exe, without the path. The name is case insensitive. In a Cygwin environment, you can use ' : ' (colon) instead of ' ; ' (semicolon).

--electrify-not-remote and --electrify-remote are mutually exclusive.

If you use --electrify-not-remote, all other tools’ command lines are executed remotely by default. This is generally undesirable, so to do this, you must add a command to this list.

--electrify-remote=<x;y>

ELECTRIFY_REMOTE

x and y are commands that are distributed to the cluster. Use the command’s full name, such as cl.exe, link.exe, gcc.exe, without the path. The name is case insensitive. In a Cygwin environment, you can use ' : ' (colon) instead of ' ; ' (semicolon).

Limited to 2048 characters.

Example: Running Electrify on Linux (SCons with JobCache)

On Linux, Electrify supports JobCache for SCons builds. In most cases, just add --emake-jobcache=all to your Electrify command-line options to enable it. This example demonstrates how to build using SCons with JobCache and then check that the job cache works. For more information about JobCache, see Job Caching.

SConstruct File

Following is the SConstruct file for this example:

import os

env = Environment()
_env = env['ENV']
for k in [
        'EMAKE_JOB_SERVER',
        'ELECTRIFYPROXY',
        'ELECTRIFYMON32DIR',
        'ELECTRIFYMON64DIR',
        'EMAKE_ELECTRIFYMON',
        'ECLOUD_MONITOR',
        'LD_PRELOAD',
        'LD_LIBRARY_PATH',
        ]:
    try:
        _env[k] = os.environ[k]
    except KeyError as e:        pass
env.Program('helloworldthankyou', ['sentence.c', 'hello.c', 'world.c', 'thank.c', 'you.c'])

Source Files

Each row in the following table represents one C program and a corresponding header file:

Source File Name Source File Corresponding Header File Name Header File
sentence.c
#include <stdio.h>
#include "hello.h"
#include "thank.h"
#include "world.h"
#include "you.h"
 int main(int argc, char *argv[]){
   hello();
   world();
   thank();
   you();
}

N/A

N/A

hello.c
#include <stdio.h>
#include "hello.h"
void hello() {
   printf("Hello");
}
hello.h
void hello();
world.c
#include <stdio.h>
#include "world.h"
void world() {
   printf(" world");
}
world.h
void world();
thank.c
#include <stdio.h>
#include "thank.h"
void thank() {
   printf(" and thank");
}
thank.h
void thank();
you.c
#include <stdio.h>
#include "you.h"
void you() {
   printf(" you!\n");
}
you.h
void you();

Building the Source Code

To build the source code for the first build, enter:

/opt/ecloud/i686_Linux/64/bin/electrify --emake-cm=rhodes  --electrify-remote=gcc --emake-jobcache=all -- scons -j 4
Starting build: 178
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
cc -o hello.o -c hello.c
cc -o sentence.o -c sentence.c
cc -o world.o -c world.c
cc -o thank.o -c thank.c
cc -o you.o -c you.c
cc -o helloworldthankyou sentence.o hello.o world.o thank.o you.o
scons: done building targets.
Finished build: 178   Duration: 0:04 (m:s)   Cluster availability: 100%

Checking That There Were No Cache Hits After the First Build

The first run did not have a job cache to use and therefore had to populate the cache. To check for hits, enter:

grep jobcache.hit\" emake.xml
<metric name="jobcache.hit">0</metric>

where emake.xml is the annotation file that is produced by every Electrify build. The grep command output should show no cache hits.

Cleaning the Tree

This removes any files that would be built by the next build:

scons -c
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Cleaning targets ...
Removed hello.o
Removed sentence.o
Removed world.o
Removed thank.o
Removed you.o
Removed helloworldthankyou
scons: done cleaning targets.

Rebuilding the Source Code

For the second run, enter:

/opt/ecloud/i686_Linux/64/bin/electrify --emake-cm=rhodes  --electrify-remote=gcc --emake-jobcache=all -- scons -j 4
Starting build: 179
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
cc -o hello.o -c hello.c
cc -o sentence.o -c sentence.c
cc -o world.o -c world.c
cc -o thank.o -c thank.c
cc -o you.o -c you.c
cc -o helloworldthankyou sentence.o hello.o world.o thank.o you.o
scons: done building targets.
Finished build: 179   Duration: 0:01 (m:s)   Cluster availability: 100%

Checking for Cache Hits for a Subsequent Build

The first run populated the job cache for subsequent runs. To check for hits for the rebuild, enter:

grep jobcache.hit\" emake.xml <metric name="jobcache.hit">5</metric>

The grep command output should show five cache hits.

Example: Running Electrify on Linux (GNU Make C++)

For example:

electrify --emake-cm=<CM name> --electrify-remote=g++:gcc:<any other tools used in the build> make -j 4 -f makefile
If you intend to use Electrify with GNU Make, CloudBees recommends using eMake instead. eMake provides superior performance and correctness as well as full annotation information.

Using Whole Command-Line Matching and efpredict (Linux Only)

Whole Command-Line Matching

On Linux platforms, Electrify lets you use a process’ entire command line to determine whether to send it the cluster for execution.

Electrify Options for Whole Command-Line Matching

You use the following additional Electrify command-line options:

--electrify-allow-regexp=<perl-regular-expression>
--electrify-deny-regexp=<perl-regular-expression>

These options let you specify which sub-processes to execute in the cluster. You use a Perl-style regular expression that is matched against both the process name and all of its arguments, such as the name of the script or JAR file that is being executed. When Electrify detects that a process is started, it constructs the command line for that process by joining all of the components of its "argv" array together with spaces and then applying the list of "allow" and "deny" regular expressions in the sequence that they were supplied on the command line.

Whole Command-Line Matching Example

For example, for the following three processes:

java -jar runstep.jar -x86
java -jar otherjar.jar
java -jar runstep.jar -armv7

the following options send the first process to the cluster but not the second or third:

--electrify-regexp-allow="[^ ]+java\s.*runstep.jar.*"  --electrify-regexp-deny=".*\-armv7.*"

A process is initially considered to be for local execution only, but successive regexp-allow options can change this state if any of them match. Any regexp-deny in the sequence will, if it matches, short-circuit the decision immediately and cause that process to be executed locally.

efpredict

On Linux platforms, Electrify lets you use efpredict to verify that the expressions you entered actually select the correct processes. Otherwise, you must perform a full build and then examine the annotation file to see if the correct decisions were made. You would have to repeat this process each time there were any mistakes, which could be time consuming. Instead, you can test settings with efpredict. You provide the same options as you would for Electrify and then enter command lines into efpredict 's standard input to see if it selects them for local or cluster execution. One easy way to do this is to pipe an old build lsign into efpredict and view its output to see if the processes were executed remotely.

One regexp can match many commands. For example, to send both the gcc and ld commands to the cluster, you can use:

--electrify-regexp-allow='[^ ]*((gcc)|(ld))(\s.*)?'

efpredict for Processes Executed by a Shell

If a process was executed by a shell, variables will be expanded, quotes will be removed, and white space between tokens is replaced with single spaces before Electrify matches the process. This means that if you look at a process invocation in a shell script or makefile, that might not be the exact text that Electrify sees when it attempts to intercept the invocation of that process.

For example, in a script you might see:

'gcc "$SOURCE/myfile.c" -o "$OUTPUT/myfile.o" -c'

but when Electrify intercepts this and tries to reconstruct the command line, it will see:

"gcc src/myfile.c -o out/myfile.o -c".

Regular expressions must be written to match what Electrify will be able to see.

efpredict Example

For example:

cat oldlog | efpredict --electrify-regexp-allow="[^ ]+java\s.*runstep.jar.*"  --electrify-regexp-deny=".*\-armv7.*"

gives this output:

remote_allow: java -jar runstep.jar -x86
remote_deny: java -jar otherjar.jar
remote_deny: java -jar runstep.jar -armv7