Bril: A Compiler Intermediate Representation for Learning

Bril, the Big Red Intermediate Language, is a programming language for learning about compilers. It’s the intermediate representation we use in CS 6120, a PhD-level compilers course. Bril’s design tenets include:

  • Bril is an instruction-oriented language, like most good IRs.
  • The core is minimal and ruthlessly regular. Extensions make it interesting.
  • The tooling is language agnostic. Bril programs are just JSON.
  • Bril is typed.

See the language reference for the complete language specification and the tool documentation for details on the “batteries included” monorepo.

Language Reference

This section describes the Bril language exhaustively. The Syntax chapter is about the structure of the language---the abstract syntax. Then, there are several chapters about the actual operations that fit into this general structure. The core language has the basics, including integers, Booleans, and control flow. Other extensions add optional functionality on top of this simple core language. You are cordially invented to add your own similar extensions to the Bril language!

Syntax Reference

Bril programs are JSON objects that directly represent abstract syntax. This chapter exhaustively describes the structure of that syntax. All objects are JSON values of one sort or another.

Program

{ "functions": [<Function>, ...] }

A Program is the top-level object. It has one key:

  • functions, a list of Function objects.

There should be at least one function with the name main. When execution starts, this function will be invoked.

Type

"<string>"
{"<string>": <Type>}

There are two kinds of types: primitive types, whose syntax is just a string, and parameterized types, which wrap a smaller type. The semantics chapters list the particular types that are available---for example, core Bril defines the basic primitive types int and bool and the memory extension defines a parameterized pointer type.

Function

{
  "name": "<string>",
  "args": [{"name": "<string>", "type": <Type>}, ...]?,
  "type": <Type>?,
  "instrs": [<Instruction>, ...]
}

A Function object represents a (first-order) procedure consisting of a sequence of instructions. There are four fields:

  • name, a string.
  • args, optionally, a list of arguments, which consist of a name and a type. Missing args is the same as an empty list.
  • Optionally, type, a Type object: the function’s return type, if any.
  • instrs, a list of Label and Instruction objects.

When a function runs, it creates an activation record and transfers control to the first instruction in the sequence.

Label

{ "label": "<string>" }

A Label marks a position in an instruction sequence as a destination for control transfers. It only has one key:

  • label, a string. This is the name that jump and branch instructions will use to transfer control to this position and proceed to execute the following instruction.

Instruction

{ "op": "<string>", ... }

An Instruction represents a unit of computational work. Every instruction must have this field:

  • op, a string: the opcode that determines what the instruction does. (See the Operations section, below.)

Depending on the opcode, the instruction might also have:

  • dest, a string: the name of the variable where the operation’s result is stored.
  • type, a Type object: the type of the destination variable.
  • args, a list of strings: the arguments to the operation. These are names of variables.
  • funcs, a list of strings: any names of functions referenced by the instruction.
  • labels, a list of strings: any label names referenced by the instruction.

There are three kinds of instructions: constants, value operations, and effect operations.

Constant

{ "op": "const", "dest": "<string>", "type": <Type>,
  "value": <literal> }

A Constant is an instruction that produces a literal value. Its op field must be the string "const". It has the dest and type fields described above, and also:

  • value, the literal value for the constant. This is either a JSON number or a JSON Boolean value. The type field must match—i.e., it must be “int” or “bool”, respectively.

Value Operation

{ "op": "<string>", "dest": "<string>", "type": <Type>,
  "args": ["<string>", ...]?,
  "funcs": ["<string>", ...]?,
  "labels": ["<string>", ...]? }

A Value Operation is an instruction that takes arguments, does some computation, and produces a value. Like a Constant, it has the dest and type fields described above, and also any of these three optional fields:

  • args, a list of strings. These are variable names defined elsewhere in the same function.
  • funcs, a list of strings. The names of any functions that this instruction references. For example, core Bril‘s call instruction takes one function name.
  • labels, a list of strings. The names of any labels within the current function that the instruction references. For example, core Bril‘s jump and branch instructions have target labels.

In all three cases, these keys may be missing and the semantics are identical to mapping to an empty list.

Effect Operation

{ "op": "<string>",
  "args": ["<string>", ...]?,
  "funcs": ["<string>", ...]?,
  "labels": ["<string>", ...]? }

An Effect Operation is like a Value Operation but it does not produce a value. It also has the optional args, funcs, and labels fields.

Well Formedness

Not every syntactically complete Bril program is well formed. Here is an incomplete list of rules that well-formed Bril programs must follow:

  • Instructions may name variables as arguments when they are defined elsewhere in the function. Similarly, they may only refer to labels that exist within the same function, and they can only refer to functions defined somewhere in the same file.
  • Dynamically speaking, during execution, instructions may refer only to variables that have already been defined earlier in execution. (This is a dynamic property, not a static property.)
  • Every variable may have only a single type within a function. It is illegal to have two assignments to the same variable with different types, even if the function’s logic guarantees that it is impossible to execute both instructions in a single call.
  • Many operations have constraints on the types of arguments they can take; well-formed programs always provide the right type of value.

Tools do not need to handle ill-formed Bril programs. As someone working with Bril, you never need to check for well-formedness and can do anything when fed with ill-formed code, including silently working just fine, producing ill-formed output, or crashing and burning.

To help check for well-formedness, the reference interpreter has many dynamic checks and the type inference tool can check types statically.

Core Language

This section describes the core Bril instructions. Any self-respecting Bril tool must support all of these operations; other extensions are more optional.

Types

Core Bril defines two primitive types:

  • int: 64-bit, two’s complement, signed integers.
  • bool: True or false.

Arithmetic

These instructions are the obvious binary integer arithmetic operations. They all take two arguments, which must be names of variables of type int, and produce a result of type int:

  • add: x + y.
  • mul: x × y.
  • sub: x - y.
  • div: x ÷ y.

Comparison

These instructions compare integers. They all take two arguments of type int and produce a result of type bool:

  • eq: Equal.
  • lt: Less than.
  • gt: Greater than.
  • le: Less than or equal to.
  • ge: Greater than or equal to.

Logic

These are the basic Boolean logic operators. They take arguments of type bool and produce a result of type bool:

  • not (1 argument)
  • and (2 arguments)
  • or (2 arguments)

Control

These are the control flow operations. Unlike the value operations above, they take labels and functions in addition to normal arguments.

  • jmp: Unconditional jump. One label: the label to jump to.
  • br: Conditional branch. One argument: a variable of type bool. Two labels: a true label and a false label. Transfer control to one of the two labels depending on the value of the variable.
  • call: Function invocation. Takes the name of the function to call and, as its arguments, the function parameters. The call instruction can be a Value Operation or an Effect Operation, depending on whether the function returns a value.
  • ret: Function return. Stop executing the current activation record and return to the parent (or exit the program if this is the top-level main activation record). It has one optional argument: the return value for the function.

Only call may (optionally) produce a result; the rest appear only as Effect Operations.

Miscellaneous

  • id: A type-insensitive identity. Takes one argument, which is a variable of any type, and produces the same value (which must have the same type, obvi).
  • print: Output values to the console (with a newline). Takes any number of arguments of any type and does not produce a result.
  • nop: Do nothing. Takes no arguments and produces no result.

Static Single Assignment (SSA) Form

This language extension lets you represent Bril programs in static single assignment (SSA) form. As in the standard definition, an SSA-form Bril program contains only one assignment per variable, globally—that is, variables within a function cannot be reassigned. This extension adds ϕ-nodes to the language.

Operations

There is one new instruction:

  • phi: Takes n labels and n arguments, for any n. Copies the value of the ith argument, where i is the index of the second-most-recently-executed label. (It is an error to use a phi instruction when two labels have not yet executed, or when the instruction does not contain an entry for the second-most-recently-executed label.)

Intuitively, a phi instruction takes its value according to the current basic block’s predecessor.

Examples

In the text format, you can write phi instructions like this:

x: int = phi a .here b .there;

The text format doesn’t care how you interleave arguments and labels, so this is equivalent to (but more readable than) phi a b .here .there. The “second-most-recent label” rule means that the labels refer to predecessor basic blocks, if you imagine blocks being “named” by their labels.

Here’s a small example:

.top:
  a: int = const 5;
  br cond .here .there;
.here:
  b: int = const 7;
.there:
  c: int = phi a .top b .here;
  print c;

A phi instruction is sensitive to the incoming CFG edge that execution took to arrive at the current block. The phi instruction in this program, for example, gets its value from a if control came from the .top block and b if control came from the .here block.

The reference interpreter can supports programs in SSA form because it can faithfully execute the phi instruction.

Manually Managed Memory

While core Bril only has simple scalar stack values, the memory extension adds a manually managed heap of array-like allocations. You can create regions, like with malloc in C, and it is the program’s responsibility to delete them, like with free. Programs can manipulate pointers within these regions; a pointer indicates a particular offset within a particular allocated region.

You can read more about the memory extension from its creators, Drew Zagieboylo and Ryan Doenges.

Types

The memory extension adds a parameterized ptr type to Bril:

{"ptr": <Type>}

A pointer value represents a reference to a specific offset within a uniformly-typed region of values.

Operations

These are the operations that manipulate memory allocations:

  • alloc: Create a new memory region. One argument: the number of values to allocate (an integer). The result type is a pointer; the type of the instruction decides the type of the memory region to allocate. For example, this instruction allocates a region of integers:

    {
        "op": "alloc",
        "args": ["size"],
        "dest": "myptr",
        "type": {"ptr": "int"}
    }
    
  • free: Delete an allocation. One argument: a pointer produced by alloc. No return value. It is an error to access or free a region that has already been freed.

  • store: Write into a memory region. Two arguments: a pointer and a value. The pointer type must agree with the value type (e.g., if the second argument is an int, the first argument must be a ptr<int>). No return value.

  • load: Read from memory. One argument: a pointer. The return type is the pointed-to type for that pointer.

  • ptradd: Adjust the offset for a pointer, producing a new pointer to a different location in the same memory region. Two arguments: a pointer and an offset (an integer, which may be negative). The return type is the same as the original pointer type.

Floating Point

Bril has an extension for computing on floating-point numbers.

You can read more about the extension, which is originally by Dietrich Geisler and originally included two FP precision levels.

Types

The floating point extension adds one new base type:

"float"

Floating point numbers are 64-bit, double-precision IEEE 754 values. (There is no single-precision type.)

Operations

There are the standard arithmetic operations, which take two float values and produce a new float value:

  • fadd
  • fmul
  • fsub
  • fdiv

There are also comparison operators, which take two float values and produce a bool:

  • feq
  • flt
  • fle
  • fgt
  • fge

Speculative Execution

This extension lets Bril programs use a form of explicit speculative execution with rollback.

In general, speculation is when programs perform work that might not actually be necessary or even correct, under the assumption that it is likely to be right and useful. If this assumption turns out to be wrong, speculation typically needs some rollback mechanism to undo incorrect side effects and recover to a correct state.

In this Bril extension, programs can explicitly enter a speculative mode, where variable assignments are temporary. Then, they can either abort or commit those assignments, discarding them or making them permanent.

Operations

  • speculate: Enter a speculative execution context. No arguments.
  • commit: End the current speculative context, committing the current speculative state as the “real” state. No arguments.
  • guard: Check a condition and possibly abort the current speculative context. One argument, the Boolean condition, and one label, to which control is transferred on abort. If the condition is true, this is a no-op. If the condition is false, speculation aborts: the program state rolls back to the state at the corresponding speculate instruction, execution jumps to the specified label.

Speculation can be nested, in which case aborting or committing a child context returns execution to the parent context. Aborting speculation rolls back normal variable assignments, but it does not affect the memory extension‘s heap—any changes there remain. It is an error to commit or abort outside of speculation. It is not an error to perform side effects like print during speculation, but it is probably a bad idea.

Examples

Committing a speculative update makes it behave like normal:

v: int = const 4;
speculate;
v: int = const 2;
commit;
print v;

So this example prints 2. However, when a guard fails, it rolls back any modifications that happened since the last speculate instruction:

  b: bool = const false;

  v: int = const 4;
  speculate;
  v: int = const 2;
  guard b .failed;
  commit;

.failed:
  print v;

The guard here fails because b is false, then v gets restored to its pre-speculation value, and then control transfers to the .failed label. So this example prints 4. You can think of the code at .failed as the “recovery routine” that handles exceptional conditions.

Interpreter

The reference interpreter supports speculative execution. However, it does not support function calls during speculation, so you will get an error if you try to use a call or ret instruction while speculating.

Bril Tools

These sections describe tools for dealing with Bril programs.

Interpreter

brili is the reference interpreter for Bril. It is written in TypeScript. You can find brili in the bril-ts directory in the Bril repository.

The interpreter supports core Bril along with the memory, floating point, SSA, and speculation extensions.

Install

To set up the interpreter, you will need Node and Yarn. Go to the bril-ts directory and do this:

$ yarn
$ yarn build
$ yarn link

The last thing will install symlinks to the two utility programs, but they may not be in a standard location. To find where these tools were installed, run yarn global bin. You probably want to add this to your $PATH.

Run

The brili program takes a Bril program as a JSON file on standard input:

$ brili < my_program.json

It emits any print outputs to standard output. To provide inputs to the main function, you can write them as command-line arguments:

$ brili 37 5 < add.json
42

Profiling

The interpreter has a rudimentary profiling mode. Add a -p flag to print out a total number of dynamic instructions executed to stderr:

$ brili -p 37 5 < add.json
42
total_dyn_inst: 9

Bril Text Format

While Bril’s canonical representation is a JSON AST, humans don’t like to read and write JSON. To accommodate our human foibles, we also have a simple textual representation. There is a parser and pretty printer tool that can convert the text representation to and from JSON.

For example, this Bril program in JSON:

{
  "functions": [{
    "name": "main",
    "instrs": [
      { "op": "const", "type": "int", "dest": "v0", "value": 1 },
      { "op": "const", "type": "int", "dest": "v1", "value": 2 },
      { "op": "add", "type": "int", "dest": "v2", "args": ["v0", "v1"] },
      { "op": "print", "args": ["v2"] },
      { "op": "alloc", "type": { "ptr" : "int" }, "dest": "v3", "args": ["v0"] },
      { "op": "free", "args": ["v3"] },
    ]
  }]
}

Gets represented in text like this:

@main {
  v0: int = const 1;
  v1: int = const 2;
  v2: int = add v0 v1;
  print v2;
  v3: ptr<int> = alloc v0;
  free v3;
}

TypeScript-to-Bril Compiler

Bril comes with a compiler from a very small subset of TypeScript to Bril called ts2bril.

It is not supposed to make it easy to port existing JavaScript code to Bril; it is a convenient way to write larger, more interesting programs without manually fiddling with Bril directly. It also emits somewhat obviously inefficient code to keep the compiler simple; some obvious optimizations can go a long way.

Install

You can build and install the TypeScript compiler the same way as the interpreter, in the bril-ts directory:

$ yarn
$ yarn build
$ yarn link

Use

Compile a TypeScript program to Bril by giving a filename on the command line:

$ ts2bril mycode.ts

The compiler supports both integers (from core Bril) and floating point numbers. Perhaps somewhat surprisingly, plain JavaScript numbers and the TypeScript number type map to float in Bril. For integers, use JavaScript big integers whenever you need an integer, like this:

var x: bigint = 5n;
printInt(x);

function printInt(x: bigint) {
    console.log(x);
}

The n suffix on literals distinguishes integer literals, and the bigint type in TypeScript reflects them.

Fast Interpreter in Rust

The brilirs directory contains a fast Bril interpreter written in Rust. It is a drop-in replacement for the reference interpreter that prioritizes speed over completeness and hacakability. It implements core Bril and the SSA, memory, and floating point extensions.

Read more about the implementation, which is originally by Wil Thomason and Daniel Glus.

Install

To use brilirs you will need to install Rust and add the nightly channel with rustup toolchain install nightly. Use echo $PATH to check that $HOME/.cargo/bin is on your path.

In the brilirs directory, build the interpreter with:

cargo +nightly install --path .

Run a program by piping a JSON Bril program into it:

bril2json < myprogram.bril | brilirs

Similar to type-infer, brilirs can be used to typecheck and validate your Bril JSON program by passing the --check flag (similar to cargo --check).

Syntax Plugin for Text Editors

There is a Vim syntax highlighting plugin for Bril’s text format available in bril-vim. You can use it with a Vim plugin manager. For example, if you use vim-plug, you can add this to your .vimrc:

Plug 'sampsyo/bril', { 'for': 'bril', 'rtp': 'bril-vim' }

You can read more about the plugin, which is originally by Edwin Peguero.

Type Inference

Bril requires exhaustive type annotations on every instruction, which can quickly get tedious. The type-infer directory contains a simple global type inference tool that fills in missing type annotations. For example, it can turn this easier-to-write program:

@main(arg: int) {
  five = const 5;
  ten = const 10;
  res = add arg five;
  cond = le res ten;
  br cond .then .else;
.then:
  print res;
.else:
}

Into this actually executable program:

@main(arg: int) {
  five: int = const 5;
  ten: int = const 10;
  res: int = add arg five;
  cond: bool = le res ten;
  br cond .then .else;
.then:
  print res;
.else:
}

The tool is a simple Python program, infer.py, that takes JSON programs that are missing types and adds types to them. It is also useful even on fully-typed programs as a type checker to rule out common run-time errors. The included text format tools support missing types for both parsing and printing, so here’s a shell pipeline that adds types to your text-format Bril program:

cat myprog.bril | bril2json | python type-infer/infer.py | bril2txt

You can read more about the inference tool, which is originally by Christopher Roman.

Benchmarks

The bench directory in the Bril repository contains a fledgling suite of microbenchmarks that you can use to measure the impact of your optimizations. (Benchmarks are different from tests because they are meant to actually calculate something instead of just exercising a language feature.)

The current benchmarks are:

  • ackermann: Print the value of Ack(m, n), the two-argument Ackermann–Péter function.
  • check-primes: Check the first n natural numbers for primality, printing out a 1 if the number is prime and a 0 if it’s not.
  • collatz: Print the Collatz sequence starting at n. Note: it is not known whether this will terminate for all n.
  • digial-root: Computes the digital root of the input number.
  • eight-queens: Counts the number of solutions for n queens problem, a generalization of Eight queens puzzle.
  • euclid: Calculates the greatest common divisor between two large numbers using the Euclidean Algorithm with a helper function for the modulo operator.
  • fib: Calculate the nth Fibonacci number by allocating and filling an array of numbers up to that point.
  • fizz-buzz: The infamous programming test.
  • gcd: Calculate Greatest Common Divisor (GCD) of two input positive integer using Euclidean algorithm.
  • loopfact: Compute n! imperatively using a loop.
  • mat-mul: Multiplies two nxn matrices using the naive matrix multiplication algorithm. The matrices are randomly generated using a linear congruential generator.
  • newton: Calculate the square root of 99,999 using the newton method
  • orders: Compute the order ord(u) for each u in a cyclic group <Zn,+> of integers modulo n under the group operation + (modulo n). Set the second argument is_lcm to true if you would like to compute the orders using the lowest common multiple and otherwise the program will use the greatest common divisor.
  • perfect: Check if input argument is a perfect number. Returns output as Unix style return code.
  • pythagorean_triple: Prints all Pythagorean triples with the given c, if such triples exist. An intentionally very naive implementation.
  • quadratic: The quadratic formula, including a hand-rolled implementation of square root.
  • recfact: Compute n! using recursive function calls.
  • sieve: Print all prime numbers up to n using the Sieve of Eratosthenes.
  • sum-bit: Print the number of 1-bits in the binary representation of the input integer.
  • sum-divisors: Prints the positive integer divisors of the input integer, followed by the sum of the divisors.
  • sqrt: Implements the Newton–Raphson Method of approximating the square root of a number to arbitrary precision
  • sum-sq-diff: Output the difference between the sum of the squares of the first n natural numbers and the square of their sum.
  • binary-fmt: Print the binary format for the given positive integer.

Credit for several of these benchmarks goes to Alexa VanHattum and Gregory Yauney, who implemented them for their global value numbering project.

OCaml Library

The OCaml bril library, which lives in the bril-ocaml directory, provides an OCaml interface and parser for Bril’s JSON files.

Install

To build the library, you first need to install OCaml. Then, install the dependencies with opam install core yojson.

To install the bril-ocaml library:

git clone https://github.com/sampsyo/bril path/to/my/bril
opam pin add -k path bril path/to/brill/bril-ocaml
opam install bril

Thats it! Now, you can then include it in your Dune files as bril, like any other ocaml library!

Use

The interface for the library can be found in bril.mli—good starting points are from_string, from_file, and to_string. A small code example for the library lives in the count subdirectory.

If you wish to make changes to the bril OCaml library, simply hack on the git clone.

When you are done, simply reinstall the package with opam reinstall bril. Restart the build of your local project to pick up changes made to bril-ocaml.

For Development

ocamlformat is recommended for style consistency. The dune documentation on Automatic Formatting has information about using ocamlformat with dune.

Rust Library

This is a no-frills interface between Bril’s JSON and your Rust code. It supports the Bril core along with the SSA, memory, floating point, and speculative execution extensions.

Use

Include this by adding the following to your Cargo.toml:

[dependencies.bril-rs]
version = "0.1.0"
path = "../bril-rs"
features = ["ssa", "memory", "float", "speculate"]

Each of the extensions to Bril core is feature gated. To ignore an extension, remove its corresponding string from the features list.

There are two helper functions, load_program, which will read a valid Bril program from stdin, and output_program which writes your Bril program to stdout. Otherwise, this library can be treated like any other serde JSON representation.

Examples

There is currently a very trivial example that uses this interface to create a rust implementation of bril2txt. Run it with cargo run --example bril2txt.

Development

To maintain consistency and cleanliness, run:

cargo fmt
cargo clippy

Brench

Brench is a simple benchmark runner to help you measure the impact of optimizations. It can run the same set of benchmarks under multiple treatments, check that they still produce the correct answer, and report their performance under every condition.

Set Up

Brench is a Python tool. There is a brench/ subdirectory in the Bril repository. Get Flit and then type:

$ flit install --symlink --user

Configure

Write a configuration file using TOML. Start with something like this:

extract = 'total_dyn_inst: (\d+)'
benchmarks = '../benchmarks/*.bril'

[runs.baseline]
pipeline = [
    "bril2json",
    "brili -p {args}",
]

[runs.myopt]
pipeline = [
    "bril2json",
    "myopt",
    "brili -p {args}",
]

The global options are:

  • extract: A regular expression to extract the figure of merit from a given run of a given benchmark. The example above gets the simple profiling output from the Bril interpreter in -p mode.
  • benchmarks (optional): A shell glob matching the benchmark files to run. You can also specify the files on the command line (see below).
  • timeout (optional): The timeout of each benchmark run in seconds. Default of 5 seconds.

Then, define an map of runs, which are the different treatments you want to give to each benchmark. Each one needs a pipeline, which is a list of shell commands to run in a pipelined fashion on the benchmark file, which Brench will send to the first command’s standard input. The first run constitutes the “golden” output; subsequent runs will need to match this output.

Run

Just give Brench your config file and it will give you results as a CSV:

$ brench example.toml > results.csv

You can also specify a list of files after the configuration file to run a specified list of benchmarks, ignoring the pre-configured glob in the configuration file.

The command has only one command-line option:

  • --jobs or -j: The number of parallel jobs to run. Set to 1 to run everything sequentially. By default, Brench tries to guess an adequate number of threads to fill up your machine.

The output CSV has three columns: benchmark, run, and result. The latter is the value extracted from the run’s standard output and standard error using the extract regular expression or one of these three status indicators:

  • incorrect: The output did not match the “golden” output (from the first run).
  • timeout: Execution took too long.
  • missing: The extract regex did not match in the final pipeline stage’s standard output or standard error.