Letting templates do what you mean

Hi everybody, today I'd like to promote a minor, but important improvement in the 'expression template compiler' for the new JIT backend. This is a tool designed to make it easy to develop expression templates, which are themselves a way to make it easy to generate the 'expression tree' intermediate representation used by the new JIT backend. This is important because MoarVM instructions operate on a perl-like level of abstraction - single instructions can perform operations such as 'convert object to string', 'find first matching character in string' or 'access the last element of an array'. Such operations require rather more instructions to represent as machine code.

This level of abstraction is rather convenient for the rakudo compiler, which doesn't have to consider low-level details when it processes your perl6 code. But it is not very convenient for the JIT compiler which does. The 'expression' intermediate representation is designed to be much closer to what hardware can support directly. Basic operations include loading from and storing to memory, memory address computation, integer arithmetic, (conditional) branching, and function calls. At some point in the future, floating point operations will also be added. But because of this difference in abstraction level, a single MoarVM instruction will often map to many expression tree nodes. So what is needed is an efficient way to convert between the two representations, and that is what expression templates are supposed to do.

Expression templates are very much like the expression tree structure itself, in that both are represented as arrays of integers. Some of the elements represent instructions, some are constants, and some are references (indexes into the same array), forming a directed acyclic graph (not a tree). The only difference is that the template is associated with a set of instructions that indicate how it should be linked into the tree. (Instruction operands, i.e. the data that each instruction operates on, are prepared and linked by the template application process as well).

Surprisingly, arrays of integers aren't a very user-friendly way to write instruction templates, and so the template compiler was born. It takes as input a text file with expression templates defined as symbolic expressions, best known from the LISP world, and outputs a header file that contains the templates, ready for use by the JIT compiler. Note that the word 'template' has become a bit overloaded, referring to the textual input of the template compiler as well as to the binary input to the JIT compiler. That's okay, I guess, since they're really two representations of the same thing. The following table shows how template text, binary, and expression tree relate to each other:

Text 'Binary' Tree

(template: unless_i
  (when
    (zr $0)
    (branch (label $1))
  ))

template: {
  MVM_JIT_ZR,
  0,
  MVM_JIT_LABEL,
  1,
  MVM_JIT_BRANCH,
  2,
  MVM_JIT_WHEN,
  0,
  4,
},
info: ".f.f.l.ll",
len: 9,
root: 6

I hope it isn't too hard to see how one maps to the other. The unless_i instruction executes a branch if its integer argument is zero, specified by a constant as its second argument. All symbols (like when, label and zr) have been replaced by uppercase prefixed constants (MVM_JIT_WHEN), and all nesting has been replaced by references (indexes) into the template array. The 'info' string specifies how the template is to be linked into the tree. Instruction operands are indicated by an 'f', and internal links by an 'l'. In the tree representation the operands have been linked into the tree by the JIT; they form the LOAD and CONST nodes and everything below them.

Anyway, my improvement concerns a more complex form of template, such as the following example, an instruction to load an object value from the instance field of an object:

(template: sp_p6oget_o
  (let: (($val (load (add (^p6obody $1) $2) ptr_sz)))
    (if (nz $val) $val (^vmnull))))

This template contains a let: expression, which declares the $val variable. This value can be used in the subsequent expression by its name. Without such declarations the result of a computation could only have one reference, its immediate syntactic parent. (Or in other words, without let:, every template can only construct a tree). That is very inconvenient in case a result should be checked for null-ness, as in this case. (vmnull is a macro for the global 'null object' in MoarVM. The null object represents NULL wherever an object is needed, but isn't actually NULL, as that would mean it couldn't be dereferenced; it saves the interpreter from checking if a pointer to an object is NULL everywhere it is accessed).

The let: construct has another purpose: it ensures the ordering of operations. Although most operations can be ordered in whatever way suits the compiler, some do not, most notably function calls. (Function calls may have numerous unpredictable side effects, after all). All statements declared in the 'let declaration body' are compiled to run before any statements in the 'expression body'. This enables the programmer to ensure that a value is not needlessly computed twice, and more importantly, it ensures that a value that is used in multiple branches of a conditional statement is defined in both of them. For instance:


(let (($foo (...)))
     (if (...)
         (load $foo)
         $foo))

This pseudo-snippet of template code would dereference $foo if some condition is met (e.g. $foo is not NULL) and returns $foo directly otherwise. Without let to order the computation of $foo prior to the blocks of if, the first (conditional) child of if would be the first reference to $foo. That would mean that the code to compute $foo is only compiled in the first conditional block, which would not be executed whenever the if condition was not true, meaning that $foo would be undefined in the alternative conditional block. This would mean chaos. So in fact let does order expressions. All is good.

Except... I haven't told you how this ordering works, which is where my change comes in. Prior to commit 7fb1b10 the let expression would insert a hint to the JIT compiler to add the declared expressions as tree roots. The 'tree roots' are where the compiler starts converting the expression tree (graph) to a linear sequence of byte code. Hence the declaring expressions are compiled prior to the dependent expressions. But this has, of course, one big disadvantage, which is that the set of roots is global for the tree. Every declaration, no matter how deep into the tree, was to be compiled prior to the head of the tree. As a result, the following template code would not at all do what you want:


(let ($foo (...))
     (if (nz $foo)
         (let (($bar (load $foo))) # dereference $foo !
              (... $bar))
        ...)


The declaration of $bar would cause $foo to be dereferenced prior to checking whether it is non-null, causing a runtime failure. Chaos is back. Well, that's what I've changed. Fortunately, we have another ordering mechanism at our disposal, namely DO lists. These are nodes with a variable number of children that are also promised to be compiled in order. After the patch linked above, the compiler now transforms let expressions into the equivalent DO expressions. Because DO expressions can be nested safely, $bar is not computed prior to the null-check of $foo, as the programmer intended. I had originally intended to implement analysis to automatically order the expressions with regard to the conditionals, but I found that this was more complicated to implement and more surprising to the programmer. I think that in this case, relying on the programmer is the right thing.

One thing that I found interesting is that this reduces the number of mechanisms in the compiler. The 'root-hint' was no longer useful, and subsequently removed. At the same time, all but the last child of a DO list must be void expressions, i.e. yield no value, because DO can only return the value of its last child. Since all expressions in a let declaration must yield some value - otherwise they would be useless - they required a new operation type: discard. Thus with a new node type (extension of data range) we can remove a class of behavior.

After I had implemented this, I've started working on adding basic block analysis. That is a subject for a later post, though. Until next time!

Reacties

Populaire posts van deze blog

Reverse Linear Scan Allocation is probably a good idea

Retrospective of the MoarVM JIT

Something about IR optimization