Threading Macros in Scheme

October 02, 2020

I’d always heard about the power of Lisp macros, which let you extend the language by rewriting code on the fly. You call macros just like functions, but unlike functions, the arguments aren’t eagerly evaluated before being passed into the macro body. The macro takes them as unevaluated data (numbers, strings, lists), rewrites and evaluates them as it sees fit, and hands them over to the interpreter.

As an example, Clojure (a popular modern Lisp) provides “thread-first” (->) and “thread-last” (->>) macros, which pipe a value through a sequence of functions in either the first or last position among arguments. For example:

(->> '(1 2 3 4)
     (filter even?) ; incremental value: (2 4)
     (map double)) ; final value: (4 8)

Note how filter and map, which both take two arguments, are provided with only one. The macro rewrites the expression while incrementally evaluating it, “threading” the results through in the last position (e.g., (filter even? '(1 2 3 4))). (You can see why ->> can’t be an ordinary function: if we eagerly evaluated either filter or map this way, it would break with a missing argument!) This achieves basically the same code style as compose/pipe and currying, but in a way that pushes complexity away from the individual steps (which no longer need to be curried) and up to the pipeline itself.

Though I’m familiar with the high-level notion of macros, I only recently peeked under their hood. I’ve been working through the venerable Structure and Intepretation of Computer Problems, which uses the minimal Lisp dialect Scheme as a teaching language. In problem 2.42, we solve for all the ways to place eight (or simply n) queens on a chessboard such that none are in check. The problem comes in the context of a discussion of map/filter/fold, and in particular the clarity introduced by piping data through these list operations.

Yet, despite this clarity, I still find it super hard to eyeball the solution and intuit what happens in what order:

(define (queens board-size)
    (define (queen-cols col-idx)
        (if (= col-idx 0)
            (list empty-queens)
                    (lambda (prev-queens)
                        (map (lambda (row-idx)
                                (adjoin-position row-idx col-idx prev-queens))
                             (enumerate-rows board-size)))
                    (queen-cols (- col-idx 1))))))
    (queen-cols board-size))

The tricky part isn’t the recursion—it’s this inner expression, in which the order of execution hops around both vertically and horizontally. Can you tell what runs in what order?

        (lambda (prev-queens)
            (map (lambda (row-idx)
                    (adjoin-position row-idx col-idx prev-queens))
                 (enumerate-rows board-size)))
        (queen-cols (- col-idx 1))))

This code could use a way to make the data flow more apparent. So, I took a detour from SICP to implement Clojure’s ->> in Scheme.

And it turns out it’s remarkably simple for something so powerful! define-syntax creates the macro, and syntax-rules uses pattern matching to define valid ways to invoke it: each pattern corresponds to a template that rewrites the expression. Macros can get quite a bit more involved than this, but it’s enough for our purposes.

(define-syntax ->>
    (syntax-rules ()
        ; Each syntax rule is a pair: a pattern to match on, and an expression to
        ; rewrite to. This first list is the pattern: both args (`->>`, `value`)
        ; are just variables, which we could name however we want. Since the first
        ; part of the pattern matches the macro name, we bind it to `->>`, but `_`
        ; is another popular choice.
        ; So, this pattern matches, e.g., (->> 3)...
        ((->> value)
            ; ...and rewrites to the expression 3, our expected result

        ; Here we match a more complex pattern, where the second var is a list--
        ; that is, a function call. Note the ellipses, which gather multiple items
        ; into `args` and `rest`.
        ; This would match, e.g., (->> 3 (+ 1 2) (/ 3))...
        ((->> value (fn args ...) rest ...)
            ; ...and rewrite it to (->> (+ 1 2 3) (/ 3)), or (->> 6 (/ 3)).
            ; Note that we just accomplished the "thread-last" part!
            ; Obviously we're not at a final value yet: (->> 6 (/ 3)) will invoke
            ; the macro again. So, we "cycle through," evaluating incrementally,
            ; until we reach a final value.
            (->> (fn args ... value) rest ...))

        ; Finally, here we match a named function in the second position:
        ; e.g.. (->> 3 inc (/ 3))
        ((->> value fn rest ...)
            ; ...and pipe it through just like we did above: (->> 4 (/ 3))
            (->> (fn value) rest ...))))

…And that’s all we need!

Here’s the same complex expression from above, with ->> helping make the data flow clear: we recursively call queen-cols to get solutions so far, call flatmap to add possible new solutions, and finally call filter to keep only the solutions that work.

(->> (queen-cols (- col-idx 1))
     (flatmap (lambda (prev-queens)
                (->> (enumerate-rows board-size)
                     (map (lambda (row-idx)
                            (adjoin-position row-idx col-idx prev-queens))))))
     (filter safe?))

Lisp enthusiasts place a lot of emphasis on the language(s) being homoiconic (that is, “self-representing”): written in the exact same syntax that represents raw data. Lists, for example, represent both literal lists (data) and function invocations (language). The idea sounds a little academic, but this example shows its power. Programs already manipulate strings and lists in the course of regular data processing; with macros, they can manipulate strings and lists to write other programs. Lisp helped popularize higher-order functions: functions that take and build other functions. Macros make Lisp a higher-order language: one that parses and writes itself.