Productive Rust: Implementing Traits with Macros

  • Friends don't let friends write boilerplate trait impls
  • If you find yourself needing to write the same code twice for different types, generate it instead
  • You don't even need fancy "procedural" macros, good old macro_rules! will do.

It's never a bad idea to take a stroll through the source code for Rust's standard library. There's a lot to see, including high-performance data structures, meticulously-designed system interfaces, and rock-solid concurrency primitives. Personally, I've learned a lot just from studying (and using) the elegant APIs provided by the Result and Option types.

But, for a Rust developer, the standard library serves another vital purpose: it is chock full of clever ideas for how to manage various ergonomics issues you'll encounter initially when writing Rust. Indeed, it is a particularly valuable source of such techniques because, given the language's young age, the solution to every problem isn't exactly plastered all over Stack Overflow quite yet.

It was during just such a stroll that one day — I can still remember it distinctly, even though it happened long ago — I "discovered" a technique that freed me from the chains of tedious boilerplate forever.

I had come across something like this — can you guess what it does?

impl u32 {
    uint_impl! { u32, u32, 32, 4294967295, "", "", 8, "0x10000b3", "0xb301", "0x12345678",
    "0x78563412", "0x1e6a2c48", "[0x78, 0x56, 0x34, 0x12]", "[0x12, 0x34, 0x56, 0x78]", "", "" }
}

If you guessed, "implements checked_add, checked_div, checked_div_euclid, checked_mul, checked_neg, checked_next_power_of_two, and 61 other methods + for the primitive type u32, then congratulations!

uint_impl! is a 1,889 line macro in core::num that not only implements most of u32's functionality, but also that of u8, u16, u64, u128 and usize.

Macros are used heavily in the standard library to generate the same code for different types.

Now, depending on your point of view, the possibility of generating a bunch of boring boilerplate code with a clever macro might be laughably obvious. But to me, someone who'd never used a language with macros before, it was a revelation.

And, once I learned how to do it myself, the experiences of tedium when it came to implementing traits largely became a distant memory.

Back to Basics: What are Macros?

Macros are a special mini-language within Rust that allow you to generate Rust code programmatically.

println! is a macro, which is why it can have a variable number of arguments, unlike a normal Rust function:

println!();
println!("{}", 1);
println!("{} {}", 1, 2);
println!("{} {} {}", 1, 2, 3);

Another builtin, vec!, creates a new Vec from a list of items. This code:

let xs: Vec<i32> = vec![1, 2, 3];

... is functionally equivalent to:

let xs: Vec<i32> = {
    let mut out = Vec::with_capacity(3);
    out.push(1);
    out.push(2);
    out.push(3);
    out
};

Fairly early during the compilation process, macros are "expanded" into normal Rust code. The generated code is then combined with the surrounding (non-macro) Rust code, type-checked, optimized, and linked into the final program.

Macros are powerful tools that ought to be treated with respect. But unlike C and C++ preprocessor macros, which are stunted in capability (functioning by simple string substitution), and dangerously easy to misuse (e.g. no distinct syntax to help tell a macro apart from a normal function call), Rust macros have several safeguards that limit the damage you (or your favorite colleague) can inflict.

That includes obvious syntax — a_macro_call! invocations are easily distinguished by the exclamation point — and "partially hygienic" scoping rules, which limit the surrounding context they "capture."

Declarative Macros vs. Procedural Macros

Macros defined using macro_rules! are called "declarative" macros, or sometimes "macros by example." This article is about declarative macros. There is another type of macros called procedural macros. If you've ever used serde, it uses a procedural macro to allow you to derive Serialize and Deserialize. Procedural macros are declared in special proc-macro crates that are only used for defining procedural macros. Unlike macro_rules! declarative macros, procedural macros are written in normal Rust code, using normal Rust types to operate on the source code.

Procedural macros are much more powerful than declarative macros, but they also have a steeper learning curve, require more effort to declare and use, and are still an evolving part of the language. For quickly working around boilerplate, macro_rules! is still a valuable tool.

Simple Example: debug_display!

This is a situation I've come across a number of times:

I had derived Debug on a type, but now wanted to pass it to a generic function that required it to implement Display. But for my case, Debug was good enough; it was only for me to see. What I wanted was to just "delegate" Display to the derived Debug implementation without too much fuss.

And now I can:

use std::fmt::Display;

#[derive(Debug)]
struct Point {
    pub x: f64,
    pub y: f64,
}

macro_rules! debug_display {
    ($t:ident) => {
        impl std::fmt::Display for $t {
            fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                write!(f, "{:?}", self)
            }
        }
    }
}

debug_display!(Point); // poof! `Point` now implements `Display`

Philosophical Objections

Now, before we go any further: a lot of people feel quite strongly that macros are a bad idea altogether. Just this week on r/rust, user u/ragnese expressed representative objections to the "magic" of macros:

I don't care for macros in general. I treat macros the same as dependencies: only if necessary or if it's going to save me a lot of time. I hate magic. I already know Rust. Why would I invoke a macro that could be doing anything with some obscure, ad hoc, DSL that its author created?

ragnese

Alexis Beingessner writes that macros are "fragile" and create debugging difficulties:

The compiler can't evaluate the body of a macro matches its signature, and that the macro is being invoked correctly for its signature. It just has to expand the macro out to some code and then check that code. This leads to the standard problem with dynamic programming: late binding of errors. With macros, we can get the spiritual equivalent of "undefined in not a function" in the compiler.

Alexis Beingessner

Macros, especially when mixed with Rust's new async syntax, can occasionally explode into "spectacularly huge" compiler errors — take a look at this doozy.

Relatedly to debugging, macros cause problems for IDEs, if you're in to that sort of thing. The author of IntelliJ's IDEA plugin for Rust describes macros as one of several "IDE-hostile" parts of the language's syntax. They are a "nightmare" for IDEs, responds another person on the thread.

Also, macro are downright ugly, says Brenden Matthews:

Rust macros feel like a left turn compared to the rest of the language. To be fair, I haven’t been able to grok them yet, and yet they feel out of place like some strange bolt-on appendage, one which only came about after the language was designed, inspired by Perl. I will eventually take some time to understand them properly in the future, but right now I want to avoid them. Like the plague.

Brenden Matthews

All of these are reasonable opinions. Some of them I identify with — strongly, even. I'm certainly no friend to magic. In my book, the fact that Rust is significantly more readable than any other language I've coded in is the very best thing it has going for it.

The first point I'd like to propose, however, is that macros can be an exceedingly efficient tool — not as a replacement for Rust's rich type system — but in service of it. By all means, define the public API of your crate with a trait. To me, macros are mostly a tool to help you use traits effectively. Traits with less typing, if you will. (Rust's fantastic readability can involve a lot of typing).

The second point is maintainability. Consider the uint_impl! from the standard library: would it be better if there were duplicate implementations of checked_add for each of u8, u16, u32, u64 and u128? Code duplication is a liability — it increases the surface area for possible bugs. Changing code that's repeated in many places is tedious and error-prone.

Finally, while I'll admit to occasionally going overboard with macros in the past (check this out if you want something that'll really blow your hair back), what I'm suggesting here is a simple, understandable pattern that provides a significant amount of efficiency in practice: don't duplicate, generate.

For code that would otherwise be duplicated for multiple types, write it as a macro that takes the type as a parameter, and use it to generate the code you would have duplicated.

Learning Macros: the 'Little Book'

little book of rust macros

Don't let its 'Little' title mislead you: boilerplate code everywhere quivers in fear at the mighty black book!

As I learned how to escape the tyranny of boilerplate, no resource was more valuable to me than the "Little Book of Rust Macros" by Daniel Keep.

Although the book has not been updated in several years, it was an incredible resource when published. In my opinion, it still contains some of the best treatments of the subject available. I've returned to it literally dozens of times.

Particularly good chapters include:

Scoping Subtleties

With macros, type-checking happens after expansion (similar to how C++ templates work). This is a key difference from Rust's generics, which are type-checked before expansion.

Consider the macro below: the code it expands to would quite obviously run afoul of the borrow checker, but it compiles without incident, as simply declaring the macro (without invoking it) expands to nothing:

macro_rules! ah_ah_ah_didnt_say_the_magic_word {
    ($x:ident) => {
        let x_mut_1 = &mut $x;
        let x_mut_2 = &mut $x;
        *x_mut_1 += 1i32;
    }
}

The minute you call the macro (i.e. ah_ah_ah_didnt_say_the_magic_word!()), the generated code triggers this error message:

error[E0499]: cannot borrow `x` as mutable more than once at a time
  --> src/main.rs:8:27
   |
7  |             let x_mut_1 = &mut $x;
   |                           ------- first mutable borrow occurs here
8  |             let x_mut_2 = &mut $x;
   |                           ^^^^^^^ second mutable borrow occurs here
9  |             *x_mut_1 += 1;
   |             ------------- first borrow later used here
...
16 |     ah_ah_ah_didnt_say_the_magic_word!(x);
   |     -------------------------------------- in this macro invocation

Rust generics, on the other hand, are eagerly type-checked.

For example, this generic min function won't compile without adding a PartialOrd bound on T:

fn min<T>(a: T, b: T) -> T {
    if a < b { a } else { b }
}

// rustc: "binary operation `<` cannot be applied to type `T`"

But the equivalent macro will compile:

macro_rules! min {
    ($a:expr, $b:expr) => {
        if $a < $b { $a } else { $b }
    }
}

The code generated by a macro will be still be type checked in the context where it is expanded, and macros are also checked against the syntax and typing rules of the macros mini-language. In this example, the first call is a correct invocation of add_one!, the second runs afoul of the post-expansion type checking, and the third violates the macro_rules! "type" checks (which are more like "syntax checks", as you are matching against categories of Rust syntax):

macro_rules! add_one {
    ($x:ident) => {
        $x + 1i32
    }
}
let a: i32 = 42;
let b: std::collections::HashMap<String, u32> = Default::default();

add_one!(a);    // Ok: expands to a + i32, where `a` is an i32;
add_one!(b);    // (type) Err: expands to HashMap<String, u32> + i32, a type error
add_one!(42i32) // (macro) Err: "no rules expected the token `42i32`" - 42i32 is not an ident

The distinction creates different uses for each tool. Traits' strengths include:

  • Traits are composable
    • a generic function can require any combination of traits
    • new traits can build on existing traits
  • Traits are explicit and expressive
    • a good way for communicating in code what the expectations of a type are.

However,

  • Traits are verbose, with significant syntactical overhead
  • Traits are abstract, and can be confusing
  • Some patterns, even good ones, are difficult to express with traits (as they currently exist in Rust)

To me, the shortfalls and annoyances of traits are hugely reduced by having macros handy to fill in the gaps as needed.

Advice and Best Practices

Some general recommendations from my experiences using macros:

Use Fully Qualified Paths

In the implementation of debug_display! above, the Display and Debug traits from the standard library are referred to using their full paths (i.e. std::fmt::Display, std::fmt::Debug).

Using fully-qualified paths in the body of a macro eliminates possible name ambiguity if, for instance, it the macro referred to a name that had been redefined in the context it was expanded in.

Dynamically-Generating Names Doesn't Work Well

Once you realize you can programmatically generate code, it's only a matter of time until you'll want to name the items you generate programmatically, something like:

macro_rules! make_adder_fn {
    ($n:expr) => {
        fn add_$n(rhs: i32) -> i32 {  // this does not work
            $n + rhs
        }
    }
}
make_adder_fn!(42);

However, this is not really possible with macro_rules!. Solutions have been discussed, and there are some things (like concat_idents!) that seem promising at first glance, but nothing has the capabilities you would need to generate names cleanly.

(This is coming from someone who's wasted more time than I care to admit trying to work around this, without success.)

Instead, provide the name of your function, type or other item as a parameter to the macro:

macro_rules! make_adder_fn {
    ($f:ident, $n:expr) => {
        fn $f(rhs: i32) -> i32 {
            $n + rhs
        }
    }
}
make_adder_fn!(add_42, 42);

Import/Export Rules Are Confusing

First, there's several different ways to import and export macros. Rust's 2018 edition added improved syntax for this, but the 2015 edition syntax is still legal, so it's pretty easy to get tripped up.

In the 2015 edition, you would use another crate's macros with the macro_use attribute above its extern crate declaration:

#[macro_use]
extern crate clap;

macro_use was also used to make a macro available internally inside the crate it was defined in (akin to pub(crate)). And, there's a separate attribute, macro_export used to make the macro available outside the crate (akin to pub).

With the 2018 edition, you can just import the macro like any other public item (the extern crate isn't required anymore, either):

use clap::crate_version;

const VERSION: &str = crate_version!();

(One remaining case where the 2015 #[macro_use] extern crate crate_name; syntax is still useful is if you want to import all of a crate's macros without tediously listing them all by name.)

For further reading, the 2018 edition guide has a section on changes to macro syntax. The "Scoping" and "Import/Export" sections of the "Little Book" are also clarifying.

Code Order Can Matter

Another easy pitfall: unlike a fn, struct, enum, or impl block, a macro_rules! macro is only visible to code that comes after (below) it is declared.


fn a() -> i32 { b() } // this is ok, even though `b` is below

const B: i32 = b(); // also ok, even though `b` is below

const fn b() -> i32 { 42 }

d!(); // not ok

macro_rules! d { 
    () => { 
        fn spooky_numbers() -> [usize; 6] { 
            [4, 8, 15, 16, 23, 42] 
        } 
    }
}

A common practice is to put all macros in a mod that can easily be introduced at the tops of files with a use statement.

ident: the Flexible "Fragment"

The basic structure of a macro is similar to a match block, except with pattern matching based on a raw syntax tokens.

Each named parameter in the macro corresponds to one of about 10 syntax "fragments," or categories, including expr (an expression), ty (a type), and so on.

Some of these are fairly self-evident in terms of usage, but I've been surprised how often I ended up using the ident fragment, which stands for "identifier", i.e. a name.

ident parameters can refer to variable names, functions, traits, enum variants, object attributes, object methods, and, although there is also the specific ty fragment for types, ident can also stand-in for a type in many situations (namely, the lack of any named type parameters).

One frequent use of ident is for supplying the name of an item you are using a macro to generate. In this case, a ty fragment won't work, because there is not such type just yet — you're about to create it with the macro:

macro_rules! make_ident_struct {
    ($t:ident) => {
        struct $t {
            pub x: f64,
            pub y: f64,
        }
    }
}
make_ident_struct!(A); // works

Using $t:ty instead of $t:ident would not work.

However, since $t:ident will match against type name A, but not type A<T> (with a parameterized type), the need to use an ident when generating new types with generic parameters becomes somewhat tricky. Sometimes, tinkering is involved.

A final note: another fragment, tt, corresponds to a single token tree, which can be almost anything. That's obviously much more flexible than ident, and is exactly what you're looking for when constructing advanced DSL-type macros. In my experience, tt matches against too many things to be used cleanly and easily for simple macros.

Case Study: State Machine Enum

Let's say you have a struct Event<T>:

use chrono::{DateTime, Utc};
use uuid::Uuid;

pub struct Event<T> {
    pub time: DateTime<Utc>,
    pub id: Uuid,
    pub event: T,
}

And some types to hold the data for certain types of events:

use std::net::Ipv4Addr;

pub struct Pending{
    pub recipient: Ipv4Addr,
}

#[derive(Default)]
pub struct Sending {
    pub payload: Vec<u8>,
    pub bytes_sent: usize,
    pub prev: Event<Pending>,
}

#[derive(Default)]
pub struct Sent {
    pub ack_req: bool,
    pub prev: Event<Sending>,
}

#[derive(Default)]
pub struct Ack {
    pub data: Vec<u8>,
    pub prev: Event<Sent>,
}

#[derive(Default)]
pub struct Finished<T> {
    pub prev: Event<T>,
}

And an enum to represent the types of events that might occur in your application, perhaps as a means of storing all the Event<T> instances you're tracking in a single collection:

pub enum Active {
    Pending(Event<Pending>),
    Sending(Event<Sending>),
    Sent(Event<Sent>),
    Acked(Event<Ack>),
    FinishedNoAck(Event<Finished<Sent>>),
    FinishedAcked(Event<Finished<Ack>>),
}

Now you're looking at all these types and starting to notice some patterns. Every Active inner-Event has a time and id, for instance. And four out of five of them have a prev field that refers to the last state, which could be used to calculate duration relative to the time of each Event<T>.

Pretty soon, you start getting trait happy:

pub trait Timestamped {
    fn time(&self) -> DateTime<Utc>;
}

pub trait Chronological {
    type Prev;
    type Next;

    fn prev(&self) -> Self::Prev;
    fn next(&self) -> Self::Next;
}

pub trait Elapsed<P, N>: Chronological<Prev = P, Next = N> {
    fn elapsed(&self) -> chrono::Duration;
}

impl<T> Timestamped for Event<T> {
    fn time(&self) -> DateTime<Utc> { self.time }
}

impl<P, N, T> Elapsed<P, N> for T
    where T: Chronological<Prev = P, Next = N>,
          P: Timestamped,
          N: Timestamped
{
    fn elapsed(&self) -> chrono::Duration {
        self.next().time().signed_duration_since(self.prev().time())
    }
}

impl<N, T> Elapsed<(), N> for T
    where T: Chronological<Prev = (), Next = N>
{
    fn elapsed(&self) -> chrono::Duration {
        chrono::Duration::seconds(0)
    }
}

The parameterized polymorphism: it's glorious!

jafar before he realizes

Hindley-Milner! But usable! The absolute power!

But...what's this? The shackles of boilerplate code materialize out of thin air!

jafar before he realizes

10,000 years in the Cave of Wonders implementing Chronological for every possible Event<T> ought to chill him out!

Like, who wants to do more than one of these:

impl Timestamped for Active {
    fn time(&self) -> DateTime<Utc> {
        use Active::*;
        match self {
            Pending(event) => event.time,
            Sending(event) => event.time,
            Sent(event) => event.time,
            Acked(event) => event.time,
            FinishedNoAck(event) => event.time,
            FinishedAcked(event) => event.time,
        }
    }
}

Instead, write it once:

macro_rules! event_attr {
    ($method:ident, $t:ty, $attr:ident) => {
        fn $method(&self) -> $t {
            use Active::*;
            match self {
                Pending(event) => event.$attr,
                Sending(event) => event.$attr,
                Sent(event) => event.$attr,
                Acked(event) => event.$attr,
                FinishedNoAck(event) => event.$attr,
                FinishedAcked(event) => event.$attr,
            }
        }
    }
}

impl Timestamped for Active {
    event_attr!(time, DateTime<Utc>, time);
}

To generate a method marked by an optional visibility specifier (pub or pub(crate)), you can use the vis fragment:

macro_rules! optionally_public_event_attr {

    // note: `vis` is automatically optional, doesn't need 0,1 repetition
    // i.e. `$($pub:vis)*`. this will work as-is if there is nothing there
    
    ($pub:vis $method:ident, $t:ident, $attr:ident) => {
        $pub fn $method(&self) -> $t {
            use Active::*;
            match self {
                Pending(event) => event.$attr,
                Sending(event) => event.$attr,
                Sent(event) => event.$attr,
                Acked(event) => event.$attr,
                FinishedNoAck(event) => event.$attr,
                FinishedAcked(event) => event.$attr,
            }
        }
    }
}

impl Active {
    optionally_public_event_attr!(pub id, Uuid, id);
}

What about this, does this look fun? How about six times in a row?

impl From<Event<Pending>> for Active {
    fn from(pending: Event<Pending>) -> Active {
        Active::Pending(pending)
    }
}

Or, you could write one macro:

macro_rules! from_event {
    ($t:ty, $variant:ident) => {
        impl From<Event<$t>> for Active {
            fn from(event: Event<$t>) -> Active {
                Active::$variant(event)
            }
        }
    }
}

from_event!(Pending, Pending);
from_event!(Sending, Sending);
from_event!(Sent, Sent);
from_event!(Ack, Acked);
from_event!(Finished<Sent>, FinishedNoAck);
from_event!(Finished<Ack>, FinishedAcked);

Rust's Result and Option have helper methods like Result::is_ok(&self) -> bool and Option::is_none(&self) -> bool that can be very convenient. We should have those!

macro_rules! variant_check {
    ($f:ident, $variant:ident) => {
        impl Active {
            pub fn $f(&self) -> bool {
                match self {
                    Active::$variant(..) => true,
                    _ => false,
                }
            }
        }
    }
}

variant_check!(is_pending, Pending);
variant_check!(is_sending, Sending);
variant_check!(is_sent, Sent);
variant_check!(is_acked, Acked);
variant_check!(is_finished_noack, FinishedNoAck);
variant_check!(is_finished_acked, FinishedAcked);

Usage:

let pending = Pending { recipient: Ipv4Addr::new(127, 0, 0, 1) };
let event = Event { time: Utc::now(), id: Uuid::new_v4(), event: pending };
let active = Active::from(event);
assert_eq!(active.is_pending(), true);

Here's the same idea, but matching against several enum variants at once:

macro_rules! multiple_variant_check {
    ($f:ident; $( $variant:ident ),* ) => {
        impl Active {
            pub fn $f(&self) -> bool {
                match self {
                    $( Active::$variant(..) => { true } )*

                    _ => false,
                }
            }
        }
    }
}

multiple_variant_check!(is_finished; FinishedNoAck, FinishedAcked);
multiple_variant_check!(is_still_unfinished; Pending, Sending, Sent, Acked);
multiple_variant_check!(anything_but_sent; Pending, Sending, Acked, FinishedNoAck, FinishedAcked);

In multiple_variant_check!, the $( $variant:ident ),* is used to match against a variable number of comma-separated identifiers.

Next, if you patch a few missing Default implementations, you can generate constructors that put a default Event<T> in the correct Active variant:

// needed because Ipv4Addr does not implement Default
// other event type structs derive Default instead
impl Default for Pending {
    fn default() -> Self {
        Self { recipient: Ipv4Addr::new(127, 0, 0, 1) }
    }
}

impl<T> Default for Event<T>
    where T: Default
{
    fn default() -> Self {
        Self {
            time: Utc::now(),
            id: Uuid::new_v4(),
            event: T::default(),
        }
    }
}

impl Default for Active {
    fn default() -> Self {
        Active::Pending(Default::default())
    }
}


macro_rules! new_with_default {
    ($f:ident, $t:ty) => {
        impl Active {
            pub fn $f() -> Self {
                let inner: Event<$t> = Default::default();
                Self::from(inner)
            }
        }
    }
}

new_with_default!(new_pending, Pending);
new_with_default!(new_sending, Sending);
new_with_default!(new_sent, Sent);
new_with_default!(new_acked, Ack);
new_with_default!(new_finished_noack, Finished<Sent>);
new_with_default!(new_finished_acked, Finished<Ack>);

And when it comes to testing all this brand new code, you know what to do!

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn generated_method_check_for_variant() {
        macro_rules! check_for_variant {
            ($new:ident, $single:ident, $multiple:ident, $not:ident) => {
                let active = Active::$new();
                assert!(active.$single());
                assert!(active.$multiple());
                assert!( ! active.$not());
            }
        }

        check_for_variant!(new_pending, is_pending, is_still_unfinished, is_sent);
        check_for_variant!(new_sending, is_sending, is_still_unfinished, is_acked);
        check_for_variant!(new_sent, is_sent, is_still_unfinished, is_finished_noack);
        check_for_variant!(new_acked, is_acked, is_still_unfinished, is_pending);
        check_for_variant!(new_finished_noack, is_finished_noack, is_finished, is_still_unfinished);
        check_for_variant!(new_finished_acked, is_finished_acked, is_finished, is_still_unfinished);
    }
}

Conclusion

Don't duplicate, generate!

Declarative macros are a powerful tool for generating code that would be either entirely or largely duplicative.

Although some tinkering is often required, especially when you start out, learning macros is still a tremendous win for productivity in the long run.

Coding Rust has never been the same for me after learning how to eliminate a lot of the repetitive stuff with macros. Hopefully I can pass this lesson on to others so that they, too, can be freed from the shackles of boilerplate once and for all.

Notes and Further Reading

Huge thanks to u/Quxxy for reviewing this article and providing excellent feedback!

Some other pieces of interest: