It's never a bad idea to take a stroll through the source code for Rust's standard library. There's a lot to see, including high-performance data structures, meticulously-designed system interfaces, and rock-solid concurrency primitives. Personally, I've learned a lot just from studying (and using) the elegant APIs provided by the Result
and Option
types.
But, for a Rust developer, the standard library serves another vital purpose: it is chock full of clever ideas for how to manage various ergonomics issues you'll encounter initially when writing Rust. Indeed, it is a particularly valuable source of such techniques because, given the language's young age, the solution to every problem isn't exactly plastered all over Stack Overflow quite yet.
It was during just such a stroll that one day — I can still remember it distinctly, even though it happened long ago — I "discovered" a technique that freed me from the chains of tedious boilerplate forever.
I had come across something like this — can you guess what it does?
impl u32 { uint_impl! { u32, u32, 32, 4294967295, "", "", 8, "0x10000b3", "0xb301", "0x12345678", "0x78563412", "0x1e6a2c48", "[0x78, 0x56, 0x34, 0x12]", "[0x12, 0x34, 0x56, 0x78]", "", "" } }
If you guessed, "implements checked_add
, checked_div
, checked_div_euclid
, checked_mul
, checked_neg
, checked_next_power_of_two
, and 61 other methods
+
for the primitive type u32
, then congratulations!
uint_impl!
is a 1,889 line macro in core::num
that not only implements most of u32
's functionality, but also that of u8
, u16
, u64
, u128
and usize
.
Macros are used heavily in the standard library to generate the same code for different types.
Now, depending on your point of view, the possibility of generating a bunch of boring boilerplate code with a clever macro might be laughably obvious. But to me, someone who'd never used a language with macros before, it was a revelation.
And, once I learned how to do it myself, the experiences of tedium when it came to implementing traits largely became a distant memory.
Back to Basics: What are Macros?
Macros are a special mini-language within Rust that allow you to generate Rust code programmatically.
println!
is a macro, which is why it can have a variable number of arguments, unlike a normal Rust function:
println!(); println!("{}", 1); println!("{} {}", 1, 2); println!("{} {} {}", 1, 2, 3);
Another builtin, vec!
, creates a new Vec
from a list of items. This code:
let xs: Vec<i32> = vec![1, 2, 3];
... is functionally equivalent to:
let xs: Vec<i32> = { let mut out = Vec::with_capacity(3); out.push(1); out.push(2); out.push(3); out };
Fairly early during the compilation process, macros are "expanded" into normal Rust code. The generated code is then combined with the surrounding (non-macro) Rust code, type-checked, optimized, and linked into the final program.
Macros are powerful tools that ought to be treated with respect. But unlike C and C++ preprocessor macros, which are stunted in capability (functioning by simple string substitution), and dangerously easy to misuse (e.g. no distinct syntax to help tell a macro apart from a normal function call), Rust macros have several safeguards that limit the damage you (or your favorite colleague) can inflict.
That includes obvious syntax — a_macro_call!
invocations are easily distinguished by the exclamation point — and "partially hygienic" scoping rules, which limit the surrounding context they "capture."
Declarative Macros vs. Procedural Macros
Macros defined using macro_rules!
are called "declarative" macros, or sometimes "macros by example." This article is about declarative macros. There is another type of macros called procedural macros. If you've ever used serde
, it uses a procedural macro to allow you to derive Serialize
and Deserialize
. Procedural macros are declared in special proc-macro
crates that are only used for defining procedural macros. Unlike macro_rules!
declarative macros, procedural macros are written in normal Rust code, using normal Rust types to operate on the source code.
Procedural macros are much more powerful than declarative macros, but they also have a steeper learning curve, require more effort to declare and use, and are still an evolving part of the language. For quickly working around boilerplate, macro_rules!
is still a valuable tool.
Simple Example: debug_display!
This is a situation I've come across a number of times:
I had derived Debug
on a type, but now wanted to pass it to a generic function that required it to implement Display
. But for my case, Debug
was good enough; it was only for me to see. What I wanted was to just "delegate" Display
to the derived Debug
implementation without too much fuss.
And now I can:
use std::fmt::Display; #[derive(Debug)] struct Point { pub x: f64, pub y: f64, } macro_rules! debug_display { ($t:ident) => { impl std::fmt::Display for $t { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { write!(f, "{:?}", self) } } } } debug_display!(Point); // poof! `Point` now implements `Display`
Philosophical Objections
Now, before we go any further: a lot of people feel quite strongly that macros are a bad idea altogether. Just this week on r/rust, user u/ragnese expressed representative objections to the "magic" of macros:
I don't care for macros in general. I treat macros the same as dependencies: only if necessary or if it's going to save me a lot of time. I hate magic. I already know Rust. Why would I invoke a macro that could be doing anything with some obscure, ad hoc, DSL that its author created?
Alexis Beingessner writes that macros are "fragile" and create debugging difficulties:
The compiler can't evaluate the body of a macro matches its signature, and that the macro is being invoked correctly for its signature. It just has to expand the macro out to some code and then check that code. This leads to the standard problem with dynamic programming: late binding of errors. With macros, we can get the spiritual equivalent of "undefined in not a function" in the compiler.
Macros, especially when mixed with Rust's new async
syntax, can occasionally explode into "spectacularly huge" compiler errors — take a look at this doozy.
Relatedly to debugging, macros cause problems for IDEs, if you're in to that sort of thing. The author of IntelliJ's IDEA plugin for Rust describes macros as one of several "IDE-hostile" parts of the language's syntax. They are a "nightmare" for IDEs, responds another person on the thread.
Also, macro are downright ugly, says Brenden Matthews:
Rust macros feel like a left turn compared to the rest of the language. To be fair, I haven’t been able to grok them yet, and yet they feel out of place like some strange bolt-on appendage, one which only came about after the language was designed, inspired by Perl. I will eventually take some time to understand them properly in the future, but right now I want to avoid them. Like the plague.
All of these are reasonable opinions. Some of them I identify with — strongly, even. I'm certainly no friend to magic. In my book, the fact that Rust is significantly more readable than any other language I've coded in is the very best thing it has going for it.
The first point I'd like to propose, however, is that macros can be an exceedingly efficient tool — not as a replacement for Rust's rich type system — but in service of it. By all means, define the public API of your crate with a trait. To me, macros are mostly a tool to help you use traits effectively. Traits with less typing, if you will. (Rust's fantastic readability can involve a lot of typing).
The second point is maintainability. Consider the uint_impl!
from the standard library: would it be better if there were duplicate implementations of checked_add
for each of u8
, u16
, u32
, u64
and u128
? Code duplication is a liability — it increases the surface area for possible bugs. Changing code that's repeated in many places is tedious and error-prone.
Finally, while I'll admit to occasionally going overboard with macros in the past (check this out if you want something that'll really blow your hair back), what I'm suggesting here is a simple, understandable pattern that provides a significant amount of efficiency in practice: don't duplicate, generate.
For code that would otherwise be duplicated for multiple types, write it as a macro that takes the type as a parameter, and use it to generate the code you would have duplicated.
Learning Macros: the 'Little Book'
Don't let its 'Little' title mislead you: boilerplate code everywhere quivers in fear at the mighty black book!
As I learned how to escape the tyranny of boilerplate, no resource was more valuable to me than the "Little Book of Rust Macros" by Daniel Keep.
Although the book has not been updated in several years, it was an incredible resource when published. In my opinion, it still contains some of the best treatments of the subject available. I've returned to it literally dozens of times.
Particularly good chapters include:
- the overview of
macro_rules!
- Keep's explanation of the complex interaction between identifiers and keywords
- A clever fix for problems with trailing separators
- "Incremental TT munchers"
Scoping Subtleties
With macros, type-checking happens after expansion (similar to how C++ templates work). This is a key difference from Rust's generics, which are type-checked before expansion.
Consider the macro below: the code it expands to would quite obviously run afoul of the borrow checker, but it compiles without incident, as simply declaring the macro (without invoking it) expands to nothing:
macro_rules! ah_ah_ah_didnt_say_the_magic_word { ($x:ident) => { let x_mut_1 = &mut $x; let x_mut_2 = &mut $x; *x_mut_1 += 1i32; } }
The minute you call the macro (i.e. ah_ah_ah_didnt_say_the_magic_word!()
), the generated code triggers this error message:
error[E0499]: cannot borrow `x` as mutable more than once at a time
--> src/main.rs:8:27
|
7 | let x_mut_1 = &mut $x;
| ------- first mutable borrow occurs here
8 | let x_mut_2 = &mut $x;
| ^^^^^^^ second mutable borrow occurs here
9 | *x_mut_1 += 1;
| ------------- first borrow later used here
...
16 | ah_ah_ah_didnt_say_the_magic_word!(x);
| -------------------------------------- in this macro invocation
Rust generics, on the other hand, are eagerly type-checked.
For example, this generic min
function won't compile without adding a PartialOrd
bound on T
:
fn min<T>(a: T, b: T) -> T { if a < b { a } else { b } } // rustc: "binary operation `<` cannot be applied to type `T`"
But the equivalent macro will compile:
macro_rules! min { ($a:expr, $b:expr) => { if $a < $b { $a } else { $b } } }
The code generated by a macro will be still be type checked in the context where it is expanded, and macros are also checked against the syntax and typing rules of the macros mini-language. In this example, the first call is a correct invocation of add_one!
, the second runs afoul of the post-expansion type checking, and the third violates the macro_rules!
"type" checks (which are more like "syntax checks", as you are matching against categories of Rust syntax):
macro_rules! add_one { ($x:ident) => { $x + 1i32 } } let a: i32 = 42; let b: std::collections::HashMap<String, u32> = Default::default(); add_one!(a); // Ok: expands to a + i32, where `a` is an i32; add_one!(b); // (type) Err: expands to HashMap<String, u32> + i32, a type error add_one!(42i32) // (macro) Err: "no rules expected the token `42i32`" - 42i32 is not an ident
The distinction creates different uses for each tool. Traits' strengths include:
- Traits are composable
- a generic function can require any combination of traits
- new traits can build on existing traits
- Traits are explicit and expressive
- a good way for communicating in code what the expectations of a type are.
However,
- Traits are verbose, with significant syntactical overhead
- Traits are abstract, and can be confusing
- Some patterns, even good ones, are difficult to express with traits (as they currently exist in Rust)
To me, the shortfalls and annoyances of traits are hugely reduced by having macros handy to fill in the gaps as needed.
Advice and Best Practices
Some general recommendations from my experiences using macros:
Use Fully Qualified Paths
In the implementation of debug_display!
above, the Display
and Debug
traits from the standard library are referred to using their full paths (i.e. std::fmt::Display
, std::fmt::Debug
).
Using fully-qualified paths in the body of a macro eliminates possible name ambiguity if, for instance, it the macro referred to a name that had been redefined in the context it was expanded in.
Dynamically-Generating Names Doesn't Work Well
Once you realize you can programmatically generate code, it's only a matter of time until you'll want to name the items you generate programmatically, something like:
macro_rules! make_adder_fn { ($n:expr) => { fn add_$n(rhs: i32) -> i32 { // this does not work $n + rhs } } } make_adder_fn!(42);
However, this is not really possible with macro_rules!
. Solutions have been discussed, and there are some things (like concat_idents!
) that seem promising at first glance, but nothing has the capabilities you would need to generate names cleanly.
(This is coming from someone who's wasted more time than I care to admit trying to work around this, without success.)
Instead, provide the name of your function, type or other item as a parameter to the macro:
macro_rules! make_adder_fn { ($f:ident, $n:expr) => { fn $f(rhs: i32) -> i32 { $n + rhs } } } make_adder_fn!(add_42, 42);
Import/Export Rules Are Confusing
First, there's several different ways to import and export macros. Rust's 2018 edition added improved syntax for this, but the 2015 edition syntax is still legal, so it's pretty easy to get tripped up.
In the 2015 edition, you would use another crate's macros with the macro_use
attribute above its extern crate
declaration:
#[macro_use] extern crate clap;
macro_use
was also used to make a macro available internally inside the crate it was defined in (akin to pub(crate)
). And, there's a separate attribute, macro_export
used to make the macro available outside the crate (akin to pub
).
With the 2018 edition, you can just import the macro like any other public item (the extern crate
isn't required anymore, either):
use clap::crate_version; const VERSION: &str = crate_version!();
(One remaining case where the 2015 #[macro_use] extern crate crate_name;
syntax is still useful is if you want to import all of a crate's macros without tediously listing them all by name.)
For further reading, the 2018 edition guide has a section on changes to macro syntax. The "Scoping" and "Import/Export" sections of the "Little Book" are also clarifying.
Code Order Can Matter
Another easy pitfall: unlike a fn
, struct
, enum
, or impl
block, a macro_rules!
macro is only visible to code that comes after (below) it is declared.
fn a() -> i32 { b() } // this is ok, even though `b` is below const B: i32 = b(); // also ok, even though `b` is below const fn b() -> i32 { 42 } d!(); // not ok macro_rules! d { () => { fn spooky_numbers() -> [usize; 6] { [4, 8, 15, 16, 23, 42] } } }
A common practice is to put all macros in a mod
that can easily be introduced at the tops of files with a use
statement.
ident
: the Flexible "Fragment"
The basic structure of a macro is similar to a match
block, except with pattern matching based on a raw syntax tokens.
Each named parameter in the macro corresponds to one of about 10 syntax "fragments," or categories, including expr
(an expression), ty
(a type), and so on.
Some of these are fairly self-evident in terms of usage, but I've been surprised how often I ended up using the ident
fragment, which stands for "identifier", i.e. a name.
ident
parameters can refer to variable names, functions, traits, enum variants, object attributes, object methods, and, although there is also the specific ty
fragment for types, ident
can also stand-in for a type in many situations (namely, the lack of any named type parameters).
One frequent use of ident
is for supplying the name of an item you are using a macro to generate. In this case, a ty
fragment won't work, because there is not such type just yet — you're about to create it with the macro:
macro_rules! make_ident_struct { ($t:ident) => { struct $t { pub x: f64, pub y: f64, } } } make_ident_struct!(A); // works
Using $t:ty
instead of $t:ident
would not work.
However, since $t:ident
will match against type name A
, but not type A<T>
(with a parameterized type), the need to use an ident
when generating new types with generic parameters becomes somewhat tricky. Sometimes, tinkering is involved.
A final note: another fragment, tt
, corresponds to a single token tree, which can be almost anything. That's obviously much more flexible than ident
, and is exactly what you're looking for when constructing advanced DSL-type macros. In my experience, tt
matches against too many things to be used cleanly and easily for simple macros.
Case Study: State Machine Enum
Let's say you have a struct Event<T>
:
use chrono::{DateTime, Utc}; use uuid::Uuid; pub struct Event<T> { pub time: DateTime<Utc>, pub id: Uuid, pub event: T, }
And some types to hold the data for certain types of events:
use std::net::Ipv4Addr; pub struct Pending{ pub recipient: Ipv4Addr, } #[derive(Default)] pub struct Sending { pub payload: Vec<u8>, pub bytes_sent: usize, pub prev: Event<Pending>, } #[derive(Default)] pub struct Sent { pub ack_req: bool, pub prev: Event<Sending>, } #[derive(Default)] pub struct Ack { pub data: Vec<u8>, pub prev: Event<Sent>, } #[derive(Default)] pub struct Finished<T> { pub prev: Event<T>, }
And an enum to represent the types of events that might occur in your application, perhaps as a means of storing all the Event<T>
instances you're tracking in a single collection:
pub enum Active { Pending(Event<Pending>), Sending(Event<Sending>), Sent(Event<Sent>), Acked(Event<Ack>), FinishedNoAck(Event<Finished<Sent>>), FinishedAcked(Event<Finished<Ack>>), }
Now you're looking at all these types and starting to notice some patterns. Every Active
inner-Event
has a time
and id
, for instance. And four out of five of them have a prev
field that refers to the last state, which could be used to calculate duration relative to the time
of each Event<T>
.
Pretty soon, you start getting trait happy:
pub trait Timestamped { fn time(&self) -> DateTime<Utc>; } pub trait Chronological { type Prev; type Next; fn prev(&self) -> Self::Prev; fn next(&self) -> Self::Next; } pub trait Elapsed<P, N>: Chronological<Prev = P, Next = N> { fn elapsed(&self) -> chrono::Duration; } impl<T> Timestamped for Event<T> { fn time(&self) -> DateTime<Utc> { self.time } } impl<P, N, T> Elapsed<P, N> for T where T: Chronological<Prev = P, Next = N>, P: Timestamped, N: Timestamped { fn elapsed(&self) -> chrono::Duration { self.next().time().signed_duration_since(self.prev().time()) } } impl<N, T> Elapsed<(), N> for T where T: Chronological<Prev = (), Next = N> { fn elapsed(&self) -> chrono::Duration { chrono::Duration::seconds(0) } }
The parameterized polymorphism: it's glorious!
Hindley-Milner! But usable! The absolute power!
But...what's this? The shackles of boilerplate code materialize out of thin air!
10,000 years in the Cave of Wonders
implementing Chronological
for every possible Event<T>
ought to chill him out!
Like, who wants to do more than one of these:
impl Timestamped for Active { fn time(&self) -> DateTime<Utc> { use Active::*; match self { Pending(event) => event.time, Sending(event) => event.time, Sent(event) => event.time, Acked(event) => event.time, FinishedNoAck(event) => event.time, FinishedAcked(event) => event.time, } } }
Instead, write it once:
macro_rules! event_attr { ($method:ident, $t:ty, $attr:ident) => { fn $method(&self) -> $t { use Active::*; match self { Pending(event) => event.$attr, Sending(event) => event.$attr, Sent(event) => event.$attr, Acked(event) => event.$attr, FinishedNoAck(event) => event.$attr, FinishedAcked(event) => event.$attr, } } } } impl Timestamped for Active { event_attr!(time, DateTime<Utc>, time); }
To generate a method marked by an optional visibility specifier (pub
or pub(crate)
), you can use the vis
fragment:
macro_rules! optionally_public_event_attr { // note: `vis` is automatically optional, doesn't need 0,1 repetition // i.e. `$($pub:vis)*`. this will work as-is if there is nothing there ($pub:vis $method:ident, $t:ident, $attr:ident) => { $pub fn $method(&self) -> $t { use Active::*; match self { Pending(event) => event.$attr, Sending(event) => event.$attr, Sent(event) => event.$attr, Acked(event) => event.$attr, FinishedNoAck(event) => event.$attr, FinishedAcked(event) => event.$attr, } } } } impl Active { optionally_public_event_attr!(pub id, Uuid, id); }
What about this, does this look fun? How about six times in a row?
impl From<Event<Pending>> for Active { fn from(pending: Event<Pending>) -> Active { Active::Pending(pending) } }
Or, you could write one macro:
macro_rules! from_event { ($t:ty, $variant:ident) => { impl From<Event<$t>> for Active { fn from(event: Event<$t>) -> Active { Active::$variant(event) } } } } from_event!(Pending, Pending); from_event!(Sending, Sending); from_event!(Sent, Sent); from_event!(Ack, Acked); from_event!(Finished<Sent>, FinishedNoAck); from_event!(Finished<Ack>, FinishedAcked);
Rust's Result
and Option
have helper methods like Result::is_ok(&self) -> bool
and Option::is_none(&self) -> bool
that can be very convenient. We should have those!
macro_rules! variant_check { ($f:ident, $variant:ident) => { impl Active { pub fn $f(&self) -> bool { match self { Active::$variant(..) => true, _ => false, } } } } } variant_check!(is_pending, Pending); variant_check!(is_sending, Sending); variant_check!(is_sent, Sent); variant_check!(is_acked, Acked); variant_check!(is_finished_noack, FinishedNoAck); variant_check!(is_finished_acked, FinishedAcked);
Usage:
let pending = Pending { recipient: Ipv4Addr::new(127, 0, 0, 1) }; let event = Event { time: Utc::now(), id: Uuid::new_v4(), event: pending }; let active = Active::from(event); assert_eq!(active.is_pending(), true);
Here's the same idea, but matching against several enum variants at once:
macro_rules! multiple_variant_check { ($f:ident; $( $variant:ident ),* ) => { impl Active { pub fn $f(&self) -> bool { match self { $( Active::$variant(..) => { true } )* _ => false, } } } } } multiple_variant_check!(is_finished; FinishedNoAck, FinishedAcked); multiple_variant_check!(is_still_unfinished; Pending, Sending, Sent, Acked); multiple_variant_check!(anything_but_sent; Pending, Sending, Acked, FinishedNoAck, FinishedAcked);
In multiple_variant_check!
, the $( $variant:ident ),*
is used to match against a variable number of comma-separated identifiers.
Next, if you patch a few missing Default
implementations, you can generate constructors that put a default Event<T>
in the correct Active
variant:
// needed because Ipv4Addr does not implement Default // other event type structs derive Default instead impl Default for Pending { fn default() -> Self { Self { recipient: Ipv4Addr::new(127, 0, 0, 1) } } } impl<T> Default for Event<T> where T: Default { fn default() -> Self { Self { time: Utc::now(), id: Uuid::new_v4(), event: T::default(), } } } impl Default for Active { fn default() -> Self { Active::Pending(Default::default()) } } macro_rules! new_with_default { ($f:ident, $t:ty) => { impl Active { pub fn $f() -> Self { let inner: Event<$t> = Default::default(); Self::from(inner) } } } } new_with_default!(new_pending, Pending); new_with_default!(new_sending, Sending); new_with_default!(new_sent, Sent); new_with_default!(new_acked, Ack); new_with_default!(new_finished_noack, Finished<Sent>); new_with_default!(new_finished_acked, Finished<Ack>);
And when it comes to testing all this brand new code, you know what to do!
#[cfg(test)] mod tests { use super::*; #[test] fn generated_method_check_for_variant() { macro_rules! check_for_variant { ($new:ident, $single:ident, $multiple:ident, $not:ident) => { let active = Active::$new(); assert!(active.$single()); assert!(active.$multiple()); assert!( ! active.$not()); } } check_for_variant!(new_pending, is_pending, is_still_unfinished, is_sent); check_for_variant!(new_sending, is_sending, is_still_unfinished, is_acked); check_for_variant!(new_sent, is_sent, is_still_unfinished, is_finished_noack); check_for_variant!(new_acked, is_acked, is_still_unfinished, is_pending); check_for_variant!(new_finished_noack, is_finished_noack, is_finished, is_still_unfinished); check_for_variant!(new_finished_acked, is_finished_acked, is_finished, is_still_unfinished); } }
Conclusion
Don't duplicate, generate!
Declarative macros are a powerful tool for generating code that would be either entirely or largely duplicative.
Although some tinkering is often required, especially when you start out, learning macros is still a tremendous win for productivity in the long run.
Coding Rust has never been the same for me after learning how to eliminate a lot of the repetitive stuff with macros. Hopefully I can pass this lesson on to others so that they, too, can be freed from the shackles of boilerplate once and for all.
Notes and Further Reading
Huge thanks to u/Quxxy for reviewing this article and providing excellent feedback!
Some other pieces of interest:
- Small crate with all code examples used in this article
- Zig Blurs Line Between Compile-Time and Run-Time: how Zig's innovative compile-time reflection capabilities, with comparison to Rust's macro system
- An amazing display of
macro_rules!
prowess that dances the line between genius and insanity - Rust Reference entry for
macro_rules!
- Rust Book chapter on macros
- For Daniel Keep acolytes: earlier article on macros by author of the "Little Book" (somewhat out of date, but very detailed)
- Special rules for which type of syntax fragment can come after which in a
macro_rules!
macro