Today we are going to look at something that came up a bunch when I finally got to write some thirty hours of Rust at work: how do I decide between different implementations of something in Rust. Big deal right? It's not, but there can certainly be some syntax involved and piecing it together from various chapters of books was a pain so I'm writing it down.
Example
In Rust there is a complexity cliff once you introduce async fn
and Futures
. The types and annotations get heavier
and the compiler errors become much less helpful than the wonderful ones your normally get. So in this example
we are going to look at changing out behavior in a component that returns a Future / uses an async fn
.
We'll run into the following concepts:
async fn
s returnimpl Futures
see here and here we are going to be focusing on dynamic dispatch. When we return animpl Future
we aren't returning a specific type, but promising that the type we return implements this trait so they let the caller use methods from the trait on the object they get back.Future
is a trait. It is not a type. It has no size and holds no data. It's used as an interfaace.
Box<dyn Future<Output=SomeType>>
means storing a type where all we know is that it implements this trait. We don't know the size of this object at compile time, but at runtime we can follow pointer to the actual implementation (think vtables). It's allocated on the heap. The box is a pointer (of known size) we can store it on stack frames where we can only put things of known size.Pin<Box<dyn Future<Output=SomeType>>>
is the same as the above but we also tell the compiler that the data on the heap can n ever move. This is out of my wheel house but I put some links below for more information. The compiler is making sure you can not get a mutable reference to the stored data.
As far as implementing this, in an OOP language this would be Interfaces + subclassing. You could do the same in scala or you can solve it with: an ADT, passing an anonymous function (the simplest and most general interface in reality), or something with type classes.
We'll use a trivial example for exploring this in Rust
- we accept integers
- in one implementation we return the integer as is
- in another implementation we return the square of the integer
- we're going to pretend a bunch of work is happening somewhere else and concurrency is involved so we'll be using
async/await
andFutures
Fire up a new project and add the following to your Cargo.toml
[package] name = "async-example" version = "0.1.0" edition = "2021" [dependencies] async-trait = "0.1.64" tokio = { version = "1.25.0", features = ["macros", "rt-multi-thread"] }
Option 1 - Enum/ADT
The simplest approach is the Enum/ADT approach followed by some pattern. It's not really an option but if you can
get away with it, it is certainly the lightest on syntax. You have to know
all your options at compile time aka already know all the possible things you are going to construct. It has a number of other drawbacks as well.
enum Printer { Identity, Square, } impl Printer { pub async fn do_work(&self, i: u32) -> u32 { match &self { Printer::Identity => i, Printer::Square => i * i, } } } #[tokio::main] async fn main() { let i = 10; let case = Printer::Identity; println!("one option: {}", case.do_work(i).await); let case2 = Printer::Square; println!("other option: {}", case2.do_work(i).await); }
Running this gives us the expected
one option: 10
other option: 100
Pros:
- Straight forward, provided you can setup all your cases ahead of time.
- Not heavy on the syntax
Cons:
- Need to know all your cases ahead of time, not dynamic
- Expression Problem. You can add more variants but you have to update every patern match
- Not extensible by third parties. No one else could extend Printer for their own type.
- Even in your own code, what if you wanted a
TestPrinter
orRecordingPrinter
or something, you can't define it just in your tests. It would bleed into your production code.
- Even in your own code, what if you wanted a
- Not compositional
It works, but not what we need here.
Option 2 - Trait Objects
Trait objects come up as a way to do (handwavy) sort of do interface/implementations in Rust. The book above does not do them great justice given this obtuse example text:
However, trait objects are more like objects in other languages in the sense that they combine data and behavior. But trait objects differ from traditional objects in that we can’t add data to a trait object. Trait objects aren’t as generally useful as objects in other languages: their specific purpose is to allow abstraction across common behavior.
TLDR; instead of returning a concrete type you are returning some type that at least implements the trait. This comes at a cost for us:
- Trait objects aren't types so we can't throw them in a
struct
- We are using a trait object but not returning one from a function, so we can't use
impl trait
, and need to usedyn trait
to indicate they are dynamically dispatched. - dyn Traits are types that we can return, but we don't know their size at compile time so we need to box them and put them on the heap
Say I have a struct and I want to embed different implementations of something onto it:
use async_trait::async_trait; #[async_trait] pub trait Printer { async fn print(&self, i: u32) -> u32; } pub struct IdentityPrinter; pub struct SquarePrinter; #[async_trait] impl Printer for IdentityPrinter { async fn print(&self, i: u32) -> u32 { i } } #[async_trait] impl Printer for SquarePrinter { async fn print(&self, i: u32) -> u32 { i * i } } struct Print<S: Printer> { printer: S } impl<S: Printer> Print<S> { async fn print(&self, i: u32) -> u32 { self.printer.print(i).await } }
You immediately run into the first problem which is that Rust cannot have async functions in trait. You can get around it
by using the async-trait crate to have functions return impl Futures
plus an ungodly amount of other annotations. You can read more on it here if you care.
This feels like identical to what you would do in an OOP language. Does it work?
#[tokio::main] async fn main() { let i = 10; let case_one = Print { printer: IdentityPrinter }; let case_two = Print { printer: SquarePrinter }; println!("{}", case_one.printer.print(i).await); println!("{}", case_two.printer.print(i).await); }
It seems like it does but it has a dirty little secret: we did not specify the types of case_one
and case_two
. The compiler has inferred the type case_one
to be Print<IdentityPrinter>
and case_two
to
be Print<SquarePrinter>
, while correct this isn't the whole story. If we wanted to pass this around generically as Print<Printer>
which
is what we'll want to do when solving real problems we are out of luck.
let case_one: Print<Printer> = Print { printer: IdentityPrinter }; let case_two: Print<Printer> = Print { printer: SquarePrinter };
And we explode with many errors. A shortlist:
error[E0277]: the size for values of type `dyn Printer` cannot be known at compilation time
--> src/main.rs:132:18
|
132 | printer: SquarePrinter
| ^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `dyn Printer`
note: required by a bound in `Print`
--> src/main.rs:111:14
|
111 | struct Print<S: Printer> {
| ^ required by this bound in `Print`
help: you could relax the implicit `Sized` bound on `S` if it were used through indirection like `&S` or `Box<S>`
--> src/main.rs:111:14
|
111 | struct Print<S: Printer> {
| ^ this could be changed to `S: ?Sized`...
112 | printer: S
| - ...if indirection were used here: `Box<S>`
Doing all the necessary boxing, keywords and sytnax you will end up with the finished implementation
use async_trait::async_trait; #[async_trait] pub trait Printer { async fn print(&self, i: u32) -> u32; } pub struct IdentityPrinter; pub struct SquarePrinter; #[async_trait] impl Printer for IdentityPrinter { async fn print(&self, i: u32) -> u32 { i } } #[async_trait] impl Printer for SquarePrinter { async fn print(&self, i: u32) -> u32 { i * i } } struct Print<S: Printer +?Sized> { printer: Box<S> } impl<S: Printer + ?Sized> Print<S> { async fn print(&self, i: u32) -> u32 { self.printer.print(i).await } } #[tokio::main] async fn main() { let i = 10; let case_one: Print<dyn Printer> = Print { printer: Box::new(IdentityPrinter) }; let case_two: Print<dyn Printer> = Print { printer: Box::new(SquarePrinter) }; println!("{}", case_one.printer.print(i).await); println!("{}", case_two.printer.print(i).await); }
Not so bad once you wrap your head around it once. We can do typeclasses and pass around a generic version of our typeclass but we need to store it on the heap and deal w/ a bunch of syntax to make it happen. Not the worst.
Pros:
- easy to provide new implementations later
- do not need to know them all ahead of time
- can seperate out a
Test
implementation later - can add new implementations w/o having to update code (no need to change pattern matches, etc.)
- other people can extend
Cons:
- syntax heavy w/ a bit of indirection (generics + trait bounds, the ?Sized bound, etc.)
A more functional approach by passing around a function
My motto from FP work in Scala/Haskell is: when in doubt, pass a function. It is the lightest of interfaces and often a great choice. Rust supports higher order functions so let's give that a whirl.
This is really easy in scala or Haskell. It looks something like this:
val identity: Int => IO[Int] = i => IO(i) val square: Int => IO[Int] = i => IO(i*i)
So let's treat our Print
struct as a bag of data and throw a field on it that holds onto a function. You see this in haskell all the time
as something called a record of functions. Since we can define functions like any other piece of data, we can pass them around, accept
them into methds (vec![1,2,3].into.map(|i| i+ 1))
for instance), put them on structs, etc. Unfortunately, while passing around functions
is easy in Rust, passing around functions that return Future
(which is hiding behind every async fn) is anything but.
The setup:
struct Print { printer: ??? // it's not something simple like Fn(u32) -> u32 // alternatively // let identity = |i: u32| async { i }; async fn identity(i: u32) -> u32 { i } // alternatively // let square = |i: u32| async { i*i }; async fn square(i: u32) -> u32 { i * i } #[tokio::main] async fn main() { let i = 10; let case_one = Print { printer: identity }; let case_two = Print { printer: square }; println!("{}", case_one.printer.print(i).await); println!("{}", case_two.printer.print(i).await); }
This is an area where the compiler is less nice. You immediately run into problems:
- Future is a trait, not a concrete type, so we are back in
dyn
territory - We'll need to not only
Box
our Future becuase we don't know it's size at compile time, we'll need toPin
it as well- I'm still wrapping my head around Pin. Suggested reading: Pin and Suffering
- We need to throw the whole thing on the heap
use std::{future::Future, pin::Pin}; // type alias so I can start fitting this on a screen type PinnedFuture<T> = Pin<Box<dyn Future<Output = T>>>> struct Print { pub print: Box<dyn FnOnce(u32) -> PinnedFuture<u32> } ... let case_one = Print { print: Box::pin(identity) }; let case_two = Print { print: Box::pin(square) }; ...
Unfortunately we have three problems here
- No implicit sugar to turn an async fn into a FnOnce for us. We need to handle this explicitly
- We need to box the whole thing, inlucind the function
- Rust sees functions that return the same thing as two different types (investigate Futures and opaque types). You run into this if you try to choose between two functions in a match statement or an if statemnet. So you need an explicit cast to the same type.
Doing that:
let case_one = Print { print: Box::new(|u: u32| Box::pin(identity(u)) as PinnedFuture<u32> }; let case_two = Print { print: Box::new(|u: u32| Box::pin(square(u)) as PinnedFuture<u32> };
And now it works but wow at what a cost compared to the scala solution.
Pros:
- passing a function is the lightest of interfaces
- easy to construct the functions themselves (con: the wrapping sucks)
- opens up possibilities for function composition
- easy to extend by third parties
Cons:
- syntax heavy with poor error messages from the compiler