- Part 1 - Data / Types / Referential Transparency / Value prop
- Part 2 - Programming with effects
- Part 3 - Typeclasses
- Part 4 - Practical effect manipulation with traverse and friends (
this post
)- based on a workshop repository you can work through
- the workshop questions
- the workshop solutions
- Part 5 - Basics of Final-tagless / ZIO
They are originally based on a series of 5 presentations I ran at my company in an attempt to get in front of some anti-Scala sentiment that pervaded much of the org at the time. The org suffered from the typical problem of hiring Java devs and then telling them to write to Scala. It assumes someone at least has a few months of Scala programming experience.
Workshop
So you've bought into effects and start using them. Inevitably, you will start running into some horrendous type signatures. The first time it happens you will throw your hands up in despair. We all did it. On my team people first hit it when they take list of urls, map some side-effecting function over it to make an HTTP request, and then maps some other function over it for parsing. You end up with something like:
val result: List[Future[Either[ParsingError, User]]]
Or something else suitably nested. Usually at this point, with a hot jira ticket burning a hole in their pocket, people resort to doing all sorts of weird stuff to brute force through it:
- pattern matching all the things
- this gets unreadable quickly
- blocking on futures
- dealing with
iterator
from thegrouped
function rather than a list - lots of
.map(_.map(_.map)
which gets unreadable quickly - falling back to exceptions for error handling
- intermediate variables in the wrong places
- unsafe
get
s - weird folds all over the place
- while loops and ListBuffers w/
Try/Catch
I was seeing this so often at work that I put together a workshop on it:
- the workshop questions
- the workshop solutions
If you don't know how to tackle such a nested type or it's never come up, stop reading and give the workshop a try. The goal is to work through the exercises and make all the tests go green. It's structured after fake work, that is, there is an initial design in exercise 1 that needs modification due to changing requirements in later exercises. Goals of the exercises are:
- get used to manipulating such types and realize they aren't so bad after all
- investigate what the type signature is trying to tell you about decisions you have to make in your business logic and that these gnarly type signatures are actually your friend
- learn about common pitfalls
- intermediate variables are a good idea but doing them naively will actually make the code harder to read
- type-first design
- it's kind of like Tetris
So stop here and give the workshop a try. I don't give many hints because the idea is to struggle a little. In the workshop I do this so people can have a feeling of when they are going down the wrong path. If everything feels difficult when encountering these types the first few times don't worry, you are not a bad programmer. I've been there myself! If you find yourself pattern matching all over the place, or your solution starts getting 10, 20, etc. lines long. You've probably gone a little too far. Scroll down a little for the combinators you are likely missing and then try again! We've all been there. You got this.
There are some language/library features I will point out that make dealing with these kind of problems pleasant. Hopefully I can convince you that those ugly type signatures are actually a strength, not just crufty noise.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Great! You've made it this far. If your solutions don't look like mine that's
ok! The real point of this workshop is to learn about Sequence
and Traverse
.
Bonus points for combineAll
from cats
which was introduced at the end of
Part 3 of this series.
Type-first development
A tool that can help us out is type-first development. I'm not sure if you would
call this type-driven development but that name also probably works. The idea
is to introduce some intermediate variables/functions, in a smart way, to
improve clarity, by just implementing the types. Unlike the naive approach, we don't do this from the outside
in so we aren't having to .map
to get into the first effect all the time.
Also, in scala, just because we can introduce an anonymous closure anywhere doesn't mean it's a good idea. We have all seen for-comprehensions that open up huge anonymous functions resulting in a mess that is hard to understand. This is not a dig on scala --no compiler in the world stops you from writing. If you are using anonymous functions, nested forcomps, folds, then they should be small. A good intermediate function can do wonders for readability.
So let's do that. We know there are three distinct parts of this problem:
- making the http request
- expanding the
batchResponse
intoList[Response]
- parsing each response
- doing all of the above to a list of urls
This isn't a big problem so we could inline this into a single function. But since we are trying to prove a point, let's flesh out parts 1 to 3. We'll do this by just implementing the type signatures but not the actual implementations. If it compiles, we know we're on the right track. In fact, I think the version with these intermediate functions is more readable. As you will see from the refactorings for other exercises, they also provide good hooks for changes.
When we do this, we want to factor out our effects as much as possible. We don't want effects on our inputs. We want to break it down into elements that we put together somehow (mapping, flatMapping, etc) into a bigger program. I don't want to take in a List[Responses], or a Future[Response]. I just want Response. In the exercise though we have a batchResponse, which will expand into a number of parsed responses:
def parseResponse(raw: BatchResponse, parser: String => Try[ParsedResponse] ): Either[ServiceError, List[ParsedResponse]] = ???
Now we have just implemented what we want on the type level. The ???
is valid
syntax and this will type check/compile fun but explode at runtime with a
not-implemented exception. Right now, we are just using types to play with the
shape of our problem and help us flesh out what we want to happen.
Now let's do the same with making a function that makes an http request and also parses the response. Again, we just want to deal with a single element. We are doing the smallest thing we can possibly do.
// fn that makes an http request and attempts to parse the response def getAndParseBatch(batch: BatchRequest): Future[Either[ServiceError, List[ParsedResponse]]] = ???
Now we have all the pieces to implement the top level of our program. It doesn't matter that we don't have implementations. We just want to line up the shape of our types. We will do the straight forward approach to surface our ugly type signature first:
// I've put in the types for clarity def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = { val batches: List[BatchRequest] = requests.grouped(batchSize).toList.map(BatchRequest.apply)) val result: List[Future[Either[ServiceError, List[ParsedResponse]]]] = batches.map(getAndParseBatch) ??? }
Great. We've surfaced our ugly type. This will compile and typecheck. We don't care that it explodes at runtime right now. We haven't done any implementations but we have described the nature of our problem with types. Notice what else has happened here: we've worked from the inside out. When tackling problems like this, I find this is better approach compared to making intermediate variables from the outside-in.
Before going further into the solution, let's talk about what contracts get implied by the types encountered.
Type signatures and contracts
The types are telling us a lot about the contract. Yes, yes, I setup the problem but they give you a map of what the contract might be. And if they aren't signalling the contract then it is a smell to address. If teammates are giving misleading type signatures then it's a teaching opportunity because clear intentions are for readability and maintainability.
Looking at the first exercise we have
Future[Either[ServiceError, List[ParsedResponse]]]
Sure this looks ugly but it's telling us a lot:
- We have some non-blocking action going on
- We have an IO channel (
Future
) and an (Error
) channel - The error channel is interesting: it's
Either[ServiceError, List[ParsedResponse]]
- we either get an error, or (presumably) all the answers
- If the type was
List[Either[ServiceError, ParsedResponse]]
then we would care about every individual parsing error. But since we can only have one error, we must be caring only about the first
- If the type was
- our caller cares about the error to recover from
- if they didn't, then using the error channel of the
Future
would be fine to handle all errors (including parsing)
- if they didn't, then using the error channel of the
- we either get an error, or (presumably) all the answers
- There is still room for improvement
- the type could be
Futuer[Either[ServiceError, NonEmptyList[ParsedResponse]]]
- the type could be
This is a lot of info! If we don't like this contract, we can shift it. This is what we do in later exercises. The dual error channel is an interesting problem. There are pros and cons to this approach and the industry seems pretty mixed on it. My rule of thumb:
Use Future[Either[E, R]]
to separate out channel related exceptions from
business logic errors:
- Keep channel errors inside the failure channel of
Future
. In our case, this is network errors. It could be file io errors or whatever - Keep business logic errors inside the Either channel. In our case, this is parsing errors. I.e. out network call was successful, but our response was something we didn't expect. In a real program this could be some business rule violation.
There are caveats though. If no one is going to recover from the business logic
error and a failure of the IO channel and a business failure both end up doing
the same thing, don't bother separating them out. I.e., if this was a webserver
and they are both going to return error jsons, then
Future[NonEmptyList[ParsedResponse]]
is probably the more clear response.
There is a small discussion the pros and cons of both approaches in Practical
FP in scala
You can do the same exercise of sussing out the contract with the remainder of the types.
Sequence and Traverse
The first tool to learn when tackling these gnarly types is Sequence. Or Traverse. One is
defined in terms of the other. These two abilities allow you to invert two
effects. That is, if you have an F[G[X]]
you can get a G[F[X]]
. It's ok to
think about it in simple terms like this. A more full definition can be found
in Haskellbook: "Traversable allows you to transform elements inside the structure
like a functor, producing applicative effects along the way, and lift
those potentially multiple instances of applicative structure outside
of the traversable structure. It is commonly described as a way to
traverse a data structure, mapping a function inside a structure while
accumulating the applicative contexts along the way." Luka Jacobowitz has a
number of talks, like this one, on traverse
Like I said, it lets you invert two effects (heh).
You may be familiar with this already from the Scala standard library as these
two functions exist for Future
. Cats expands the amount of types
defined so that they are more broadly useful.
This only works if my effect / container / box / whatever makes sense in your
head has an instance of the Traversable
typeclass:
trait Traverse[F[_]] extends Functor[F] { def traverse[G[_]: Applicative, A, B](fa: F[A])(f: A => G[B]): G[F[B]] def map[A, B](fa: F[A])(f: A => B): F[B] = traverse(fa)(a => Id(f(a))).value }
So traverse applies some effectful function and turns things inside out. The
interesting thing is say, we have a list of effectful functions then well, they
are already applied effectful functions! We can traverse using identity
to turn the
effects inside out:
List(1.some, 2.some).sequence == List(1.some, 2.some).traverse(identity)
In both cases we are turning a List[Option[Int]]
into a Option[List[Int]]
.
The latter is much simpler to unroll even further via pattern matching into some
other result at the end of the world. Note that there is some structure
preservation in the same way Functor
doesn't destroy structure. If the outer list has 5 elements, then the inner list must also
have 5 elements.
Now that we know about traverse we can easily solve the exercises. It's up to you how to fill in the missing types. It doesn't really matter if we work top down or bottom up here. Some problems may. Since I'm trying to show type-lead design, let's start from the top and get the shape of our problem solved and then go and flesh out the building blocks. Remember we were here:
// I've put in the types for clarity def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = { val batches: List[BatchRequest] = requests.grouped(batchSize).toList.map(BatchRequest.apply)) val result: List[Future[Either[ServiceError, List[ParsedResponse]]]] = batches.map(getAndParseBatch) ??? }
Moving this to use traverse
:
def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = { val batches = requests.grouped(batchSize).toList.map(BatchRequest.apply) batches.traverse(getAndParseBatch) }
But this won't compile. We get stuck in the following type:
Future[List[Either[ServiceError, List[ParsedResponse]]]]
This is close. But we're stuck with an extra List that we don't want. What we want is to combine each element of the list if it's a Right, but if it's a Left we want to fail out early with that error.
We need another tool in our toolbox to collapse those lists together.
Help from Monoids
The problem is we need to combine the outer List
with the inner List
. What
people reach for first here is usually one of two things:
- Some listBuffer and a foreach loop with a pattern match to unpack everything and pack it all back into the type they want
- If they are slightly more FP aware, a fold using
val acc: Either[ServiceError, List[ParsedResponse]] = List().asRight[ServiceError]
as the accumulator:- if the element is a
Left
, abort early and return that error - if the element is a
Right
, combine right the existingRight
- if the element is a
But from Part 3 - Typeclasses) this looks a lot like a Monoid. We want to key into
some plumbing and higher abstractions and do less work. We learned about
combineAll
for generically folding a list of monoids. A refresher:
List(1.some, 2.some, 3.some).combineAll // Some(6) List(1.some, 2.some, None, 3.some).combineAll // Some(6), remember the Monoid for [Int] treats None as 0
But we're not an Option. We're an either, which behaves a bit different in the
Left
case than then None
case for Option in exactly the way we want:
List(1.asRight[String], 2.asRight[String]).combineAll // result: Either[String, Int] = Right(3) List(1.asRight[String], "boop".asLeft[Int]).combineAll // result: Either[String, Int] = Left("boop") // very different than option!
Combing this with traverse
gives us what we want using Monoid for Either[E, A]
and
the Monoid for List[A]
:
val a = List(1).asRight[String] val b = List(2).asRight[String] val c = "bop".asLeft[List[Int]] val good = List(a, b).combineAll // result: Either[String, List[Int]] = Right(List(1,2)) val bad = List(a,b,c).combineAll // result: Either[String, List[Int]] = Left("bop")
Now we have what we want to solve exercise 1:
def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = { val batches = requests.grouped(batchSize).toList.map(BatchRequest.apply) batches.traverse(getAndParseBatch).map(_.combineAll) }
I think this is really neat. When I solved this, I fleshed out the types of the
other two functions, left their implementation as ???
and solved the shape
of my requestBatch
function before going on to do the rest of the implementations. I had
all the types lined up as a map and just had to follow the types to do the
implementations.
Continuing on for getAndParseBatch
which is straight forward. Note we could
have inlined this in requestBatch
and that would have been fine:
def getAndParseBatch(batch: BatchRequest): Future[Either[ServiceError, List[ParsedResponse]]] = Http.getBatch(batch).map(parseResponse(_, ParsedResponse.parser))
Traverse
makes an appearance again when writing our parseResponse
. When we
expand BatchResponse
and map an effectful function over it, we have the types
inside out again:
private def parseResponse(responses: BatchResponse, parser: String => Try[ParsedResponse]): Either[ServiceError, List[ParsedResponse]] = responses .responses .traverse(raw => parser(raw.value).toEither) .leftMap(e => ServiceError(e.getMessage)))
We sorted out the shape of our problem using just types and then worked through implementations. This can be a great way to code. Figure out what kind of contract you want to express, put it in types, then fill in the blanks. Honestly, it feels like programming on easy mode.
Remaining exercises
The remaining exercises are variations of the first:
- exercise 2: Fail if future fails, but if parsing fails that's ok. Return
Future[List[ParsedResponse]]
- exercise 3: Fail if future fails, fail if parsing fails but collect all the
parsing failures. Return is
Future[ValidatedNel[ServiceError, List[ParsedResponse]]]
- exercise 4: Like exercise 2 but network failures are also ignored. Return is
Future[List[ParsedResponse]]
- exercise 5: If errors happen, either network or parsing error, collect all of
them. Return is
Future[(List ServiceError), List[ParsedResponse])]
. There are many better ways to tackle this one.
If you attempted all of these, great! If you look at my solutions you will
notice that we managed to change a lot of business behavior without really
changing much code. The requestBatch
function doesn't actually change between
all five exercises other than changing the return type. What I'm really doing
is playing with types and letting a few functions do all the work.
Exercise 2
- We change
parseResponse
to return aList
of responses, possibly empty (if nothing parses). We don't change the other two functions except to change the return type. We lettraverse
andcombineAll
do all the work.
Exercise 3
- We change
parseResponse
slightly to convertEither[ServiceError, List[ParsedResponse]]
toValidatedNel[ServiceError, List[ParsedResponse]]
but otherwise this function is the same. We update the type signatures. The applicative instance, and thus thetraverse
for Validated handles the rest by combining our failures for us. This is a very useful type.
Exercise 4
- as per Exercise 2 except we convert
Future.failed
toFuture.success(List())
. - there are other ways we could have done this
Exercise 5
- Not a great solution but it works. You could also look into the IOR type
or a case class that holds both values with a custom monoid definition.
Either
doesn't work because it's a disjunction. We use a tuple and rely on the fact for any type(A, B)
then we have aMonoid[(A, B)]
provided that bothA
andB
have a monoid.
To me, having these types guide the refactoring by just playing with type
signatures and relying on the underlying type capabilities
(can be combined,
can be traversed) do the work for me is really powerful. We kept referential
transparency, wrangled some gnarly types, and easily manipulated our code in a
few lines. But these types are not a silver bullet for capturing all behavior
--you still need to read code. They can give you an intuition of what's going
on but it's not 100%. For example, Exercise 2 and Exercise 4 have the same
return type but have different underlying business logic.
Future / Task / IO and Referential Transparency
All the solutions use Future
as an effect. There is a problem though: Future
is not referentially transparent. Referentially transparency doesn't mean that
the answer we get is always the same (this viewpoint will come in handy
shortly), it means that we can replace a value with it's bound expression and
get the same result. For example:
val ex1 = { val r = new Random(0L) val f = Future(r.nextInt) for { a <- f b <- f } yield (a, b) } // Same as ex1, but we replace f // with it's bound expression val ex2 = { val r = new Random(0L) for { a <- Future(r.nextInt) b <- Future(r.nextInt) } yield (a, b) } ex1.onComplete(println) ex2.onComplete(println)
Are ex1 and ex2 the same? No! Because Future is not referentially transparent. Ex1 will assign (a,b) to be the same answer since r is only evaluated once and then memoized. In the second example, (a,b) will be different. We can't substitute an expression for the bound value.
Future has other weirdness going on too. It represents an already running computation and is actually hard to kill a running future. It leads to a lot of cache missing as well. Mapping a Future-effecting function over a list of values will potentially run each function on a different thread. This may not be what you want.
This has implications for traverse
. I was reviewing a PR earlier, and someone
had traversed an effectful future with a comment stating "return the results in
order". Some alarm bells went off because this is a subtle bug if you aren't
familiar with how Futures execute and don't look too deeply into traverse
.
Let's look at the following:
def futureThing(i: Int)(implicit r: Random, ec: ExecutionContext): Future[Int] = { Future { Thread.sleep(r.nextInt(250).toLong) println(s"processing $i") i } } val x = List(1,2,3,4,5) val res = Await.result(x.traverse(Blah.futureThing), 10.seconds) println(res)
On my computer, for one particular run I have the following output:
processing 3
processing 1
processing 5
processing 4
processing 2
List(1,2,3,4,5)
The person had introduced a bug in that PR. Yes the result is returned in order
but it's not necessarily executed in order. Remember, a Future
is an
already running computation. If we map over a list and apply a Future
effecting function, these are now running on different threads. The result is
ordered as the input-order, but the execution of them is interleaved.
Traverse
is really traversable functor
so it's no surprise that the behavior
is the same here.
Because the results were ordered, the person assumed that the execution was also
ordered. In their case, since they were using Future
and wanted to execute
sequentially over the list then they really wanted to
use fold
or foldM
(the monadic version where your accumulator is inside an effect)
If we take Task
from Monix or IO
from cats-effect or ZIO
well things are a bit different:
def taskThing(i: Int)(implicit r: Random): Task[Int] = { val delay = FiniteDuration(r.nextInt(250).toLong, TimeUnit.MILLISECONDS) Task { println(s"processing $i") i }.delayExecution(delay) } val x = List(1,2,3,4,5).traverse(taskThing).runToFuture val res = Await.result(x, 10.seconds) println(res)
No matter how many times you run this, you will get the results processed in order:
processing 1
processing 2
processing 3
processing 4
processing 5
List(1,2,3,4,5)
The effects are always processed in order. In one thread*
. Note that the end
result is the same between the Future
and Task
version. If we want to run
this as fast as possible, on as many threads as we can get from the execution
context then we need to replace
traverse
with parTraverse
for parallel traverse
. By going more
functional, we are actually being more explicit with what's going on. It's
much easier to recognize when things are running in parallel using ZIO/IO/TASK
than having to remember that Future
is kind of weird.
Side-effects and Referential Transparency
The big question is why is Task
(or IO
or ZIO
) so different than Future
.
It turns out that Task
and friends are all referentially transparent. Yes.
They give us the ability to do referentially transparent side-effects.
Referentially transparency has nothing to do with subsequent DB calls, for the
same key, returning the same result. We know that the state of the world may
have changed between those two calls. All referential transparency gives us is
that we can replace a value with it's bound expression. For me, this was the
first big mental leap I had to get over when really getting into FP.
There is a large reddit post on this topic that is worth reading. What it
really boils down to is the difference between evaluation and execution. In
the case of IO/Task/Zio
we aren't building up a stack of function calls like
we are with Future
. Instead we are building up set of data structures that describe
what we are doing. We can make them until the cows go home and nothing happens
until we execute (interpret) these data structures. Step into the source for
Monix Task and you'll
find out that flatMap
is actually a case class called FlatMap
:
/** [[Task]] state describing an immediate synchronous value. */ final case class Eval[A](thunk: () => A) extends Task[A] /** Internal state, the result of [[Task.defer]] */ final case class Suspend[+A](thunk: () => Task[A]) extends Task[A] /** Internal [[Task]] state that is the result of applying `flatMap`. */ final case class FlatMap[A, B](source: Task[A], f: A => Task[B]) extends Task[B] /** Internal [[Coeval]] state that is the result of applying `map`. */ final case class Map[S, +A](source: Task[S], f: S => A, index: Int) extends Task[A] with (S => Task[A])
So how IO
is referentially transparent is really just a stupid trick. When
we substitute a value for it's expression, all we are saying is that we can
build the same data structure. The fact that we may get a different answer
between subsequent calls during interpretation of that data structure has
nothing to do with referential transparency.
The following definitions may be useful:
- evaluation: refers to what scala does aka strict evaluation. Val vs def. By name vs. by value
- execution: refers to what you want your program to do when run
- evaluating:
unsafeRunSync
causes theexecution
of the IO program
Evaluating an IO expression does not cause execution. This is what we mean when we say IO is pure.
def foo(i: Int): Task[Int] { println("evaluation") Task(println(s"execution: $i")) } val triggersEvaluation: Task[Int] = foo(1) val triggersExecution = triggersEvaluation.unsafeRunSync
In foo
we have some side-effect being evaluated outside of our data
structure. It gets printed when we evaluate foo
. But when we evaluate foo
we are really building a data structure called Task
. If we ended our program
without ever executing anything, the Task
is not interpretted. Whatever
side-effect is embedded in the Task (in our case, printing to the screen but it could be DB
update or whatever) is never run. Only when we execute (interpret) our data
structure at the end of the world via unsafeRunSync
or runtoFuture
or
others, does this actually happen.
IO
, Task
, and ZIO
are actually runtimes. They build up data structures
that are then interpretted in a m:n threaded environment built on top of JVM
threads. In this environment, threads are cheap and simple to cancel. There is a whole world of referentially transparent functional
concurrency primitives that this opens up. I would recommend this talk by Gavin Busesi before looking up all of
Fabio Labella's talks in this domain. The folks in cats-effect gitter are
also helpful. Though different, there is a lot of overlap between
IO
and ZIO
such that learning one will get you going in the other. If you
are interested in the runtimes themselves, this talk on implementing a ZIO
runtime and building an effect
system is an amazing talk.