Getting Func-ey Part 4 - Practical effect manipulation

24 minute read Published: 2020-01-28

Part 1 - Data / Types / Referential Transparency / Value prop
Part 2 - Programming with effects
Part 3 - Typeclasses
Part 4 - Practical effect manipulation with traverse and friends (this post)
- based on a workshop repository you can work through
- the workshop questions
- the workshop solutions
Part 5 - Basics of Final-tagless / ZIO

They are originally based on a series of 5 presentations I ran at my company in an attempt to get in front of some anti-Scala sentiment that pervaded much of the org at the time. The org suffered from the typical problem of hiring Java devs and then telling them to write to Scala. It assumes someone at least has a few months of Scala programming experience.

Workshop

So you've bought into effects and start using them. Inevitably, you will start running into some horrendous type signatures. The first time it happens you will throw your hands up in despair. We all did it. On my team people first hit it when they take list of urls, map some side-effecting function over it to make an HTTP request, and then maps some other function over it for parsing. You end up with something like:

val result: List[Future[Either[ParsingError, User]]]

Or something else suitably nested. Usually at this point, with a hot jira ticket burning a hole in their pocket, people resort to doing all sorts of weird stuff to brute force through it:

pattern matching all the things
- this gets unreadable quickly
blocking on futures
dealing with iterator from the grouped function rather than a list
lots of .map(_.map(_.map) which gets unreadable quickly
falling back to exceptions for error handling
intermediate variables in the wrong places
unsafe gets
weird folds all over the place
while loops and ListBuffers w/ Try/Catch

I was seeing this so often at work that I put together a workshop on it:

the workshop questions
the workshop solutions

If you don't know how to tackle such a nested type or it's never come up, stop reading and give the workshop a try. The goal is to work through the exercises and make all the tests go green. It's structured after fake work, that is, there is an initial design in exercise 1 that needs modification due to changing requirements in later exercises. Goals of the exercises are:

get used to manipulating such types and realize they aren't so bad after all
investigate what the type signature is trying to tell you about decisions you have to make in your business logic and that these gnarly type signatures are actually your friend
learn about common pitfalls
- intermediate variables are a good idea but doing them naively will actually make the code harder to read
type-first design
- it's kind of like Tetris

So stop here and give the workshop a try. I don't give many hints because the idea is to struggle a little. In the workshop I do this so people can have a feeling of when they are going down the wrong path. If everything feels difficult when encountering these types the first few times don't worry, you are not a bad programmer. I've been there myself! If you find yourself pattern matching all over the place, or your solution starts getting 10, 20, etc. lines long. You've probably gone a little too far. Scroll down a little for the combinators you are likely missing and then try again! We've all been there. You got this.

There are some language/library features I will point out that make dealing with these kind of problems pleasant. Hopefully I can convince you that those ugly type signatures are actually a strength, not just crufty noise.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Great! You've made it this far. If your solutions don't look like mine that's ok! The real point of this workshop is to learn about Sequence and Traverse. Bonus points for combineAll from cats which was introduced at the end of Part 3 of this series.

Type-first development

A tool that can help us out is type-first development. I'm not sure if you would call this type-driven development but that name also probably works. The idea is to introduce some intermediate variables/functions, in a smart way, to improve clarity, by just implementing the types. Unlike the naive approach, we don't do this from the outside in so we aren't having to .map to get into the first effect all the time.

Also, in scala, just because we can introduce an anonymous closure anywhere doesn't mean it's a good idea. We have all seen for-comprehensions that open up huge anonymous functions resulting in a mess that is hard to understand. This is not a dig on scala --no compiler in the world stops you from writing. If you are using anonymous functions, nested forcomps, folds, then they should be small. A good intermediate function can do wonders for readability.

So let's do that. We know there are three distinct parts of this problem:

making the http request
expanding the batchResponse into List[Response]
parsing each response
doing all of the above to a list of urls

This isn't a big problem so we could inline this into a single function. But since we are trying to prove a point, let's flesh out parts 1 to 3. We'll do this by just implementing the type signatures but not the actual implementations. If it compiles, we know we're on the right track. In fact, I think the version with these intermediate functions is more readable. As you will see from the refactorings for other exercises, they also provide good hooks for changes.

When we do this, we want to factor out our effects as much as possible. We don't want effects on our inputs. We want to break it down into elements that we put together somehow (mapping, flatMapping, etc) into a bigger program. I don't want to take in a List[Responses], or a Future[Response]. I just want Response. In the exercise though we have a batchResponse, which will expand into a number of parsed responses:

def parseResponse(raw: BatchResponse,
                  parser: String => Try[ParsedResponse]
                  ): Either[ServiceError, List[ParsedResponse]] = ???

Now we have just implemented what we want on the type level. The ??? is valid syntax and this will type check/compile fun but explode at runtime with a not-implemented exception. Right now, we are just using types to play with the shape of our problem and help us flesh out what we want to happen.

Now let's do the same with making a function that makes an http request and also parses the response. Again, we just want to deal with a single element. We are doing the smallest thing we can possibly do.

// fn that makes an http request and attempts to parse the response
def getAndParseBatch(batch: BatchRequest): Future[Either[ServiceError, List[ParsedResponse]]] = ???

Now we have all the pieces to implement the top level of our program. It doesn't matter that we don't have implementations. We just want to line up the shape of our types. We will do the straight forward approach to surface our ugly type signature first:

// I've put in the types for clarity 
def requestBatch(requests: List[Request], 
  batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = {
  val batches: List[BatchRequest] = requests.grouped(batchSize).toList.map(BatchRequest.apply))
  val result: List[Future[Either[ServiceError, List[ParsedResponse]]]] =
    batches.map(getAndParseBatch)
  ???
  }

Great. We've surfaced our ugly type. This will compile and typecheck. We don't care that it explodes at runtime right now. We haven't done any implementations but we have described the nature of our problem with types. Notice what else has happened here: we've worked from the inside out. When tackling problems like this, I find this is better approach compared to making intermediate variables from the outside-in.

Before going further into the solution, let's talk about what contracts get implied by the types encountered.

Type signatures and contracts

The types are telling us a lot about the contract. Yes, yes, I setup the problem but they give you a map of what the contract might be. And if they aren't signalling the contract then it is a smell to address. If teammates are giving misleading type signatures then it's a teaching opportunity because clear intentions are for readability and maintainability.

Looking at the first exercise we have

Future[Either[ServiceError, List[ParsedResponse]]]

Sure this looks ugly but it's telling us a lot:

We have some non-blocking action going on
We have an IO channel (Future) and an (Error) channel
The error channel is interesting: it's Either[ServiceError, List[ParsedResponse]]
- we either get an error, or (presumably) all the answers
  - If the type was List[Either[ServiceError, ParsedResponse]] then we would care about every individual parsing error. But since we can only have one error, we must be caring only about the first
- our caller cares about the error to recover from
  - if they didn't, then using the error channel of the Future would be fine to handle all errors (including parsing)
There is still room for improvement
- the type could be Futuer[Either[ServiceError, NonEmptyList[ParsedResponse]]]

This is a lot of info! If we don't like this contract, we can shift it. This is what we do in later exercises. The dual error channel is an interesting problem. There are pros and cons to this approach and the industry seems pretty mixed on it. My rule of thumb:

Use Future[Either[E, R]] to separate out channel related exceptions from business logic errors:

Keep channel errors inside the failure channel of Future. In our case, this is network errors. It could be file io errors or whatever
Keep business logic errors inside the Either channel. In our case, this is parsing errors. I.e. out network call was successful, but our response was something we didn't expect. In a real program this could be some business rule violation.

There are caveats though. If no one is going to recover from the business logic error and a failure of the IO channel and a business failure both end up doing the same thing, don't bother separating them out. I.e., if this was a webserver and they are both going to return error jsons, then Future[NonEmptyList[ParsedResponse]] is probably the more clear response. There is a small discussion the pros and cons of both approaches in Practical FP in scala

You can do the same exercise of sussing out the contract with the remainder of the types.

Sequence and Traverse

The first tool to learn when tackling these gnarly types is Sequence. Or Traverse. One is defined in terms of the other. These two abilities allow you to invert two effects. That is, if you have an F[G[X]] you can get a G[F[X]]. It's ok to think about it in simple terms like this. A more full definition can be found in Haskellbook: "Traversable allows you to transform elements inside the structure like a functor, producing applicative effects along the way, and lift those potentially multiple instances of applicative structure outside of the traversable structure. It is commonly described as a way to traverse a data structure, mapping a function inside a structure while accumulating the applicative contexts along the way." Luka Jacobowitz has a number of talks, like this one, on traverse

Like I said, it lets you invert two effects (heh).

You may be familiar with this already from the Scala standard library as these two functions exist for Future. Cats expands the amount of types defined so that they are more broadly useful.

This only works if my effect / container / box / whatever makes sense in your head has an instance of the Traversable typeclass:

trait Traverse[F[_]] extends Functor[F] {
  def traverse[G[_]: Applicative, A, B](fa: F[A])(f: A => G[B]): G[F[B]]

  def map[A, B](fa: F[A])(f: A => B): F[B] =
    traverse(fa)(a => Id(f(a))).value
}

So traverse applies some effectful function and turns things inside out. The interesting thing is say, we have a list of effectful functions then well, they are already applied effectful functions! We can traverse using identity to turn the effects inside out:

List(1.some, 2.some).sequence == List(1.some, 2.some).traverse(identity)

In both cases we are turning a List[Option[Int]] into a Option[List[Int]]. The latter is much simpler to unroll even further via pattern matching into some other result at the end of the world. Note that there is some structure preservation in the same way Functor doesn't destroy structure. If the outer list has 5 elements, then the inner list must also have 5 elements.

Now that we know about traverse we can easily solve the exercises. It's up to you how to fill in the missing types. It doesn't really matter if we work top down or bottom up here. Some problems may. Since I'm trying to show type-lead design, let's start from the top and get the shape of our problem solved and then go and flesh out the building blocks. Remember we were here:

// I've put in the types for clarity 
def requestBatch(requests: List[Request], 
  batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = {
  val batches: List[BatchRequest] = requests.grouped(batchSize).toList.map(BatchRequest.apply))
  val result: List[Future[Either[ServiceError, List[ParsedResponse]]]] =
    batches.map(getAndParseBatch)
  ???
  }

Moving this to use traverse:

def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = {
    val batches = requests.grouped(batchSize).toList.map(BatchRequest.apply)
    batches.traverse(getAndParseBatch)
}

But this won't compile. We get stuck in the following type:

Future[List[Either[ServiceError, List[ParsedResponse]]]]

This is close. But we're stuck with an extra List that we don't want. What we want is to combine each element of the list if it's a Right, but if it's a Left we want to fail out early with that error.

We need another tool in our toolbox to collapse those lists together.

Help from Monoids

The problem is we need to combine the outer List with the inner List. What people reach for first here is usually one of two things:

Some listBuffer and a foreach loop with a pattern match to unpack everything and pack it all back into the type they want
If they are slightly more FP aware, a fold using val acc: Either[ServiceError, List[ParsedResponse]] = List().asRight[ServiceError] as the accumulator:
- if the element is a Left, abort early and return that error
- if the element is a Right, combine right the existing Right

But from Part 3 - Typeclasses) this looks a lot like a Monoid. We want to key into some plumbing and higher abstractions and do less work. We learned about combineAll for generically folding a list of monoids. A refresher:

List(1.some, 2.some, 3.some).combineAll
// Some(6) 
List(1.some, 2.some, None, 3.some).combineAll
// Some(6), remember the Monoid for [Int] treats None as 0

But we're not an Option. We're an either, which behaves a bit different in the Left case than then None case for Option in exactly the way we want:

List(1.asRight[String], 2.asRight[String]).combineAll
// result: Either[String, Int] = Right(3) 
List(1.asRight[String], "boop".asLeft[Int]).combineAll
// result: Either[String, Int] = Left("boop")
// very different than option!

Combing this with traverse gives us what we want using Monoid for Either[E, A] and the Monoid for List[A]:

val a = List(1).asRight[String]
val b = List(2).asRight[String]
val c = "bop".asLeft[List[Int]]

val good = List(a, b).combineAll
// result: Either[String, List[Int]] = Right(List(1,2)) 

val bad = List(a,b,c).combineAll
// result: Either[String, List[Int]] = Left("bop")

Now we have what we want to solve exercise 1:

def requestBatch(requests: List[Request], batchSize: Int): Future[Either[ServiceError, List[ParsedResponse]]] = {
    val batches = requests.grouped(batchSize).toList.map(BatchRequest.apply)
    batches.traverse(getAndParseBatch).map(_.combineAll)
}

I think this is really neat. When I solved this, I fleshed out the types of the other two functions, left their implementation as ??? and solved the shape of my requestBatch function before going on to do the rest of the implementations. I had all the types lined up as a map and just had to follow the types to do the implementations.

Continuing on for getAndParseBatch which is straight forward. Note we could have inlined this in requestBatch and that would have been fine:

  def getAndParseBatch(batch: BatchRequest): Future[Either[ServiceError, List[ParsedResponse]]] =
    Http.getBatch(batch).map(parseResponse(_, ParsedResponse.parser))

Traverse makes an appearance again when writing our parseResponse. When we expand BatchResponse and map an effectful function over it, we have the types inside out again:

private def parseResponse(responses: BatchResponse,
    parser: String => Try[ParsedResponse]): Either[ServiceError, List[ParsedResponse]] =
        responses
            .responses
            .traverse(raw => parser(raw.value).toEither)
            .leftMap(e => ServiceError(e.getMessage)))

We sorted out the shape of our problem using just types and then worked through implementations. This can be a great way to code. Figure out what kind of contract you want to express, put it in types, then fill in the blanks. Honestly, it feels like programming on easy mode.

Remaining exercises

The remaining exercises are variations of the first:

exercise 2: Fail if future fails, but if parsing fails that's ok. Return Future[List[ParsedResponse]]
exercise 3: Fail if future fails, fail if parsing fails but collect all the parsing failures. Return is Future[ValidatedNel[ServiceError, List[ParsedResponse]]]
exercise 4: Like exercise 2 but network failures are also ignored. Return is Future[List[ParsedResponse]]
exercise 5: If errors happen, either network or parsing error, collect all of them. Return is Future[(List ServiceError), List[ParsedResponse])]. There are many better ways to tackle this one.

If you attempted all of these, great! If you look at my solutions you will notice that we managed to change a lot of business behavior without really changing much code. The requestBatch function doesn't actually change between all five exercises other than changing the return type. What I'm really doing is playing with types and letting a few functions do all the work.

Exercise 2

We change parseResponse to return a List of responses, possibly empty (if nothing parses). We don't change the other two functions except to change the return type. We let traverse and combineAll do all the work.

Exercise 3

We change parseResponse slightly to convert Either[ServiceError, List[ParsedResponse]] to ValidatedNel[ServiceError, List[ParsedResponse]] but otherwise this function is the same. We update the type signatures. The applicative instance, and thus the traverse for Validated handles the rest by combining our failures for us. This is a very useful type.

Exercise 4

as per Exercise 2 except we convert Future.failed to Future.success(List()).
there are other ways we could have done this

Exercise 5

Not a great solution but it works. You could also look into the IOR type or a case class that holds both values with a custom monoid definition. Either doesn't work because it's a disjunction. We use a tuple and rely on the fact for any type (A, B) then we have a Monoid[(A, B)] provided that both A and B have a monoid.

To me, having these types guide the refactoring by just playing with type signatures and relying on the underlying type capabilities (can be combined, can be traversed) do the work for me is really powerful. We kept referential transparency, wrangled some gnarly types, and easily manipulated our code in a few lines. But these types are not a silver bullet for capturing all behavior --you still need to read code. They can give you an intuition of what's going on but it's not 100%. For example, Exercise 2 and Exercise 4 have the same return type but have different underlying business logic.

Future / Task / IO and Referential Transparency

All the solutions use Future as an effect. There is a problem though: Future is not referentially transparent. Referentially transparency doesn't mean that the answer we get is always the same (this viewpoint will come in handy shortly), it means that we can replace a value with it's bound expression and get the same result. For example:

val ex1 = { 
  val r = new Random(0L)
  val f = Future(r.nextInt)
  for { 
    a <- f
    b <- f
  } yield (a, b) 
}

// Same as ex1, but we replace f 
// with it's bound expression
val ex2 = { 
  val r = new Random(0L)
  for { 
    a <- Future(r.nextInt)
    b <- Future(r.nextInt)
  } yield (a, b) 
}

ex1.onComplete(println)
ex2.onComplete(println)

Are ex1 and ex2 the same? No! Because Future is not referentially transparent. Ex1 will assign (a,b) to be the same answer since r is only evaluated once and then memoized. In the second example, (a,b) will be different. We can't substitute an expression for the bound value.

Future has other weirdness going on too. It represents an already running computation and is actually hard to kill a running future. It leads to a lot of cache missing as well. Mapping a Future-effecting function over a list of values will potentially run each function on a different thread. This may not be what you want.

This has implications for traverse. I was reviewing a PR earlier, and someone had traversed an effectful future with a comment stating "return the results in order". Some alarm bells went off because this is a subtle bug if you aren't familiar with how Futures execute and don't look too deeply into traverse. Let's look at the following:

def futureThing(i: Int)(implicit r: Random, ec: ExecutionContext): Future[Int] = {
  Future {
    Thread.sleep(r.nextInt(250).toLong)
    println(s"processing $i")
    i
  }
}

val x = List(1,2,3,4,5)
val res = Await.result(x.traverse(Blah.futureThing), 10.seconds)
println(res)

On my computer, for one particular run I have the following output:

processing 3 
processing 1 
processing 5
processing 4
processing 2 
List(1,2,3,4,5)

The person had introduced a bug in that PR. Yes the result is returned in order but it's not necessarily executed in order. Remember, a Future is an already running computation. If we map over a list and apply a Future effecting function, these are now running on different threads. The result is ordered as the input-order, but the execution of them is interleaved. Traverse is really traversable functor so it's no surprise that the behavior is the same here.

Because the results were ordered, the person assumed that the execution was also ordered. In their case, since they were using Future and wanted to execute sequentially over the list then they really wanted to use fold or foldM (the monadic version where your accumulator is inside an effect)

If we take Task from Monix or IO from cats-effect or ZIO well things are a bit different:

def taskThing(i: Int)(implicit r: Random): Task[Int] = {
  val delay = FiniteDuration(r.nextInt(250).toLong, TimeUnit.MILLISECONDS)
  Task {
    println(s"processing $i")
    i
  }.delayExecution(delay)
}

val x = List(1,2,3,4,5).traverse(taskThing).runToFuture

val res = Await.result(x, 10.seconds)
println(res)

No matter how many times you run this, you will get the results processed in order:

processing 1
processing 2
processing 3
processing 4
processing 5
List(1,2,3,4,5)

The effects are always processed in order. In one thread*. Note that the end result is the same between the Future and Task version. If we want to run this as fast as possible, on as many threads as we can get from the execution context then we need to replace traverse with parTraverse for parallel traverse. By going more functional, we are actually being more explicit with what's going on. It's much easier to recognize when things are running in parallel using ZIO/IO/TASK than having to remember that Future is kind of weird.

Side-effects and Referential Transparency

The big question is why is Task (or IO or ZIO) so different than Future. It turns out that Task and friends are all referentially transparent. Yes. They give us the ability to do referentially transparent side-effects. Referentially transparency has nothing to do with subsequent DB calls, for the same key, returning the same result. We know that the state of the world may have changed between those two calls. All referential transparency gives us is that we can replace a value with it's bound expression. For me, this was the first big mental leap I had to get over when really getting into FP.

There is a large reddit post on this topic that is worth reading. What it really boils down to is the difference between evaluation and execution. In the case of IO/Task/Zio we aren't building up a stack of function calls like we are with Future. Instead we are building up set of data structures that describe what we are doing. We can make them until the cows go home and nothing happens until we execute (interpret) these data structures. Step into the source for Monix Task and you'll find out that flatMap is actually a case class called FlatMap:

  /** [[Task]] state describing an immediate synchronous value. */
  final case class Eval[A](thunk: () => A) extends Task[A]

  /** Internal state, the result of [[Task.defer]] */
  final case class Suspend[+A](thunk: () => Task[A]) extends Task[A]

  /** Internal [[Task]] state that is the result of applying `flatMap`. */
  final case class FlatMap[A, B](source: Task[A], f: A => Task[B]) extends Task[B]

  /** Internal [[Coeval]] state that is the result of applying `map`. */
  final case class Map[S, +A](source: Task[S], f: S => A, index: Int)
    extends Task[A] with (S => Task[A])

So how IO is referentially transparent is really just a stupid trick. When we substitute a value for it's expression, all we are saying is that we can build the same data structure. The fact that we may get a different answer between subsequent calls during interpretation of that data structure has nothing to do with referential transparency.

The following definitions may be useful:

evaluation: refers to what scala does aka strict evaluation. Val vs def. By name vs. by value
execution: refers to what you want your program to do when run
evaluating: unsafeRunSync causes the execution of the IO program

Evaluating an IO expression does not cause execution. This is what we mean when we say IO is pure.

def foo(i: Int): Task[Int] {
    println("evaluation")
    Task(println(s"execution: $i"))
}
val triggersEvaluation: Task[Int] = foo(1)
val triggersExecution = triggersEvaluation.unsafeRunSync

In foo we have some side-effect being evaluated outside of our data structure. It gets printed when we evaluate foo. But when we evaluate foo we are really building a data structure called Task. If we ended our program without ever executing anything, the Task is not interpretted. Whatever side-effect is embedded in the Task (in our case, printing to the screen but it could be DB update or whatever) is never run. Only when we execute (interpret) our data structure at the end of the world via unsafeRunSync or runtoFuture or others, does this actually happen.

IO, Task, and ZIO are actually runtimes. They build up data structures that are then interpretted in a m:n threaded environment built on top of JVM threads. In this environment, threads are cheap and simple to cancel. There is a whole world of referentially transparent functional concurrency primitives that this opens up. I would recommend this talk by Gavin Busesi before looking up all of Fabio Labella's talks in this domain. The folks in cats-effect gitter are also helpful. Though different, there is a lot of overlap between IO and ZIO such that learning one will get you going in the other. If you are interested in the runtimes themselves, this talk on implementing a ZIO runtime and building an effect system is an amazing talk.