How to avoid correctness space leaks on a lazy setting in haskell

How to avoid space leaks on lazy setting

How to avoid space leaks on lazy setting

Some posters on HN on made the observation that on my previous post I did not give general advice on how to avoid space leaks. This is specially important given that I ceded the point that exists a cluster (b) of space leaks that are of a correctness concern.

The tools at our disposal come in two varieties:

  • Defensive coding patterns
  • Running program profiling

I think the first ones are the most important, we want to avoid problems on the first place. To justify and discover the space leaks we will use a combination of by hand expansions and/or analyze the STG output of a piece of code. The STG code will be used as the operational model, even though that is just the case for GHC compiled Haskell code. It is confusing at first but not difficult, learning to read it has a number of benefits:

  • All allocations are explicit on let, specially thunks. Laziness is explicit.
  • All evaluation ocurrs on case expressions.
  • Free variables of closures are explicitly provided.
  • No parenthesis, all objects are explicitly constructed, given a name and passed through it.
  • All applications are saturated.

variables have odd names though, we will get used to them :-) .

1. A classification of space leaks

There other classifications of space leaks. I will use another taxonomy and I will only restrict myself to the leaks that generally cause correctness concerns. We end up with three different kinds:

  1. Strictness leaks: When a recursive-ish function generates a series of thunks that depends on each other upon being evaluated only to be immediately reduced to a value. The amount of thunks is positively correlated with the amount of recursion.
  2. Liveness leaks: When a reference to a big value r is kept alive instead of dead from the point of view of the GC because it is being referenced on the thunk of a pipeline. Once the pipeline value is forced, the value is marked dead and collected.
  3. Excessive sharing leaks: A generator is shared by two different pipelines. The realized size of the generator is big. Once the first pipeline evaluates the generator, the realized size will remain in memory until the second pipeline consumes the value.

This classification is useful because strictness leaks are a local program property, they are present on the recursive function definition or expected on function that bind expressions on eDSL such as >>=. Liveness leaks are a global program property though, they require upfront planning to avoid. Excessive sharing leaks require care when dealing with generators (which are big data structures when realized).

2. Avoiding strictness leaks

These occur on:

  • Explicitly recursive functions
  • Binding functions of an eDSL such as (>>=)
  • Functions that are expected to be called repeatedly by the programmer such as (++).

You should be conscientious about these kinds of functions when writing them. The most classic example is the foldl function. The leaks is completely contained on the function definition and is clear on the function expansion. The STG corroborates what we can do by hand

module Gato where

myFoldl :: (a -> b -> a) -> a -> [b] -> a
myFoldl f acc [] = acc
myFoldl f acc (b : bs) = myFoldl f (f acc b) bs

results in the following STG compiled with ghc -c -O1 -ddump-stg-final -ddump-to-file -fforce-recomp

Rec {
Gato.myFoldl [Occ=LoopBreaker]
  :: forall a b. (a -> b -> a) -> a -> [b] -> a
[GblId, Arity=3, Str=<LCL(C1(L))><L><1L>, Unf=OtherCon []] =
    {} \r [f_s1fN acc_s1fO ds_s1fP]
        case ds_s1fP of {                                 -- Only evaluation
          [] -> acc_s1fO;
          : b1_s1fR [Occ=Once1] bs_s1fS [Occ=Once1] ->
              let {                                       -- Thunk allocation
                sat_s1fT [Occ=Once1] :: a_aNS
                [LclId] =
                    -- {acc_s1fO, b1_s1fR, f_s1fN} are free variables (and
                    -- references) of an \u-ptable closure.
                    {acc_s1fO, b1_s1fR, f_s1fN} \u [] f_s1fN acc_s1fO b1_s1fR;
              } in  Gato.myFoldl f_s1fN sat_s1fT bs_s1fS;
        };
end Rec }

We see that the only evaluation on the pattern matching of the list. No other reduction is done on the recursive case of this function. The thunk is passed on the recursive call, so on the next iteration acc_s1fO will be a thunk and it will be wrapped again.

The fix is well known, that thunk allocation can be forced before doing the recursive call, liberating the free variables {acc_s1fO, b1_s1fR, f_s1fN} and passing a value to the next recursive call. This is a general pattern for functions that should be tail recursive. We can do this with seq.

myFoldl2 :: (a -> b -> a) -> a -> [b] -> a
myFoldl2 f acc [] = acc
myFoldl2 f acc (b : bs) =
  let t = f acc b
  in t `seq` myFoldl f t bs
Gato.myFoldl2 :: forall a b. (a -> b -> a) -> a -> [b] -> a
[GblId, Arity=3, Unf=OtherCon []] =
    {} \r [f_s15n acc_s15o ds_s15p]
        case ds_s15p of {
          [] -> acc_s15o;
          : b1_s15r [Occ=Once1] bs_s15s [Occ=Once1] ->
              case f_s15n acc_s15o b1_s15r of t_s15t [Occ=Once1] {
              __DEFAULT -> Gato.myFoldl f_s15n t_s15t bs_s15s;
              };
        };

Here we have two case expressions and no allocation generated, nice.

2.1. Repeatedly used functions over the same expressions

With these I mean functions such as (>>=) or mappend. These bind different expressions, so we can see many applications of these on the same pipeline. The most interesting example here is the bind of the strict writer monad.

instance (Monoid w, Monad m) => Monad (WriterT w m) where
    return a = writer (a, mempty)

    m >>= k  = WriterT $ do
        (a, w)  <- runWriterT m
        (b, w') <- runWriterT (k a)
        return (b, w `mappend` w')

has the following STG code

$c>>=_rKER
  :: forall w (m :: * -> *) a b.
    (GHC.Base.Monoid w, GHC.Base.Monad m) =>
    Control.Monad.Trans.Writer.Strict.WriterT w m a
    -> (a -> Control.Monad.Trans.Writer.Strict.WriterT w m b)
    -> Control.Monad.Trans.Writer.Strict.WriterT w m b
[GblId, Arity=4, Unf=OtherCon []] =
    {} \r [$dMonoid_sKLr $dMonad_sKLs eta_sKLt eta1_sKLu]
        let {
          sat_sKLK [Occ=Once1] :: m_aKdo (b_aKdA, w_aKdn)                 -- A wrapper over f or eta1_sKLu
          [LclId] =
              {$dMonoid_sKLr, $dMonad_sKLs, eta1_sKLu, eta_sKLt} \u []
                  let {
                    sat_sKLJ [Occ=Once1] :: (a_aKdz, w_aKdn) -> m_aKdo (b_aKdA, w_aKdn)
                    [LclId] =
                        {$dMonoid_sKLr, $dMonad_sKLs, eta1_sKLu} \r [ds_sKLx]
                            case ds_sKLx of {
                            (,) a1_sKLz [Occ=Once1] w1_sKLA [Occ=OnceL1] ->
                            let {
                              sat_sKLI [Occ=Once1] :: (b_aKdA, w_aKdn) -> m_aKdo (b_aKdA, w_aKdn)
                              [LclId] =
                                  {$dMonoid_sKLr, w1_sKLA, $dMonad_sKLs} \r [ds1_sKLC] -- Actual thunk using mappend
                                      case ds1_sKLC of {                               -- captures w1_sKLA
                                      (,) b1_sKLE [Occ=Once1] w'_sKLF [Occ=Once1] ->
                                      let {
                                        sat_sKLG [Occ=Once1] :: w_aKdn
                                        [LclId] =
                                            {$dMonoid_sKLr, w1_sKLA, w'_sKLF} \u []
                                                GHC.Base.mappend $dMonoid_sKLr w1_sKLA w'_sKLF; } in
                                      let {
                                        sat_sKLH [Occ=Once1] :: (b_aKdA, w_aKdn)
                                        [LclId] =
                                            CCCS (,)! [b1_sKLE sat_sKLG];
                                      } in  GHC.Base.return $dMonad_sKLs sat_sKLH;
                                      }; } in
                            let {
                              sat_sKLB [Occ=Once1] :: m_aKdo (b_aKdA, w_aKdn)
                              [LclId] =
                                  {eta1_sKLu, a1_sKLz} \u [] eta1_sKLu a1_sKLz;
                            } in  GHC.Base.>>= $dMonad_sKLs sat_sKLB sat_sKLI; -- Thread the thunk sat_sKLj at the end.
                            };
                  } in  GHC.Base.>>= $dMonad_sKLs eta_sKLt sat_sKLJ; } in
        let {
          sat_sKLw [Occ=Once1]
            :: m_aKdo (b_aKdA, w_aKdn)
              -> Control.Monad.Trans.Writer.Strict.WriterT w_aKdn m_aKdo b_aKdA
          [LclId] =
              {} \r [ds_sKLv] ds_sKLv;
        } in  GHC.Base.$ sat_sKLw sat_sKLK;

That thunk at the end is the problem, it is un-enforceable, we cannot use seq to bind its evaluation because it happens at the end of the sequence of (>>=). STG clarified us the operational aspect of this code, making explicit the thunk at the end.

You have to ask yourself with these kind of functions: What happens if we call (>>=) multiple times? For example let us expand by hand the following definition:

m >>= \a -> f a >>= g

it results on

m >>= \a -> f a >>= g =
  WriterT $ do
    (a, w)  <- runWriterT m
    (b, w') <- runWriterT $ WriterT $ do
        (a1, w1)  <- runWriterT (f a)
        (b1, w1') <- runWriterT (g a1)
        return (b1, w1 `mappend` w1')
    return (b, w `mappend` w')

Now we have two thunks following the point we made on the previous paragraph. The more (>>=) we end up using, more un-enforceable thunks at the end we will have. This is why you should not use neither the strict or lazy writer transformers, instead use logging library for the common use case.

2.2. Using cost centres and one weird trick to catch them at run-time

Cost centres inserted by cabal build --enable-profiling tag every top level definition of our program and give us nice statistics of the memory usage and allocation done by the wrapped expression. There is just too much info to extract a good signal from all the noise. For this exact reason is that I prefer to insert SCC by hand as this:

instance (Monoid w, Monad m) => Monad (WriterT w m) where
    return a = writer (a, mempty)

    {-# SCC >>= "myBind" #-}
    m >>= k  = WriterT $ do
        (a, w)  <- runWriterT m
        (b, w') <- ({-# SCC "bind_mine" #-} runWriterT (k a))
        return (b, w `mappend` w')

and just collect that info on test programs.

Apart from this, there is a really nice technique described by Neil Mitchell to discover precisely this kind of leaks. Eventually the thunks will be evaluated on a recursive call tree, if we limit the depth that the RTS can use for this tree, it will error out and print an exception with the line and function that triggered it. This has worked for big program on the wild, it is something to do once on a while on your own programs.

3. Avoiding Liveness leaks

Long lived data structures that are not part of a pipeline should be forced to be reduced to a value to free captured free variables. These leaks commonly manifest when storing a computation with put on a state monad, over MVar or other reference type. The compiler cannot put demands over these values because it is opaque to them when (and if) these values will be demanded, so it has to fallback completely on lazy semantics.

Here the general solution is to have every pipeline married to a good consumer. In particular this mean to have strictness annotations on the result value of a pipeline. How deep that annotation needs to be is context specific.

Given that these leaks are a program property, it is best to avoid them than to diagnose them with the previous principle. Although call-site cost centers work best to diagnose the possible culprits. For example on

import Data.Map.Strict
import qualified Data.ByteString as B

comp :: StateT (Map Char Int) IO ()
comp = do
  file <- liftIO (B.readFile "/path/to/big/file")
  let a :: Map Char Int
      a = ({-# SCC "my_pipeline" #-} (f . g . h) r)
  put a

the value r is read completely from the file path (we won't use lazy IO). That reference is discarded syntactically at the end of comp, yet the closure/thunk of a keeps a live reference to the value of r, so the GC cannot collect it. We can measure how much total memory is attributed to this pipeline depending on time with that set cost centre (SCC) attribute. Setting them on call-sites instead of top level identifiers is a powerful tool not commonly used.

The solution is to have a good consumer for the output value of the pipeline on a. A single deepseq on comp can fix this issue. We need deepseq here instead of seq given that Data.Map.Strict has layers that need to be evaluated too.

import Data.Map.Strict
import qualified Data.ByteString as B
import Control.DeepSeq

comp :: StateT (Map Char Int) IO ()
comp = do
  file <- liftIO (B.readFile "/path/to/big/file")
  let a :: Map Char Int
      a = (f . g . h) r
  a `deepseq` put a

This is the most common space leak on programs that are organized over an event loop. Here is an example of a leak I solved on xmonad-contrib a year ago.

4. Avoiding excessive sharing leaks

These leaks are not that common on real program, but they are pervasive over project Euler like problems. It involves generators, closures for big data structures that should not be evaluated in full, instead they should be consumed layer by layer on a lazy fashion. If the generators are passed to two different functions that evaluate them completely, the full realization will be stored on the heap until the second function finishes.

The classic example here is the mean function.

mean :: (Fractional a, Foldable t) => t a -> a
mean xs = sum xs / fromIntegral (length xs)

A good signal that this code has a leak is that it grabs a Foldable and uses it more than one.

The remedies come from three different approaches

4.1. Merge multiple passes over the data structure on a single pass

My preferred approach. You need to use the foldl package to make it ergonomic. The generator is only traversed once with the correct evaluation to avoid problem of strictness leaks.

4.2. Move the creation of the generator inside the function and wrap it on a dumb function

This is doing something like this

mean :: Int -> Int
mean i =
  let gen () = [1 .. i]
  in sum (gen ()) / fromIntegral (length (gen ()))

not pretty though.

4.3. Bite the bullet and use linear types

What would happen if we restrict the use of an identifier to just once on the above code?

{-# Language LinearTypes #-}
module Gato where

import Data.Foldable

mean :: (Fractional a, Foldable t) => t a %1 -> a
mean xs = sum xs / fromIntegral (length xs)

Here we have the following error

> ghci linear.hs
GHCi, version 9.2.6: https://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/slack/.ghci
[1 of 1] Compiling Gato             ( linear.hs, interpreted )

linear.hs:7:6: error:
    • Couldn't match type ‘'Many’ with ‘'One’
        arising from multiplicity of ‘xs’
    • In an equation for ‘mean’:
          mean xs = sum xs / fromIntegral (length xs)
  |
7 | mean xs = sum xs / fromIntegral (length xs)
  |      ^^
Failed, no modules loaded.

Nice, how to we go from this? we basically have to make the whole consumption of xs explicit and only call linear functions on it. This is made easier by using linear-base.

5. Conclusion

With these patterns in mind, you can avoid most space leaks that are a correctness concern. Sure, it requires care; although I was never of the opinion that lazy evaluation means that you should not care about evaluation. Eventually code will run on real computer, we need to care. I just think that these defensive coding patterns enable me to use lazy evaluation to have more compositional power, it is a trade-off.

I would urge people to learn more about STG and what is behind the curtain. GHC has a bag of tricks to make common code performant while preserving non-strict semantics. Some of them such as the worker-wrapper are difficult to not get excited about. We should use these intermediary languages like the STG as the operational model for Haskell programs on the real world. That will inform us of patterns to deal with space leaks more thoroughly.

Created: 2023-04-05 mié 01:53

Validate

Comentarios

Entradas más populares de este blog

An apologia for lazy evaluation

Presenting data-forced, an alternative to bang patterns for long lived data structures