How to avoid correctness space leaks on a lazy setting in haskell
How to avoid space leaks on lazy setting
Table of Contents
Some posters on HN on made the observation that on my previous post I did not give general advice on how to avoid space leaks. This is specially important given that I ceded the point that exists a cluster (b) of space leaks that are of a correctness concern.
The tools at our disposal come in two varieties:
- Defensive coding patterns
- Running program profiling
I think the first ones are the most important, we want to avoid problems on the first place. To justify and discover the space leaks we will use a combination of by hand expansions and/or analyze the STG output of a piece of code. The STG code will be used as the operational model, even though that is just the case for GHC compiled Haskell code. It is confusing at first but not difficult, learning to read it has a number of benefits:
- All allocations are explicit on
let
, specially thunks. Laziness is explicit. - All evaluation ocurrs on
case
expressions. - Free variables of closures are explicitly provided.
- No parenthesis, all objects are explicitly constructed, given a name and passed through it.
- All applications are saturated.
variables have odd names though, we will get used to them :-) .
1. A classification of space leaks
There other classifications of space leaks. I will use another taxonomy and I will only restrict myself to the leaks that generally cause correctness concerns. We end up with three different kinds:
- Strictness leaks: When a recursive-ish function generates a series of thunks that depends on each other upon being evaluated only to be immediately reduced to a value. The amount of thunks is positively correlated with the amount of recursion.
- Liveness leaks: When a reference to a big value
r
is kept alive instead of dead from the point of view of the GC because it is being referenced on the thunk of a pipeline. Once the pipeline value is forced, the value is marked dead and collected. - Excessive sharing leaks: A generator is shared by two different pipelines. The realized size of the generator is big. Once the first pipeline evaluates the generator, the realized size will remain in memory until the second pipeline consumes the value.
This classification is useful because strictness leaks are a local program
property, they are present on the recursive function definition or expected
on function that bind expressions on eDSL such as >>=
. Liveness leaks are
a global program property though, they require upfront planning to avoid.
Excessive sharing leaks require care when dealing with generators (which are
big data structures when realized).
2. Avoiding strictness leaks
These occur on:
- Explicitly recursive functions
- Binding functions of an eDSL such as
(>>=)
- Functions that are expected to be called repeatedly by the programmer such
as
(++)
.
You should be conscientious about these kinds of
functions when writing them. The most classic example is the foldl
function. The leaks is completely contained on the function definition and
is clear on the function expansion. The STG corroborates what we can do by
hand
module Gato where myFoldl :: (a -> b -> a) -> a -> [b] -> a myFoldl f acc [] = acc myFoldl f acc (b : bs) = myFoldl f (f acc b) bs
results in the following STG compiled with ghc -c -O1 -ddump-stg-final
-ddump-to-file -fforce-recomp
Rec { Gato.myFoldl [Occ=LoopBreaker] :: forall a b. (a -> b -> a) -> a -> [b] -> a [GblId, Arity=3, Str=<LCL(C1(L))><L><1L>, Unf=OtherCon []] = {} \r [f_s1fN acc_s1fO ds_s1fP] case ds_s1fP of { -- Only evaluation [] -> acc_s1fO; : b1_s1fR [Occ=Once1] bs_s1fS [Occ=Once1] -> let { -- Thunk allocation sat_s1fT [Occ=Once1] :: a_aNS [LclId] = -- {acc_s1fO, b1_s1fR, f_s1fN} are free variables (and -- references) of an \u-ptable closure. {acc_s1fO, b1_s1fR, f_s1fN} \u [] f_s1fN acc_s1fO b1_s1fR; } in Gato.myFoldl f_s1fN sat_s1fT bs_s1fS; }; end Rec }
We see that the only evaluation on the pattern matching of the list. No
other reduction is done on the recursive case of this function. The thunk is
passed on the recursive call, so on the next iteration acc_s1fO
will be a
thunk and it will be wrapped again.
The fix is well known, that thunk allocation can be forced before doing the
recursive call, liberating the free variables {acc_s1fO, b1_s1fR, f_s1fN}
and passing a value to the next recursive call. This is a general pattern
for functions that should be tail recursive. We can do this with seq
.
myFoldl2 :: (a -> b -> a) -> a -> [b] -> a myFoldl2 f acc [] = acc myFoldl2 f acc (b : bs) = let t = f acc b in t `seq` myFoldl f t bs
Gato.myFoldl2 :: forall a b. (a -> b -> a) -> a -> [b] -> a [GblId, Arity=3, Unf=OtherCon []] = {} \r [f_s15n acc_s15o ds_s15p] case ds_s15p of { [] -> acc_s15o; : b1_s15r [Occ=Once1] bs_s15s [Occ=Once1] -> case f_s15n acc_s15o b1_s15r of t_s15t [Occ=Once1] { __DEFAULT -> Gato.myFoldl f_s15n t_s15t bs_s15s; }; };
Here we have two case
expressions and no allocation generated, nice.
2.1. Repeatedly used functions over the same expressions
With these I mean functions such as (>>=)
or mappend
. These bind
different expressions, so we can see many applications of these on the same
pipeline. The most interesting example here is the bind of the strict writer
monad.
instance (Monoid w, Monad m) => Monad (WriterT w m) where return a = writer (a, mempty) m >>= k = WriterT $ do (a, w) <- runWriterT m (b, w') <- runWriterT (k a) return (b, w `mappend` w')
has the following STG code
$c>>=_rKER :: forall w (m :: * -> *) a b. (GHC.Base.Monoid w, GHC.Base.Monad m) => Control.Monad.Trans.Writer.Strict.WriterT w m a -> (a -> Control.Monad.Trans.Writer.Strict.WriterT w m b) -> Control.Monad.Trans.Writer.Strict.WriterT w m b [GblId, Arity=4, Unf=OtherCon []] = {} \r [$dMonoid_sKLr $dMonad_sKLs eta_sKLt eta1_sKLu] let { sat_sKLK [Occ=Once1] :: m_aKdo (b_aKdA, w_aKdn) -- A wrapper over f or eta1_sKLu [LclId] = {$dMonoid_sKLr, $dMonad_sKLs, eta1_sKLu, eta_sKLt} \u [] let { sat_sKLJ [Occ=Once1] :: (a_aKdz, w_aKdn) -> m_aKdo (b_aKdA, w_aKdn) [LclId] = {$dMonoid_sKLr, $dMonad_sKLs, eta1_sKLu} \r [ds_sKLx] case ds_sKLx of { (,) a1_sKLz [Occ=Once1] w1_sKLA [Occ=OnceL1] -> let { sat_sKLI [Occ=Once1] :: (b_aKdA, w_aKdn) -> m_aKdo (b_aKdA, w_aKdn) [LclId] = {$dMonoid_sKLr, w1_sKLA, $dMonad_sKLs} \r [ds1_sKLC] -- Actual thunk using mappend case ds1_sKLC of { -- captures w1_sKLA (,) b1_sKLE [Occ=Once1] w'_sKLF [Occ=Once1] -> let { sat_sKLG [Occ=Once1] :: w_aKdn [LclId] = {$dMonoid_sKLr, w1_sKLA, w'_sKLF} \u [] GHC.Base.mappend $dMonoid_sKLr w1_sKLA w'_sKLF; } in let { sat_sKLH [Occ=Once1] :: (b_aKdA, w_aKdn) [LclId] = CCCS (,)! [b1_sKLE sat_sKLG]; } in GHC.Base.return $dMonad_sKLs sat_sKLH; }; } in let { sat_sKLB [Occ=Once1] :: m_aKdo (b_aKdA, w_aKdn) [LclId] = {eta1_sKLu, a1_sKLz} \u [] eta1_sKLu a1_sKLz; } in GHC.Base.>>= $dMonad_sKLs sat_sKLB sat_sKLI; -- Thread the thunk sat_sKLj at the end. }; } in GHC.Base.>>= $dMonad_sKLs eta_sKLt sat_sKLJ; } in let { sat_sKLw [Occ=Once1] :: m_aKdo (b_aKdA, w_aKdn) -> Control.Monad.Trans.Writer.Strict.WriterT w_aKdn m_aKdo b_aKdA [LclId] = {} \r [ds_sKLv] ds_sKLv; } in GHC.Base.$ sat_sKLw sat_sKLK;
That thunk at the end is the problem, it is un-enforceable, we cannot use
seq
to bind its evaluation because it happens at the end of the sequence
of (>>=)
. STG clarified us the operational aspect of this code, making
explicit the thunk at the end.
You have to ask yourself with these kind of functions: What happens if we
call (>>=)
multiple times? For example let us expand by hand the following
definition:
m >>= \a -> f a >>= g
it results on
m >>= \a -> f a >>= g = WriterT $ do (a, w) <- runWriterT m (b, w') <- runWriterT $ WriterT $ do (a1, w1) <- runWriterT (f a) (b1, w1') <- runWriterT (g a1) return (b1, w1 `mappend` w1') return (b, w `mappend` w')
Now we have two thunks following the point we made on the previous paragraph.
The more (>>=)
we end up using, more un-enforceable thunks at the end we
will have. This is why you should not use neither the strict or lazy writer
transformers, instead use logging library for the common use case.
2.2. Using cost centres and one weird trick to catch them at run-time
Cost centres inserted by cabal build --enable-profiling
tag every top
level definition of our program and give us nice statistics of the memory
usage and allocation done by the wrapped expression. There is just too
much info to extract a good signal from all the noise. For this exact
reason is that I prefer to insert SCC
by hand as this:
instance (Monoid w, Monad m) => Monad (WriterT w m) where return a = writer (a, mempty) {-# SCC >>= "myBind" #-} m >>= k = WriterT $ do (a, w) <- runWriterT m (b, w') <- ({-# SCC "bind_mine" #-} runWriterT (k a)) return (b, w `mappend` w')
and just collect that info on test programs.
Apart from this, there is a really nice technique described by Neil Mitchell to discover precisely this kind of leaks. Eventually the thunks will be evaluated on a recursive call tree, if we limit the depth that the RTS can use for this tree, it will error out and print an exception with the line and function that triggered it. This has worked for big program on the wild, it is something to do once on a while on your own programs.
3. Avoiding Liveness leaks
Long lived data structures that are not part of a pipeline should be
forced to be reduced to a value to free captured free variables. These leaks
commonly manifest when storing a computation with put
on a state monad,
over MVar
or other reference type. The compiler cannot put demands over
these values because it is opaque to them when (and if) these values will be
demanded, so it has to fallback completely on lazy semantics.
Here the general solution is to have every pipeline married to a good consumer. In particular this mean to have strictness annotations on the result value of a pipeline. How deep that annotation needs to be is context specific.
Given that these leaks are a program property, it is best to avoid them than to diagnose them with the previous principle. Although call-site cost centers work best to diagnose the possible culprits. For example on
import Data.Map.Strict import qualified Data.ByteString as B comp :: StateT (Map Char Int) IO () comp = do file <- liftIO (B.readFile "/path/to/big/file") let a :: Map Char Int a = ({-# SCC "my_pipeline" #-} (f . g . h) r) put a
the value r
is read completely from the file path (we won't use lazy IO).
That reference is discarded syntactically at the end of comp
, yet the
closure/thunk of a
keeps a live reference to the value of r
, so the GC
cannot collect it. We can measure how much total memory is attributed to
this pipeline depending on time with that set cost centre (SCC) attribute.
Setting them on call-sites instead of top level identifiers is a powerful
tool not commonly used.
The solution is to have a good consumer for the output value of the pipeline
on a
. A single deepseq
on comp can fix this issue. We need deepseq
here instead of seq
given that Data.Map.Strict
has layers that need to
be evaluated too.
import Data.Map.Strict import qualified Data.ByteString as B import Control.DeepSeq comp :: StateT (Map Char Int) IO () comp = do file <- liftIO (B.readFile "/path/to/big/file") let a :: Map Char Int a = (f . g . h) r a `deepseq` put a
This is the most common space leak on programs that are organized over an event loop. Here is an example of a leak I solved on xmonad-contrib a year ago.
4. Avoiding excessive sharing leaks
These leaks are not that common on real program, but they are pervasive over project Euler like problems. It involves generators, closures for big data structures that should not be evaluated in full, instead they should be consumed layer by layer on a lazy fashion. If the generators are passed to two different functions that evaluate them completely, the full realization will be stored on the heap until the second function finishes.
The classic example here is the mean function.
mean :: (Fractional a, Foldable t) => t a -> a mean xs = sum xs / fromIntegral (length xs)
A good signal that this code has a leak is that it grabs a Foldable and uses it more than one.
The remedies come from three different approaches
4.1. Merge multiple passes over the data structure on a single pass
My preferred approach. You need to use the foldl package to make it ergonomic. The generator is only traversed once with the correct evaluation to avoid problem of strictness leaks.
4.2. Move the creation of the generator inside the function and wrap it on a dumb function
This is doing something like this
mean :: Int -> Int mean i = let gen () = [1 .. i] in sum (gen ()) / fromIntegral (length (gen ()))
not pretty though.
4.3. Bite the bullet and use linear types
What would happen if we restrict the use of an identifier to just once on the above code?
{-# Language LinearTypes #-} module Gato where import Data.Foldable mean :: (Fractional a, Foldable t) => t a %1 -> a mean xs = sum xs / fromIntegral (length xs)
Here we have the following error
> ghci linear.hs GHCi, version 9.2.6: https://www.haskell.org/ghc/ :? for help Loaded GHCi configuration from /home/slack/.ghci [1 of 1] Compiling Gato ( linear.hs, interpreted ) linear.hs:7:6: error: • Couldn't match type ‘'Many’ with ‘'One’ arising from multiplicity of ‘xs’ • In an equation for ‘mean’: mean xs = sum xs / fromIntegral (length xs) | 7 | mean xs = sum xs / fromIntegral (length xs) | ^^ Failed, no modules loaded.
Nice, how to we go from this? we basically have to make the whole
consumption of xs
explicit and only call linear functions on it. This is
made easier by using linear-base.
5. Conclusion
With these patterns in mind, you can avoid most space leaks that are a correctness concern. Sure, it requires care; although I was never of the opinion that lazy evaluation means that you should not care about evaluation. Eventually code will run on real computer, we need to care. I just think that these defensive coding patterns enable me to use lazy evaluation to have more compositional power, it is a trade-off.
I would urge people to learn more about STG and what is behind the curtain. GHC has a bag of tricks to make common code performant while preserving non-strict semantics. Some of them such as the worker-wrapper are difficult to not get excited about. We should use these intermediary languages like the STG as the operational model for Haskell programs on the real world. That will inform us of patterns to deal with space leaks more thoroughly.
Created: 2023-04-05 mié 01:53
Comentarios
Publicar un comentario