Presenting data-forced, an alternative to bang patterns for long lived data structures
Table of Contents
As a consequence of my previous post, I crafted a new way to deal with liveness leaks: data-forced. It relies on CBV functions and tagging values when we know they have been evaluated. It is not complete, it is possible to be mis-used and not get any benefits, but it shows promise and gives stronger guarantees than the current status quo. Also, I am starting a new job next week, so I will have less time for hacking and writing.
1. The problem
Long lived data structures that store thunks will keep the references needed to evaluate those thunks alive for a long time. This is a consequence of not pairing the denotation of the values with a good consumer. Current ways to deal with this on Haskell are:
- Use Bang patterns and/or
seq
to force evaluation. - Make laziness unrepresentable at the data type level.
Both have their pitfalls. For Bang Patterns and/or seq
you can forget to
put the bang at the bound location. Given the signature for seq
, nothing
will signal the missing evaluation. You can put discipline on yourself to
not forget to put the bangs, but once you start collaborating with other
people, you cannot rely on just discipline.
For the second alternative, it helps more with strictness leaks than with liveness leaks. It requires to rewrite our data structures (or using the strict-wrapper library) and the functions that operate over them. Code reuse suffers because of it.
2. The data-forced approach
At its core, data-forced provide two newtypes:
newtype ForcedWHNF a = ForcedWHNF a
newtype ForcedNF a = ForcedNF a
The library does not export the constructors (so you cannot coerce
). It
does export a unidirectional pattern so you can pattern match. The idea is
that if you hold a value of type ForcedWHNF a
you know that the
underlying value has been forced to WHNF and thus, all references needed to
evaluate to this level are not kept marked as alive because of it. This is
a cool invariant and just what we need for our long lived data structures.
Now you can design you long lived data structure as this
import Data.Map.Lazy as ML -- Spine strict, lazy leaves type MyMap1 = ML.Map Char (ForcedWHNF Int) type MyMap2 = ForcedWHNF (ML.Map Char (ForcedNF (Maybe Int)))
Here Map1
is equivalent to a Data.Map.Strict
on strictness. The leaves
cannot contain extra references on their thunks. MyMap2
goes even further
as the outer ForcedWHNF
guarantees prompt removal of objects upon
deletion. Given the type signature, you cannot forget to force the
evaluation, this is in contrast to bang patterns.
You must be thinking: How do we generate these magical ForcedNF a
values?
Well the answer certainly feels like magic.
3. UnliftedDatatypes
have a different calling convention (CBV)
This extension was proposed by Sebastian Graf some years ago. It does a lot
of things, but one interesting addition is the semantics of bound values on
let binding when the type is kind Unlifted
. In this case we use Call by
Value semantics.
For example
{-# LANGUAGE UnliftedDatatypes, GADTSyntax #-} import GHC.Exts data UMaybe (a :: TYPE UnliftedRep) :: TYPE UnliftedRep where UNothing :: UMaybe a UJust :: a -> UMaybe a -- error is levity polymorphic. foo :: Int -> Int foo v = let a = UJust (error "boom!") in v + 2
here foo
will return bottom with the boom
message. The binding of a
evaluated its right hand side before being bound.
Obviously for this to be useful, we needed to evaluate foo v
at some
point. Yet for long lived data structures, if the leaf values are bound
close to main
(in a strict monad context) we can enforce that evaluation
happens automatically without bangs.
Another pitfall is that CBV computations cannot be inlined without affecting the semantics. The library tries to avoid these issues and steer you on the right direction, yet it is still possible to hide a CBV computation inside a lazy value and have the whole exercise be pointless. This is what I mean with the library not being complete yet. I am experimenting with template Haskell based solutions for this problem but there are some roadblocks. In the mean time I rely on documentation to show how to use this library in a correct way. I still think it is a worthwhile addition to the ecosystem.
The archetypal usage of the library uses two CBV functions:
strictlyWHNF : forall a. a -> StrictValueExtractor (ForcedWHNF a)
strictlyNF : forall a. NFData a => a -> StrictValueExtractor (ForcedNF a)
The first argument is the computation that you want to evaluate. It returns
you a Unlifted
value holding a Lifted
value that is not holding its
references hostages. It is to be use like this:
-- No references on added leafs even though it is a lazy map. basicEvent :: ML.Map Char (ForcedWHNF Int) -> IO (ML.Map Char (ForcedWHNF Int)) basicEvent map0 = do let -- Step1: bind the strict value with a strict let. (2 + 2) reduced -- before val0 is bound. val0 :: StrictValueExtractor (ForcedWHNF Int) val0 = strictlyWHNF (2 + 2) -- val0 = strictlyWHNF (error "argument evaluated") -- would fail -- Step2: extract the strict value to be use on lazy setting. A -- neccesary idiom to avoid a pitfall. val1 = case val0 of { Pairy val0' ext -> ext val0' } -- Step3: Store the value free of references. Even though map1 is a lazy -- map, the references to evaluate val1 were already freed. map1 = ML.insert 'a' val1 map0 pure map1
here val0
does the evaluation as soon that value is bound. The val1
computation is there to extract the value from the unlifted environment
through the continuation that is hidden. This is done this way to avoid the
common pitfall of lazy binding on the let
of the strict computations,
turning it on a lazy computation without noticing. Check the library
documentation. From then on, you can use this value as you see fit on a lazy
environment.
4. Conclusion
It still irks me that I couldn't force the correct usage of this library
through the type system and had to rely on documentation. I will keep
experimenting with template haskell to see if I can reach a more ergonomic
solution. Another alternative is to bite the bullet and hack on GHC to add a
syntactic check to enforce this pattern, along the lines of
-Wunbanged-strict-patterns
.
In the meantime, this library is my go-to alternative to avoid liveness leaks on Haskell. Most of my programs have their long lived data structures planned upfront, now I will be also avoiding these leaks on the planning phase as purity guarantees that these references can only be on certain places. Who said Haskell couldn't be reliable? :-) . Feeling that your programs are reliable and predictable (CBV) on their long lived values and compositional on the leaf computations (CBN) is such an interesting paradigm. I wonder what other experiments will come from this fusion.
Happy hacking!
Created: 2023-04-20 jue 22:40
Comentarios
Publicar un comentario