Presenting data-forced, an alternative to bang patterns for long lived data structures

Presenting data-forced, an alternative to bang patterns for long lived data structures

As a consequence of my previous post, I crafted a new way to deal with liveness leaks: data-forced. It relies on CBV functions and tagging values when we know they have been evaluated. It is not complete, it is possible to be mis-used and not get any benefits, but it shows promise and gives stronger guarantees than the current status quo. Also, I am starting a new job next week, so I will have less time for hacking and writing.

1. The problem

Long lived data structures that store thunks will keep the references needed to evaluate those thunks alive for a long time. This is a consequence of not pairing the denotation of the values with a good consumer. Current ways to deal with this on Haskell are:

  1. Use Bang patterns and/or seq to force evaluation.
  2. Make laziness unrepresentable at the data type level.

Both have their pitfalls. For Bang Patterns and/or seq you can forget to put the bang at the bound location. Given the signature for seq, nothing will signal the missing evaluation. You can put discipline on yourself to not forget to put the bangs, but once you start collaborating with other people, you cannot rely on just discipline.

For the second alternative, it helps more with strictness leaks than with liveness leaks. It requires to rewrite our data structures (or using the strict-wrapper library) and the functions that operate over them. Code reuse suffers because of it.

2. The data-forced approach

At its core, data-forced provide two newtypes:

  • newtype ForcedWHNF a = ForcedWHNF a
  • newtype ForcedNF a = ForcedNF a

The library does not export the constructors (so you cannot coerce). It does export a unidirectional pattern so you can pattern match. The idea is that if you hold a value of type ForcedWHNF a you know that the underlying value has been forced to WHNF and thus, all references needed to evaluate to this level are not kept marked as alive because of it. This is a cool invariant and just what we need for our long lived data structures.

Now you can design you long lived data structure as this

import Data.Map.Lazy as ML -- Spine strict, lazy leaves

type MyMap1 = ML.Map Char (ForcedWHNF Int)
type MyMap2 = ForcedWHNF (ML.Map Char (ForcedNF (Maybe Int)))

Here Map1 is equivalent to a Data.Map.Strict on strictness. The leaves cannot contain extra references on their thunks. MyMap2 goes even further as the outer ForcedWHNF guarantees prompt removal of objects upon deletion. Given the type signature, you cannot forget to force the evaluation, this is in contrast to bang patterns.

You must be thinking: How do we generate these magical ForcedNF a values? Well the answer certainly feels like magic.

3. UnliftedDatatypes have a different calling convention (CBV)

This extension was proposed by Sebastian Graf some years ago. It does a lot of things, but one interesting addition is the semantics of bound values on let binding when the type is kind Unlifted. In this case we use Call by Value semantics.

For example

{-# LANGUAGE UnliftedDatatypes, GADTSyntax #-}

import GHC.Exts

data UMaybe (a :: TYPE UnliftedRep) :: TYPE UnliftedRep where
  UNothing :: UMaybe a
  UJust :: a -> UMaybe a

-- error is levity polymorphic.
foo :: Int -> Int
foo v = let a = UJust (error "boom!")
         in v + 2

here foo will return bottom with the boom message. The binding of a evaluated its right hand side before being bound.

Obviously for this to be useful, we needed to evaluate foo v at some point. Yet for long lived data structures, if the leaf values are bound close to main (in a strict monad context) we can enforce that evaluation happens automatically without bangs.

Another pitfall is that CBV computations cannot be inlined without affecting the semantics. The library tries to avoid these issues and steer you on the right direction, yet it is still possible to hide a CBV computation inside a lazy value and have the whole exercise be pointless. This is what I mean with the library not being complete yet. I am experimenting with template Haskell based solutions for this problem but there are some roadblocks. In the mean time I rely on documentation to show how to use this library in a correct way. I still think it is a worthwhile addition to the ecosystem.

The archetypal usage of the library uses two CBV functions:

  • strictlyWHNF : forall a. a -> StrictValueExtractor (ForcedWHNF a)
  • strictlyNF : forall a. NFData a => a -> StrictValueExtractor (ForcedNF a)

The first argument is the computation that you want to evaluate. It returns you a Unlifted value holding a Lifted value that is not holding its references hostages. It is to be use like this:

-- No references on added leafs even though it is a lazy map.
basicEvent :: ML.Map Char (ForcedWHNF Int) -> IO (ML.Map Char (ForcedWHNF Int))
basicEvent map0 = do
  let
    -- Step1: bind the strict value with a strict let. (2 + 2) reduced
    -- before val0 is bound.
    val0 :: StrictValueExtractor (ForcedWHNF Int)
    val0 = strictlyWHNF (2 + 2)
    -- val0 = strictlyWHNF (error "argument evaluated") -- would fail

    -- Step2: extract the strict value to be use on lazy setting. A
    -- neccesary idiom to avoid a pitfall.
    val1 = case val0 of { Pairy val0' ext -> ext val0' }

    -- Step3: Store the value free of references. Even though map1 is a lazy
    -- map, the references to evaluate val1 were already freed.
    map1 = ML.insert 'a' val1 map0
  pure map1

here val0 does the evaluation as soon that value is bound. The val1 computation is there to extract the value from the unlifted environment through the continuation that is hidden. This is done this way to avoid the common pitfall of lazy binding on the let of the strict computations, turning it on a lazy computation without noticing. Check the library documentation. From then on, you can use this value as you see fit on a lazy environment.

4. Conclusion

It still irks me that I couldn't force the correct usage of this library through the type system and had to rely on documentation. I will keep experimenting with template haskell to see if I can reach a more ergonomic solution. Another alternative is to bite the bullet and hack on GHC to add a syntactic check to enforce this pattern, along the lines of -Wunbanged-strict-patterns.

In the meantime, this library is my go-to alternative to avoid liveness leaks on Haskell. Most of my programs have their long lived data structures planned upfront, now I will be also avoiding these leaks on the planning phase as purity guarantees that these references can only be on certain places. Who said Haskell couldn't be reliable? :-) . Feeling that your programs are reliable and predictable (CBV) on their long lived values and compositional on the leaf computations (CBN) is such an interesting paradigm. I wonder what other experiments will come from this fusion.

Happy hacking!

Created: 2023-04-20 jue 22:40

Validate

Comentarios

Entradas más populares de este blog

How to avoid correctness space leaks on a lazy setting in haskell

An apologia for lazy evaluation