Are you frustrated trying to do any kind of string manipulation in Haskell? The same kind of interpolation or string building that would require zero thought in other languages seems to always turn into a mess of (<>) s, mconcat s, and show s in vanilla Haskell. Throw in Text and ByteString and now we have to juggle collisions between different definitions of pack , module imports, lazy versus strict versions of the same functions… not to mention having to manually handle Unicode conversions between Text and ByteString, even when 99% of the time you know it’ll be UTF-8. In short, it just becomes a huge mess.

I got fed up with this situation and wrote string-interpolate .

Here’s how you use it:

{-# LANGUAGE QuasiQuotes #-} import Data.String.Interpolate ( i ) ( i ) import Data.Text firstName :: String = "Yuuko" firstName lastName :: String = "Shirasawa" lastName message :: Text = [i|Good morning, #{firstName} #{lastName}!|] message[i|Good morning, #{firstName} #{lastName}!|]

Note how we can directly interpolate Strings into a Text without needing to do any explicit conversion; interpolation does what you expect, where textual types and Chars will be interpolated without the surrounding quotes.

You don’t have to worry about lazy versus strict:

import qualified Data.Text as T import qualified Data.Text.Lazy as LT description :: T.Text = "a very, very old strategy game" description payload :: LT . Text = [i|{ 'description': '#{description}' }|] payload[i|{ 'description': '#{description}' }|]

Anything which implements Show gets interpolated as you’d expect:

data Direction = Left | Right deriving Show λ > dir = Left dir λ > [i|Heading: #{dir}|] :: String [i|Heading: #{dir}|] >>> "Heading: Left"

Conversions between Text/String and ByteString automatically convert to/from UTF-8:

import Data.Text import Data.ByteString text :: Text = "💩" text λ > [i|Bytes: #{text}|] :: ByteString [i|Bytes: #{text}|] >>> "Bytes: \240\159\146\169"

In short, it makes building up strings much, much easier. It’s also fast!

The rest of this post is about the internal architecture of string-interpolate; while the tricks used involving -XOverlappingInstances and related type hackery should be old hat to advanced Haskellers, I haven’t seen a case study of using them in practice.

While using string-interpolate is extremely simple, internally we have to somehow transform the quasiquotes that users give us into code that:

handle taking in values of the various textual types to interpolate, and outputting every textual type

Since quasiquote handlers don’t allow you to look at the type of identifiers, we can’t actually emit some code conditionally; we have to spit out the same code no matter what types our users give us. Sounds like a job for typeclasses.

The simplest way to implement string-interpolate would just be to use Show and IsString together, since every textual type already implements both of these. So a quasiquote like:

would emit code like:1

mconcat [ "Username: " show username) , fromString (username) , " | Updoots: " show updoots) , fromString (updoots) ]

While this immediately gets us to a working, convenient solution, it’s painfully slow; using show and toString means we have to construct a String for every interpolated value and then convert it into our target type. Just as importantly, it doesn’t handle Unicode strings correctly! If we took our emoji-containing Text above and tried to interpolate it into a ByteString, we’d end up just getting a string containing the Unicode code point as a sequence of digit characters. Ditto if we tried to interpolate the UTF-8 bytes into a Text, just with the individual bytes. Weirdly, interpolating a ByteString into another ByteString won’t even give you the same string if that ByteString isn’t ASCII, which certainly seems counterintuitive for any users.

Another idea would be to create our own Interpolatable typeclass. Since we want the behavior to depend both on what we’re interpolating in and what we want out, we’d have to take in two parameters:

{-# LANGUAGE MultiParamTypeClasses #-} -- from `text-conversions' import Data.Text.Conversions -- from `utf8-string' import qualified Data.ByteString.UTF8 as UTF8 class Interpolatable src dst where src dst interpolate :: src -> dst srcdst instance Interpolatable Text Text where = id interpolate instance Interpolatable ByteString ByteString where = id interpolate instance Interpolatable Text ByteString where = unUTF8 . convertText interpolateunUTF8convertText instance Interpolatable ByteString Text where -- emit replacement character � for anything -- which isn't valid UTF-8 = UTF8.foldr interpolateUTF8.foldr -> Data.Text.singleton char <> output) (\char outputData.Text.singleton charoutput) ""

So that our quasiquote from earlier now emits something like:

mconcat [ "Username: " , interpolate username , " | Updoots: " , interpolate updoots ]

Now we’re fast and handle Unicode properly, but it means that we’re kind of limited in what we can interpolate. Do we really want to provide instances for, say, Int , Integer , Float , Double , Bool , and so on? Not to mention that if someone wants to use their own type with our interpolation, they’ll also have to write their own instances for Interpolatable. Wasn’t our library supposed to be convenient?

It would be nice if we could simply use Show as a default, and use the specific instances if we’ve defined one. What happens if we just try to add an instance for anything that’s Show?

{-# LANGUAGE FlexibleInstances #-} instance Show src => Interpolatable src Text where srcsrc = Data.Text.pack . show interpolateData.Text.pack instance Show src => Interpolatable src ByteString where srcsrc = unUTF8 . convertText . show interpolateunUTF8convertText

This seems to compile fine, but when we try it out, GHC starts complaining at us:

λ > text = "abc" :: Text text λ > (interpolate text) :: ByteString (interpolate text) < interactive >: 76 : 1 : error : interactive • Overlapping instances for Interpolatable Text ByteString instances for of ‘interpolate’ arising from a use‘interpolate’ Matching instances : instances instance [safe] Show src => Interpolatable src ByteString [safe]srcsrc -- Defined at <interactive>:70:10 instance [safe] Interpolatable Text ByteString [safe] -- Defined at <interactive>:44:10 • In the expression : interpolate text :: ByteString the expressioninterpolate In an equation for ‘it’ : it = interpolate text :: ByteString an equation for ‘it’itinterpolate

Hmm. Well, that’s not great.

What GHC is telling us is that it doesn’t know which typeclass instance to use for our call to interpolate ; it could use the Interpolatable Text ByteString instance, which is probably what we want. But the Interpolatable src ByteString instance is also possible, since src could unify with Text .

We’re able to describe in plain English what we want the compiler to do: only fall back to the Show -based instance if we don’t have something more specific. Is there a way to explain that to the compiler? Thankfully, yes!

If we mark our instances as overlapping, GHC will attempt to pick the most specific instance when there are multiple that might work. The exact instance matching rules are explained in detail here.

instance {-# OVERLAPS #-} Interpolatable Text ByteString where = ... interpolate instance {-# OVERLAPS #-} Show src => Interpolatable src ByteString where srcsrc = ... interpolate λ > text = "abc" :: Text text λ > (interpolate text) :: ByteString (interpolate text) >>> "abc" λ > (interpolate True ) :: ByteString (interpolate >>> "True"

Success!

In practice, I’ve found overlapping instances to be useful for exactly this sort of behavior: providing a “default” typeclass behavior that can be given specific overrides.

At this point we’ve sketched out the implementation of a fast, Unicode-aware interpolation backend; you could totally take this and spin off a new library with it. However, there’s one last tweak we want to make to how this works for string-interpolate. Relying on the Monoid-based implementation to handle assembling the final output is okay in most cases, but would blow up if the interpolation gets really large. Both Text and ByteString have their own Builder types that are designed for efficiently constructing instances of their associated textual types. Could we turn our interpolation fragments into that intermediate type first, and then use that to output the result, instead of having to directly convert our fragments into the target type?

As it turns out, yes, but here’s where the type hacking starts to get really hairy.

We need to associate with each output type what its builder type is, which sounds like a job for type families. If you’re not familiar with type families, all they are are functions that operate on type terms instead of data. So we can have a function that takes in the output type and returns the builder type. One nice thing is that there’s a convenient syntax for doing this right in the typeclass instance itself:

{-# LANGUAGE TypeFamilies #-} {-# LANGUAGE AllowAmbiguousTypes #-} import Data.ByteString import qualified Data.ByteString.Lazy as LB import qualified Data.ByteString.Builder as B import Data.Text.Conversions class InterpSink dst where dst type SinkBuilder dst :: * finalize :: SinkBuilder dst -> dst dstdst combine :: SinkBuilder dst -> SinkBuilder dst -> SinkBuilder dst dstdstdst -- For the verbatim parts of the interpolation quasiquote literal :: String -> SinkBuilder dst dst class InterpSink dst => Interpolatable src dst where dstsrc dst interpolate :: src -> SinkBuilder dst srcdst instance {-# OVERLAPS #-} InterpSink ByteString where type SinkBuilder ByteString = B.Builder = LB.toStrict . B.toLazyByteString finalizeLB.toStrictB.toLazyByteString = ( <> ) combine = B.lazyByteString . unUTF8 . convertText literalB.lazyByteStringunUTF8convertText

Once again, though, we run into an issue when we try to define our “default” implementation:

{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE UndecidableInstances #-} import Data.String import Text.Show ( ShowS , showString ) , showString ) instance {-# OVERLAPS #-} IsString dst => InterpSink dst where dstdst type SinkBuilder dst = ShowS dst = fromString . ( $ "" ) finalizefromString = ( . ) combine = showString literal -- <interactive>:33:8: error: -- Conflicting family instance declarations: -- SinkBuilder ByteString = B.Builder -- Defined at <interactive>:33:8 -- SinkBuilder dst = ShowS -- Defined at <interactive>:62:8

Unfortunately, type families don’t play nice with overlaps. The problem is exactly as it was with our basic overlapping instances: there are two possibilities for what the type SinkBuilder ByteString might be, and the compiler doesn’t currently have the ability to decide between them. Here there doesn’t seem to be any way to get around this restriction on overlapping instances. So instead, we have to figure out a way to make our instances not overlapping.

The approach I chose was to add another parameter to all the involved typeclasses purely for the purpose of making the instance heads not collide. So the typeclasses look like this:

{-# LANGUAGE DataKinds #-} {-# LANGUAGE KindSignatures #-} data SinkDefinition = Default | Specific type family HasCustomSink dst where dst HasCustomSink ByteString = 'Specific HasCustomSink _ = 'Default class ( HasCustomSink dst ~ flag) => InterpSink ( flag :: SinkDefinition ) dst where dstflag)) dst type SinkBuilder flag dst :: * flag finalize :: SinkBuilder flag dst -> dst flag dstdst combine :: SinkBuilder flag dst -> SinkBuilder flag dst -> SinkBuilder flag dst flag dstflag dstflag dst literal :: String -> SinkBuilder flag dst flag dst class ( HasCustomSink dst ~ flag) => Interpolatable ( flag :: SinkDefinition ) src dst where dstflag)) src dst interpolate :: src -> SinkBuilder flag dst srcflag dst

And once that’s set up, we can write our instances using the flag to make them distinct:

instance InterpSink 'Specific ByteString where type SinkBuilder 'Specific ByteString = B.Builder ... instance ( IsString dst, HasCustomSink dst ~ 'Default ) dst,dst => InterpSink 'Default dst where dst type SinkBuilder 'Default dst = ShowS dst ...

Since the flag parameter in the instances differ, a type can’t match both of these at once, solving our overlap problem. The rest is then implementing all the instances for InterpSink and Interpolatable that we want to offer.

So… success! We have our working interpolation backend, using the most efficient builder types that we can for each textual type. Unfortunately, we gave up the ability for library users to override the Show/IsString default behavior for their own types, since there’s no way to define a new instance of HasCustomSink . If that was a priority, we could stick with our two-parameter version using overlapping instances and take the performance hit on more complicated interpolations.

Unfortunately, I haven’t been able to figure out how to get string-interpolate to have all three of 1) usage of efficient intermediate types, 2) defaulting to Show/IsString, and 3) allowing library users to customise the behavior for their own types. Any two together seem doable, but getting all three seems not to be.

Have you run into obvious use cases that Haskell makes way harder than other languages? Do you like string-interpolate and find it useful? Talk to me!