The other day I threw together a quick program to solve an annoyance some people on our team were expressing. We sometimes do our development on remote machines, but would still like to use local file editors. There are plenty of solutions for this (SSHFS, inotify + rsync ), but none of us ever found a prebaked solution that was low-latency and robust enough to use regularly.

The program I put together is a simple client/server combo, where the client watches for local file changes and sends the updates to the server, which writes them to disk. This approach reduces latency significantly by keeping an open TCP connection, and can be tunneled over SSH for security. It's simple, and the code is short and readable.

This program turned out to be a good demonstration of a few different common problem domains when writing practical, real world Haskell code:

Communication protocols

Streaming of data

Network communications

File monitoring (eh, probably not that common, but cool)

common, but cool) Command line option parsing

This blog post series will build up from basics to being able to implement a program like the simple file mirror. This first post of three planned posts will cover communication protocols and streaming of data. You can also read:

Part 2, on network communication and concurrency

Prereqs

This series will contain a number of self-contained code samples, presented as runnable scripts. In order to run these scripts, copy them into a file ending with .hs (e.g., practical.hs ), and then run it with stack practical.hs ). You will need to install the Stack build tool. Stack will take care of downloading and installing any compiler and libraries necessary. Because of that: your first time running a script will take a bit of time while downloading and installing.

If you'd like to test that you're ready to proceed, try running this program:

#!/ usr / bin / env stack main :: IO () main = putStrLn "Hello world!"

Also, this series of posts will use the classy-prelude-conduit library, which provides a lot of built-in functionality to make it easier to follow. If you'd like to kick off a build of that library now so it's available when you want to use it, run:

$ stack --resolver nightly-2016-09-10 --install-ghc build classy-prelude-conduit

Textual versus binary data

We humans have an interesting tendancy to want to communicate - with each other and our computers - in human languages. Computers don't care: they just see binary data. Often times, this overlaps just fine, specifically when dealing with the ASCII subset of the character set. But generally speaking, we need some way to distinguish between textual and binary data.

In Haskell, we do this by using two different data types: Text and ByteString . In order to convert between these two, we need to choose a character encoding, and most often we'll choose UTF-8. We can perform this conversion with the encodeUtf8 and decodeUtf8 functions:

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = do let someText :: Text someText = unlines [ "Hello, this is English." , "Hola, este es el español." , "שלום, זה עברית." ] binary :: ByteString binary = encodeUtf8 someText filePath :: FilePath filePath = "say-hello.txt" writeFile filePath binary binary2 <- readFile filePath :: IO ByteString let text2 = decodeUtf8 binary2 putStrLn text2

If you're coming from a language that doesn't have this strong separation, it may feel clunky at first. But after years of experience with having a type-level distinction between textual and binary data, I can attest to the fact that the compiler has many times prevented me from making some mistakes that would have been terrible in practice. One example popped up when working on the simple-file-mirror tool itself: I tried to send the number of characters in a file path over the wire, instead of sending the number of bytes after encoding it.

The OverloadedStrings language extension lets us use string literals for all of these string-like things, including (as you saw above) file paths.

NOTE: For historical reasons which are being worked on, there is additionally a String type, which is redundant with Text but far less efficient. I mention it because you may see references to it when working with some Haskell documentation. Whenever possible, stick to Text or ByteString , depending on your actual data content.

Some simple data streaming

OK, that was a bit abstract. Let's do something more concrete. We're going to use the conduit library to represent streaming data. To start off simple, let's stream the numbers 1 to 10 and print each of them:

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = yieldMany [ 1 .. 10 ] $$ mapM_C print

Our yieldMany function will take a sequence of values - in this case a list from 1 to 10 - and yield them downstream. For those familiar, this is similar to yield ing in Python with generators. The idea is that we will build a pipeline of multiple components, each await ing for values from upstream and yield ing values downstream.

Our mapM_C print component will apply the function print to every value it receives from upstream. The C suffix is used for disambiguating conduit functions from non-conduit functions. Finally, the $$ operator in the middle is the "connect" function, which connects a source of data to a sink of data and runs it. As you might guess, the above prints the numbers 1 to 10.

We can also put other components into our pipeline, including functions that both await values from upstream and yield them downstream. For example:

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = yieldMany [ 1 .. 10 ] $$ mapC ( * 10 ) =$ mapM_C print

The mapC function applies a function to each value in the stream and yields it downstream, while the =$ operator connects two components together. Take a guess at what the output from this will be, and then give it a try.

NOTE There's a subtle but important different between $$ and =$ . $$ will connect two components and then run the result to completion to get a result. =$ will connect two components without running them so that it can be further connected to other components. In a single pipeline, you'll end up with one $$ usage.

We can also do more interesting consumption of a stream, like summing:

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = do total <- yieldMany [ 1 .. 10 ] $$ mapC ( * 10 ) =$ sumC print total total <- yieldMany [ 1 .. 10 ] $$ mapC ( * 10 ) =$ foldlC ( + ) 0 print total

Or limit how much of the stream we want to consume:

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = do total <- yieldMany [ 1 .. 10 ] $$ mapC ( * 10 ) =$ takeC 5 =$ sumC print total

This only scratches the surface of what we can do with conduit, but hopefully it gives enough of a basic intuition for the library to get started. If you're interested in diving in deep on the conduit library, check out the previously linked tutorial.

Streaming chunked data

Having a stream of individual bytes turns out to be inefficient in practice. It's much better to chunk a series of bytes into an efficient data structure like a ByteString . Let's see what it looks like to stream data from a file to standard output:

#!/ usr / bin / env stack import ClassyPrelude . Conduit import qualified System . IO as IO main :: IO () main = do writeFile "some-file.txt" ( "This is just some text." :: ByteString ) IO . withBinaryFile "some-file.txt" IO . ReadMode $ \ fileHandle -> sourceHandle fileHandle $$ decodeUtf8C =$ stdoutC

This is good, but what if we want to deal with the individual bytes in the stream instead of the chunks. For example, let's say we want to get just the first 10 bytes of our file. The takeC function we used above would take the first five chunks of data. We instead need a function which will work on the elements of the bytestrings (the individual bytes). Fortunately, conduit provides for this with the E -suffixed element-specific functions:

#!/ usr / bin / env stack import ClassyPrelude . Conduit import qualified System . IO as IO main :: IO () main = do writeFile "some-file.txt" ( "This is just some text." :: ByteString ) IO . withBinaryFile "some-file.txt" IO . ReadMode $ \ fileHandle -> sourceHandle fileHandle $$ takeCE 10 =$ decodeUtf8C =$ stdoutC

In the simple-file-mirror program, we will be sending files over the network, and will need to limit some operations to the actual file sizes in question. Functions like takeCE will be vital for doing this.

Managing resources

While the withBinaryFile approach above worked, it felt kind of clunky. And for more complicated control flows, opening up the file in advance won't be an option (like when we'll only know which file to open after the network connection tells us the file path). To allow for these cases, we're going to introduce the runResourceT , which allows us to acquire resources in an exception-safe manner. Let's rewrite the above example with runResourceT :

#!/ usr / bin / env stack import ClassyPrelude . Conduit main :: IO () main = do writeFile "some-file.txt" ( "This is just some text." :: ByteString ) runResourceT $ sourceFile "some-file.txt" $$ takeCE 10 =$ decodeUtf8C =$ stdoutC

Internally, sourceFile uses the bracketP function, which runs some initialization function (in our case, opening a file handle), registers some cleanup function (in our case, closing that file handle), and then performs an action with the resource. To demonstrate what that looks like more explicitly, let's write a modified sourceFile function which will return some default file contents if the requested file can't be read from.

#!/ usr / bin / env stack import ClassyPrelude . Conduit import qualified System . IO as IO sourceFileDef :: MonadResource m => FilePath -> Source m ByteString sourceFileDef fp = do let open = tryIO $ IO . openBinaryFile fp IO . ReadMode close ( Left e ) = putStrLn $ "No file to close, got an exception: " ++ tshow e close ( Right h ) = hClose h bracketP open close $ \ eitherHandle -> case eitherHandle of Left ex -> do yield "I was unable to open the file in question:

" yield $ encodeUtf8 $ tshow ex ++ "

" Right fileHandle -> sourceHandle fileHandle main :: IO () main = runResourceT $ sourceFileDef "some-file.txt" $$ decodeUtf8C =$ stdoutC

Implementing our protocol

Let's at least get started on our actual simple-file-mirror code. The wire protocol we're going to use is defined in the README, but we can describe it briefly as:

Client sends data to server, server never sends to client

A line like 9:hello.txt11:Hello World means "write the file hello.txt with the content Hello World "

means "write the file with the content " We provide lengths of both the file paths and the file contents with a decimal-encoded integer followed by a colon (similar to netstrings, but without the trailing comma)

If a file has been deleted, we use a special length of -1, e.g. 9:hello.txt-1: means " hello.txt was deleted"

So let's get started with implementing the integer send/receive logic in conduit. There are many ways of doing this, some more efficient than others. For this program, I elected to go with the simplest approach possible (though you can see some historical more complicated/efficient versions). Let's start with sending, which is pretty trivial:

#!/ usr / bin / env stack import ClassyPrelude . Conduit sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" main :: IO () main = yieldMany sampleInts $$ awaitForever sendInt =$ stdoutC where sampleInts = [ 1 , 10 , 5 , 0 , 60 ]

Here we've introduced a new function, awaitForever , which repeatedly applies a function as long as data exists on the stream. Take a guess at what the output of this program will be, and then try it out.

Now let's try out the receiving side of this, which is slightly more complicated, but not too bad:

#!/ usr / bin / env stack import ClassyPrelude . Conduit import Data . Word8 ( _colon ) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE ( /= _colon ) =$ decodeUtf8C =$ foldC dropCE 1 case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i main :: IO () main = yieldMany sampleInts $$ awaitForever sendInt =$ peekForeverE ( do i <- recvInt print i ) where sampleInts = [ 1 , 10 , 5 , 0 , 60 ]

peekForeverE is similar to awaitForever , in that it repeatedly performs an action as long as there is data on the stream. However, it's different in that it doesn't grab the data off of the stream itself, leaving it to the action provided to do that, and it deals correctly with chunked data by ignoring empty chunks.

We've also introduced takeWhileCE , which is like takeCE , but instead of giving it a fixed size of the stream to consume, it continues consuming until it finds the given byte. In our case: we consume until we get to a colon. Then we decode into UTF-8 data, and use foldC to concatenate multiple chunks of Text into a single Text value. Then we use readMay to parse the textual value into an Int . (And yes, we could do a much better job at error handling, but using error is the simplest approach.)

Building on top of these two functions, it becomes a lot easier to send and receive complete file paths. Let's put that code in here, together with a test suite to prove it's all working as expected:

#!/ usr / bin / env stack import ClassyPrelude . Conduit import Data . Word8 ( _colon ) import Test . Hspec import Test . Hspec . QuickCheck ( prop ) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" sendFilePath :: Monad m => FilePath -> Producer m ByteString sendFilePath fp = do let bs = encodeUtf8 $ pack fp :: ByteString sendInt $ length bs yield bs recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE ( /= _colon ) =$ decodeUtf8C =$ foldC dropCE 1 case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i recvFilePath :: MonadThrow m => Consumer ByteString m FilePath recvFilePath = do fpLen <- recvInt fpText <- takeCE fpLen =$= decodeUtf8C =$= foldC return $ unpack fpText main :: IO () main = hspec $ do prop "sendInt/recvInt are inverses" $ \ i -> do res <- sendInt i $$ recvInt res `shouldBe` i prop "sendFilePath/recvFilePath are inverses" $ \ fp -> do res <- sendFilePath fp $$ recvFilePath res `shouldBe` fp

We've used the hspec test framework library and its QuickCheck support to create a test suite which automatically generates test cases based on the types in our program. In this case, it will generate 100 random Int s and 100 random FilePath s each time it's run, and ensure that our properties hold. This is a great way to quickly get significant test coverage for a program.

Sending the files themselves

OK, finally time to put all of this together. We're going to add in some new functions for sending and receiving files themselves. This is a fairly simple composition of all of the work we've done until now. And this is the nice thing about Haskell in general, as well as the conduit library: purity and strong typing often make it possible to trivially combine smaller functions into more complicated beasts. Let's see this all in action:

#!/ usr / bin / env stack import ClassyPrelude . Conduit import Data . Word8 ( _colon ) import System . Directory ( createDirectoryIfMissing , doesFileExist , removeFile ) import System . FilePath ( takeDirectory ) import System . IO ( IOMode ( ReadMode ) , hFileSize , openBinaryFile ) import System . IO . Temp ( withSystemTempDirectory ) import Test . Hspec import Test . Hspec . QuickCheck ( prop ) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" sendFilePath :: Monad m => FilePath -> Producer m ByteString sendFilePath fp = do let bs = encodeUtf8 $ pack fp :: ByteString sendInt $ length bs yield bs sendFile :: MonadResource m => FilePath -> FilePath -> Producer m ByteString sendFile root fp = do sendFilePath fp let open = tryIO $ openBinaryFile fpFull ReadMode close ( Left _err ) = return () close ( Right h ) = hClose h bracketP open close $ \ eh -> case eh of Left _ex -> sendInt ( 1 ) Right h -> do size <- liftIO $ hFileSize h sendInt $ fromInteger size sourceHandle h where fpFull = root </> fp recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE ( /= _colon ) =$ decodeUtf8C =$ foldC dropCE 1 case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i recvFilePath :: MonadThrow m => Consumer ByteString m FilePath recvFilePath = do fpLen <- recvInt fpText <- takeCE fpLen =$= decodeUtf8C =$= foldC return $ unpack fpText recvFile :: MonadResource m => FilePath -> Sink ByteString m () recvFile root = do fpRel <- recvFilePath let fp = root </> fpRel fileLen <- recvInt if fileLen == ( 1 ) then liftIO $ void $ tryIO $ removeFile fp else do liftIO $ createDirectoryIfMissing True $ takeDirectory fp takeCE fileLen =$= sinkFile fp main :: IO () main = hspec $ do prop "sendInt/recvInt are inverses" $ \ i -> do res <- sendInt i $$ recvInt res `shouldBe` i prop "sendFilePath/recvFilePath are inverses" $ \ fp -> do res <- sendFilePath fp $$ recvFilePath res `shouldBe` fp it "create and delete files" $ withSystemTempDirectory "src" $ \ srcDir -> withSystemTempDirectory "dst" $ \ dstDir -> do let relPath = "somepath.txt" content = "This is the content of the file" :: ByteString writeFile ( srcDir </> relPath ) content runResourceT $ sendFile srcDir relPath $$ recvFile dstDir content' <- readFile ( dstDir </> relPath ) content' `shouldBe` content removeFile ( srcDir </> relPath ) runResourceT $ sendFile srcDir relPath $$ recvFile dstDir exists <- doesFileExist ( dstDir </> relPath ) exists `shouldBe` False

Our sendFile function looks very similar to our sourceFileDef example at the beginning of the post. But instead of streaming a default value, we just send a length of -1, as our protocol dictates. The recvFile function relies heavily on recvFilePath and recvInt . In the case of a -1 , it removes the file in question. Otherwise, it creates the containing directory if necessary, and then composes takeCE with sinkFile to stream the correct number of bytes into the file.

We also have a unit test covering the interaction of these two new functions. While some kind of property could perhaps be devised for testing this with QuickCheck, a more standard unit test seemed far more straightforward in this case.

Next time on Practical Haskell

This part of the tutorial covered quite a number of topics, so this is a good place to take a break. Next time, we'll dive into the network communication aspect of things, including:

Implementing a really simple HTTP client

Implementing an echo server

Using some basic concurrency in Haskell to have a client and server in the same process

If you have feedback on how to make this tutorial series more useful, please share it in the comments below, on Twitter (@snoyberg), or in any Reddit discussions about it. I'll try to review feedback and make changes to parts 2 and 3.

Do you like this blog post and need help with DevOps, Rust or functional programming? Contact us.

Share this