Better F# data science with FsLab and Ionide

At NDC Oslo 2016, I did a talk about some of the recent new F# projects that are making data science with F# even nicer than it used to be. The talk covered a wider range of topics, but one of the nice new thing I showed was the improved F# Interactive in the Ionide plugin for Atom and the integration with FsLab libraries that it provides.

In particular, with the latest version of Ionide for Atom and the latest version of FsLab package, you can run code in F# Interactive and you'll see resulting time series, data frames, matrices, vectors and charts as nicely pretty printed HTML objects, right in the editor. The following shows some of the features (click on it for a bigger version):

In this post, I'll write about how the new Ionide and FsLab integration works, how you can use it with your own libraries and also about some of the future plans. You can also learn more by getting the FsLab package, or watching the NDC talk:

FsLab formatters for Ionide

FsLab is just a NuGet package that references a number of other F# packages for doing data science with F#. The one thing that it adds is an easy to use load script that you can use to load all the packages from F# interactive. This means that when you download the template, the sample script file starts with something like this:

1: 2: 3: 4: 5: 6: 7: #load "packages/FsLab/Themes/AtomChester.fsx" #load "packages/FsLab/FsLab.fsx" open Deedle open FSharp . Data open XPlot . GoogleCharts open XPlot . GoogleCharts . Deedle

The first line loads a default theme that configures how embedded charts and tables will be formatted. It sets things like float formatting options, colours, fonts etc. You can find and contribute themes in the FsLab.Formatters repository - the current choice covers only one white and one dark theme for Atom. The second line is the more important one, which loads the FsLab dependencies.

The basic template comes with a minimal example that downloads two time series from the World Bank and finds the years when they were the most different:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: let wb = WorldBankData . GetDataContext () let cz = wb . Countries . ``Czech Republic`` . Indicators let eu = wb . Countries . ``European Union`` . Indicators let czschool = series cz . ``Gross enrolment ratio, tertiary, both sexes (%)`` let euschool = series eu . ``Gross enrolment ratio, tertiary, both sexes (%)`` // Get 5 years with the largest difference between EU and CZ abs ( czschool - euschool ) |> Series . sort |> Series . rev |> Series . take 5

When you run the code in Atom, a formatter for Deedle series should make it easy to see the result of the last expression - make sure to run the last 4 lines of the snippet as a separate interaction. Ionide will only show the formatted object if the formattable object is the result of the snippet. Alternatively, you can also select czschool or euschool and run Alt+Enter to see one of the source series:

Aside from Deedle series, the FsLab package registers formatters for the charting libraries that it comes with. This includes F# Charting (Windows-only), XPlot Google charts and also XPlot Plotly charts. The following example plots the two time-series using the XPlot wrapper for Google charts:

1: 2: 3: 4: [ czschool . [ 1975 .. 2010 ]; euschool . [ 1975 .. 2010 ] ] |> Chart . Line |> Chart . WithOptions ( Options ( legend = Legend ( position = "bottom" ))) |> Chart . WithLabels [ "CZ" ; "EU" ]

The Google chart is formatted according to the theme that we loaded on the first line of the script, so it looks nicely integrated with the F# Interactive window (but as I mentioned, we need your help with adding more than just the two standard Atom themes).

One of the nice aspects of how the FsLab and Ionide integration works is that it is not ad-hoc integration for just a couple of selected libraries - quite the opposite! All the FsLab formatters live in a separate repository from Ionide and you can create your own formatters that will work in exactly the same way. The following section has more details about the underlying mechanism behind all this.

Creating custom HTML formatters

The latest release of ionide-fsi, which is the F# Interactive plugin for Atom no longer runs fsi.exe in the background (like Visual Studio or all other editors), but instead it is based on the brand new FsInteractiveService. This is a light-weight server that wraps the F# Interactive functionality. It can be consumed by any editor via HTTP and it exposes API for evaluating F# code but also for getting autocompletion and other hints.

The FsInteractiveService extends the standard F# Interactive functionality with the ability to format objects as HTML. The idea is very simple. You call fsi.AddHtmlPrinter and specify a function that turns your object into an HTML string! When you evaluate an expression that returns a value that has a registered formatter, Ionide will then display it using your provided HTML formatter.

Creating HTML formatter for tables

As a basic example, say you have a type that represents a table:

1: type Table = Table of string [,]

Now, we want to create a HTML formatter that will render the table as a <table> element. To do this, all you need is to call fsi.AddHtmlPrinter . The FsInteractiveService also defines a symbol HAS_FSI_ADDHTMLPRINTER and so it is a good idea to wrap the following code in a big #if HAS_FSI_ADDHTMLPRINTER block - this way, the code will be compatible with F# Interactive in Visual Studio and other editors that do not support HTML formatters (yet).

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: fsi . AddHtmlPrinter ( fun ( Table t ) -> let body = [ yield "<table>" for i in 0 .. t . GetLength ( 0 ) - 1 do yield "<tr>" for j in 0 .. t . GetLength ( 1 ) - 1 do yield "<td>" + t . [ i , j ] + "</td>" yield "</tr>" yield "</table>" ] |> String . concat "" seq [ "style" , "<style>table { background:#f0f0f0; }</style>" ], body )

The result of the formatter function is actually seq<string * string> * string . The tuple consists of two things:

The second element is the HTML body that represents the formatted value. Typically, editors will embed this into HTML output.

A sequence of key value pairs that represents additional styles and sripts used that are required by the body. The keys can be style or script (or other custom keys supported by editors) and can be treated in a special way by the editors (e.g. loading JavaScript dynamically in Atom requires placing the HTML content in an <iframe> `).

You can now define a table as follows:

1: 2: 3: 4: let table = [ [ "Test" ; "More" ] [ "1234" ; "5678" ] ] |> array2D |> Table

In the current version, the value is only formatted when Table is returned as a direct result of an expression. This means that you need to evaluate an expression of type Table rather than, for example, a value binding as above:

1: table

When you run the above in Atom, you will see a table formatted as HTML <table> element. (Some more styling is needed to actually make this pretty, but this is a good start. Oh and did you know that Atom supports the <marquee> tag?!)

Themes, parameters and servers

In practice, there are a few other concerns that make formatting objects as HTML harder. For example, some of the HTML formatters can implement lazy loading where they use a simple web server running in the background to provide data to the view (which calls the server using JavaScript). Also, it is nice if all the HTML formatters can share the same visual theme. To make these possible, the FsInteractiveService also defines fsi.HtmlPrinterParameters which is a global value of type IDictionary<string, obj> that can be used for storing various shared configuration.

For example, the html-standalone-output parameter specifies whether the generated HTML code should be stand-alone, or whether it is allowed to use JavaScript to load data lazily (the latter is used for Deedle frames in the talk and it means you can scroll through the data, but you need to hava a server running in the background):

1: 2: 3: 4: #if HAS_FSI_ADDHTMLPRINTER let standaloneHtmlOutput = fsi.HtmlPrinterParameters.["html-standalone-output"] :?> bool #endif

There are a couple of examples of how this dictionary can be used in the standard FsLab formatters:

The DefaultWhite.fsx file shows the different kind of parameters that you can specify for default FsLab formatters. You can copy & edit it to create new visual styles for FsLab (and send a PR to FsLab.Formatters if they correspond to a common Atom theme!)

The XPlot formatter in XPlot.fs is a good example of a formatter that reads the above visual styles and uses it to customize the look of the HTML it generates.

The Deedle formatter in Deedle.fs uses a lightweight Suave server running in the background to load data from a frame or series on demand. This is a good example of a more sophisticated formatter.

FsLab Journal and looking ahead

Formatting in FsLab journals

The FsLab downloads page also lets you download a FsLab Journal template. This is something that has been available in FsLab for longer time, but I never wrote much about it. The summary is:

FsLab Journal lets you turn your F# scripts consisting of F# code snippets and Markdown formatted comments into a nice HTML report.

When you download the template, you can just run build run and your script will be turned into a HTML report in the background. When you change your script, the background runner will upadate and reload your report. If you want to produce stand-alone HTML (that does not require background server), you can run build html . The following is an opened journal, running on my machine.

In the latest version of FsLab, the formatting for journals is based on the same fsi.AddHtmlPrinter formatters. This means we get to reuse the code for it, but most importantly, when your write your own formatter, it will work with both Ionide and also with FsLab journals.

Formatting in Jupyter notebooks

One of the related projects in the F# and data science space is the F# bindings for Jupyter Notebooks. This does not yet use the same model for registering HTML formatters via fsi.AddHtmlPrinter . Instead, it has its own mechanism for registering printers, but I expect that it will be possible to merge the two so that you can just write fsi.AddHtmlPrinter once and use it in Ionide, FsLab Journals as well as Jupyter.

namespace Deedle

Multiple items

namespace FSharp



--------------------

namespace Microsoft.FSharp

Multiple items

namespace FSharp.Data



--------------------

namespace Microsoft.FSharp.Data

namespace XPlot

namespace XPlot.GoogleCharts

module Deedle



from XPlot.GoogleCharts

val wb : WorldBankData.ServiceTypes.WorldBankDataService



Full name: Fslab-ionide.wb

type WorldBankData =

static member GetDataContext : unit -> WorldBankDataService

nested type ServiceTypes



Full name: FSharp.Data.WorldBankData





<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>

WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService

val cz : WorldBankData.ServiceTypes.Indicators



Full name: Fslab-ionide.cz

property WorldBankData.ServiceTypes.WorldBankDataService.Countries: WorldBankData.ServiceTypes.Countries

val eu : WorldBankData.ServiceTypes.Indicators



Full name: Fslab-ionide.eu

val czschool : Series<int,float>



Full name: Fslab-ionide.czschool

val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)



Full name: Deedle.F# Series extensions.series

val euschool : Series<int,float>



Full name: Fslab-ionide.euschool

val abs : value:'T -> 'T (requires member Abs)



Full name: Microsoft.FSharp.Core.Operators.abs

Multiple items

module Series



from Deedle



--------------------

type Series =

new : ?type:string -> Series

member ShouldSerializeannotations : unit -> bool

member ShouldSerializeareaOpacity : unit -> bool

member ShouldSerializecolor : unit -> bool

member ShouldSerializecurveType : unit -> bool

member ShouldSerializefallingColor : unit -> bool

member ShouldSerializelineWidth : unit -> bool

member ShouldSerializepointShape : unit -> bool

member ShouldSerializepointSize : unit -> bool

member ShouldSerializerisingColor : unit -> bool

...



Full name: XPlot.GoogleCharts.Configuration.Series



--------------------

type Series<'K,'V (requires equality)> =

interface IFsiFormattable

interface ISeries<'K>

new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>

new : keys:'K [] * values:'V [] -> Series<'K,'V>

new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>

new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>

member After : lowerExclusive:'K -> Series<'K,'V>

member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)

member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)

member AsyncMaterialize : unit -> Async<Series<'K,'V>>

...



Full name: Deedle.Series<_,_>



--------------------

new : ?type:string -> Series



--------------------

new : pairs:seq<System.Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>

new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>

new : keys:'K [] * values:'V [] -> Series<'K,'V>

new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>

val sort : series:Series<'K,'V> -> Series<'K,'V> (requires equality and comparison)



Full name: Deedle.Series.sort

val rev : series:Series<'K,'T> -> Series<'K,'T> (requires equality)



Full name: Deedle.Series.rev

val take : count:int -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)



Full name: Deedle.Series.take

type Chart =

static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)

static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

...



Full name: XPlot.GoogleCharts.Chart

static member Chart.Line : data:Frame<'K,'V> * ?Options:Options -> GoogleChart (requires equality and equality)

static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)

static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)

static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Chart.WithOptions : options:Options -> chart:GoogleChart -> GoogleChart

Multiple items

type Options =

new : unit -> Options

member ShouldSerializeaggregationTarget : unit -> bool

member ShouldSerializeallValuesSuffix : unit -> bool

member ShouldSerializeallowHtml : unit -> bool

member ShouldSerializealternatingRowStyle : unit -> bool

member ShouldSerializeanimation : unit -> bool

member ShouldSerializeannotations : unit -> bool

member ShouldSerializeannotationsWidth : unit -> bool

member ShouldSerializeareaOpacity : unit -> bool

member ShouldSerializeavoidOverlappingGridLines : unit -> bool

...



Full name: XPlot.GoogleCharts.Configuration.Options



--------------------

new : unit -> Options

Multiple items

type Legend =

new : unit -> Legend

member ShouldSerializealignment : unit -> bool

member ShouldSerializemaxLines : unit -> bool

member ShouldSerializenumberFormat : unit -> bool

member ShouldSerializeposition : unit -> bool

member ShouldSerializetextStyle : unit -> bool

member alignment : string

member maxLines : int

member numberFormat : string

member position : string

...



Full name: XPlot.GoogleCharts.Configuration.Legend



--------------------

new : unit -> Legend

static member Chart.WithLabels : labels:seq<string> -> chart:GoogleChart -> GoogleChart

Multiple items

union case Table.Table: string [,] -> Table



--------------------

type Table = | Table of string [,]



Full name: Fslab-ionide.Table

Multiple items

val string : value:'T -> string



Full name: Microsoft.FSharp.Core.Operators.string



--------------------

type string = System.String



Full name: Microsoft.FSharp.Core.string

val fsi : Compiler.Interactive.InteractiveSession



Full name: Microsoft.FSharp.Compiler.Interactive.Settings.fsi

module String



from Microsoft.FSharp.Core

val concat : sep:string -> strings:seq<string> -> string



Full name: Microsoft.FSharp.Core.String.concat

Multiple items

val seq : sequence:seq<'T> -> seq<'T>



Full name: Microsoft.FSharp.Core.Operators.seq



--------------------

type seq<'T> = System.Collections.Generic.IEnumerable<'T>



Full name: Microsoft.FSharp.Collections.seq<_>

val table : Table



Full name: Fslab-ionide.table

val array2D : rows:seq<#seq<'T>> -> 'T [,]



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.array2D