Announcing FsLab Data science package

After over a year of working on FsLab and talking about it at conferences, it is finally time for an official announcement. So, today, I'm excited to announce FsLab - a cross-platform package for doing data science with .NET and Mono.

It is probably not necessary to explain why data science is an important area. We live surrounded by information, but extracting useful knowledge from the vast amounts of data is not an easy task. You have to access data in different formats (JSON-based REST services, XML, CSV files or even HTML tables), you need to deal with missing values, combine and align data from multiple sources and then build visualizations (or reports) to tell the right story.

The goal of FsLab is to make this process easier. FsLab combines the power of F# type providers, the efficiency and robustness of Mono and .NET and the high quality engineering of the open-source ecosystem around F# and C#.

FsLab links and resources

FsLab questions and answers

Rather than writing a long introduction about FsLab, the following tries to answer the most important questions that you might have about FsLab using the Q & A format.

Why should I choose FsLab over X?

There is a couple of things that FsLab does exceptionally well. With F# Data type providers, you get type-safe access to a wide range of external data sources with tooling that no other data science package can offer. FsLab also runs on Mono and .NET and so it is extremely easy to turn your experiments into production-quality code. For many other tasks, you can easily call other tools such as R using the R type provider.

Is FsLab only for F#?

No. Some of the libraries that are a part of FsLab have excellent C# support - most importantly, Deedle, which is the core library for working with data frames and data series has an excellent C# support. The libraries that rely on type providers are F#-only, but you can use them and then expose the functionality to C#, Visual Basic .NET or any other .NET language.

Who is behind FsLab?

FsLab is a community effort with a large number of contributors - both individuals and companies. BlueMountain Capital is funding the development of R type provider and Deedle, F# Data is maintained by Gustavo Guerra and contributors, Math.NET is maintained by Christoph Rüegg and contributors. Finally, I'm the maintainer of the FsLab package. Commercial support and training for FsLab is available from fsharpWorks.

What is the FsLab roadmap?

There is no official roadmap yet. Please help us shape it by joining the discussion! However, there are a couple of things that are coming to FsLab very soon:

We're integrating FsLab with XPlot to provide cross-platform HTML5 charting.

We're working on FsLab Journal template which lets you generate reports from scripts.

We're integrating FsLab with M-Brace, which lets you scale your scripts to the cloud.

We're working on BigDeedle, a new backend for Deedle that makes it possible to treat big data as ordinary frames and series.

Demonstrating the FsLab approach

I don't want to turn this announcement into a technical post about FsLab, but since FsLab is very much about technology, I'll give you at least a quick demo. The demo illustrates the 2 key ideas that FsLab follows:

Access, analyze, visualize cycle - when doing data science, you typically follow this cycle a number of times. You get some data, try to explore it, visualize the results and then repeat. FsLab gives you great tools for all three steps.

Integrate with leading technologies - FsLab has some great libraries and excells in some areas (like data access). For other tasks, it can integrate with other technologies - it lets you call R packages and visualize data using Google Charts.

To start with FsLab, you need to download FsLab package or a template. Then you can write an F# script file that references FsLab and opens all necessary namespaces:

1: 2: 3: 4: 5: #load "packages/FsLab/FsLab.fsx" open Deedle open FSharp . Data open XPlot . GoogleCharts open XPlot . GoogleCharts . Deedle

The example uses F# Data for data access, Deedle for working with time series and XPlot for producing Google Charts.

We'll use the World Bank type provider to get the population in the largest city of Czech Republic as a time series. When writing the code in F#-enabled editor, you'll get auto-completion offering all countries of the world and thousands of indicators:

1: 2: 3: 4: 5: let wb = WorldBankData . GetDataContext () let pop = wb . Countries . ``Czech Republic`` . Indicators . ``Population in largest city`` |> series

The |> operator passes the data from World Bank to the series function to create a Deedle series that gives you a nice way to explore the data. When you run the code in F# REPL, you'll see a printout showing the first few years and last few years of the time series (Prague had 1,000,830 inhabitants in 1960 and 1,302,883 inhabitants in 2014).

Next, we'll use the R type provider to call the R stats package to calculate linear regression:

1: 2: 3: 4: 5: 6: open RProvider open RProvider . stats let df = frame [ "pop" => pop ] df ? years <- pop . Keys df ? predict <- R . predict ( R . lm ( "pop~years" , df )) . GetValue < float [] > ()

The first two lines reference the R type provider. Again, thanks to the type provider mechanism, you get auto-completion on RProvider. (with all installed R packages) and on R. (with all available R functions).

The code then creates a Deedle data frame df with columns pop (from World Bank data), years (with the keys of the pop series) and then it uses R.lm and R.predict to calculate linear regression model and use it to predict values for the current range of years.

With three more lines of code, we can build a Google Charts chart comparing the actual data with the data predicted by the linear regression model:

1: 2: 3: [ df ? predict ; df ? pop ] |> Chart . Line |> Chart . WithOptions ( Options ( title = "Prague Population" ))

I embedded the chart below by hand, but you can also use the FsLab Journal template, which produces the HTML automatically from your F# script:

Summary

FsLab is a collection of high quality libraries for doing data science on Mono and .NET. It combines the power of F# type providers for data access, it lets you easily explore ideas, while writing code for a robust platform that is easy to deploy.

Many of the libraries that are included in FsLab have been around for some time, have been used in production and have a large number of contributors, both from the open-source community and from commercial companies.

Even the FsLab package itself existed for some time - but with this announcement, the project reaches a new milestone. We've done a lot of work on making FsLab stable, well documented and truly cross-platform over the last few months and many more things are coming in the near future. So stay tuned, send us feedback, contribute and try FsLab now!

namespace Deedle

Multiple items

namespace FSharp



--------------------

namespace Microsoft.FSharp

Multiple items

namespace FSharp.Data



--------------------

namespace Microsoft.FSharp.Data

namespace XPlot

namespace XPlot.GoogleCharts

module Deedle



from XPlot.GoogleCharts

val wb : WorldBankData.ServiceTypes.WorldBankDataService



Full name: Announcing-fslab.wb

type WorldBankData =

static member GetDataContext : unit -> WorldBankDataService

nested type ServiceTypes



Full name: FSharp.Data.WorldBankData

WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService

val pop : Series<key,'a>



Full name: Announcing-fslab.pop

val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)



Full name: Deedle.FSharpSeriesExtensions.series

namespace RProvider

val df : Frame<key,string>



Full name: Announcing-fslab.df

val frame : columns:seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)



Full name: Deedle.FSharpFrameExtensions.frame

property Series.Keys: seq<key>

Multiple items

val float : value:'T -> float (requires member op_Explicit)



Full name: Microsoft.FSharp.Core.Operators.float



--------------------

type float = System.Double



Full name: Microsoft.FSharp.Core.float



--------------------

type float<'Measure> = float



Full name: Microsoft.FSharp.Core.float<_>

val pop : Series<key,Frame<'a,'b>> (requires equality and equality)



Full name: Announcing-fslab.pop

type Chart =

static member Annotation : data:seq<#seq<DateTime * 'V * string * string>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'V :> value)

static member Annotation : data:seq<DateTime * #value * string * string> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Area : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Area : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bar : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Bar : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Bubble : data:seq<string * #value * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Calendar : data:seq<DateTime * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

...



Full name: XPlot.GoogleCharts.Chart

static member Chart.Line : data:Frame<'K,'V> * ?Options:Options -> GoogleChart (requires equality and equality)

static member Chart.Line : data:Series<'K,#value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)

static member Chart.Line : data:seq<Series<'K,#value>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires equality and 'K :> key)

static member Chart.Line : data:seq<#seq<'K * 'V>> * ?Labels:seq<string> * ?Options:Options -> GoogleChart (requires 'K :> key and 'V :> value)

static member Chart.Line : data:seq<#key * #value> * ?Labels:seq<string> * ?Options:Options -> GoogleChart

static member Chart.WithOptions : options:Options -> chart:GoogleChart -> GoogleChart

Multiple items

type Options =

new : unit -> Options

member ShouldSerializeaggregationTarget : unit -> bool

member ShouldSerializeallValuesSuffix : unit -> bool

member ShouldSerializeallowHtml : unit -> bool

member ShouldSerializealternatingRowStyle : unit -> bool

member ShouldSerializeanimation : unit -> bool

member ShouldSerializeannotations : unit -> bool

member ShouldSerializeannotationsWidth : unit -> bool

member ShouldSerializeareaOpacity : unit -> bool

member ShouldSerializeavoidOverlappingGridLines : unit -> bool

...



Full name: XPlot.GoogleCharts.Configuration.Options



--------------------

new : unit -> Options