I ever only want a float64 matrix or vector use gonum­/mat or gorgo­nia­/te­nsor I want to focus on doing statis­tic­al/­sci­entific work use gonum­/mat I want to focus on doing machine learning work use gonum­/mat or gorgo­nia­/te­nsor . I want multid­ime­nsional arrays use gorgo­nia­/te­nsor , or []mat.Ma­trix I want to work with different data types use gorgo­nia­/te­nsor I want to wrangle data like in Pandas or R - with data frames use knire­n/gota I want to use external computation devices like GPUs to process my data use gorgonia/tensor

</table> </div>

The last row is not included in the cheatsheet. This is because I don’t really want to falsely promote the idea that doing data science is all about doing things on the GPU. In fact, short of deep learning related work, most of data science can be done with matrices, and float64 values. Speedups from using a smaller data type or an external computation device should be considered extreme optimizations.

API

The powerhouse behind a lot of these are the Gonum packages. Gorgonia’s tensor packages have a different structure than Gonum’s Mat type, but also leverages the use of algorithms in the Gonum packages.

The APIs for these packages are different, mainly due to the different design philosophies. The APIs are documented in the cheatsheet.

Gorgonia’s tensor package was designed to be much closer to Numpy’s API due to my familiarity with Numpy. It returns errors when possible except in object creation functions because one of the earliest uses of Gorgonia was to build an interactive neural network explorer that I used for my teaching courses*The program was last properly used in March, and when I recorded the asciinema piece, it was months later so I had to look up some of the syntax I wrote hence the random pauses in the video..

Gonum’s API rationale can be found in this excellent presentation. Both families of packages share Rule 1: No Silent Bugs. I like that philosophy a lot.

At this point, it wouldn’t be amiss to also enumerate the points of commonality and differences in design philosophy with Gorgonia:

Topic Gonum Gorgonia Rule 1 No Silent Bugs No Silent Bugs Panic when Error is easy to check before call Impossible parameters are passed into object creation functions Return errors when It's impossible to check without performing the operation Almost all functions and methods may return error. Memory Reuse If a function creates a type, allow a destination as a first argument

But allow a nil destination for ease Copies always created, unless WithReuse or UseUnsafe is passed in

or is passed in Function options are useful "Functional" programming Functions and methods should not modify state, unless that is their only purpose. Functions and methods should not modify state, unless the function options WithReuse or UseUnsafe is passed in. Idioms Try to reuse as much Go idioms as possible Provide commonly used methods, even if it violates some idiomatic Go. Aim to bridge the gap between Numpy and Go. Package functions Perform operations on interface types Perform operations on interface types Type methods Perform operations intrinsic to the type Perform operations intrinsic to the type

Perform operations with parameters of the same type

Algorithms

Another thing I’d like to highlight is that Gonum actually comes with a bunch of other packages which are actually useful for doing data science work. For a lot of data science work I find it more than ample that almost all my needs are filled. For smaller scale things (which is most of the cases in most of the works), I just use Gonum.

When I need to use gorgonia.org/tensor , the package provides interop with Gonum’s Mat and *Dense types. The methods that come with Gorgonia tensors are basic methods. Any additional algorithms would require extension, which can be done by creating a new ExecutionEngine . That’s fairly advanced Gorgonia work though.

Workflow

The datasets I work with are fairly known ahead of time. Exploration of said data set is either done on a SQL client, or Jupyter. But when it comes to data science work that can be pushed to production (typically in an executable), I drop straight into Go.

The packages I use are your bog standard SQL libraries, the CSV encoding libraries and Gorgonia or Gonum. That’s one of the things about Go: it’s such a simple language, that there is really nothing fancy to show off. It’s straightforwards - what you read is what happens. Code becomes boring, and there are no “One Weird Trick"s. So there isn’t really anything to blog about on that end.

I personally prefer not to use any frameworks to perform operations. This is mainly due to the fact that the operations I write for the tasks tend to be quite task specific. For a lot of functions I just write them. Few things are truly re-usable as-is. If you set out to write a truly reusable function you tend to end up with many many parameters and super overly complicated code to handle all the edge cases. Not particularly my cup of tea. And this is from the guy who wrote a fairly generic multidimensional array for go.

Generic frameworks that claim to do everything for everyone doesn’t really suit my workflow - it typically ends up with me bending the logics of my programs in weird and potentially buggy ways to fit the ideals of frameworks.

What’s Missing

Do I miss nice APIs from Scikit Learn? Occasionally. I especially miss the classification_report and in fact the entire sklearn.metrics methods. I don’t miss the fit() -style APIs tho.

I think the design of those APIs are quite silly. I think those APIs shouldn’t have a one-size-fits-all method. You’ll end up with silliness like fit_transform() *Before anyone asks: I think the transform process should be a clear and separate step. Mashing it all into one just makes things confusing for your future self. Interface definitions should in my opinion be lazily done. That’s the beauty of Go’s implicit interfaces.

Plotting is another thing that is missing in the Go ecosystem. That usually isn’t a problem for my deep learning related work - I just pipe the json out to a file and have a separate webserver that reads the file and plot it using plotly or something.

On the dynamic data exploration front, I think that’s what is missing. The entire dynamic exploration of data in Go.

Gophernotes and Gota aim to fix that. But at this point, they’re still quite young. As are Gonum and Gorgonia to be honest. As the developer of Gorgonia family of packages I keep getting the anxiety-inducing feeling that the packages are somehow still broken from time to time.

Other Resources