Parens for Python - UMAP & Trimap

We are going to explore some more Python libraries through the use of libpython-clj.

This time, we are going to focus on a couple dimensionality reduction libraries called UMAP and Trimap. They are going to need a few support libraries installed to go through the examples:

{ :deps { org.clojure/clojure { :mvn/version "1.10.1" } cnuernber/libpython-clj { :mvn/version "1.36" } } } deps.edn Clojure

Install the python dependencies

pip3 install seaborn pip3 install matplotlib pip3 install sklearn pip3 install numpy pip3 install pandas pip3 install umap-learn pip3 install trimap 36.9s Clj & Python env (Bash in Clojure) Clojure

We also need to setup a plotting alias with matplotlib

( ns gigasquid.plot ( :require [ libpython-clj.require :refer [ require-python ] ] [ libpython-clj.python :as py :refer [ py. py.. py.- ] ] ) ) 14.4s Clj & Python env (Clojure) Clojure

First, we have to define a quick macro to show the plotting for our local system. This allows matplotlib, (the library that seaborn is built on), to be able to be shown headlessly.

( def mplt ( py/import-module "matplotlib" ) ) ( py. mplt "use" "Agg" ) ( require-python matplotlib.pyplot ) ( require-python matplotlib.backends.backend_agg ) ( defmacro with-show "Takes forms with mathplotlib.pyplot to then show locally" [ & body ] ( let [ _# ( matplotlib.pyplot/clf ) fig# ( matplotlib.pyplot/figure ) agg-canvas# ( matplotlib.backends.backend_agg/FigureCanvasAgg fig# ) ] ( cons do body ) ( py. agg-canvas# "draw" ) ( matplotlib.pyplot/savefig ( str "results/" gensym ".png" ) ) ) ) 2.6s Clj & Python env (Clojure) Clojure gigasquid.plot/with-show

UMAP

UMAP is a dimensionality reduction library. It seems like a lot of words, but it basically takes a complicated dataset with many variables and reduces it down to something much simpler without losing the fundamental characteristics.

( ns gigasquid.umap ( :require [ libpython-clj.require :refer [ require-python ] ] [ libpython-clj.python :as py :refer [ py. py.. py.- ] ] [ gigasquid.plot :as plot ] ) ) ( require-python [ seaborn :as sns ] ) ( require-python [ matplotlib.pyplot :as pyplot ] ) ( require-python [ sklearn.datasets :as sk-data ] ) ( require-python [ sklearn.model_selection :as sk-model ] ) ( require-python [ numpy :as numpy ] ) ( require-python [ pandas :as pandas ] ) ( require-python [ umap :as umap ] ) 5.4s Clj & Python env (Clojure) Clojure :ok

Next we are going to follow along the code tutorial from https://umap-learn.readthedocs.io/en/latest/basic_usage.html

We next setup the defaults for plotting and get some data to work with. We'll look at the Iris dataset. It isn't very representative in terms of real world data since btoht the number of points and features are small, but it will illustrate what is going on with dimensionality reduction.

( sns/set ) ( def iris ( sk-data/load_iris ) ) ( py.- iris DESCR ) 0.1s Clj & Python env (Clojure) Clojure

We define a data frame and a series for the data set and can then plot the species.

( def iris-df ( pandas/DataFrame ( py.- iris data ) :columns ( py.- iris feature_names ) ) ) ( py/att-type-map iris-df ) ( def iris-name-series ( let [ iris-name-map ( zipmap ( range 3 ) ( py.- iris target_names ) ) ] ( pandas/Series ( map ( fn [ item ] ( get iris-name-map item ) ) ( py.- iris target ) ) ) ) ) ( py. iris-df __setitem__ "species" iris-name-series ) ( py/get-item iris-df "species" ) ( plot/with-show ( sns/pairplot iris-df :hue "species" ) ) 5.6s Clj & Python env (Clojure) Clojure

Now time to reduce! First we define a reducer and than train it to lean about the manifold. The fit_tranforms function first fits data and then transforms it into a numpy array.

( def reducer ( umap/UMAP ) ) ( def embedding ( py. reducer fit_transform ( py.- iris data ) ) ) ( py.- embedding shape ) ( str ( first embedding ) ) 5.3s Clj & Python env (Clojure) Clojure " [14.31796 -4.056695] "

( let [ colors ( mapv ( py/get-item ( sns/color_palette ) % ) ( py.- iris target ) ) x ( mapv first embedding ) y ( mapv last embedding ) ] ( plot/with-show ( pyplot/scatter x y :c colors ) ( py. ( pyplot/gca ) set_aspect "equal" "datalim" ) ( pyplot/title "UMAP projection of the Iris dataset" :fontsize 24 ) ) ) 1.0s Clj & Python env (Clojure) Clojure

UMAP with Digits Data

Now let's use a dataset with more complicated data. The handwritten digit set we all know and love.

( def digits ( sk-data/load_digits ) ) ( str ( py.- digits DESCR ) ) 0.2s Clj & Python env (Clojure) Clojure

Let's take a look at the images to see what we are dealing with:

( plot/with-show ( let [ [ fig ax-array ] ( pyplot/subplots 20 20 ) axes ( py. ax-array flatten ) ] ( doall ( map-indexed ( fn [ i ax ] ( py. ax imshow ( py/get-item ( py.- digits images ) i ) :cmap "gray_r" ) ) axes ) ) ( pyplot/setp axes :xticks [ ] :yticks [ ] :frame_on false ) ( pyplot/tight_layout :h_pad 0.5 :w_pad 0.01 ) ) ) 14.4s Clj & Python env (Clojure) Clojure

Now, let's do a scatterplot of the first 10 dimensions for the 64 elements of the grayscale values.

( def digits-df ( pandas/DataFrame ( mapv ( take 10 % ) ( py.- digits data ) ) ) ) ( def digits-target-series ( pandas/DataFrame ( mapv ( str "Digit " % ) ( py.- digits target ) ) ) ) ( py. digits-df __setitem__ "digit" digits-target-series ) ( plot/with-show ( sns/pairplot digits-df :hue "digit" :palette "Spectral" ) ) 76.2s Clj & Python env (Clojure) Clojure

Let's reduce it!

( def reducer ( umap/UMAP :random_state 42 ) ) ( py. reducer fit ( py.- digits data ) ) ( def embedding ( py. reducer transform ( py.- digits data ) ) ) ( str ( py.- embedding shape ) ) 5.7s Clj & Python env (Clojure) Clojure " (1797, 2) "

We now have a dataset with 1797 rows but only 2 columns. We can plot the resulting embedding, coloring the data points by the class to which they belong (the digit).

( plot/with-show ( let [ x ( mapv first embedding ) y ( mapv last embedding ) colors ( py.- digits target ) bounds ( numpy/subtract ( numpy/arange 11 ) 0.5 ) ticks ( numpy/arange 10 ) ] ( pyplot/scatter x y :c colors :cmap "Spectral" :s 5 ) ( py. ( pyplot/gca ) set_aspect "equal" "datalim" ) ( py. ( pyplot/colorbar :boundaries bounds ) set_ticks ticks ) ( pyplot/title "UMAP projection of the Digits dataset" :fontsize 24 ) ) ) 4.5s Clj & Python env (Clojure) Clojure

Trimap

Trimap is another dimensionality reduction library that uses a different algorithm - ;https://pypi.org/project/trimap/

( ns gigasquid.trimap ( :require [ libpython-clj.require :refer [ require-python ] ] [ libpython-clj.python :as py :refer [ py. py.. py.- ] ] [ gigasquid.plot :as plot ] ) ) ( require-python [ trimap :as trimap ] ) ( require-python [ sklearn.datasets :as sk-data ] ) ( require-python [ matplotlib.pyplot :as pyplot ] ) 15.7s Clj & Python env (Clojure) Clojure :ok

We can do the digit example using it too.

( def digits ( sk-data/load_digits ) ) ( def digits-data ( py.- digits data ) ) ( def embedding ( py. ( trimap/TRIMAP ) fit_transform digits-data ) ) ( str ( py.- embedding shape ) ) 2.7s Clj & Python env (Clojure) Clojure " (1797, 2) "

Finally, we can visualize it as before

( plot/with-show ( let [ x ( mapv first embedding ) y ( mapv last embedding ) colors ( py.- digits target ) bounds ( numpy/subtract ( numpy/arange 11 ) 0.5 ) ticks ( numpy/arange 10 ) ] ( pyplot/scatter x y :c colors :cmap "Spectral" :s 5 ) ( py. ( pyplot/gca ) set_aspect "equal" "datalim" ) ( py. ( pyplot/colorbar :boundaries bounds ) set_ticks ticks ) ( pyplot/title "UMAP projection of the Digits dataset" :fontsize 24 ) ) ) 1.0s Clj & Python env (Clojure) Clojure

I hope that you have enjoyed this example and that it will spur your curiosity to try Python interop for yourself. You can find this code example, along with other here https://github.com/gigasquid/libpython-clj-examples