

Convex hull polygon, demonstrating round trip from Spotfire® to R data function and back.

A flexible way to extend the capabilities of TIBCO Spotfire® is to use a data function, for example an R data function executed with Spotfire's built-in R engine, TERR. One of my favorite methods of quickly developing a new TERR data function for TIBCO Spotfire® uses a snippet in the "Top 10 Tricks for TERR Data Functions", which appears in the TIBCO Community's TIBCO Spotfire® Tips & Tricks Homepage.

The snippet of R code can be used in a data function to save all of the incoming parameters to an RData file:

save(list=ls(), file="C:/debug.RData", RFormat=TRUE)

The Tips and Tricks presents this snippet as a debugging aid but I find it even more useful as a basic development tool.

Let's say I have some point data in Spotfire®, which I'm calling simply "alpha" and "beta" to be general (left panel, below). I have a scatter plot of these in Spotfire® (center panel), and I want to create a data function that computes the "convex hull" which is a bounding polygon around my points (finished result shown in right panel below):

This convex hull later might be used later to limit an interpolated surface using these points. R has a built-in function, "chull()", that computes the convex hull so the basic calculation will be straightforward. Here the main task is creating a simple TERR data function wrapper around this function, sending data to the function, and collecting the results.

Here I'll show my favorite work flow to develop the new data function.

Setting up the new data function

I start off the creation of the new TERR data function by focusing on the input data that will be used. I bring up the Spotfire® dialog for new data functions, from the Spotfire menu "Tools - Register Data Functions". In this case the inputs are just the 2 columns alpha and beta.

One design choice could be to send both inputs to the data function as a single table, which is appealing since the alpha and beta columns already appear in the "mydata" table. With this choice I would configure the data function to use a table as a single input. The downside is that the data function must know about the names "alpha" and "beta"; if the data function were to be re-used with different variables the code would need to be changed, or would need a way to handle the new column names.

I prefer to send inputs like these to the data function as separate columns, mostly because the names of the input parameters as they appear in the R code are fixed. In the data function definition I can name these columns "x" and "y" so the R code can be written to use input parameters named x and y no matter what the original names are. I use the "Add" button to configure these separate x and y column inputs; for example the x column:

The completed set of input parameters appears like this:

With these inputs set up, I populate the data function with just a stub of simple code similar to the "TERR Tricks" code; all this script does is save variables to disk and then exit. The code I use also creates a time stamp, and organizes the file in a sub folder, explained below:

TimeStamp = timestamp() save(list=ls(), file="C:/Temp/abc.RData",RFormat=T) # remove(list=ls()); load(file='C:/Temp/abc.RData'); print(TimeStamp) # use in development

Here is how the code looks in the "Script" portion of the dialog:



You can keep this stub later as you develop the full script (perhaps commenting it out once the script is developed).

notes:

I normally uncheck the "Allow caching" checkbox. If checked, the calculations will be reused if the inputs have not changed, but when actively making edits to the code it's best to make sure the code runs every time the data function is saved.

I specify a scratch folder, C:\Temp, which I have created for organizing the temporary .RData files. (in the R code this becomes either "C/Temp" or "C\\Temp").

You can either use a generic name for this file (here: "abc.RData") across several data functions, or tailor the name to the specific data function. Note that this name also appears in the commented-out line 7. This commented-out line is meant to be run from the interactive environment as demonstrated below. The ".RData" suffix is not required but is a good way to remember what kind of file it is.

At this stage I do not set up any output variables, these are added later.

Embedding the Data Function in the Spotfire® file

When you are ready to run your data function for the first time, look for the button labeled "Run" at the top right of the dialog:

Hovering the mouse over this reveals the button's purpose is to "Save Data Function in Analysis and Run". Push this button only once. If you push it a second time it will embed a second copy of the data function into the analysis which will quickly become confusing. This action then brings up the Edit Parameters dialog where you specify which Spotfire® variables get mapped to the data function inputs. Here I associate alpha with x, and beta with y:

In this same dialog you can optionally specify if you want to limit the data by marking or filtering. You will also be able to return to this dialog later and make changes.

When you have completed setting up all the required inputs (here x and y), the "OK" button will become active. Push the "OK" button; this is the point when the data function is actually embedded and runs. Then close the "Register Data Functions" dialog box because you have finished embedding the data function. It is important to close the "Register Data Function" dialog at this point to avoid generating duplicate copies, even if you spot a typo in your code!

To make further edits, go into Data - Data Function properties and locate your data function and make your edits. From here you can also change which columns are sent to the data function, and which marking or filtering affects the data. If you have checked the "Refresh function automatically" as I have here, the data function will then update whenever the marking changes, which is an intuitive way of making the data function interactive.

When the data function runs, the code saves whatever variables are in its environment to the disk file. If you have defined any optional variables but have not populated these, they will appear as NULL objects in the R environment.

I then move over to an R IDE like RStudio, and bring up an editor with the data function code. I'll highlight the last line, without the pound sign, and run it (see screenshot below). This will clear my variables and load the saved file.

The "TimeStamp" variable is saved along with the normal inputs, I print it out when I load the data, as a sanity check that I'm loading the correct version.

It is occasionally useful to insert a corresponding snippet at the very end of the data function, using the name "abc.out.RData" to capture all the variables immediately prior to returning to Spotfire. This strategy can be quite useful, if the objects coming back to Spotfire aren't as expected.

At this point I have embedded and run one copy of the new data function in the dxp file, mapping the input columns alpha and beta to R variables x and y. I now turn to RStudio to develop the R code. The RStudio IDE can be started from the usual start menu or shortcut, or from a convenient button in the Spotfire TERR Tools menu:

With RStudio (or other R IDE) running and the code visible in the editor, I highlight and execute the line of code that loads the data. This loads vectors x and y into my R environment:

These x and y vectors are now the starting point of my R code development. I can now interactively develop and debug code in the interactive R environment. The advantage of this method is that whatever code I develop in the IDE will work in Spotfire, so when I am finished I will be able to simply paste the code as-is into Spotfire's data function Script Editor.

After a bit of experimentation I develop this code:

xy <- cbind(x=x,y=y) u0 <- chull(xy) u <- c(u0,u0[1]) xychull <- data.frame(xy[u,]) xychull$order <- 1:nrow(xychull)

line 1: assembling the two incoming vectors into a matrix.

line 2: invokes the built-in function chull(). Variable u0 is a vector of integers, these are the indices of the points that define the convex hull

> u0 [1] 7 9 18 10 11 13 16 12

line 3: Replicates the 1st point so the line in Spotfire will appear as a closed polygon

> u <- c(u0,u0[1]) > u [1] 7 9 18 10 11 13 16 12 7

line 4: uses the index u to form a new data frame to return to Spotfire.

line 5: inserts a new column for Spotfire to use to order the drawn line.

> xychull x y order 1 -1.0153384 -1.2470843 1 2 -1.5022591 -0.7912453 2 3 -1.5645259 -0.6075356 3 4 -2.0575395 1.5801117 4 5 -0.2492311 1.4572451 5 6 1.3914470 0.9499062 6 7 2.3272803 -0.6425621 7 8 0.1241502 -1.9491346 8 9 -1.0153384 -1.2470843 9

The updated code is now complete, and ready to use in Spotfire. There will be one output object, the data frame "xychull".

I'm now ready to edit the TERR data function, and augment the existing code stub with the newly developed code.

In Spotfire, I invoke the data function editor from the menu data - data function properties; I select my data function and hit the "Edit Script.." button.

I can now delete the temporary code and replace it with the new code:

In the "Output Parameters" tab, I now define a new output table named xychull. Note that this name must exactly match the name of the object defined in the R code:

After I close the editor. I'll set up how this table is mapped to a Spotfire object. I hit the "Edit Parameters..." button and go to the Output tab of the Edit Parameters dialog; I set up the output table xychull to go to a new table in Spotfire:

Hitting OK in this dialog will automatically run my data function since I've made a fundamental change to the inputs/outputs; I can also slect my data function in the Data Function Properties dialog, and hit "Refresh"

The new data table xychull will now be present in Spotfire, and can be used to draw the polygon in the Spotfire® Scatter Plot (using add line from table)

This simple data function is now fully functional.

This example can be set up to be more interactive, by responding to only marked points. To do this, we can set the inputs x and y to use only the marked points, and we can set the data function to refresh automatically (on the parameters page). A caveat here is that the code as-is does not include a handler for the case of no points marked, but we'll cover this in a separate article.

In summary, I've presented a common workflow that I use whenever I develop a TERR data function, or make alterations; I've used the convex hull example but this is a general method. The basic steps are

Determine which input parameters will be needed, and set up a new data function with these inputs.

For the body of the code, at first I use just the stub that saves inputs to a file.

I then connect the actual Spotfire data to the input parameters and run the code.

I then move to an R IDE like RStudio, and develop my code in this interactive environment.

When I have something to test out, I copy the code to the Script Editor in Spotfire, set up and map any output variables, and try it out.

This is an interactive, iterative approach that is quite flexible. I frequently add or remove inputs, or outputs to my data function, making sure that all the inputs are populated and variables are defined for all outputs.

Notes: The topic of setting up TERR data functions in Spotfire has been covered in previous articles, with slightly different emphasis than this article.

Peter Shaw is a staff data scientist in the TIBCO Data Science team, based in Seattle. His interests include geospatial analysis, mapping, pattern recognition, optimization, time series and routing. He views data science as a contact sport, with the analyst, the data, and analytical models as the players. Other interests include photography, drawing, music, and partner dancing.