Having a really broad toolset and an open mind on how to approach data can lead to interesting insights that are missed when data is looked at only through the lens of statistics or machine learning. It’s something we at Wolfram Research call multiparadigm data science, which I use here for a small excursion through calculus, graph theory, signal processing, optimization and statistics to gain some interesting insights into the engineering of supersonic cars.

The story started with a conversation about data with some of the Bloodhound team, which is trying to create a 1000 mph car. I offered to spend an hour or two looking at some sample data to give them some ideas of what might be done. They sent me a curious binary file that somehow contained the output of 32 sensors recorded from a single subsonic run of the ThrustSSC car (the current holder of the world land speed record).

Import

The first thing I did was code the information that I had been given about the channel names and descriptions, in a way that I could easily query:

✕ channels={"SYNC"->"Synchronization signal","D3fm"->"Rear left active suspension position","D5fm"->"Rear right active suspension position","VD1"->"Unknown","VD2"->"Unknown","L1r"->"Load on front left wheel","L2r"->"Load on front right wheel","L3r"->"Load on rear left wheel","L4r"->"Load on rear right wheel","D1r"->"Front left displacement","D2r"->"Front right displacement","D4r"->"Rear left displacement","D6r"->"Rear right displacement","Rack1r"->"Steering rack displacement rear left wheel","Rack2r"->"Steering rack displacement rear right wheel","PT1fm"->"Pitot tube","Dist"->"Distance to go (unreliable)","RPM1fm"->"RPM front left wheel","RPM2fm"->"RPM front right wheel","RPM3fm"->"RPM rear left wheel","RPM4fm"->"RPM rear right wheel","Mach"->"Mach number","Lng1fm"->"Longitudinal acceleration","EL1fm"->"Engine load left mount","EL2fm"->"Engine load right mount","Throt1r"->"Throttle position","TGTLr"->"Turbine gas temperature left engine","TGTRr"->"Turbine gas temperature right engine","RPMLr"->"RPM left engine spool","RPMRr"->"RPM right engine spool","NozLr"->"Nozzle position left engine","NozRr"->"Nozzle position right engine"};

✕ SSCData[]=First/@channels;

✕ SSCData[name_,"Description"]:=Lookup[channels,name,Missing[]]; TextGrid[{#,SSCData[#,"Description"]}&/@SSCData[],Frame->All]

Then on to decoding the file. I had no guidance on format, so the first thing I did was pass it through the 200+ fully automated import filters:

✕ DeleteCases[Map[Import["BLK1_66.dat",#]&,$ImportFormats],$Failed]

Thanks to the automation of the Import command, that only took a couple of minutes to do, and it narrowed down the candidate formats. Knowing that there were channels and repeatedly visualizing the results of each import and transformation to see if they looked like real-world data, I quickly tumbled on the following:

✕ MapThread[Set,{SSCData/@SSCData[],N[Transpose[Partition[Import["BLK1_66.dat","Integer16"],32]]][[All,21050;;-1325]]}];

✕ Row[ListPlot[SSCData[#],PlotLabel->#,ImageSize->170]&/@SSCData[]]

The ability to automate all 32 visualizations without worrying about details like plot ranges made it easy to see when I had gotten the right import filter and combination of Partition and Transpose . It also let me pick out the interesting time interval quickly by trial and error.

OK, data in, and we can look at all the channels and immediately see that SYNC and Lng1fm contain nothing useful, so I removed them from my list:

✕ SSCData[] = DeleteCases[SSCData[], "SYNC" | "Lng1fm"];

Graphs & Networks: Looking for Families of Signals

The visualization immediately reveals some very similar-looking plots—for example, the wheel RPMs. It seemed like a good idea to group them into similar clusters to see what would be revealed. As a quick way to do that, I used an idea from social network analysis: to form graph communities based on the relationship between individual channels. I chose a simple family relationship—streams with a correlation with of at least 0.4, weighted by the correlation strength:

✕ correlationEdge[{v1_,v2_}]:=With[{d1=SSCData[v1],d2=SSCData[v2]}, If[Correlation[d1,d2]^2<0.4,Nothing,Property[UndirectedEdge[v1,v2],EdgeWeight->Correlation[d1,d2]^2]]];

✕ edges = Map[correlationEdge, Subsets[SSCData[], {2}]]; CommunityGraphPlot[Graph[ Property[#, {VertexShape -> Framed[ListLinePlot[SSCData[#], Axes -> False, Background -> White, PlotRange -> All], Background -> White], VertexLabels -> None, VertexSize -> 2}] & /@ SSCData[], edges, VertexLabels -> Automatic], CommunityRegionStyle -> LightGreen, ImageSize -> 530]

I ended up with three main clusters and five uncorrelated data streams. Here are the matching labels:

✕ CommunityGraphPlot[Graph[ Property[#, {VertexShape -> Framed[Style[#, 7], Background -> White], VertexLabels -> None, VertexSize -> 2}] & /@ SSCData[], edges, VertexLabels -> Automatic], CommunityRegionStyle -> LightGreen, ImageSize -> 530]

Generally it seems that the right cluster is speed related and the left cluster is throttle related, but perhaps the interesting one is the top, where jet nozzle position, engine mount load and front suspension displacement form a group. Perhaps all are thrust related.

The most closely aligned channels are the wheel RPMs. Having all wheels going at the same speed seems like a good thing at 600 mph! But RPM1fm, the front-left wheel is the least correlated. Let’s look more closely at that:

✕ TextGrid[ Map[SSCData[#, "Description"] &, MaximalBy[Subsets[SSCData[], {2}], Abs[Correlation[SSCData[#[[1]]], SSCData[#[[2]]]]] &, 10]], Frame -> All]

Optimization: Data Comparison

I have no units for any instruments and some have strange baselines, so I am not going to assume that they are calibrated in an equivalent way. That makes comparison harder. But here I can call on some optimization to align the data before we compare. I rescale and shift the second dataset so that the two sets are as similar as possible, as measured by the Norm of the difference. I can forget about the details of optimization, as FindMinimum takes care of that:

✕ alignedDifference[d1_,d2_]:=With[{shifts=Quiet[FindMinimum[Norm[d1-(a d2+b),1],{a,b}]][[2]]},d1-(a #+b&/.shifts)/@d2];

Let’s look at a closely aligned pair of values first:

✕ ListLinePlot[MeanFilter[alignedDifference[SSCData["RPM3fm"],SSCData["RPM4fm"]],40],PlotRange->All,PlotLabel->"Difference in rear wheel RPMs"]

Given that the range of RPM3fm was around 0–800, you can see that there are only a few brief events where the rear wheels were not closely in sync. I gradually learned that many of the sensors seem to be prone to very short glitches, and so probably the only real spike is the briefly sustained one in the fastest part of the run. Let’s look now at the front wheels:

✕ ListLinePlot[MeanFilter[alignedDifference[SSCData["RPM1fm"],SSCData["RPM2fm"]],40],PlotRange->All,PlotLabel->"Difference in front wheel RPMs"]

The differences are much more prolonged. It turns out that desert sand starts to behave like liquid at high velocity, and I don’t know what the safety tolerances are here, but that front-left wheel is the one to worry about.

I also took a look at the difference between the front suspension displacements, where we see a more worrying pattern:

✕ ListLinePlot[MeanFilter[alignedDifference[SSCData["D1r"],SSCData["D2r"]],40],PlotRange->All,PlotLabel->"Difference in front suspension displacements"]

Not only is the difference a larger fraction of the data ranges, but you can also immediately see a periodic oscillation that grows with velocity. If we are hitting some kind of resonance, that might be dangerous. To look more closely at this, we need to switch paradigms again and use some signal processing tools. Here is the Spectrogram of the differences between the displacements. The Spectrogram is just the magnitude of the discrete Fourier transforms of partitions of the data. There are some subtleties about choosing the partitioning size and color scaling, but by default that is automated for me. We should read it as time along the axis, frequency along the , and darker values are greater magnitude:

✕ Spectrogram[alignedDifference[SSCData["D1r"],SSCData["D2r"]],PlotLabel->"Difference in front suspension displacements"]

We can see the vibration as a dark line from 2000 to 8000, and that its frequency seems to rise early in the run and then fall again later. I don’t know the engineering interpretation, but I would suspect that this reduces the risk of dangerous resonance compared to constant frequency vibration.

Calculus: Velocity and Acceleration

It seems like acceleration should be interesting, but we have no direct measurement of that in the data, so I decided to infer that from the velocity. There is no definitive accurate measure of velocity at these speeds. It turned out that the Pitot measurement is quite slow to adapt and smooths out the features, so the better measure was to use one of the wheel RPM values. I take the derivative over a 100-sample interval, and some interesting features pop out:

✕ ListLinePlot[Differences[SSCData["RPM4fm"], 1, 100], PlotRange -> {-100, 80}, PlotLabel -> "Acceleration"]

The acceleration clearly goes up in steps and there is a huge negative step in the middle. It only makes sense when you overlay the position of the throttle:

✕ ListLinePlot[ {MeanFilter[Differences[SSCData["RPM4fm"],1,100],5], MeanFilter[SSCData["Throt1r"]/25,10]}, PlotLabel->"Acceleration vs Throttle"]

Now we see that the driver turns up the jets in steps, waiting to see how the car reacts before he really goes for it at around 3500. The car hits peak acceleration, but as wind resistance builds, acceleration falls gradually to near zero (where the car cruises at maximum speed for a while before the driver cuts the jets almost completely). The wind resistance then causes the massive deceleration. I suspect that there is a parachute deployment shortly after that to explain the spikiness of the deceleration, and some real brakes at 8000 bring the car to a halt.

Signal Processing

I was still pondering vibration and decided to look at the load on the suspension from a different point of view. This wavelet scalogram turned out to be quite revealing:

✕ WaveletScalogram[ContinuousWaveletTransform[SSCData["L1r"]],PlotLabel->"Suspension frequency over time"]

You can read it the same as the Spectrogram earlier, time along , and frequency on the axis. But scalograms have a nice property of estimating discontinuities in the data. There is a major pair of features at 4500 and 5500, where higher-frequency vibrations appear and then we cross a discontinuity. Applying the scalogram requires some choices, but again, the automation has taken care of some of those choices by choosing a MexicanHatWavelet[1] out of the dozen or so wavelet choices and the choice of 12 octaves of resolution, leaving me to focus on the interpretation.

I was puzzled by the interpretation, though, and presented this plot to the engineering team, hoping that it was interesting. They knew immediately what it was. While this run of the car had been subsonic, the top edge of the wheel travels forward at twice the speed of the vehicle. These features turned out to detect when that top edge of the wheel broke the sound barrier and when it returned through the sound barrier to subsonic speeds. The smaller features around 8000 correspond to the deployment of the physical brakes as the car comes to a halt.

Deployment: Recreating the Cockpit

There is a whole sequence of events that happen in a data science project, but broadly they fall into: data acquisition, analysis, deployment. Deployment might be setting up automated report generation, creating APIs to serve enterprise systems or just creating a presentation. Having only offered a couple of hours, I only had time to format my work into a slide show notebook. But I wanted to show one other deployment, so I quickly created a dashboard to recreate a simple cockpit view:

✕ CloudDeploy[ With[{data = AssociationMap[ Downsample[SSCData[#], 10] &, {"Throt1r", "NozLr", "RPMLr", "RPMRr", "Dist", "D1r", "D2r", "TGTLr"}]}, Manipulate[ Grid[List /@ { Grid[{{ VerticalGauge[data[["Throt1r", t]], {-2000, 2000}, GaugeLabels -> "Throttle position", GaugeMarkers -> "ScaleRange"], VerticalGauge[{data[["D1r", t]], data[["D2r", t]]}, {1000, 2000}, GaugeLabels -> "Displacements"], ThermometerGauge[data[["TGTLr", t]] + 1600, {0, 1300}, GaugeLabels -> Placed[ "Turbine temperature", {0.5, 0}]]}}, ItemSize -> All], Grid[{{ AngularGauge[-data[["RPMLr", t]], {0, 2000}, GaugeLabels -> "RPM L", ScaleRanges -> {1800, 2000}], AngularGauge[-data[["RPMRr", t]], {0, 2000}, GaugeLabels -> "RPM R", ScaleRanges -> {1800, 2000}] }}, ItemSize -> All], ListPlot[{{-data[["Dist", t]], 2}}, PlotMarkers -> Magnify["", 0.4], PlotRange -> {{0, 1500}, {0, 10}}, Axes -> {True, False}, AspectRatio -> 1/5, ImageSize -> 500]}], {{t, 1, "time"}, 1, Length[data[[1]]], 1}]], "SSCDashboard", Permissions -> "Public"]

In this little meander through the data, I have made use of graph theory, calculus, signal processing and wavelet analysis, as well as some classical statistics. You don’t need to know too much about the details, as long as you know the scope of tools available and the concepts that are being applied. Automation takes care of many of the details and helps to deploy the data in an accessible way. That’s multiparadigm data science in a nutshell.

Download this post as a Wolfram Notebook