La La State

We’ve made good progress on this front. First, some background -

The “La La” state, is a term coined by TextBlade customer and heavy user Trace R., (himself a software developer). It refers to a state where the TextBlade remains on, and linked via Bluetooth, but could temporarily become unresponsive. When in LaLa state, the SpaceBlade LED display stays dark, and characters don’t stream to the host.

The LaLa state has been reported by a few users, (about 5 percent), and for those users, perhaps 1-3 times in one of a few days when they might have seen it. It’s hard to reproduce on demand. Restarting the TextBlade generally restores operation.

Occasional confused states like this can happen on our iPhones and other devices, and are usually fixed by a reset. The bar is set a bit higher here however, since you tend to see your keyboard more as an instrument, rather than a computer in its own right, (even though TextBlade actually contains 4 cores). You like to use your keyboard without thinking about it.

So minimizing resets is actually meaningful to maintaining high user satisfaction. If your controls are ever interrupted, you’d notice right away, and want it back online fast. Users who have seen this state can vouch for why minimizing interruption is so important to our cognitive connection with the instrument. This is especially true with TextBlade’s advances in human factors and workflow. When you’re in the groove of producing lots of text with minimal effort, you expect to stay in that zone. Any minor distractions are all the more noticeable while you’re in the satisfying sweet spot, gliding through text. So we pay attention to what users report.

LaLa Causes and Cures

Eliminating any causes of the LaLa state has required a fairly deep dive into certain details, but it has been productive. Any intermittent state like this, is, by definition, hard to chase down. But unless the underlying cause is well understood, you can’t be certain how many general release customers could be affected. So you have to dig until you uncover the operative causes.

To get good data on this, we had to define and build more event monitoring methods for our logging systems. After studying a lot of those logs, we were able to infer that some of the events of interest might actually occur during a brief interval within boot-up itself, before the iOS logger was even linked and recording.

So we had to build a fast, special-purpose monitor that could operate right inside the boot routines, before booting even completes, to locally record new types of events. Once the iOS app links up to TextBlade, the boot logger then hands off its observations to the app so they can be integrated into the full record. Without these onion-peeling steps and several others, we couldn’t drill down to the root cause.

We built and tested more than a dozen firmware releases internally, and then shared the later ones with TREG users to gather more data from the field. After a lot of enhancements and iterations to the logs, and a lot of very helpful posts from users, we were able to get a clear picture. As is often the case with deeply buried issues, it turns out there were several layers that contributed to this LaLa state.

The primary cause centered around TextBlade’s built-in monitor and notification system. This system proactively and continuously monitors the hardware and firmware for any anomaly that could affect user operation. It knows about measurement accuracy, power levels, environmental tracking, inter-blade communications, and many other parameters that work together to produce TextBlade’s performance. It constantly monitors them and makes sure that all these systems are performing to specifications. If anything isn’t right, it’s the monitor’s job to alert the user, and to act automatically to correct or prevent any errors in output.

Jet aircraft and Tesla electric cars both have similar monitors that alert pilots to any anomalies in the subsystems that make up the car or jet, and the monitors work to prevent malfunctions that could materially compromise performance. An important reason for this kind of architecture is to pro-actively respond to anomalies so as to prevent failures, rather than simply react after the fact. If a pilot gets an early warning, you can usually avoid most failures. TextBlade’s monitor works this same way to let the user know to change something before it becomes unable to deliver full performance.

TextBlade’s monitor system classifies anomalies into two different grades: Major and Minor. Minor anomalies won’t affect current performance, but may give an early indication that something may need attention. For example, when the battery gets down to 20%, a notification is generated so the user knows to charge soon. Nothing will fail, but it’s better for the user to know to charge ahead of time.

A major anomaly is one that could prevent accurate operation, so the monitor will notify the user, and then intervene to prevent character output until it is resolved. For example, if the ambient temperature suddenly drops 50 degrees in less than a minute, the tracking systems for the finger capacitance sensors would react to correct for it. For a minute or so, the display would show a zig zag animation to indicate that key sensing is paused, and then all the sensors would realign in equilibrium with the new environmental conditions. Once adjusted, the animation would stop, and characters would resume streaming out again. This steep temperature drop scenario is a very unusual condition that is rarely seen, but there is in fact firmware inside TextBlade specifically to handle this case. Typical temperature changes of a few degrees are all automatically compensated in seconds just as they occur, without the user even knowing it’s happening.

With that foundation info provided in the explanations above, we can distill some of the points we found and corrected through careful review of the logs. One layer of the problem was that although minor anomalies are to be logged, they’re not supposed to tie to certain visual notifications or lock-outs of the character stream. In some logs however, we were able to confirm that under certain conditions some minor anomalies were improperly pausing the character stream due to a subtle gap in the notification logic. That’s been corrected and now those minor anomalies don’t block the character stream.

A second layer to this issue was that under certain conditions, for some major anomalies, the monitor would correctly lock-out the character stream, but the visual display wouldn’t confirm for the user why the system paused characters. This looked to the user to be simply an unresponsive system, instead of displaying the proper indication that it was busy correcting something. This too has now been fixed, so the user can see there’s a reason for the pause in characters.

The third layer we found was rare cases where an anomaly was reported when the criteria were not fully met. Effectively, a false alarm had paused the character stream. We put in some protections to guard against this scenario.

We’ve provided here a representative sample of some of the details we addressed, but you can get a feel that this forensic work is somewhat complicated and layered. But doing it methodically does produce a high quality result. It’s costly to do it, but it’s still more efficient to resolve these points up front.

So these cases were pretty rare, but we now collectively have enough users and hardware in the field to statistically find unusual cases where these conditions could align and produce these effects. So it’s been very helpful to find them with input from our TREG users, and to clean them up ahead of general release where many more users are turned on.

Each of these issues was addressed with recent firmware builds. Since the release of our latest firmware build 7864, we’ve not yet observed any of these same cases of the LaLa state. Based on what we see, we hope it’s likely that we’ve found the core causes. If the reports from the field continue to be clear, that’ll be a good validation that we’ve fully diagnosed and corrected this issue.

FYI, to help scrutinize this area, we had also recently turned up the sensitivity intentionally to report even very mild variations as anomalies. This hair trigger setting artificially exaggerates the frequency of the effect, which is useful to speed up testing. As this La La state debug is settling down now, we’ll also return the thresholds back to the normal settings so our users won’t see the extra reports.

Bluetooth

In parallel with the La La state debug, there’s another distinct branch of work that relates primarily to Bluetooth, and we’ve concurrently put in similar forensic work with a great deal of engineering effort there as well. We’ve made good progress to reduce intermittent Bluetooth link interruption events on different platforms like macOS and Windows. We’ll be releasing a new firmware update and new logging tools to users, to further advance the Bluetooth performance. We’ll make a separate post end of next week to drill into some of the Bluetooth topics.

Hope these technical details are of interest, and that they provide some helpful insight into the focus of our work.

Thank you, and Happy 4th of July to all.