There are at least three topics in which I think bufferbloat intersects the currently hot topic of network neutrality. This is entirely personal opinion, and not that of my employer.

by happenstance, broadband ISP’s are enjoying a serious competitive advantage with any other provider of telephony and gaming services. I believe it unlikely this advantage put in place with malice aforethought, though I expect conspiracy theorists will enjoy trying to prove otherwise. I think we’d have heard of this by now or there are a very few, very smart people out there who have managed to figure bufferbloat out and keep their mouths firmly shut. But if they were that smart, why did they not foresee the pain of the next bullet?

the impact of bufferbloat on ISP’s needs to be well understood, to understand their motivations. I now believe bittorrent hit ISP’s and their customers much harder everyone understands; but that ISP’s diagnosis of root cause was flawed.

to preserve future innovation for new applications and services.

We should not set public policy going forward without understanding what may actually have happened, rather than a possibly flawed understanding of technical problems.

Unfortunately, everyone has taken now very public positions based on a probably a flawed analysis of the very real, painful problems they have experienced. Getting everyone to stop and revisit their presumptions, and rethink and possibly change their positions is going to be hard. Please help!

Sherman, please set the Wayback machine to when bittorrent first deployed (2004).



Telephony and Bufferbloat

If you get your conventional telephone service from your broadband carrier, it may be/probably is provisioned independently of your data service. This is certainly typically true for DSL (that has been one of its easy upgrade path features), I believe is typically true of telephone services provided by cable providers. I don’t personally have any clue how fiber services typically provision telephony. Perhaps some of you know the answers.

You can think of these systems as enabling telephony to access a separately provisioned “class of service”, over a different channel on the last mile (and it may be implemented as IP QOS classification internally, though I gather they may also use different signalling channels in the broadband access itself). I personally don’t think that some traffic classification is entirely bad. To get really reliable telephony under high load, some sort of traffic classification may be needed (at times), and QOS also enable strong guarantees hard to provide otherwise. But at current common broadband speeds, there is enough bandwidth available that even without traffic classification, we would be able to get a lot of calls to work very well even with a lot of competing flows, if we did we not suffer from bufferbloat (but see my previous and future posts regarding recent changes to web browser behavior).

The problem arises is that (I believe unintentionally by all concerned), bufferbloat in broadband services has put independently provided VOIP or Skype telephony over the IP data services at a serious disadvantage since this QOS classification is not available to the user’s devices; they have to fight against the high latency and jitter imposed by the stupid and bufferbloated broadband data devices that provide no traffic classification to customer devices. Mitigating or solving bufferbloat makes alternative telephony services much more viable and more competitive. Whether this is good, or bad, depends on where you sit. Certainly mitigating bufferbloat in your home router can/does make such service work very much better, (I’m much less unhappy using skype that I used to be) and that is the subject of tomorrow’s installment.

Bittorrent and Bufferbloat

I’ve heard claims made (by people independent of Comcast) that blocking bittorrent completely was unintentional on their part. I do know first hand that when the controversy hit, I personally tried testing bittorrent at home and found I could not make it function at all, and that Comcast were the responsible party that acted without disclosure; I do strongly believe that new applications should be able to deploy without playing “Mommy May I” with all the different ISP’s, which stifles innovation.

Please also remember in this discussion that bittorrent can induce several problems (e.g., transit and traffic stability problems when bittorent’s guess for locality is poor), and bittorrent’s issues are certainly not limited to customer and operator suffering caused directly by bufferbloat. For an ISP, there are multiple bittorrent pain points, not visible to most home users.

Start by remembering that any protocol can trigger bufferbloat suffering if it saturates a link; what I demonstrated was that a single TCP connection could/would/does do so. Personally, I started using bittorrent to download Linux distributions and similar large images somewhat later than 2004, and have educated my kids carefully about copyright. In my household, it has clearly been Dad who usually did in the wife and kids, rather than vice versa; this may be common among many other readers of this blog. In most households, however, it’s likely been the reverse, with the kids inflicting pain on their parents.

Video uploading to YouTube was in the future; video downloads were mostly in the future, certainly of large HD content streamed to disk most likely to saturate the customer’s links. Uploading of dead application carcasses for crash analysis was less common. Uplink bandwidth was so low that using cloud storage for backup was infeasible for most. So many of today’s applications that trigger bufferbloat misbehavior were significantly less common. The dominant desktop operating system was overwhelmingly Windows XP (and older) with > 90% market share. Browsers were still primarily obeying RFC 2068 rules about # of active connections (no more than 2). Windows XP and before never has more than 64KB in flight at once, and browsers of that era typically would never use more than 2 TCP connections, and so would be unlikely to fill buffers (though might cause significant latency, a single user would not routinely saturate connections due to this limitation). At the time bittorrent deployed, it was the first time many uplinks were routinely filled for long periods. Any other application/web services with similar characteristics (e.g. YouTube uploads) could have triggered the problem.

By 2004-2005 era, bufferbloat was already well established in broadband networks.

The Motorola SB5100 series modems I experimented with was introduced in April 2003, for example; it was one of the standard cable modems provided by Comcast (until prices went up recently, I rented my modem). Bufferbloat had already been noticed by some, though not recognized as a generic problem. DSL also has similar trouble trouble and similar history: I don’t know the fiber history. Both cable and DSL broadband services are very asymmetric; the uplink bandwidth in that era was very low (but the buffer sizes the same as I observed). Uplink speeds of 384 or 768Kbps were commonplace; IIRC, as a computer person I had paid for 768Kbps uplink service in that era (and was happy when at the same cost, it was increased greatly a few years later). Many/most customers only paid for 384Kbps uplinks. The buffer sizes I see, as the Netalyzr data shows, are not unusual. And I’m really not picking on Motorola here; they just happens to be the vendor of the modems I have used: I have no reason to believe their bufferbloat is any larger or smaller than their competitors: I have no information at all there and what tiny anecdotal information I have is that there are likely far worse vendors. Netalyzr shows many different buffer sizes are present.

What happened to customers when their kids (or they) started using bittorrent for whatever purpose?

Bad thingsTM.

Let’s examine my data, and make the direct extrapolation based on my experiments. Where I saw 1 to 1.5 second latency on that hardware, I would have 3-4.5 seconds latency in 2004 on my 784Kbps uplink, as the buffer in the modem was the same size. My uplink bandwidth was 1/3 of what I had when I took my ConPing2 dataset recently on the SB5101 modem. Customers only buying 384Kbps uplinks would have been completely dying with latencies in the 10 second range.

Here’s serious speculation: I know the problems I saw over a year ago were enough to cause me to log multiple service calls, and I attempted to debug it with second and third level support: I now think it likely, but not certain, that those problems were been bufferbloat in some guise. Unfortunately, that equipment is now scrap due to lightning so I can’t attempt to reproduce what I saw then now I have better understanding. Alternatively, it could in fact have been the cause I diagnosed at the time: hidden damage from lightning in a NIC either in my home router or the cable modem. In my recent experiments, I’ve not seen the hideously high loss rates I sometimes observed then (though I’ve also had a few reports of others reproducing my bufferbloat result of extreme loss rates; but nothing reproducible so far). But I’ve not run enough experiments looking for bufferbloat packet loss to rule out much worse behavior than you see in my traces. I don’t know if I’ll ever know for sure.

I do know with 100% certainty I’d have been on the phone with support incessantly with 3-10 second latencies and multi-percent observable packet loss.

I believe many ISP’s with limited uplink bandwidth with bufferbloated infrastructure started to see a serious rise where it hurts them in the pocket book most: in service calls from customers (in addition to the other bittorrent issues which I’m not trying to minimize). No company wants admit to severe problems in their product in public. But I also believe they mis-diagnosed the root cause of their pain, and shot the messenger of the broken network (bittorrent) rather than fixing the network. So part of ISP’s motivations (far from all; I can’t see a CEO of a shareholder beholden corporation ignoring the opportunity to extract rent out of everyone), I hypothesize has been caused by the very real pain they felt reacting to what happened when a major new application deployed causing major headaches to both them and their customers.

At a later date, broadband ISP’s upped their uplink bandwidth; this brought the effects of bufferbloat back to a semi-manageable situation (by reducing service calls). But bufferbloat is getting worse again: the same phenomena that has encouraged ethernet and wireless toward larger buffers is at work again: the static buffers on latest equipment appears to be sized for the absolute highest delay/bandwidth product that the devices could ever possibly need (and then some), sized to paper over whatever performance bugs they may have; as you will see, when I double back to cover why the ethernet and wireless NIC buffers have been growing, buffers are often/usually being used to cover a number of sins.

If my bittorrent/bufferbloat hypothesis is correct, it helps explain ISP’s wish to control applications; they may be seeing control as an existential problem. But I believe the underlying undiagnosed bufferbloat problem made a situation much worse in a way most of us have not appreciated. And there are indeed valid times when traffic management may be needed to protect the network and even be imposed quickly; in the modern era, applications can deploy much faster than in the past, and I can certainly see emergency action may be needed (but not in secret). Whatever you may think of network neutrality, I believe you need to understand the very real pain that I think ISP’s and some of you endured had a different cause than you may have understood.

Chilling of Innovation

Beyond the concrete example of harm of bufferbloat to the competitive market illustrated telephony example, let me note the following: so long as there is bufferbloat in broadband (and home routers), and only carriers have mechanisms to separately provision different classes of service over independent provisioning (as happens with telephony on broad-band today) without user’s having any access to that quality of service provisioning, any deployment of new innovative low latency applications (such as the immersive teleconferencing I work on for Bell Labs) will be greatly slowed. In such systems as ours, carriers have a somewhat advantageous position in any case (they are closer to most users than the rest of the network, and it makes sense for them to host much of the infrastructure that can optimize what we are doing; their advantage locality and the speed of light!).

If bufferbloat is not solved in the Internet, not only are current low latency applications such as telephony and gaming problematic and present fair competition issues, but so are future applications. If deployment of a system like ours not only requires separate infrastructure and provisioning to make work well, I fear what we are inventing can never succeed. Separate paths and provisioning is less efficient and more expensive, prone to abusive rents, and deployment of new applications may languish for years or decades. If we can only make immersive teleconferencing work with such separate provisioning as has been done for telephony, I feel our project is doomed. If broadband worked properly today, we would face none but the usual problems to take our innovations to marke.

ISP’s would have incentive to invest, as they would both have additional service opportunities, and ways to reduce load on the network. I see arguments that such separate provisioning of low latency services such as telephony as “good thing” as being fundamentally flawed. I want a single pipe, that works well, and which I (the consumer) can decide how much to pay for the what of service. I have no problems at all with congestion pricing (if I want to do my immersive teleconference at peak hours, and object to flaws in the service) I am happy to pay for the privilege. I am happy to pay for additional “added value” services (when priced fairly and competitively).

We must make the Internet work well to preserve innovation; to do that, bufferbloat must be overcome.

Conclusions

I may be incorrect about details in the above points; but I think I’m right. So my personal conclusions are:

We should not set public policy going forward without understanding what may actually have happened, rather than a possibly flawed understanding of technical problems.

Unfortunately, everyone has taken now very public positions based on a probably a flawed analysis of the very real, painful problems they have experienced. Getting everyone to stop and revisit their presumptions, and rethink and possibly change their positions is going to be hard. Please help!

Share this: Print

Twitter

Facebook

More

Email

Reddit



Like this: Like Loading... Related