Insouciant

William Chan's blag

Sonic DSL & Bufferbloat

[TL;DR: Sonic’s DSL modem (Pace 5268AC) suffers from terrible bufferbloat that renders the DSL service unusable whenever devices upload substantial data (e.g. whenever I come home and my phone sync photos to the cloud). Otherwise, Sonic provides solid service. I’d consider going back for their fiber service if/when that becomes available in my neighborhood.]

I moved within SF last year, and was excited to finally ditch Comcast in favor of…well…any other ISP. I had heard great things about Sonic, especially regarding user privacy, so I figured I’d give them a try. Sadly, despite their many good qualities, I ultimately have decided to switch to Monkeybrains. I’ve scheduled an appointment with them in a few weeks, and will see how Sonic’s service cancellation service works shortly thereafter.

First, let me talk about the good things about Sonic. Their installation was wonderfully uneventful. I ordered Sonic Fusion DSL before I moved in, and they showed up a week later and set everything up. No hassle whatsoever. I was without internet for a day or so, since I had timed the installation to happen after I had moved in. And it just worked fine.

Secondly, their customer service has been pretty good. When I first encountered slow internet issues, I emailed them late at night, and they responded first thing in the morning with the correct diagnosis (WiFi interference). It seems like they have good monitoring infrastructure set up to help diagnose network issues when using their provided DSL modem/router combination.

Subsequently, I had contacted them about some issues with streaming media, which it took them 5 days to respond to. That was disappointing, and by that point, they weren’t able to diagnose it. That said, the problem never reoccurred, so I didn’t mind so much. It could very well have been a problem on my end for all I know.

I continued to use Sonic for the past half year or so, and many times a week, the internet connectivity would just suck. Browsing was super slow, and many HTTP requests would just time out. It would pass after some time, so I never got around to trying to diagnose it. But one time, it was absolutely terrible, and I had friends over who couldn’t use my WiFi, so I figured I’d track it down. I contacted Sonic again, and it again took them like 4 days to get back to me. In the meanwhile, I kept sending Sonic output from ping, trying to demonstrate the high latency I would see. When they finally got back to me, they provided the useful information that the periods of high latency appeared to correlate with high uplink bandwidth utilization. Props to them for being able to identify that. Maybe I have low standards to be impressed by that, but Comcast was never helpful with this stuff.

Anyhow, from there, it was pretty easy to debug everything. I made a screen recording of the bufferbloat in action, and will break it down here. The initial setup is pretty simple. I use apenwarr’s Blip tool to visualize latency. Basically it pings with XHRs to a fast site (Google / gstatic.com) and a slower site (apenwarr.ca, which is presumably his personal website). Simultaneously, I have ping commands (every 1s) running to www.google.com and the DSL modem/router at the LAN address. With this monitoring going, I use Google’s speedtest (run by Measurement Lab) to test downloads and uploads. See this setup below:

The short summary is, when I run the test, latency is fine during downloads, but goes to shit during uploads. During downloads, latency looks stable both in Blip and in the command line pings:

On the other hand, uploads immediately start looking terrible. The ping to www.google.com starts timing out, even though the router pings finish fine. Since Blip is sending XHRs at 100ms intervals until they time out, you can see the few initial XHR pings’ latency jumps until it maxes out at 1000ms latencies (at which Blip times out the XHRs).

For the entirety of the upload test, all the pings time out, as you can see below. I captured this screenshot right after the speedtest ended:

It’s interesting to see what happens shortly after as the uplink interface gets a chance to drain its buffer from the speedtest’s upload portion. Blip’s XHR latencies immediately start recovering, and we see all the pings to www.google.com which had timed out (icmp_seq 19-31), actually still manage to complete. They weren’t dropped by network queues, but rather, were just excessively buffered, and still managed to complete. The primary component of their network delay is not the propagation delay across the network hops to the nearest Google server, but the queueing delay from excessive buffering at the DSL modem interface.

Nothing super surprising here. It’s pretty textbook bufferbloat. I guess I’m just frustrated because this is 2017, and I didn’t expect such a highly recommended ISP to be providing DSL modems with so much bufferbloat. I Googled [sonic bufferbloat] and found numerous hits. I wish I had seen these before signing up for Sonic’s DSL service. It’s extra unfortunate because it seems like Dane Jasper, their CEO, is aware of the problem but doesn’t think bufferbloat is to blame.

I think that buffer bloat is a red herring here, and suspect that QoS is the real challenge. This is something we are actively working with Pace on, the 4111N doesn’t currently support upstream ACK prioritization, and that can impair performance during times of upstream saturation with some applications.

It looks like Dave Taht from the Bufferbloat project chimed in to correct Dane, but still nothing has been done:

I do not think bufferbloat is a “red herring” here. The vast majority of modern dsl interfaces are dramatically overbuffered, and your 1+second results in line with rather large datasets. www.dslreports.com/speedtest/results/bufferbloat?up=1

Best results would come from using a modem with absolute minimal firmware buffering using bytes rather than packets,”BQL”, essentially, then combined with a latency sensitive aqm/fq system like fq_codel on top of that. Older DSL modems did this, with hardware flow control. The only DSL device I know of getting it right this way, today, is free.fr’s revolution V6 modems. Newer ones overbuffer and connect to switches that cannot do hardware flow control.

Second best results are using a good QoS system that also does fq and aqm underneath, running at slightly less than line rate - and many QoS systems do do that, nowadays. But I was reluctant to call it QoS because that implies that packet prioritization is useful when there are seconds of uncontrolled buffers underneath. We ended up inventing a new term - “smart queue management” that described things better. See openwrt’s sqm-scripts for details.

I am not a fan of ack prioritization - what’s an ack? In IPv6? In QUIC? In other protocols? - but of a combined fair queuing and aqm approach as per the above.

It was also doubly frustrating as a user to have Sonic’s customer service ask me what cloud software I’m using, as if the onus is on the user to make sure he/she doesn’t use too much upstream bandwidth. I mean, do they really expect me to rate limit the upload bandwidth utilization of all my devices, and the devices of guests who I may have over, in order to make sure I have usable internet?

apenwarr told me I could fix this by putting a Linux box in between my devices and the DSL modem that rate limits upstream bandwidth, so that the DSL modem upstream buffer never fills up. I mean, yes, I could totally do this, but I don’t think a normal user should have to do this. I don’t really want to maintain my own networking setup, I just want it to work out of box. I really wish they’d get Pace to take Dave Taht’s advice to fix the DSL modems they provide to customers.

But since there’s no sign that it’ll be fixed in the near future, nor do I know when they’ll expand their fiber service to my SF neighborhood, I’m sadly giving up on Sonic for now. I had been holding out on canceling my Sonic DSL service, because I was hoping to switch to their fiber service whenever it expands to the Mission, but it’s pretty lame for me to deal with having the internet service go out to lunch whenever I come home and my devices sync photos/videos/files to the cloud, or whenever guests come over and their devices do the same thing. It’s simply unacceptable to have to ask people to make sure they’re not uploading any large files, so that other folks can use the internet while they’re over at my place. And when I’m videoconferencing and some cloud sync process fires, I don’t want to hunt around to figure out which device/process is causing it.

Anyway, hopefully Monkeybrains is better. We’ll see!

Comments