Over the years your PCs and laptops and smart phones have gotten cheaper while simultaneously getting more memory (RAM, FLASH, etc...). It's been a great thing. But the same has been happening to the network infrastructure; everything from your home WiFi router to big iron routers interconnecting huge networks have also benefitted from reduced memory prices resulting in more and more memory on-board.
As it turns out, this memory increase in network routing equipment might not be a good thing. Network routers have always had some sort of memory for buffering network traffic, but the answer for smoothing out network traffic flows and congestion has always been in the computers on the sending and receiving ends. The protocols have built in mechanisms for closing the valves when the pipes are overflowing, so to speak. When a computer sends out data, at some point it knows to shut up and send no more until it has heard from the other side. But with the network routers buffering more and more of that data, the computer and the router get into this game of waiting on each other to act.
Jim Gettys, the guy credited for finally articulating the problem, has a video demonstrating the problem. He can actually get better network performance by tuning down buffer sizes.
As he states in a blog post on the topic:
The buffers are confusing TCP’s RTT estimator; the delay caused by the buffers is many times the actual RTT on the path. Remember, TCP is a servo system, which is constantly trying to “fill” the pipe. So by not signalling congestion in a timely fashion, there is *no possible way* that TCP’s algorithms can possibly determine the correct bandwidth it can send data at (it needs to compute the delay/bandwidth product, and the delay becomes hideously large). TCP increasingly sends data a bit faster (the usual slow start rules apply), reestimates the RTT from that, and sends data faster. Of course, this means that even in slow start, TCP ends up trying to run too fast. Therefore the buffers fill (and the latency rises).It has been a particularly devilish problem to diagnose because isolating the variables, something any good scientists would do, actually exasperates the problem. The more you try to take out interference and noise and other things that are hard to account for, the worse the problem gets. Again, Jim Gettys:
Ironically, I have realized that you don’t see the full glory of TCP RTT confusion caused by buffering if you have a bad connection as it reset TCP’s timers and RTT estimation; packet loss is always considered possible congestion. This is a situation where the “cleaner” the network is, the more trouble you’ll get from bufferbloat. The cleaner the network, the worse it will behave. And I’d done so much work to make my cable as clean as possible…And its not just your home route that has the problem. The problem is everywhere, even in the big iron in your ISP's data center and the even bigger iron used to connect your ISP to other ISPs. Here's a video where researchers isolate the problem and show that backing off the buffer size actually makes things better.
So there. It's not you. You are not crazy. Things are not as they should be. But don't worry, your friendly, neighborhood Internet gurus are working on the problem.