Traceroute Won't Help A Lot When Someone Plows Through Your Junction Box
It's still a pretty damn clever piece of software though
You never know how fragile our modern day infrastructure is until a literal delivery truck takes down a junction box about 200 yards (or meters, if you're that way inclined) away from your home. That happened to me a few days ago. Needless to say that it’s not ideal for internet connectivity if there is a gap the size of a Volkswagen Crafter between one half of the connection cable and the other.
Now, I’m the last person to complain about being able to catch a glimpse into the inner workings of a junction box, but it’d be more enjoyable when I’d come back to a home with a functioning internet connection afterwards. I die a little more inside each time I have to press that light switch myself instead of making Alexa do it for me.
These days, when I’m sitting at home and the internet goes down my immediate reaction is usually “damn it, what the hell did they do now?”. Not because I have kids in an age in which they start trying to bypass screen and internet time limits and break stuff in the process, but because I am a customer of ████████, which is redacted not because I worry about shaming them, but because I am scared that their customer service will call when I say their name too often. Since signing my contract with them I have regularly had my internet taken down by issues like broken routers, broken cables and “successful” cyber attacks. Adding a van on top of that list is just the cherry on the cake.
As a side note, the reason I put “successful” in quotes is because of the fact that, back when the mirai botnet was released, presumably the whole ipv4 address space was scanned for vulnerable routers that had an issue in some remote management protocol.
This took a whole bunch of ████████ routers down, but not because they were affected by the specific vulnerability the attackers tried to exploit, but because they were ████████-provided routers, and therefore not the best software quality. Basically they were overwhelmed by the repeated connection attempts to a debugging port and instead of executing attacker-provided commands they just flatlined. Can't get your router hacked if it doesn't work anymore. 3D-Chess at its finest.
Anyway, after all of those… experiences you skip the “I wonder what happened to my internet” stage of grief whenever a new outage occurs and go straight to the “Please don’t make me call customer service to get this fixed” one. A younger, more hopeful version of myself, or a customer of another ISP would maybe wonder why the internet isn’t working and how it can be fixed.
There are several tools that you can use for that purpose. One of them is ping
which is a command line utility in both Windows and unix like operating systems, such as Linux or macOS. It’s not like it will help the casual user like you and I to fix anything most of the time, but it’s somewhat of a tradition to run it I guess, so I’ll give you a quick rundown on what it does.
It will send messages using a special network protocol called ICMP, or Internet Control Message Protocol. The data that ping
sends is a so called echo message. Simply put, it will send some data to the destination, often, but not always, a web server and wait for a reply. During the whole procedure it will watch the time it takes. If the web server is reachable, it will generally repeat the very same data back to you that you have already given. If it’s not, you will just never receive an answer and eventually ping
will realise that nobody will listen to its calls. There is a ████████ customer service metaphor in there somewhere if you look very closely.
Anyway the other traditional form of internet-free entertainment commonly celebrated while the ISP cleaning personnel puts the plug of the cable they tripped over back in, is running the traceroute command. Again, for the average Joe, its cryptic output is completely useless for pretty much any purpose, but it’s a mandatory ritual. What it basically does is giving you a list of network devices that are on the path between yourself and the target. Just like ping it does so using ICMP - at least on windows.
Let’s run it and see what it does.
There you go, a list of intermediate devices that are responsible for routing the message to its destination, including their IPs and response times. The question you’re probably asking yourself is: How did traceroute obtain that data? When sending an ICMP echo request to a server, you do not get a reply by each and every device along the way, but only the final one you want to talk to. So where’s the difference and how does it work?
Well for one, the behaviour that ping uses is pretty much a feature of ICMP. It’s written in its very long and very boring RFC document. (If you have trouble falling asleep at night I can highly recommend it: https://datatracker.ietf.org/doc/html/rfc792). This is basically a kind of construction manual for the developers of network devices who want their products to be, well… functional. (The average ████████ technician has obviously stopped reading now).
However if you search for keywords like “trace” you will be disappointed to find not a single mention of it in that RFC. So what gives? Well traceroute was never meant to be a thing. At least not on purpose. It’s not a built-in tool for debugging that someone wrote into a specification. There is no “traceroute message” which tells each device along the way to reveal itself before passing on the message to the next one*. Instead, it’s an exploit, and a pretty clever one at that. Before you can understand how it works, I have to give you a quick rundown on what an ICMP packet looks like.
(Image by Michel Bakni - CC BY-SA 4.0)
The type field will contain data about what kind of message is being sent (for example an echo request). But you don't see any ip address in the packet, so how does it know where it needs to go? That’s because ICMP is encapsulated in an IPv4 packet (there is also ICMPv6 which uses IPv6 packets). One interesting field in the IP header is called TTL.
This is short for Time To Live, which is a bit of a misnomer (and also a good Bond Movie title). As far as I remember, the time it refers to was initially meant to be in seconds, however in practice it refers to the number of hops a packet takes to reach its destination. So if you send a packet from your laptop over the internet, your router would be the first hop, the next network device that handles that packet would be hop two and so on, until the destination server is reached.
On each of these hops the respective device needs to decrement the TTL value. If it’s 0 it must send back a timeout message to the original sender of the package. Not the previous intermediary but to the machine that sent that package in the first place. Here is a rough example:
You have a packet with a TTL of 5
Your router picks up the packet and decrements it by 1.
The packet now has a TTL of 4
It sends it over to the next network device which also decrements it by 1 and so on
Once the TTL becomes 0 the device that handles it must not send it any further
It must send an ICMP packet back to you, stating that the request timed out
So here’s the clever bit: traceroute abuses this behaviour to reveal any devices along the way - at least if they actually send back ICMP messages on timeouts. Here is a simplified version of what it does:
send a packet that has a TTL of 1 in its IP header
The next network device, often your home router, will decrement the TTL by 1
Since it’s now 0, the device will not route it any further
Additionally it will send a timeout message back to your machine
So you now know the IP of the next network device in line and how long it took to get a response.
Next traceroute will send another packet, this time with a TTL of 2.
Your router will decrement it by 1, (meaning the TTL is now 1) and send it over to the next device
Since once it’s decremented, the TTL becomes 0 again, it will send a timeout response back to us - again from its own IP.
That means by starting at TTL 1 and consistently increasing the TTL value, we can map out which route a packet would take until it reached its destination.
Again, this was never meant to be a feature. It was just a clever way to use existing behaviour of a system in order to get information you were not normally able to see. Originally nobody intended for you to get all of that information and if research about it was released today, it would be absolutely wild and break a lot of assumptions!
But since this has been around for so long and was a genuinely welcome thing for admins at a time where it was easy enough to get a hold of someone responsible for a malfunctioning network device administrators got used to it and accepted it as a useful debugging tool. You can also read a long rant on gekk.info [1] on why it’s not so useful anymore.
However to me, the way in which it was implemented is what’s so fascinating about it. The author saw an intentional behaviour and found a way to (ab)use it in order to get some data that was not meant to be available.
Us humans put cleverly designed systems in place all the time and envision how a user will interact with them. And we also try to think about every possible way the system will be abused to perform unintended actions. But quite often it only takes one particular creative individual to make it do what it’s not supposed to do and reveal more information than intended. It’s pretty much what hacking is about. (That and wearing ski masks inside).
And even though traceroute is not 100% reliable for LOTS of reasons, it’s still nice that it exists and it shows that understanding a system in detail often pays off and can lead to cool results.
So next time a Volkswagen Van takes out your junction box and you feel the urge to perform the ancient traceroute ritual, remember what amount of ingenuity went into its creation.
* technically there was an RFC and a proper traceroute protocol after the original traceroute was invented. It did not take off however and its type (30) was eventually deprecated
[1] https://gekk.info/articles/traceroute.htm