hn-classics/_stories/2007/8353232.md

101 lines
3.8 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
created_at: '2014-09-22T23:19:54.000Z'
title: Larry and the “Ping of Death” (2007)
url: http://blogs.msdn.com/b/larryosterman/archive/2007/10/16/larry-and-the-ping-of-death.aspx
author: yuhong
points: 59
story_text: ''
comment_text:
num_comments: 11
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1411427994
_tags:
- story
- author_yuhong
- story_8353232
objectID: '8353232'
year: 2007
---
Also known as "Larry mounts a DDOS attack against every single machine
running Windows NT"
Or: No stupid mistake goes unremembered.
 
I was recently in the office of a very senior person at Microsoft
debugging a problem on his machine.  He introduced himself, and
commented "We've never met, but I've heard of you.  Something about a
ping of death?"
Oh. My. Word.  People still remember the "ping of death"?  Wow.  I
thought I was long past the ping of death (after all, it's been 15
years), but apparently not.  I'm not surprised when people who were
involved in the PoD incident remember it (it was pretty spectacular),
but to have a very senior person who wasn't even working at the company
at the time remember it is not a good thing :).
So, for the record, here's the story of Larry and the Ping of Death.
First I need to describe my development environment at the time
(actually, it's pretty much the same as my dev environment today).  I
had my primary development machine running a version of NT, it was
running a kernel debugger connected to my test machine over a serial
cable.  When my test machine crashed, I would use the kernel debugger on
my dev machine to debug it.  There was nothing debugging my dev machine,
because NT was pretty darned reliable at that point and I didn't need a
kernel debugger 99% of the time.  In addition, the corporate network
wasn't a switched network - as a result, each machine received datagram
traffic from every other machine on the network.
 
Back in that day, I was working on the NT 3.1 browser (I've written
about the browser
[here](http://blogs.msdn.com/larryosterman/archive/2005/01/11/350800.aspx) and
[here](http://blogs.msdn.com/larryosterman/archive/2005/01/12/351634.aspx) before). 
As I was working on some diagnostic tools for the browser, I wrote a
[tool](http://support.microsoft.com/kb/188305) to manually generate some
of the packets used by the browser service.
One day, as I was adding some functionality to the tool, my dev machine
crashed, and my test machine locked up.
\*CRUD\*.  I can't debug the problem to see what happened because I lost
my kernel debugger.  Ok, I'll reboot my machines, and hopefully whatever
happened will hit again.
The failure didn't hit, so I went back to working on the tool.
And once again, my machine crashed.
At this point, everyone in the offices around me started to get noisy -
there was a great deal of cursing going on.  What I'd not realized was
that every machine had crashed at the same time as my dev machine had
crashed.  And I do mean EVERY machine.  Every single machine in the
corporation running Windows NT had crashed.  Twice (after allowing just
enough time between crashes to allow people to start getting back to
work).
 
I quickly realized that my test application was the cause of the crash,
and I isolated my machines from the network and started digging in.  I
quickly root caused the problem - the broadcast that was sent by my test
application was malformed and it exposed a bug in the bowser.sys
driver.  When the bowser received this packet, it crashed.
I quickly fixed the problem on my machine and added the change to the
checkin queue so that it would be in the next day's build.
 
I then walked around the entire building and personally apologized to
every single person on the NT team for causing them to lose hours of
work.  And 15 years later, I'm still apologizing for that one moment of
utter stupidity.