=============================================================================
Section 1:  Creative uses of rm(1)...
=============================================================================

From: dbrillha@dave.mis.semi.harris.com (Dave Brillhart)
Organization: Harris Semiconductor

We can laugh (almost) about it now, but...

Our operations group, a VMS group but trying to learn UNIX, was assigned
account administration. They were cleaning up a few non-used accounts
like they do on VMS - backup and purge. When they came across the
account "sccs", which had never been accessed, away it went. The
"deleteuser" utility fom DEC asks if you would like to delete all
the files in the account. Seems reasonable, huh?

-----------------------------------------------------------------------------

From: broadley@neurocog.lrdc.pitt.edu (Bill Broadley)
Organization: University of Pittsburgh

On a old decstation 3100 I was deleting last semesters users to try to
dig up some disk space, I also deleted some test users at the same time.

One user took longer then usual, so I hit control-c and tried ls.
"ls: command not found"

Turns out that the test user had / as the home directory and the remove
user script in ultrix just happily blew away the whole disk.

ftp, telnet, rcp, rsh, etc were all gone.  Had to go to tapes, and had
one LONG rebuild of X11R5.

Fortunately it wasn't our primary system, and I'm only a student....

-----------------------------------------------------------------------------

From: cjc@ulysses.att.com (Chris Calabrese)
Organization: AT&T Bell Labs, Murray Hill, NJ, USA

We have a home-grown admin system that controls accounts on all of our
machines.  It has a remove user operation that removes the user from
all machines at the same time in the middle of the night.

Well, one night, the thing goes off and tries to remove a user with
the home directory '/'.  All the machines went down, with varying
ammounts of stuff missing (depending on how soon the script, rm, find,
and other importing things were clobbered).

Nobody knew what what was going on!  The systems were restored from
backup, and things seemed to be going OK, until the next night when
the remove-user script was fired off by cron again.

This time, Corporate Security was called in, and the admin group's
supervisor was called back from his vacation (I think there's something in
there about a helicopter picking the guy up from a rafting trip
in the Grand Canyon).

By chance, somebody checked the cron scripts, and all was well for the
next night...

-----------------------------------------------------------------------------

From: tzs@stein.u.washington.edu (Tim Smith)
Organization: University of Washington, Seattle

I was working on a line printer spooler, which lived in /etc.  I wanted
to remove it, and so issued the command "rm /etc/lpspl."  There was only
one problem.  Out of habit, I typed "passwd" after "/etc/" and removed
the password file.  Oops.

I called up the person who handled backups, and he restored the password
file.

A couple of days later, I did it again!  This time, after he restored it,
he made a link, /etc/safe_from_tim.

About a week later, I overwrote /etc/passwd, rather than removing it.

After he restored it again, he installed a daemon that kept a copy of
/etc/passwd, on another file system, and automatically restored it if
it appeared to have been damaged.

Fortunately, I finished my work on /etc/lpspl around this time, so we
didn't have to see if I could find a way to wipe out a couple of
filesystems...

-----------------------------------------------------------------------------

From: bill@chaos.cs.umn.edu ( bill pociengel )
Organization: University of Minnesota

After a real bad crash (tm) and having been an admin (on an RS/6000)
for less than a month (honest it wasn't my fault, yea right stupid)
we got to test our backup by doing:
# cd /
# rm -rf *
ohhhhhhhh sh*t i hope those tapes are good.

Ya know it's kinda funny (in a perverse way) to watch the system just
slowly go away.

-----------------------------------------------------------------------------

From: barrie@calvin.demon.co.uk (Barrie Spence)
Organization: DataCAD Ltd, Hamilton, Scotland

My mistake on SunOS (with OpenWindows) was to try and clean up all the
'.*' directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I
was very careful and made sure I was in /tmp and then executed
"rm -rf ./.*".

I will never do this again. If I am in any doubt as to how a wildcard
will expand I will echo it first.

-----------------------------------------------------------------------------

From: robjohn@ocdis01.UUCP (Contractor Bob Johnson)
Organization: Tinker Air Force Base, Oklahoma

Cleaning out an old directory, I did 'rm *', then noticed several files
that began with dot (.profile, etc) still there.  So, in a fit of obtuse
brilliance, I typed...

    rm -rf .* &

By the time I got it stopped, it had chewed through 3 filesystems which
all had to be restored from tape (.* expands to ../*, and the -r makes
it keep walking up the directory tree).  Live and learn...

-----------------------------------------------------------------------------

From: JRowe@cen.ex.ac.uk (John Rowe)
Organization: Computer Unit. - University of Exeter. UK

rik@nella15.cc.monash.edu.au (Rik Harris) writes:
[snippet about "using 'find' in an auto-cleanup script which blew away
 half of the source" deleted.  -ed.]

If you're doing this using find always put -xdev in:

 find /tmp/ -xdev -fstype 4.2 -type f -atime +5 -exec rm {} \;

This stops find from working its way down filesystems mounted under
/tmp/.  If you're using, say, perl you have to stat . and .. and see if
they are mounted on the same device.  The fstype 4.2 is pure paranoia.

Needless to say, I once forgot to do this.  All was well for some weeks
until Convex's version of NQS decided to temporarily mount /mnt under
/tmp... Interestingly, only two people noticed.  Yes, the chief op.
keeps good backups!

Other triumphs: I created a list of a user's files that hadn't been
accessed for three months and a perl script for him to delete them.
Of course, it had to be tested, I mislaid a quote from a print
statement... This did turn into a triumph, he only wanted a small
fraction of them back so we saved 20 MB.

I once deleted the only line from within an if..then statement in
rc.local, the sun refused to come up, and it was surprisingly
difficult to come up single user with a writeable file system.

AIX is a whole system of nightmares strung together.  If you stray
outside of the sort of setup IBM implicitly assume you have (all IBM
kit, no non IBM hosts on the network, etc.) you're liable to end up in
deep doodoo.

One thing I would like all vendors to do (I know one or two do) is
to give root the option of logging in using another shell.  Am I the
only one to have mangled a root shell?

-----------------------------------------------------------------------------

From: rheiger@renext.open.ch (Richard H. E. Eiger)
Organization: Olivetti (Schweiz) AG, Branch Office Berne

Just imagine having the sendmail.cf file in /etc. Now, I was working on
the sendmail stuff and had come up with lots of sendmail.cf.xxx which I
wanted to get rid of so I typed "rm -f sendmail.cf. *". At first I was
surprised about how much time it took to remove some 10 files or so. Hitting
the interrupt key, when I finally saw what had happened was way to late,
though.

Fortune has it that I'm a very lazy person. That's why I never bothered
to just back up directories with data that changes often. Therefore I
managed to restore /etc successfully before rebooting... :-) Happy end,
after all. Of course I had lost the only well working version of my
sendmail.cf...

-----------------------------------------------------------------------------

From: gfowler@javelin.sim.es.com (Gary Fowler)
Organization: Evans & Sutherland Computer Corporation

Once I was going to make a new file system using mkfs.  The device I wanted
to make it on was /dev/c0d1s8.  The device name that I used, however, was
/dev/c0d0s8 which held a very important application.  I had always been a
little annoyed by the 10 second wait that mkfs has before it actually makes
the file system.  I'm sure glad it waited that time though.  I probably waited
9.9 seconds before I realized my mistake and hit that DEL key just in time.
That was a near disaster avoided.
  [ I wish all systems were like that.  Linux mkfs doesn't wait, but at ]
  [ least I have the source!  -ed. ]

Another time I wasn't so lucky.  I was a very new SA, and I was trying to
clean some junk out of a system.  I was in /usr/bin when I noticed a sub
directory that didn't belong there.  A former SA had put it there.  I did
an ls on it and determined that it could be zapped.  Forgetting that I was
still in /usr/bin, I did an rm *.  No 10 second idiot proofing with rm.  Now
if some one would only create an OS with a "Do what I mean, not what I say"
feature.

Gary "Experience is what allows you to recognize a mistake the second time
you make it." Fowler

-----------------------------------------------------------------------------

From: russells@ccu1.aukuni.ac.nz (Russell Street)
Organization: University of Auckland, New Zealand.

I once had "gnu-emacs" aliased to 'em' (and 'emacs' etc)

One day I wanted to edit the start up file and mistyped
        # rm /etc/rc.local
instead of the obvious.

*Fortunately* I had just finished a backup and was now finding
out the joys of tar and it's love of path names. [./etc/rc.local
and /etc/rc.local and etc/rc.local) are *not* the same for tar
and TK-50s take a *long* time search for non-existant files :(]

Of course the BREAK (Ctrl-P) key on a VAX and an Ultrix manual
and a certain /etc/ttys line are just a horror story waiting
to happen!  Especially when the VAX and manuals are in a
unsupervised place :)

-----------------------------------------------------------------------------

From: rik@nella15.cc.monash.edu.au (Rik Harris)
Organization: Monash University, Melb., Australia.

Most of our disks reside on a single, high-powered server.  We decided
this probably wasn't too good an idea, and put a new disk on one of
the workstations (particularly since the w/s has a faster transfer
rate than the server does!).  It's still really useful to be able to
use all disks from the one machine, so I mounted the w/s disk on the
server.  I said to myself (being a Friday afternoon...see previous
post) "it's only temporary.../mnt is already being used...I'll mount
it in /tmp".  So, I mounted on /tmp/a (or something).  This was fine
for a few hours, but then the auto-cleanup script kicked in, and blew
away half of my source (the stuff over 2 weeks old).  I didn't notice
this for a few days, though.  After I figured out what had happened,
and restored the files (we _do_ have a good backup strategy),
everything was OK.

Until a few months later.  We were trying to convince a sysadmin from
another site that he shouldn't NFS export his disks rw,root to everyone,
so I mounted the disk to put a few suid root programs in his home
directory to convince him.  Well, it's only a temporary mount, so....

You guessed it, another Friday afternoon.  I did a umount /tmp/b, and
forgot about it.  I noticed this one about halfway through the next
day.  (NFS over a couple of 64k links is pretty slow).  The disk had
not unmounted because it was busy...busy with two find scripts, happily
checking for suid programs, and deleting anything over a week old.  A
df on the filesystem later showed about 12% full :-(    Sorry Craig.

Now, I create /mnt1, /mnt2, /mnt3.... :-)

Remember....Friday afternoons are BAD news.

-----------------------------------------------------------------------------

From: ranck@joesbar.cc.vt.edu (Wm. L. Ranck)

Well, after reading some of the stories in this thread I guess I can
tell mine.  I got an RS/6000 mod. 220 for my office about 6 months ago.
The OS was preloaded so I had little chance to learn that process.  Being
used to a full-screen editor I was not happy with vi so I read in the manual
that INED (IBM's editor for AIX) was full-screen and I logged in as root and
installed it.  I immediately started to play with the new editor and somehow
found a series of keys that told the editor to delete the current directory.
To this day I don't know what that sequence of keys was, but I was
unfortunately in the /etc directory when I found it, and I got a prompt that
said "do you want to remove this?" and I thought i was just removing the
file I had been playing with but instead I removed /etc!

I got the chance to learn how to install AIX from scratch.  I did reinstall
INED even though I was a little gun-shy but I made sure that whenever I used
it from then on I was *not* root.  I have since decided that EMACS may be a
better choice.

-----------------------------------------------------------------------------

From: root@rulcvx.LeidenUniv.nl (root)
Organization: CRI, institute for telecommunication and computerservices.

Well, waddya know...  Some half hour ago, coming back from root (I was
installing m4 on our system) [Shit, all my neato emacs tricks won't
work.  Damn, damn, damn kill, kill, KILL] to my own userid, I got this
little message: "Can't find home directory /mnt0/crissl." and an
other: "Can't lstat .".  [Grrrrr, ^S and ^Q haven't been remapped...]

Guess what happened, not an hour ago...  A collegue of mine was emptying
some directories of computer-course accounts.  As I did a "ps -t" on
his tty, what did I see?  "rm -rf .*"

Well, I'm not alone, he got sixteen other homedirectories as well.
And guess what filesystems we don't make incremental backups of...
And why not?  Beats me...

I haven't killed him yet, he first has to restore the lot.

And for those "touch \-i" fans out there: you wouldn't have been
protected...

-----------------------------------------------------------------------------

From: jcm@coombs.anu.edu.au (J. McPherson)
Organization: Australian National University

A few months ago in comp.sys.hp, someone posted about their repairs to an
HP 7x0, after a new sysadmin had just started work. They {the new
person} had been looking throught the file system to try to make some
space, saw /dev and the mainly 0 length files therein. Next command was "rm
-f /dev/*" and they wondered why they couldn't login ;)

I think the result was that the new person was sent on a sysamin's
course a.s.a.p

-----------------------------------------------------------------------------

From: msb@sq.sq.com (Mark Brader)
Organization: SoftQuad Inc., Toronto, Canada

> ... if you're trying rm -rf / you'll NEVER get a clear disk - at least
> /bin/rm (and if it reached /bin/rmdir before scanning some directories
> then add a lot of empty directories).  I've seen it once...

Then it must be version-dependent.  On this Sun, "cp /bin/rm foo"
followed by "./foo foo" does not leave a foo behind, and strings
shows that rm appears not to call rmdir (which makes sense, as it
can just use unlink()).

In any case, I'm reminded of the following article.  This is a classic
which, like the story of Mel, has been on the net several times;
it was in this newsgroup in January.  It was first posted in 1986.

-----

Have you ever left your terminal logged in, only to find when you came
back to it that a (supposed) friend had typed "rm -rf ~/*" and was
hovering over the keyboard with threats along the lines of "lend me a
fiver 'til Thursday, or I hit return"?  Undoubtedly the person in
question would not have had the nerve to inflict such a trauma upon
you, and was doing it in jest.  So you've probably never experienced the
worst of such disasters....

It was a quiet Wednesday afternoon.  Wednesday, 1st October, 15:15
BST, to be precise, when Peter, an office-mate of mine, leaned away
from his terminal and said to me, "Mario, I'm having a little trouble
sending mail."  Knowing that msg was capable of confusing even the
most capable of people, I sauntered over to his terminal to see what
was wrong.  A strange error message of the form (I forget the exact
details) "cannot access /foo/bar for userid 147" had been issued by
msg.  My first thought was "Who's userid 147?; the sender of the
message, the destination, or what?"  So I leant over to another
terminal, already logged in, and typed
        grep 147 /etc/passwd
only to receive the response
        /etc/passwd: No such file or directory.

Instantly, I guessed that something was amiss.  This was confirmed
when in response to
        ls /etc
I got
        ls: not found.

I suggested to Peter that it would be a good idea not to try anything
for a while, and went off to find our system manager.

When I arrived at his office, his door was ajar, and within ten
seconds I realised what the problem was.  James, our manager, was
sat down, head in hands, hands between knees, as one whose world has
just come to an end.  Our newly-appointed system programmer, Neil, was
beside him, gazing listlessly at the screen of his terminal.  And at
the top of the screen I spied the following lines:
        # cd
        # rm -rf *

Oh, shit, I thought.  That would just about explain it.

I can't remember what happened in the succeeding minutes; my memory is
just a blur.  I do remember trying ls (again), ps, who and maybe a few
other commands beside, all to no avail.  The next thing I remember was
being at my terminal again (a multi-window graphics terminal), and
typing
        cd /
        echo *
I owe a debt of thanks to David Korn for making echo a built-in of his
shell; needless to say, /bin, together with /bin/echo, had been
deleted.  What transpired in the next few minutes was that /dev, /etc
and /lib had also gone in their entirety; fortunately Neil had
interrupted rm while it was somewhere down below /news, and /tmp, /usr
and /users were all untouched.

Meanwhile James had made for our tape cupboard and had retrieved what
claimed to be a dump tape of the root filesystem, taken four weeks
earlier.  The pressing question was, "How do we recover the contents
of the tape?".  Not only had we lost /etc/restore, but all of the
device entries for the tape deck had vanished.  And where does mknod
live?  You guessed it, /etc.  How about recovery across Ethernet of
any of this from another VAX?  Well, /bin/tar had gone, and
thoughtfully the Berkeley people had put rcp in /bin in the 4.3
distribution.  What's more, none of the Ether stuff wanted to know
without /etc/hosts at least.  We found a version of cpio in
/usr/local, but that was unlikely to do us any good without a tape
deck.

Alternatively, we could get the boot tape out and rebuild the root
filesystem, but neither James nor Neil had done that before, and we
weren't sure that the first thing to happen would be that the whole
disk would be re-formatted, losing all our user files.  (We take dumps
of the user files every Thursday; by Murphy's Law this had to happen
on a Wednesday).  Another solution might be to borrow a disk from
another VAX, boot off that, and tidy up later, but that would have
entailed calling the DEC engineer out, at the very least.  We had a
number of users in the final throes of writing up PhD theses and the
loss of a maybe a weeks' work (not to mention the machine down time)
was unthinkable.

So, what to do?  The next idea was to write a program to make a device
descriptor for the tape deck, but we all know where cc, as and ld
live.  Or maybe make skeletal entries for /etc/passwd, /etc/hosts and
so on, so that /usr/bin/ftp would work.  By sheer luck, I had a
gnuemacs still running in one of my windows, which we could use to
create passwd, etc., but the first step was to create a directory to
put them in.  Of course /bin/mkdir had gone, and so had /bin/mv, so we
couldn't rename /tmp to /etc.  However, this looked like a reasonable
line of attack.

By now we had been joined by Alasdair, our resident UNIX guru, and as
luck would have it, someone who knows VAX assembler.  So our plan
became this: write a program in assembler which would either rename
/tmp to /etc, or make /etc, assemble it on another VAX, uuencode it,
type in the uuencoded file using my gnu, uudecode it (some bright
spark had thought to put uudecode in /usr/bin), run it, and hey
presto, it would all be plain sailing from there.  By yet another
miracle of good fortune, the terminal from which the damage had been
done was still su'd to root (su is in /bin, remember?), so at least we
stood a chance of all this working.

Off we set on our merry way, and within only an hour we had managed to
concoct the dozen or so lines of assembler to create /etc.  The
stripped binary was only 76 bytes long, so we converted it to hex
(slightly more readable than the output of uuencode), and typed it in
using my editor.  If any of you ever have the same problem, here's the
hex for future reference:
        070100002c000000000000000000000000000000000000000000000000000000
        0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f
        8800040000bc012f65746300

I had a handy program around (doesn't everybody?) for converting ASCII
hex to binary, and the output of /usr/bin/sum tallied with our
original binary.  But hang on---how do you set execute permission
without /bin/chmod?  A few seconds thought (which as usual, lasted a
couple of minutes) suggested that we write the binary on top of an
already existing binary, owned by me...problem solved.

So along we trotted to the terminal with the root login, carefully
remembered to set the umask to 0 (so that I could create files in it
using my gnu), and ran the binary.  So now we had a /etc, writable by
all.  From there it was but a few easy steps to creating passwd,
hosts, services, protocols, (etc), and then ftp was willing to play
ball.  Then we recovered the contents of /bin across the ether (it's
amazing how much you come to miss ls after just a few, short hours),
and selected files from /etc.  The key file was /etc/rrestore, with
which we recovered /dev from the dump tape, and the rest is history.

Now, you're asking yourself (as I am), what's the moral of this story?
Well, for one thing, you must always remember the immortal words,
DON'T PANIC.  Our initial reaction was to reboot the machine and try
everything as single user, but it's unlikely it would have come up
without /etc/init and /bin/sh.  Rational thought saved us from this
one.

The next thing to remember is that UNIX tools really can be put to
unusual purposes.  Even without my gnuemacs, we could have survived by
using, say, /usr/bin/grep as a substitute for /bin/cat.

And the final thing is, it's amazing how much of the system you can
delete without it falling apart completely.  Apart from the fact that
nobody could login (/bin/login?), and most of the useful commands
had gone, everything else seemed normal.  Of course, some things can't
stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but
by and large it all hangs together.

I shall leave you with this question: if you were placed in the same
situation, and had the presence of mind that always comes with
hindsight, could you have got out of it in a simpler or easier way?
Answers on a postage stamp to:

Mario Wolczko

------------------------------------------------------------------------------
*NEW*

From: samuel@cs.ubc.ca (Stephen Samuel)
Organization: University of British Columbia, Canada

Some time ago, I was editing our cron file to remove core more than a day
old.  Unfortunately, thru recursing into VI sessions, I ended up saving an
intermediate (wron) version of this file with an extra '-o' in it.

find / -name core -o -atime +1 -exec /bin/rm {} \;

The cute thing about this is that it leaves ALL core files intact, and
removes any OTHER file that hasn't been accessed in the last 24 hours.

Although the script ran at 4AM, I was the first person to notice this,
in the early afternoon.. I started to get curious when I noticed that 
SOME man pages were missing, while others were. Up till then, I was pleased
to see that we finally had some free disk space.  Then I started to notice
the pattern.

Really unpleasant was the fact that no system backups had taken place all
summer (and this was a research lab).

The only saving grace is that most of the really active files had been
accessed in the previous day (thank god I didn't do this on a saturday).
I was also lucky that I'd used tar the previous day, as well.

I still felt sick having to tell people in the lab what happened.

-----------------------------------------------------------------------------

From: Stephen Samuel 
Organization: University of British Columbia, Canada

As some older sys admins may remember, BSD 4.1 used to display unprintable
characters as a questionmark.

An unfortunate friend of mine had managed to create an executable with a 
name consisting of a single DEL character, so it showed up as  "?*".

He tried to remove it.

 "rm ?*"

he was quite frustrated by the time he asked me for help, because
he had such a hard time getting his files restored. Every time he walked 
up to a sys-admin type and explained what happened, they'd go "you did 
WHAT?", he'd explain again, and they'd go into a state of uncontrolable
giggles, and he'd walk away.  I only giggled controlably.

This was at a time (~star wars) when it was known to many as "the mythical
rm star".

-------------------------------------------------------------------------------

From: jjr@ctms.gwinnett.com (J.J. Reynolds)
Organization: Consolidated Traffic Management Services (CTMS)

The SCO man page for the rm command states:

          It is also forbidden to remove the root directory of a given
          file system.

Well, just to test it out, I one day decided to try "rm -r /" on one of our
test machines.   The man page is correct, but if you read carefully, it
doesn't say anything about all of the files underneath that filesystem....--

-------------------------------------------------------------------------------

From: bcutter@pdnis.paradyne.com (Brooks Cutter)

A while back I installed System V R4 on my 386 at home for development
purposes...  I was compiling programs both in my home directory, and
in /usr/local/src ... so in order to reduce unnecessary disk space I
decided to use cron to delete .o files that weren't accessed for
over a day...

I put the following command in the root cron...

find / -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;

(instead of putting)

find /home/bcutter -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;
find /usr/local/src -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;

The result was that a short time later I was unable to compile software.
What the first line was doing was zapping the files like /usr/lib/crt1.o
.. and later I found out all the Kernel object files...

OOPS!  After this happened a second time (after re-installing the files
from tape) I tracked down the offending line and fixed it....

Yet another case of creating work by trying to avoid extra work (in this
case a second find line)
=============================================================================
Section 2: How not to free up some space on you drives...
=============================================================================

From: mitch@cirrus.com (Mitch Wright)
Organization: Cirrus Logic Inc.

A fellow sysadmin was looking to free up some much needed disk space.  Since
it was purely a production machine I suggested that he go through and "strip"
his binaries.  Unfortunately I made the assumption that he knew what strip
does and would use it wisely -- flashes of the Bad News Bears come to mind
now.  To make it short, he stripped /vmunix which didn't destroy the system,
but certainly caused some interesting problems.

-----------------------------------------------------------------------------

From: hirai@cc.swarthmore.edu (Eiji Hirai)
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA

I heard this from a fellow sysadmin friend.  My friend was forced to
work with some sysadmins who didn't have their act together.  One day, one
of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
"Hmm, this is taking up a lot of space - let's delete it".  "rm /vmunix".

My friend had to reinstall the entire OS on that machine after his coworker
did this "cleanup".  Ahh, the hazards of working with sysadmins who really
shouldn't be sysadmins in the first place.

Moral of all these stories:  if I had to hire a Unix sysadmin, the first
thing I'd look for is experience.  NOTHING can substitute for down-to-earth,
real-life grungy experience in this field.

-----------------------------------------------------------------------------

From: djs@jet.uk (David J Stevenson)
Organization: Joint European Torus

hirai@cc.swarthmore.edu (Eiji Hirai) writes:
[story about "deleting /vmunix to save space" deleted - to save space!  -ed.]

When this happened to a colleague (when I worked somewhere else) he restored
vmunix by copying from another machine.  Unfortunately, a 68000 kernel does
not run very well on a Sparc...

-----------------------------------------------------------------------------

From: smckinty@sunicnc.France.Sun.COM (Steve McKinty - Sun ICNC)
Organization: SunConnect

hirai@cc.swarthmore.edu (Eiji Hirai) writes:
[story about "deleting /vmunix to save space" deleted - to save space!  -ed.]

Hmm. A colleague of mine did much the same by accident on one of
our test machines. After discovering it, fortunately while the machine
was still up & running, he FTPed a copy of /vmunix from the other lab
system (both running exactly the same kernel).

After rebooting his machine everything (to his relief) worked fine.

-----------------------------------------------------------------------------

From: greep@Speech.SRI.COM (Steven Tepper)
Organization: SRI International

At one place where I worked, someone had set up cron to delete any
file named "core" more than a few days old, since disk space was
always tight and most users wouldn't know what core files were or care
about them.  Unfortunately not everyone knew about this and one user
lost a plain text file (a project proposal) he'd spent a one lot of
time working on because he called it "core".  This was around 1976,
when Unix was still considered exotic and before bookstores carried
entire sections of Unix books.

-----------------------------------------------------------------------------

From: tjm@hrt213.brooks.af.mil (Tim Miller)
Organization: AL/HRTI, Brooks AFB

This one qulaified for Stupid Act of the Month:

All this happened on my sparcII...

I was making room on / because I needed to to test run something
(which was using a tmp file in, of all places, /var/tmp.  I could have
recompiled the application to use more memory and/or /tmp, but I'm too
lazy for that), so I figure "I'll just compress this, and this, and
this..."  One of those "this'" was vmunix.

Well, of course the application crashes the machine, and stupid
me had forgotten that I'd compressed vmunix, so the damn thing won't
boot.  checksum: Bad value or some such error.  Took me most of the day
to figure out just what I'd done to the dang thing.  8)

Moral(s):
1) Never, ever, EVER play with vmunix.
2) Always keep a log of what you do to the root file system.


-----------------------------------------------------------------------------

From: corwin@ensta.ensta.fr (Gilles Gravier)
Organization: ENSTA, Paris, France

Well, talk about horror stories... We have a DataGeneral Aviion machine
where I work at. I was doing regular admin tasks on it and decided, logged
in as root, to clean /tmp... (I can already see you laughing there!). So,
as usual, I typed "cd / tmp" then "rm *" as I was placed in / when the
dreaded rm was entered... My root directory was erased...

I realized my error fast enough... So, since I had deleted the kernel, and
the administration kernels (that both reside in /), I had to recreate a
new kernel. Luckily for me, DG/UX allows to recreate one "on the fly", using
parameters of the running kernel (in memory!)... So I did, and then rebooted.

Things started getting bad when I still couldn't work on my machine, logins
didn't work (No Shell messages...)... Until I could access the /etc/passwd
file using a trojan shell through an NFS mounted directory, and great a root
account whose shell was not /sbin/sh...

On a DG, /sbin and /bin are both links to /usr/sbin... The links were killed
when I did my "rm"...

-----------------------------------------------------------------------------

From: grover@ccai.clv.oh.us (grover davidson)
Organization: CCAI

Several months ago here, we were reoganizing our disk space on an
RS/6000 with AIX 3.1. I have done this many time before, but for some
reason, I was rushing through expanding a file system. Instead of entering
the new file system size where it belongs, I entered it into the mount
point. It also turns out that I was attached 2 levels down in the file
system. Since the size was entered as a number ('234567') and was
INTERPRETED as a mount point directory, the result was a
circular hard link that basicly left the file system unusable.
IBM was not able to help, and we had done quite a bit of work that day,
we had to somehow recover some of the stuff. We ended up doing a dd of the
raw volume, and the read it back in a couple MB at a time and extracted
the pieces that we needed for the mess.

The other day while reading Stevens new book, "Advanced Programming in
the UNIX Environment", he stated that he had done the exact same thing
durring the preparation of his book. At least I am not alone.....

-----------------------------------------------------------------------------

From: hillig@U.Chem.LSA.UMich.EDU (Kurt Hillig)
Organization: Department of Chemistry, University of Michigan, Ann Arbor

Just so nobody get the impression that you can only screw up
U**X systems....

Several years ago I was sysadmin for the department's VAX/VMS system.
One day, trying to free up some space on the system disk, I noticed
there were a bunch of files like COBRTL.EXE, BASRTL.EXE etc. - i.e.
the Cobol, Basic, etc. run-time libraries.  Since the only language
used was Fortran, I nuked them.

Three weeks later, a visiting professor came over from Greece for a few
weeks, mostly to do some calculations on the VAX.  He got in on a Friday
morning, and started work that afternoon.  About 7 PM I got a call at
home - he'd accidentally bumped the reset switch (on the VAX 3200, it
was just at knee height!) and it wouldn't reboot.  I went back in and
took a look, and the reason it wouldn't come up was that the run-time
libraries were missing.

I ended up booting stand-alone backup from tape, dumping another data
disk to tape, restoring an old system from tape, copying the RTL's,
then restoring the data disk from tape again - all with TK50's.  Took
me until 3 AM.

-----------------------------------------------------------------------------

From: adb@geac.com (Anthony DeBoer)
Organization: Geac Computer Corporation

At a former employer, I once watched our sysadmin reboot from the
distribution tape after making a typing error editing the root line in
/etc/passwd.  After munging the colon count in this line, nobody could
login or su, and he hadn't left himself in root in another session while
testing his changes (a rule I've adopted for myself).

My "big break", the moment I became sysadmin, was partly by virtue of
being the only one to ask him for the root password the day he went out
the door for the last time.

What I've found preferable, when wanting to set up an alternative shell
for root (bash, in my case), is to add a second line in /etc/passwd with
a slightly different login name, same password, UID 0, and the other
shell.  That way, if /usr/local/bin/bash or /usr/local/bin or the /usr
partition itself ever goes west, I still have a login with good ol'
/bin/sh handy.  (I know, installing it as /bin/bash might bypass some
potential problems, but not all of them.)

This might, of course, be harder to do on a security fascist system like
AIX.  Simply trying to create a "backup" login with UID 0 there once so
that the operator didn't get a prompt and have to remember what to type
next was a nightmare.  (I wound up giving "backup" a normal UID, put it
in a group by itself, and gave it setuid-root copies of find and cpio,
with owner root, group backup, and permissions 4550).  BTW, this was to
make things easier for the backup operator, not to make it secure from
that person.

-----------------------------------------------------------------------------

From: exudnw@exu.ericsson.se (Dave Williams)
Organization: Ericsson Network Systems

A sysadmin was told to change the root passwd on a dozen or so Sun servers
serving 400 diskless sun clients.  He changed the passwd string to the wrong
encrypted string (with a sed-like string editor) and locked root out from
everywhere.  Took hours to untangle.

---------------------------------------------------------------------------

From: rick@sadtler.com (Rick Morris)
Organization: Sadtler Research Laboratories

Okay, I'll bite.  We had Zenith Data System's Z-286's, boosted to 386's
via an excellerator (imagine a large boot stomping lots of data through
a small 16 bit funnel...).  We were running SCO's Xenix.  The user filesystem
crashed in such a way that it couldn't be repaired via fsck.  fsck would
try to repair a specific file and then just stop, leaving the filesystem
dirty.  The "dirty bit" in the superblock said that it couldn't be mounted
because it was dirty.  But it couldn't be cleaned.  But there was lots of
data on it and I hadn't been doing backups because the only I/O device to
do backups was the floppy drive and I wasn't about to sit there every night
or even once a week and slam 30 odd floppies into the drive while the backups
ran, even worse try to restore a file from a backup of 30 floppies....

Anyway, to recover the data I used fsdb to edit the superblock and change
the dirty bit to clean, mounted the disk, got off all the good data,
and remade the filesystem.  Thanks, Xenix.  fsck couldn't clean it,
but you did supply fsdb!   *whew*

----------------------------------------------------------------------------

From: valdis@vttcf.cc.vt.edu (Valdis Kletnieks)
Organization: Virginia Tech, Blacksburg, VA

Well, here's a few contributions of mine, over 10 years of hacking
Unixoid systems:

1) yesterday's panic:  Applying a patch tape to an AIX 3.2 system
to bring it to 3.2.3.  Having had reasonable sucess at this before,
I used an xterm window from my workstation.  Well, at some point,
a shared library got updated.. I'd seen this before on other machines -
what happens is that 'more', 'su', and a few other things start failing
mysteriously.  Unfortunately, I then managed to nuke ANOTHER window
on my workstation - and the SIGHUP semantics took out all windows I
spawned from the command line of that window.

So - we got a system that I can login to, but can't 'su' to root.
And since I'm not root, I can't continue the update install, or clean
things up.  I was in no mood to pull the plug on the machine when
I didn't know what state it was in - was kind of in no mood to reboot
and find out it wasn't rebootable.

I finally ended up using FTP to coerce all the files in /etc/security
so that I could login as root and finish cleaning up....

Ended up having to reboot *anyhow* - just too much confusion with the
updated shared library...

2) Another time, our AIX/370 cluster managed to trash the /etc/passwd
file.  All 4 machines in the cluster lost their copies within
milliseconds.  In the next few minutes, I discovered that (a) the
nightly script that stashed an archive copy hadn't run the night before
and (b) that our backups were pure zorkumblattum as well.  (The joys
of running very beta-test software).

I finally got saved when I realized the cluster had *5* machines in it -
a lone PS/2 had crashed the night before, and failed to reboot.  So
it had a propogated copy of /etc/passwd as of the previous night.

Go to that PS/2, unplug it's Ethernet.. reboot it.  Copy /etc/passwd
to floppy, carry to a working (?) PS/2 in the cluster, tar it off,
let it propogate to other cluster sites.  Go back, hook up the
crashed PS/2s ethernet.. All done.

Only time in my career that having beta-test software crash a machine
saved me from bugs in beta-test software. ;)

3) Once I was in the position of upgrading a Gould PN/9080.  I was
a good sysadmin, took a backup before I started, since the README said
that they had changed the I-node format slightly.  I do the upgrade,
and it goes with unprecidented (for Gould) smoothness.  mkfs all
the user partitions, start restoring files.  Blam.

I/O error on the tape.  All 12 tapes.  Both Sets of backups.

However, 'dd' could read the tape just fine.

36 straight hours later, I finally track it down to a bad chip on the
tape controller board - the chip was involved in the buffer/convert
from a 32-bit backplane to a 8-bit I/O cable.  Every 4 bytes, the
5th bit would reverse sense.  20 mins later, I had a program
written, and 'dd | my_twiddle | restore -f -' running.

Moral: Always *verify* the backups - the tape drive didn't report a
write error, because what it *received* and what went on the tape
were the same....

I'm sure I have other sagas, but those are some of the more memorable
ones I've had...

---------------------------------------------------------------------------
*NEW*

From: mccalld@Sonoma.EDU

I was an engineer from the CYBER world (Control Data Corp.) when they got
involved with MIPS.  They sold a contract to the Army Core of Engineers
and I got a crach course in the EP/IX, Enhanced Performance Unix, for the
San Francisco customer base.  These were RISC 4000 machines with 128mb of
memory and several 1.5 gig disks and connected to the worlds largest LAN.
One day the site administrator called me and said his machine was con-
tinuously crashing with core dumps and many other bizzare error messages...
After ariving at the site and calling for help, it was determined that I
needed a kit of spares to swap for the problem...24 hours later a kit
arrived and all cards (3) were swapped to no avail.  Software support was
then consulted and we booted to mini-root and then mounted the back door
partition into the regular root directory and went searching for the real
problem. After about 15 minutes of examining /etc it was apparent to the
support person that inittab had been deleted, and so we had to restore it
from backups.  We found out later that one of the Core network software
engineers was given su and told to learn the machine. Enough said. This
day in age, the hardware is usually quite reliable and there are a number
of files which, if corrupted, could easily simulate a hardware failure...
MORAL never give a network engineer the su password he might attempt to
build bridges into non-existant file systems, or just tear down all the
existing bridges hoping to get the bigger picture and mayber build a
better system!? Geeze.

------------------------------------------------------------------------------

From: Tatjana Heuser 

I once thought it a nice idea to leave root *without* password at all
on my little Sun 3/50 at home.  (I'm using that one to play with things
I don't dare to mess with at work)

So I started with setting every tty including the console to insecure,
put only myself in group wheel and made sure that ftp denied accesss to
every account without a password.

Everything worked fine and I couldn't imagine anything against it.

Then, after maybe a month or so, I decided for some reasons I have
entirely forgotten, to set my own login shell from /usr/local/bin/tcsh
to /bin/sh.  Trying to make things as small as possible I just deleted
the entire shell entry in the passwd so /bin/sh would get the default shell.
As a short test logging in in just another xterm went fine, I dodn't spent
any more thoughts on it and logged off a few hours later.

Next time I wanted to su to root I was plain denied it!
(Needless to say that I was somewhat surprized)

`id` quickly revealed I had no other group than my login group
(which wasn't wheel) -hence no su for me :(

- booting single-user asked for the root password and wasn't content 
  with a 
- logging in as root had been disabled by myself
- ftp denies access to accounts without password
- I didn't have an /.rhosts 
- my tape trive stopped working (I later found out the head was blocked in a
  faraway position) 

Eventually I ended up inviting another 3/50 owner to my home with his 
disk and booting from that one.

-since then I've moved experiments to diskless clients :-)

-------------------------------------------------------------------------------

From: Tatjana Heuser 

Being responsible for a small network where every single user had the root
passwd and mucked around with things (me being the lowliest person there
and not allowed to chande this then) I started putting all important
configuration files under SCCS control.  Of course I did this on the main
server, leaving instructions to all the other would-be administrators how
to use this.  Everything went fine until all the machines were taken down
during x-mas vacation (no reboot of the server for quite some time).

Well, the first working day in January I got a phone call at the
place I spent that time.  Missing /etc/rc* the server would drop a
desperated shell at a rather helpless state of things. :-}  At my last
change of the rc's I obviously had checked them in with the 'delta'
command only :( having the original files deleted (or rather stored in
the SCCS directory) :-}

I had to return the 800 km to work a week earlier than planned.
(and learned a lot about startup :)

No mistake any user ever made as root has ever outscored this one...
(oh yeah, extending the swap partition over the next one (almost one GB
without backup, but that was the boss of the department...)

-------------------------------------------------------------------------------
=============================================================================
Section 3: Dealing with /dev files...
=============================================================================

From: nickp@BNR.CA ("Nick  Pitfield", N.T.)

One of my colleagues had been itching to get into sys admin for some time,
so last week he was finally sent on a 5-day sys admin course run by HP in
Bracknell.

On the following Sunday, he decided to try out his new-found knowledge by
trying to connect and configure a DAT drive on one of our critical test
systems. He connected the cables up okay, and then created the device file
using 'mknod'.

Unfortunately, he gave the device file the same minor & major device numbers
as the root disk; so as soon as he tried to write to this newly installed
'DAT drive', the machine wents tits up with a corrupt root disk....ho hum.

-----------------------------------------------------------------------------

From: philip@haas.berkeley.edu (Philip Enteles)
Organization: Haas School of Business, Berkeley

As a new system administrator of a Unix machine with limited space I
thought I was doing myself a favor by keeping things neat and clean.  One
day as I was 'cleaning up' I removed a file called 'bzero'.  Strange
things started to happen like vi didn't work then the complaints started
coming in.  Mail didn't work.  The compilers didn't work.  About this time
the REAL system administrator poked his head in and asked what I had
done.  Further examination showed that bzero is the zeroed memory without
which the OS had no operating space so anything using temporary memory
was non-functional.  The repair?  Well things are tough to do when most of
the utilities don't work.  Eventually the REAL system administrator took
the system to single user and rebuilt the system including full
restores from a tape system.  The Moral is don't be to anal about things
you don't understand.  Take the time learn what those strange files are before
removing them and screwing yourself.

-----------------------------------------------------------------------------

From: broberts@waggen.twuug.com (Bill Roberts)
Organization: Brite Systems

My most interesting in the reguard was when I deleted "/dev/null".  Of
course it was soon recreated as a "regular file", then permission problems
started to show up.

I was new at the game at the time and couldn't figure out what happened!
It look good to me.  I didn't know about "special files" and "mknod" and
major and minor device codes.  A friend finally helped out and started
laughing and put me on the right track.  That one episode taught me a
lot about my system.

-----------------------------------------------------------------------------

From: Frank T Lofaro 
Organization: Sophomore, Math/Computer Science, Carnegie Mellon, Pittsburgh, PA

    Well one time I was installing a minimal base system of Linux on a
friends PC, so that we would have all the necessary utlitities to bring
over the rest of the stuff.  His 3 1/2 inch disk was dead, so when had to
get the 5 1/4 inch version of the boot/root disk.  Too bad that version,
having to fit in 1.2M instead of 1.44, didn't have tar.  We could get a
version of tar, but it was in a tar file (nice chicken and egg
scenario).  I said, okay, since we don't have tar, we can't use that to
copy the files from floppy to the hard disk, I'll use cp instead (bad
move).  It actually seemed to work for a while, then the machine
rebooted!  I did it again, the same thing happened.  Then I realize cp
wouldn't work on device files! (this is what happens when you try to
install un*x at 3 AM).  It just read the contents of the device and made
a file containing such, which is undesireable in any event.  (when it
read /dev/port, the device file that references I/O ports, it must've
did something to reboot the machine, that was the file that was causing
the reboots).

    I finally got it working by having him get the tar archive of the
linux binaries (including the tar we needed), and untarring it on one of
the public decstations here, so we could ftp tar to his PC using his dos
tcp/ip stuff.  A funny aside was that it untarred into ~/bin, and
superseded all his normal commands.  We were wondering why everything
wouldn't run.  Luckily it wasn't too hard to fix after we realized what
happened.

-----------------------------------------------------------------------------

From: hirai@cc.swarthmore.edu (Eiji Hirai)
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA

A consultant we had hired (and not a very good one) was installing Unix
on one our workstations.  He was mucking with creating and deleting
/dev/tty* files and made /dev/tty a regular file.  Weird things started to
happen.  Commands would only print their output if you pressed return twice,
etc.  Fortunately, we solved the problem by re-mknod-ing /dev/tty.  However,
it took a while to realize what was causing this problem.

-----------------------------------------------------------------------------

From: lingnau@math.uni-frankfurt.de (Anselm Lingnau)
Organization: University of Frankfurt/Main, Dept. of Mathematics

broberts@waggen.twuug.com (Bill Roberts) writes:
[story about deleting /dev/null deleted.  -ed.]

Years ago when I was working in the Graphics Workshop at Edinburgh University,
we used to have a small UNIX machine for testing. The machine wasn't used too
much, so nobody bothered to set up user accounts, and so everybody was running
as root all the time. Now one of the chaps who used to come in was fond of
reading fortunes (/usr/games/fortune having been removed from the University's
real machines along with all the other games). Guess what happened when the
machine said

# fortune
fortune: write error on /dev/null --- please empty the bit bucket

Quite a lot of stuff wouldn't work after the chap was done with the machine
for the day. You bet we put up proper accounts after that!