Worst IT screw up for you...

Discussion in 'OT Technology' started by trouphaz, May 18, 2008.

  1. trouphaz

    trouphaz New Member

    Joined:
    Sep 22, 2003
    Messages:
    2,666
    Likes Received:
    0
    What is your biggest screwup in your IT career? I'd like to hear about accidentally typing "rm -r *" as root and later realizing you were in / instead of the right directory.


    host A held the primary copy of a 4Tb database for host B (which normally ran in parallel and would have mirrors synced periodically) and host C which was for backups. database on host A was screwed and image on backup was worthless. i accidentally synced some of A's disks over B and some of B's disks over A, thus making all data on both servers worthless. had to restore from tape to C, sync back C over A and then sync back A over B for full copies across the board. we had about a 16 hour outage while we made all kinds of changes to our production SAN on the fly.
     
  2. tyrionlannister

    tyrionlannister New Member

    Joined:
    Jun 13, 2006
    Messages:
    710
    Likes Received:
    0
    Location:
    New York
    Still work there?
     
  3. Coottie

    Coottie BOOMER......SOONER OT Supporter

    Joined:
    Jun 6, 2006
    Messages:
    32,407
    Likes Received:
    0
    Location:
    OKC
    Worst was on my home system. Backed up quicken to floppy then wiped the HD. Backup was corrupt and I lost about 10 years worth of accounting data. :(

    The thing that still gets me, before I wiped the HD I could have easily made multiple backups to other computers HDs, CDs, DVDs all that shit. Not just backups but I could have copied the live file. But I thought no, one backup should be fine....no reason to be paranoid.
     
  4. trouphaz

    trouphaz New Member

    Joined:
    Sep 22, 2003
    Messages:
    2,666
    Likes Received:
    0
    i don't now, but i did work there for about 4 more years after that incident. it was the first time we had ever attempted a restore and the SAN hadn't been in place all that long. so, luckily i had a very understanding manager and HP (who had done the initial install) had done enough wrong to take a lot of the heat off of me. :)
     
  5. Mike99TA

    Mike99TA I don't have anything clever to put here right now

    Joined:
    Oct 3, 2001
    Messages:
    4,553
    Likes Received:
    0
    Location:
    Greenville, SC
    I haven't actually screwed up anything nearly bad enough for them to consider any sort of disciplinary action - probably a lot of luck involved.

    The only thing jumping into my head right now is something that wasn't really my fault. I was a new Unix System Admin for a medium sized ISP (400k customers or so) and had to make a change to the front-end RADIUS servers. Well, when the change was done I went ahead and restarted all 9 of them in sequence, one after the other. Each one said stopping, then starting, successful, so I kept going.

    Well, after about 15 minutes my boss walks by and asks me and the other Unix Admin if anything was wrong. We said no, why? He said they were getting hundreds and hundreds of calls that no one could connect to the internet. Well guess what the other admin hadn't told me? Apparently the start scripts were broken and said successful whether they started successfully or not, and you had to tail the logfile afterward and make sure people were authenticating successfully. Caused about a 30 minute outage where no one could connect to the internet through our ISP.

    Anyway, like I said, it wasn't really my fault (I think I had been there a week). The only things I can think of other than that are accidentally rebooting systems (never a production system though, but I've accidentally rebooted a couple windows servers before that were in use because the last option was restart and i hit enter without double checking it was set to logoff).

    So, nothing all that bad, like I said, lucky maybe :)
     
  6. trouphaz

    trouphaz New Member

    Joined:
    Sep 22, 2003
    Messages:
    2,666
    Likes Received:
    0
    yeah, i recently had a problem sort of like that. at my new job, they have a pair of DNS servers to be the authoritative resolvers for a bunch of domains we have on the innernet. they failed a security audit, so i was asked to get them up to spec. well, one was all locked down (no recursive queries, no domain transfers other than for the slaves that our ISP maintains... oh and no direct queries except for internal hosts and the ISP slaves) and the other was wide open allowing full domain transfers and recursive queries. i made the mistake of assuming everything was fully tested so i just made our own internal slave match the primary. found out 2 issues: 1) the primary didn't actually resolve anything to the internet, it basically just provided the maps to our ISP slaves (remember, no queries allowed) and 2) one of our domains wasn't setup for the ISP to be a slave. so, the only host that existed on the internet that would actually respond to queries was our own slave... that i now locked down to not permit any queries. the next day we found that no emails that were going to that domain worked. luckily i backed up the entire config so i was able to back it out the next day and then figure out what the last guy screwed up. it is all up with proper levels of security now and we fixed it so our ISP will actually act as a slave. good thing it wasn't our main domain that went down.
     
  7. makaze

    makaze New Member

    Joined:
    Jun 14, 2006
    Messages:
    15
    Likes Received:
    0
    Location:
    Baltimore, MD
    Worst I did was leave a console connection from a machine to the ALOM port on a Sun :)

    Couldn't figure out why it kept halting.. then one day I realized it only halted when I rebooted the other machine :big grin:
     
  8. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    I'd just started working at my current company, and my boss was going out of town for a week. While he was gone, one of the things I needed to do was to wipe and rebuild three servers, two HP NetServers and a Dell PowerEdge. Well, I wiped all three HP NetServers and left the Dell PowerEdge alone, because he didn't give me any written notes and I didn't think to write any myself. Turns out the third HP NetServer was the domain controller for about 75% of the servers on the two racks.

    Yeah.

    Fortunately, those servers were all slated to get wiped anyway, but that was an uncomfortable meeting when I had to explain why all the equipment from the company we'd recently bought didn't work anymore. And my boss was cool enough to say "okay, next time I'll put Post-It notes on the right machines so there's no mistake." He was a good guy, sorry to see him go.
     
  9. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    I have no idea what you just said (I'm not really an IT guy), but that definitely sounds like one of those "goddamnit, what the hell is goi...oh FUCK!" kinda problems.
     
    Last edited: May 19, 2008
  10. trouphaz

    trouphaz New Member

    Joined:
    Sep 22, 2003
    Messages:
    2,666
    Likes Received:
    0
    hehe, yeah, that is one of those things that always pissed me off about Sun consoles. i think there was an issue with connecting it to a PC serial port too. it has been so long that i don't remember the details, but something you would do on the PC would drop the Sun machine to an OK prompt.
     
  11. makaze

    makaze New Member

    Joined:
    Jun 14, 2006
    Messages:
    15
    Likes Received:
    0
    Location:
    Baltimore, MD
    Yep anytime a break was sent (happens each time the machine rebooted) :) I disabled that feature so it doesn't do it anymore, and since hooked it up to a terminal server that doesn't reboot.
     
  12. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Ginormous SQL Server logging was on, and the server was under-specified. So we kept filling up the disk. There was a 5GB DB file that was 6 months old, so I deleted it. Woops. There was no backup. That file was needed for billing, which they were many months behind on. Oh, and this was in a casino.

    :big grin:
     
  13. trouphaz

    trouphaz New Member

    Joined:
    Sep 22, 2003
    Messages:
    2,666
    Likes Received:
    0
    holy shit. did they take you out back and break your knees or put your head in a vice or anything like that?
     
  14. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    No. The system was on the fritz at least once an hour at that time. Everything was fucked. I was working 100 hour weeks out of state, traveling long distances by car, and it wasn't my fault they didn't backup or keep up with their billing.

    So, I didn't actually feel too bad. Ever since then, I am much more careful on production systems, though. I think through everything several times before executing a command.
     
  15. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    Is that why they didn't want to buy your product, by any chance?
     
  16. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Fuck no. My product worked. This was 5 years ago, for someone else. You think I would use SQL Server? :rofl:
     
  17. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    How the hell would I know? It stands to reason you'd use the database preferred by the client.
     
  18. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    The databases used by the client varied, so it was easiest to just pull it all into MySQL and do the analysis there. Used a Linked Server in SQL Server to push data there. Although would have ultimately ended up pulling data with perl or java or something.
     
  19. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    Okay, I've got a new "worst fuckup ever". On Saturday I was in the server room, cutting the ends off old, loose, disconnected telecom cables left over from when my building was a call center. It was a god-awful mess, with patches on top of patches on top of patches. It looked like this:

    [​IMG]

    (the blue and white cables are my work. Note how they're actually organized, wooo...)

    Anyway, somewhere in that hideous clusterfuck of all blue/blue-white pairs were eight conductors that were still in use -- they supplied the phone service to my office. Well, I cut six of them as they snaked up behind one of those big grey wrapped cables, leaving the office with one phone line. Thank god it was also the phone line the DSL came in on.

    So I come in this morning and the phones are dead. No idea why, though I figured I must've done something over the weekend to fuck them up. When I tugged on the 4-in-1 line that plugged into the PBX blade and saw the cut end pop out from behind the other cables I cut, I just stared for a second, then tried to run in about ten directions at once. I haven't felt my face get that pale since I got caught stealing in middle school.

    The telecom service guy gave me a look that said "nice work, Slick" when he saw what I'd done, but when he started to trace the wires back to where they plugged in, he discovered why I had so much trouble; the four lines had actually been inserted inside one of the big grey cables, through a slit in the casing. There was no way to avoid it. Then he spent a half-hour chasing from patch to patch to patch, figuring out where they actually hooked up to the main feed from the phone company.

    In the end, I got two lines up and working myself before he arrived, and the whole thing was taken care of by lunchtime -- not to mention, the tangled rat's nest of bare wires has been cleaned out and replaced by nice, easy patch cables.

    But damn, today was a very uncomfortable morning.
     
  20. Peyomp

    Peyomp New Member

    Joined:
    Jan 11, 2002
    Messages:
    14,017
    Likes Received:
    0
    Are you an electrician? You do mostly cables?
     
  21. Coottie

    Coottie BOOMER......SOONER OT Supporter

    Joined:
    Jun 6, 2006
    Messages:
    32,407
    Likes Received:
    0
    Location:
    OKC
    haha I thought of another fuck up.

    It was about 4 months ago I was working for a company that handled large text files and did a lot of matching between tables. Their main source of revenue was this batch file processing application that took customer data and matched it against public records and returned the matches. Customers would drop files at all hours of the day and night and our systems would simply open the files and process them and return them and we'd never have to touch the file.

    So we got this really large file in. We didn't want the customer to drop it and launch the job automatically because we were still configuring their job. So they FTPd it to another location and when we were ready, we dropped the file to launch the job.

    The file had some 12 million records in it and we were joining it to another table that had 20+ million records and another table that had something like 30 million records.

    Well with all the joins going on the transaction logs filled up all the available hard drive space on our SAN and the application, which runs 24/7, came crashing down.....oh and our DBA was unreachable (on vacation).

    Good thing I was new because I didn't get blamed for it. In fact, the person that got blamed for it was the network admin because we didn't have enough space in our SAN.....and he'd just upgraded it like 4 months earlier. lol
     
  22. deusexaethera

    deusexaethera OT Supporter

    Joined:
    Jan 27, 2005
    Messages:
    19,712
    Likes Received:
    0
    No, I'm not an electrician, I've just been doing a lot of work involving wires lately. Before this, I was working on process flowcharts and GUI designs. After this, I'll be working on a user manual in three different languages (fortunately pre-translated). My company is pretty small, so as one of the managers describes it, "we wear a lot of hats".
     
  23. midcalbrew

    midcalbrew OT Supporter

    Joined:
    Jun 14, 2006
    Messages:
    2,376
    Likes Received:
    0
    Location:
    MidCal
    Was relatively new, was having some problem on our HP core routing switch. Turned debug all on ... switch locked, network went down, all on a busy afternoon during registration (work for a college). The switch evidently can't log debug info AND route IP.
     

Share This Page