Crappy problem with FreeBSD / RAID5

Discussion in 'OT Technology' started by KingNothing, Jan 7, 2004.

  1. KingNothing

    KingNothing New Member

    Joined:
    Mar 4, 2002
    Messages:
    11,729
    Likes Received:
    0
    Location:
    Atlanta & Auburn
    For the past month I've been running 3 200GB drives in RAID5 on FreeBSD 4.8. This morning I was reading and writing to the array via samba when the box crashed due to disk IO errors. It rebooted on it's own and when everything came back up vinum reported one of the drives as down, the plex as degraded, and one of the subdisks as R 0% (recovering). The recovering failed.

    All of the physical connections to and from the hard drives are good, so it appears as though one of my new Maxtor's has died.

    I tried running fsck on the degraded plex, as I should be able to from what I know about RAID 5, and I get a couple hundred bad or dupe block errors. After that it asks me if I want to start deleting about a hundred or so files that are bad (but definately were not being written to at the time). I said no to deleting those files. When that's finished it tells me I need to run fsck again.

    I've run it a total of three times so far with the same results each time.

    I should also mention that something similiar happened a few days ago. I had a few people reading from the array (no writing) when I received a kernel dump and had to reboot. When it came up I fscked the drive with no errors and that was that. This leads me to believe that the one drive has been dieing for the past few days and finally bit the dust this morning.

    What's next? Did RAID 5 completely fail at what it's supposed to do (provide redundancy)?

    Here's what vinum currently reports, if it's of any use to anyone who can help:

    Code:
    2 drives:
    D a                     State: up       Device /dev/ad4s1e      Avail: 0/194474 MB (0%)
    D c                     State: up       Device /dev/ad7s1e      Avail: 0/194474 MB (0%)
    D b                     State: referenced       Device  Avail: 0/0 MB
    
    1 volumes:
    V storage               State: up       Plexes:       1 Size:        379 GB
    
    1 plexes:
    P storage.p0         R5 State: degraded Subdisks:     3 Size:        379 GB
    
    3 subdisks:
    S storage.p0.s0         State: up       PO:        0  B Size:        189 GB
    S storage.p0.s1         State: obsolete PO:      512 kB Size:        189 GB
    S storage.p0.s2         State: up       PO:     1024 kB Size:        189 GB
    
    I'm not sure why it says available is 0 / 200GB, but it has always said that, so I assume it's an error with vinum, or perhaps just how it's designed to run in RAID 5.
     
  2. diranged

    diranged New Member

    Joined:
    Nov 30, 2003
    Messages:
    2,399
    Likes Received:
    0
    Your using VINUM for your RAID right? That is a purely software based RAID solution. Honestly, if you have multiple people using this thing you should be using a real RAID card that does it all without the OS knowing. They are not expensive anymore... I dont really know what to say, the last time I used software RAID i had a similar situation -- one drive died, lost all data. Made me sad :(
     
  3. col_panic

    col_panic calm like a bomb Moderator

    Joined:
    Sep 19, 2003
    Messages:
    188,160
    Likes Received:
    0
    Location:
    winter haven, fl
    completely agree. hardware raid is the only way to go
     
  4. Rob

    Rob OT Supporter

    Joined:
    Jul 6, 2002
    Messages:
    88,625
    Likes Received:
    40
    Location:
    Atlanta, GA
    Once a RAID 5 array fails you should not touch the remaining disks until you can restore the array. By changing the drives you will make all the parity information old and therefore ruin any chance of recovery.

    I am not too familiar with software RAID on BSD, but is there a way that you can try to force a rebuild of the array?
     
  5. KingNothing

    KingNothing New Member

    Joined:
    Mar 4, 2002
    Messages:
    11,729
    Likes Received:
    0
    Location:
    Atlanta & Auburn
    Well, it turns out the drive "died" because of the power molex connecting to it. I've since rebuild everything on that particular drive, but I think my fucking around with the array in the meantime might have caused some problems. Hopefully I can find a way to reverse whatever I've done and not see complete data loss, but it isn't the end of the world if I do as I can recover most of my data from people I know.
     
  6. KingNothing

    KingNothing New Member

    Joined:
    Mar 4, 2002
    Messages:
    11,729
    Likes Received:
    0
    Location:
    Atlanta & Auburn
    If I can't recover my data, I may buy a hardware RAID 5 card. I don't regularly have more than two people using the server, but I took it over to a friend's house so he could rip my movies. :sad2:
     

Share This Page