I recently had a failure of a Rocketraid 2640 2 channel Raid 1 card. The process of replacing the hard drive that had failed was a bit more involved than I thought it would be.
If you work with a lot of data, you eventually will find yourself using some forum of a Raid solution to protect your data from loss. There are many implementations of raid (redundant array of inexpensive disks) but the most common are Raid 0, 1, and 5. Each has it’s own advantages and disadvantages in regards to speed, protection and implementation. Here is a brief overview of the 3.
- Raid 0 writes data over multiple disks (call striping) but does not write any multiple records. So for example if you have a 2 disc raid 0 array, your data will be written over both discs. However there will be no redundancy, so if you lose 1 of the 2 discs, you have lost everything. The only real advantage to Raid 0 is that it’s very fast and allows you to spread your data over multiple discs which will decrease your overall write times.
- Raid 1 (also called mirroring). Raid 1 requires at least 2 total discs. The raid implementation writes 1 record to both discs at the same time. The theory being, both discs won’t fail at once and if one does, you can still continue to work in a degraded state until you replace the failed disc. The total capacity of n/2 n being the total amount of storage. So if you have (2) 2 terabyte hard drives in a Raid 1 array, you will have 1 TB of usable space.
- Raid 5 allows you to have multiple discs in a array and gives you better utilization of storage with a n-1 capacity solution, n being the total number of drives. So if you have (4) 1 terabyte hard drives in a raid 5 array, your total available storage will be n (4TB) – (1TB) for 3TB of total storage. Raid 5 can have 1 drive in the array fail and then run in a degraded state. You replace the damaged hard drive your array will run slower until the data has been replaced over the new drive. Raid 5 tends to have more overhead and slower overall performance over a Raid 1 solution.
Only Raid 1 and 5 offer any data protection. When using a PC or Mac, you have many ways to implement a raid solution.
- Raid implemented from the system board (most times a raid 1 solution)
- Raid implemented from a raid card installed in the PC or mac. This off loads the system board from having to process the raid array and is much easier to work with.
- Raid implemented from an external device like a Drobo. The Drobo can allow the user to pick the level of raid and all of the processing is handled by the Drobo. The Drobo is attached to your PC or Mac via Fireware, USB3 or network.
In my setup I use both internal and external raid arrays. Over the years, I have started using the Rocketraid brand of card for my internal raid arrays. Rocketraid cards are inexpensive, work in all PC’s and have a very good track record on both performance and reliability. The Rocketraid card I use is the 2640X8 card which uses a port on the systemboard. I tend to use the Rocketraid cards that have the 2 channel implementation in a Raid 1 array. The cards are made in China or Taiwan, and they really don’t have a very good tech support setup. They list a phone number for tech support which is a California number, but when called, you won’t ever get an answer. They expect you to open your issue over the web, which requires getting the serial number of the raid card. Once a card is installed, the only way to read the serial number is physically remove the card, something I do not prefer to do especially if the card has been installed for a period of time over 6 months.
Rocketraid cards will scan the bios of the card each time you boot. If there are any problems, the system will stop the boot and bring up the Rocketraid bios screen. Most times you will get a message that your array is “critical” Critical means that a drive or channel on the card has failed or is failing. With modern hard drives like the Western Digital lineup allow raid cards to query the hard drive to see if there are any errors starting to show up. The raid card will have a certain tolerance level and once this level has been exceeded, the array will go “critical”. NOTE, you may get this on one boot and the next time the machine will boot up clean, but it means you do need to watch your array since once a drive starts to show errors, it eventually will fail.
The Rocketraid 2 channel cards in a raid 1 array don’t allow you to have a “hot spare” since they only have 2 channels and both have to be active.
Recently I booted up my main production machine and found my Raid 1 array was critical. I knew I had a hard drive that had either failed or was started to throw enough error codes that the card was hitting it’s tolerance level. I opened the web based utility that allows you to view the array and get the status and found that one of the 2 drives was “critical”. The software tool will show which hard drive needs to be replaced by serial number. So I know which drive to pull, however getting the array back on line was a big more trouble. As I already mentioned, Rocketraid has no tech support by phone (at least not a realistic one). Their manual that ships with the card does not cover drive replacement after a failure, which surprised me.
My first step was to power down and pull the bad hard drive. I installed a new 2 TB drive and powered back up into the raid bios utility. Here is where things got a bit hard to understand. What needs to happen is the array needs to see the new hard drive as a available spare, then it should start to rebuild the array over both drives. From the bios, I could not get the hard drive to a spare status. I could see my other original drive in a “configured” status and knew I did not want to do anything to effect it. The new drive showed in the bios as “new” Here are the steps you need to perform to get the new drive identified as a spare.
- Select the new drive (listed as new) and initialize it. Make sure you only initialize the new drive as if you initialize the “configured” drive you will loose your array and all the data. Initialization will only take a few seconds and the drive will change from “new” to “initialized”
- There are a series of tabs across the top of the bios screen, find the one assigned to “spare” and open it.
- Select the initialized drive and it will now say initialized spare.
- Close out of the bios utility making sure to save your changes and boot back into windows.
On boot the raid utility should go through a normal process and when you get back into Windows (in my case windows 7 64 bit) open the raid utility program. Under the main tab you should see a message “rebuilding” with a status number beside it. My particular array was 1.3 TB in size and it took about 11 hours to rebuild. While the array is rebuilding I would not try to access the data, but you can work on other files/data on the PC. After the rebuild is done, you will get a message saying that the process is at 100%. I rebooted, and the machine came back up fine. I noticed that the access speeds on the array appeared to be faster so it was apparent that I had been having problems with my hard drive for a while. It’s always a good idea to back up your array before you attempt to replace the “critical” hard drive.
Overall I was very impressed with how this entire process worked. I only would have liked to have better documentation about the process.
Recent Comments