Tuesday, January 24, 2012

Maintenance Best Practices for Adaptec RAID Solutions


RAID is the most common method of data protection and most companies rely on the redundancy provided by RAID at various levels to protect them from disk drive failures. RAID’s ability to protect data has become increasingly challenging with the exponential increase in drive capacities and the increased use of less reliable drives.RAID cannot protect data against virus attack, human error, data deletion, or natural or unnatural disaster. RAID cannot protect data beyond its advertised disk drive redundancy (for RAID-1, RAID-10, and RAID-5 one drive failure, for RAID-6 two drive failures, for example). Adaptec Technical Support often sees cases where an array is in a degraded state for a longer period of time and data loss then occurs when a further drive finally fails. The best RAID controller cannot help in this situation. In addition to timely maintenance, periodic backup still remains one of the most critical practices in data operations

THE EFFECT OF MODERN LARGER DISK SIZES AND DRIVE QUALITY ISSUES ON RAID

Hard drive media defects and other drive quality issues have steadily improved over time, even as drive sizes have grown substantially. However, hard drives are not expected to be totally free of flaws. In addition, normal wear on a drive may result in an increase in media defects, or “grown defects,” over time. The data block containing the defect becomes unusable and must be “remapped” to another location on the drive. If a bad block is encountered during a normal write operation, the controller marks that block as bad and the block is added to the “grown defects list” in the drive’s NVRAM. That write operation is not complete until the data is properly written in a remapped location. When a bad block is encountered during a normal read operation, the controller will reconstruct the missing data from parity operations and remap the data to the new location. A condition known as a double fault (“bad stripe”) occurs when a RAID controller encounters a bad block on a drive in a RAID volume and then encounters an additional bad block on another hard drive in the same data stripe. This double fault scenario can also occur while rebuilding a degraded array, leaving the controller with insufficient parity information to reconstruct the data stripe. The end result is a rebuild failure with the loss of any data in that stripe, assuming the stripe is in the user data area.
Today, hard drive capacities have increased remarkably, and the likelihood has grown that one or more media defects will occur over the lifespan of the drive. In addition, large arrays take longer to rebuild than small arrays, thus increasing the amount of time the array is not redundant

OVERVIEW OF STEPS THAT CAN BE TAKEN IN KEEPING WITH RAID BEST PRACTICES 
Perform all recommended driver, controller firmware, and Storage Management application (Adaptec Storage Manager) updates

Install Adaptec Storage Manager:
 Adaptec Storage Manager helps you to monitor and maintain Adaptec RAID controllers, enclosures, and disk drives in your storage space from a single location. When Adaptec Storage Manager is installed on a system, the Adaptec Storage Manager Agent is also installed automatically as a service. It’s designed to run in the background, without user intervention, and its job is to monitor and manage system health, event notifications, tasks schedules, and other on-going processes on that system. It sends notices when tasks are completed successfully, and sounds an alarm when errors or failures occur on that system.

Run regular consistency checks on the system: 
Verification is designed to proactively detect hard disk media defects while the array is online and redundant. A RAID-5 or RAID-6 array is inconsistent when the data and parity do not match. Likewise, a RAID-1 array is inconsistent when the data and mirror do not match.
The verification process issues commands to each drive in the array to test all sectors. When a bad sector is found, the RAID controller instructs the hard drive to reassign the bad sector, and then reconstructs the data using the other drives. The affected hard drive then writes data to the newly assigned good sector. These operations continue so that all sectors of each configured drive are checked, including hot spares. As a result, bad sectors can be remapped before data loss occurs.

Two run modes are available to help enhance flexibility and data protection 
1. Background Consistency Check (auto mode): In this mode, the tool is always on. Adaptec Storage Manager continually and automatically checks your logical drives once they’re in use. Once Background Consistency Check has checked all sectors of the array, it repeats this check indefinitely. As its name indicates, Background Consistency Check is always a background or secondary process. Data I/O remains the highest priority for the RAID subsystem.
Note: With this feature enabled, there may be an impact to performance. To enable Background Consistency Check using Adaptec Storage Manager:
• In the Enterprise View, right-click the controller.
• Select Background Consistency Check and then click Enable

Once enabled, the Background Consistency Check period can be adjusted: 
• In the Enterprise view, right-click the controller.
• Click on Background Consistency Check, then select Change period. The Change Background Consistency Check period window opens.
• Adjust the slider control from Very Slow (365 days) to Fast (10 days). Alternatively, in the New Period field, use the arrow keys to increase or decrease the setting.
• Click OK.

 2. Verify with fix (manual mode):
 This mode is used to perform a single, quick check of the array. After the verification process has checked all sectors of the array, it stops and will not start again until started manually by the administrator. In manual mode, the verification process commands are given a higher priority than in Auto mode so that the check completes significantly faster.
Verify with fix is a data-level check and requires more controller resources to read and compare data. Also, because of the additional resources required, verify with fix is not designed to run continuously. Rather, it should be scheduled to run at a regular interval, preferably during periods of low drive activity, or during system maintenance.

To verify and fix a logical drive using Adaptec Storage Manager:
• In the Logical Devices View, right-click the logical drive.
• Select Verify with fix and confirm that you want to verify
• To begin the verification immediately, click Yes. To schedule the verification, click Schedule, and then set the date and time. You can also choose to set the verification as a recurring task. While the verification is in progress,
the logical drive is shown as an animated icon to indicate that the task is in progress. When the verification is complete, an event notice is generated in the local system’s event log.

Monitor Storage Manager Event Logs: You can see status information and messages about the activity (or events) occurring on your storage space by checking component properties and looking at the Event Viewer and status icons in Adaptec Storage Manager. To open a full-screen version of the event log, click the Events button in the tool bar The event log lists activity occurring in your storage space, with the most recent event listed at the top. Double-click any event to open the Configuration Event Detail window to see more information in an easier-to-read format. Adaptec Storage Manager can be configured to send email messages (or notifications) about events on a system in your storage space. We recommend doing this if your storage space is not managed by a dedicated person, or if that particular system is off-site or not connected to a monitor. Email notifications can help you monitor activity on your entire storage space from any location, and are especially useful in storage spaces that include multiple systems running the Adaptec Storage Manager Agent only.
To set up email notifications:
1.In the Configure menu (on the tool bar), select the system you want, and then select Email Notifications. 2.The Email Notifications window opens. The SMTP Server Settings window opens if you haven’t set up email notifications previously.
3.Enter the address of your SMTP server and the “From” address to appear in email notifications. If an email recipient will be replying to email notifications, be sure that the “From” address belongs to a system that is actively monitored.
4.Click OK to save the settings.
5.In the Email Notifications window tool bar, click Add email recipient. The Add Email Recipient window opens.
6.Enter the recipient’s email address, select the level of events for which the recipient will receive an email, and then click Add. Repeat this Step to add more email recipients. Click Cancel to close the window.

You can also set Adaptec Storage Manager to send status alerts about a specified system to all users who are logged into your storage space. When you set Adaptec Storage Manager to broadcast event alerts, all logged-in users receive messages about all types of events. In Windows, these alerts appear as pop-up messages; in all other operating systems, these alerts appear as console messages

Note:Replace drives that have either failed completely, or are starting to show signs of failing (medium errors, S.M.A.R.T. errors, etc.) immediately

0 comments:

Post a Comment

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More