A power outage (are rolling blackouts back in California??) earlier in the morning, and the resulting mess. The new cluster (Windows 2003/Exchange 2003) with NetApp’s iSCSI filer came up happily. The older one (that uses NetApp’s older VLD protocol) failed.
Issues: L: drive (Storage Group 1, Logs) would not connect/map, so Exchange Group in cluster would not come online.
Disconnected drives, reconnected. Did not work.
NetApp tech support recommended upgrading to SnapDrive 2.1 (3.0 onwards is only for iSCSI and Fiber Channel – no support for the older VLD protocol). That also required a post-SP3 Hotfix (Q816990) – only available through Microsoft PSS.
Recent experiences with NetApp and specific Microsoft hotfixes have resulted in long conference calls with PSS, so I decided to apply Windows 2000 SP4 instead.
Once this was done, the drive mappings were totally inconsistent – 4 out of 5 drives would connect, but none would map (to drive letters), 1 would map (Q), 2 would map (Q and L, or Q and E) or 4 would map. On reboot, some would disappear completely. Completely random behaviour.
After more than half a day of troubleshooting, NetApp concluded – to our dismay – that it might have been caused because of Spanning Tree Protocol (STP) used on switches to allow multiple redundant paths and avoid loops. Apparently, NetApp filers and STP don’t work well together – it’s documented by NetApp and they won’t support it.
Can’t really turn STP off on a network. Solution was to use PortFast on those particular ports (with NetApp filers and Exchange cluster nodes connected). On Cisco switches :
set spantree portfast module/port enable
Another fix was to increase the NfsAdminRetryCount value to 7 in the Registry. Location: HKLM\SYSTEM\CurrentControlSet\Services\NAScsipt\Parameters
Finally, we were down to 2 drives mapping consistently after reboots (Q and L), and 3 drives connecting but not mapping to drive letters. Needed to go to Cluster Admin and bring the drive resources online. This made them map, and they showed up in Windows Explorer.
The Exchange Virtual Server was up shortly before 7:30 PM.
The infamous “Crazy Friday Breakdowns” law at work again!
{ 0 comments… add one now }