• 1. London, UK
  • 2. New York, NY
  • 3. Sydney, Australia
  • 4. Melbourne, Australia
  • 5. Moscow, Russia
  • 6. Singapore
  • 7. Paris, France
  • 8. Chicago, IL
  • 9. Hong Kong
  • 10. Houston, TX
Bharat Suneja

Friday, July 23, 2004

 

NetApp Volume Fails To Mount After Power Outage

Posted by Bharat Suneja at 6:54 PM
A power outage (are rolling blackouts back in California??) earlier in the morning, and the resulting mess. The new cluster (Windows 2003/Exchange 2003) with NetApp's iSCSI filer came up happily. The older one (that uses NetApp's older VLD protocol) failed.

Issues: L: drive (Storage Group 1, Logs) would not connect/map, so Exchange Group in cluster would not come online.

Disconnected drives, reconnected. Did not work.
NetApp tech support recommended upgrading to SnapDrive 2.1 (3.0 onwards is only for iSCSI and Fiber Channel - no support for the older VLD protocol). That also required a post-SP3 Hotfix (Q816990) - only available through Microsoft PSS.

Recent experiences with NetApp and specific Microsoft hotfixes have resulted in long conference calls with PSS, so I decided to apply Windows 2000 SP4 instead.

Once this was done, the drive mappings were totally inconsistent - 4 out of 5 drives would connect, but none would map (to drive letters), 1 would map (Q), 2 would map (Q and L, or Q and E) or 4 would map. On reboot, some would disappear completely. Completely random behaviour.

After more than half a day of troubleshooting, NetApp concluded - to our dismay - that it might have been caused because of Spanning Tree Protocol (STP) used on switches to allow multiple redundant paths and avoid loops. Apparently, NetApp filers and STP don't work well together - it's documented by NetApp and they won't support it.

Can't really turn STP off on a network. Solution was to use PortFast on those particular ports (with NetApp filers and Exchange cluster nodes connected). On Cisco switches :
set spantree portfast module/port enable

Another fix was to increase the NfsAdminRetryCount value to 7 in the Registry.  Location: HKLM\SYSTEM\CurrentControlSet\Services\NAScsipt\Parameters

Finally, we were down to 2 drives mapping consistently  after reboots (Q and L), and 3 drives connecting but not mapping to drive letters. Needed to go to Cluster Admin and bring the drive resources online. This made them map, and they showed up in Windows Explorer.

The Exchange Virtual Server was up shortly before 7:30 PM.

The infamous "Crazy Friday Breakdowns" law at work again!




Labels:

Wednesday, July 21, 2004

 

Import user attributes from OpenLDAP

Posted by Bharat Suneja at 5:24 PM
Recently I imported the employeeID (and a few other attributes) from an OpenLDAP directory into Active Directory. Problem was different distinguishedName attributes in both directories - the OpenLDAP directory had a different OU structure.

So we have a LDIF Dump from OpenLDAP in the format :

dn: distinguishedName not consistent with AD
uid:  *******  (Active Directory equivalent - sAMAccountName)
employeeNumber:    (Mapped to employeeID attrib in AD, could have mapped to employeeNumber in AD 2003 as well).
-

How do you import this when the dn doesn't match?
Process: Read the uid attribute in the LDIF, lookup sAMAccountName in AD, get the correct AD distinguishedName for the user, dump info into a new LDF in the format :

dn: correct AD dn
changetype: modify
replace: employeeID
employeeID: ****
-
replace: attribute2
attribute2: ***
-
replace: attribute3
attribute3: ***
-

dn: user2
.....

To do this, VBScript needs to:

1) Open the existing OpenLDAP LDIF file
2) Read the first 3 characters into a variable
3) Figure out if it's dn:, uid, or first 3 characters of any of the other attributes being imported
4) Skip a certain number of characters (different for each attribute)
5) Read rest of the line
6) Dump the stuff read into another variable
7) If the line contains the uid attribute, lookup AD for a user with a matching sAMAccountName
8) Get the correct distinguishedName for that account
9) Read the next line --- till end of one user's details
10) Determine if one user's details have been read (based on the extra carriage return after - )
11)  Write the correctly formatted info to a new file.

Once the new LDF was produced with the correct format and distinguishedName attribute, it was simple to import it into AD using LDIFDE.

LEARNING EXPERIENCES (aka GOOF-UPs): Each project has its own peculiar goof-ups - some major ones, some minor ones. These are the learning experiences - wouldn't be fun if everything worked smoothly the first time around, right? 

Some users had DUPLICATE employeeIDs! 2, 3, even 5 users with the same EID!! How did that happen? Well, some users exported from OpenLDAP did not have an employeeNumber. Mostly contractors. When the script read stuff for a user with an employee ID, finished writing it to the new file, then went back to read the second user's details - it saw no employeeNumber and used the previous user's employeeID instead... Why? Because the variable had not been "flushed" after details of the first user were written to the new file.

More scripts were written to search AD based on employeeID - that quickly showed the employees with duplicate employeeIDs.

There is no user interface to add the employeeID attribute to a user's account. Either you write a COM object, or a web-based interface, or a script to do it. The script was the shortest path. Now, the challenge is to make that part of the new account creation process that's followed consistenly - I'm sure this is one step that'll be missed every once in a while, and users who want to use the application based on this particular attribute will be unable to do so!

It'll make sense to either script the entire new account creation process, or build a web interface to it. Time consuming projects.

Labels:

 

Where's LDIFDE on XP Professional?

Posted by Bharat Suneja at 5:13 PM
The Windows XP Professional docs tell you the LDIFDE utility is avaialble on that OS, but I haven't been able to find it.

Simply copy it from a Windows Server 2003 installation - it's in %Systemroot%\System32.

Labels: ,

Friday, July 09, 2004

You've just built a shiny new Windows Server 2003 cluster, installed Exchange Server 2003, created an Exchange Virtual Server (EVS) and tested MAPI, HTTP, Cluster Failover, et al - things look great!

User calls, can't access mail on new EVS using IMAP4.

When you create the Exchange Virtual Server by creating the System Attendant resource in Cluster Admin, the IMAP4 and POP3 servers are not created automatically (unlike Exchange 2000) because "Exchange Server 2003 is designed to comply with the Microsoft Trustworthy Computing initiative" according to KB818480 - which is all good.

And here's when you have a fleeting moment of monumental stupidity - YOU DON'T READ THE KNOWLEDGEBASE ARTICLE COMPLETELY! (contrary to public opinion, many KB articles are in fact quite accurate, provide the complete solution, and help you avoid making blunders).

And here's what happens when you don't do that... :

You go ahead and try to create the IMAP4 virtual server resource in Cluster Admin. IMAP4 virtual server created successfully. Wonderful! All that's left is right-click, bring resource online.

And DISASTER! (Don't try this on a production box.. :)

This brings down your entire EVS. Goes offline. You hope it's once, to initialize the IMAP4 virtual server perhaps. Comes back online. Great! Goest back offline. Oh noooo... ! Flip flops between offline and online... mom, look what I did to the Exchange cluster! :)

Bottomline, the flip-flopping goes on. You can't delete the IMAP4 VS if it's in transition (Offline pending, Online). At the right time - just when it goes online and before any other resource goes offline, you hit delete and the resource is deleted. And investigate.

What's wrong? The IMAP4 service is set to DISABLED by default. The Application Event Log will show you 3 errors:

Event ID: 1009 | Source: MsExchangeCluster | Description: IMAP Virtual Server (MAILBOX): Failed to start the service 'IMAP4SVC' because it has been disabled. Check the Services manager to change its startup type.
Event ID: 1010 | Source: MSExchangeCluster | Description: IMAP Virtual Server (MAILBOX): Failed to start the service 'IMAP4SVC'.
Event ID: 1003 | Source: MSExchangeCluster | Description: IMAP Virtual Server (MAILBOX): Failed to bring the resource online.

Resolution: Set the IMAP4 service on all nodes to MANUAL (not AUTOMATIC).

Now bring back the Exchange Virtual Server online. Comes online. Stable.

Test IMAP4 access - telnet to port 143. It works!

Labels: ,

Wednesday, July 07, 2004

I've always had a love/hate relationship with NetApp's SnapManager for Exchange (SME) backup utility.

To restore a store, SME needs to take the Physical Disk resource offline. (Actually swaps it with the volume/LUN that has the backup). When this happens, the Exchange System Manager (and thereby the entire Exchange Virtual Server) goes down - because it depends on all the Physical Disk resources. Weird!

That means even if some disks and the stores residing on them are OK, all your users will suffer an outage when SME does a restore of a single store or a storage group.

I wish NetApp finds a better way to do this - or perhaps set up the cluster to not depend on the Physical Disk resources - or ALL the physical disk resources.

Alternatively, I'm looking at changing the System Attendant Resources's dependencies - just try and remove the disks with the Storage Group to be restored from the list.

From the SME docs:
In a Windows cluster, all resources that are directly or indirectly dependent on the LUN that is to be restored are taken offline, as well as the LUN itself. This means that the entire Exchange virtual server (all storage groups) is offline.

Caution:
Do not attempt to manage any cluster resources while the restore is running.

If a cluster move group operation occurs during the restore (for example, if the node that owns the resources goes down) you must restart the SnapManager user interface and the restore.

Labels: