Pages

Thursday, 27 October 2016

Exchange Server 2013/2016 native Load Balancing & Failover options

Although NLB is a supported method for load balancing Exchange Server 2013/2016, it does have some limitations, especially for small to medium environments. If you are trying to architect a highly available Exchange solution you will almost certainly be using Database Availability Groups (DAG’s). DAG’s were introduced in Exchange 2010, and are used to replicate Mailbox Databases between Exchange Servers. The DAG technology is built on Windows Failover Clustering.

Exchange Server 2013 is broken down into two server roles the Client Access Server (CAS) and Mailbox Servers (MBX). The DAG technology only protects failure at the Mailbox Server level, however it does not failover incoming connections to Exchange.  In small networks you want to keep the number of servers to a minimum, it is possible to have both Exchange roles installed on the same server, therefore if you wanted to deploy a highly available Exchange setup you can get away with two servers in total.

Network Load Balancing and Failover Clustering cannot be installed on the same server. Therefore, you cannot use WNLB to load balance the CAS server if you are using dual roles (CAS + MBX) servers. In order for WNLB to be a compatible solution with DAG’s you must separate the CAS and MBX roles onto separate servers.

DNS Round Robin

Although DNS RR can be used to load balance traffic across multiple CAS servers it does not offer any kind of failover. This is mainly because the native Windows DNS Server does not offer any kind of DNS record weight or priority. This means you can have multiple records pointing to your Exchange namespace (i.e mail.domain.com) using the IP’s of the CAS servers.

For this to even offer load balancing you have to reconfigure the TTL of the DNS records at both the DNS Server level, and the end clients.

To reduce the TTL on the DNS A records enable the Advanced Features inside DNS, when you create an (A) record you will then have the option to set a TTL on the record.


In this example I have set it very low to only 15 seconds, this maybe a bit low for most production environments. Especially if you only have a couple of DC/DNS servers.

So I have DNS RR setup to load balance across 3 x CAS servers.

·         mail.ryanbetts.co.uk > 192.168.1.104
·         mail.ryanbetts.co.uk > 192.168.1.105
·         mail.ryanbetts.co.uk > 192.168.1.106

Although the TTL has been lowered at the server side it must also be done on the clients. By default Windows caches DNS lookups for 1 hour.  To configure the DNS TTL cache on a localling in Windows, open the Registry at Local Machine\System CurrentControlSet\Services\DnsCache\Parameters


Create a 32 bit DWORD called "MaxCacheTtl" , set this to a value in seconds (Decimal).


For the changes to take a affect you must restart the DNS Client service. Although making these configuration changes load balances the traffic across CAS servers, it does not offer a very good user experience.

DNS RR in action on a Windows 8.1 client, when the DNS cache expires and the client does another query to it's primary DNS server the Outlook client remains connected to Exchange.


However I have noticed (I'm guessing) when the cache expires the client is sometimes prompted to enter credentials again. Which is annoying for any users, especially if you set the TTL to somethin extremely low like 15 seconds.

DNS Round Robin no "failover"

For a test to prove this to myself, so that I really understand it. I failed over my only DAG to the 2nd Exchange server (192.168.1.105). 


I then turned the 1st server off (192.168.1.104) however, as I did not change the DNS RR configuration the client was still resolving mail.ryanbetts.co.uk to the IP address of the 1st Exchange server. When the TTL expired it started resolving to 192.168.1.105 which made a successful connection.

The only way to make this a "highly available" configuration is to effectively disable DNS RR in the event one of the CAS servers fails. In this scenario you would simple delete the mail.ryanbetts.co.uk record that was pointing to the failed server, therefore all clients would be directed to the surviving server. This might be acceptable for some businesses but as it's not active/active it won't fit all requirements.

I am going to look at the free Kemp Layer 4-7 Load Balancer to load balance Exchange Server, this is free but has a limit of 20 mpb/s throughput and a maximum of 50 concurrent sessions.