You know, I’ve learned something today. When something does not work as expected, there is a reason for that, even when you don’t see it… right away.
One of our customers had an issue with Lync Phone Edition calendar integration. The phones simply refused to connect to EWS, period. To make the issue more puzzling, a phone signing form Public Internet worked, but one on LAN did not. Customer uses BIG-IP F5 hardware load balancer as HLB appliance.
The troubleshooting process started with reviewing the phone logs (.CLG2 in particular). I know, it is very hard to read them logs, but TextPad did the job. I was looking the calls for Autodiscover web services. LPE uses Autodiscover to obtain both internal and external EWS URLs. In the log I found the following: “NAutoDiscover::DnsAutodiscoverTask::TryAutodiscoverUrls: Exception with this url. hr=0x80072f0d”. This error translates to “The security certificate on the server is invalid”. What??? VeriSign Certificate is not trusted??? Tens of thousands of users connect to their Exchange accounts from desktop and mobile devices in and out, and yet LPE does not like the certificate.
There are three things I need to mention here:
- The customer uses certificate signed from Public CA on the CAS array
- Because of this fact, the customer already added VeriSign Root CA to the Lync certificate store by following the procedure described in Kevin’s blog: http://ocsguy.com/2012/05/19/lync-phone-edition-connection-to-microsoft-exchange-is-unavailable
- Customer set “internal” and “external” VIPs on F5 for all Exchange services
First step was to examine the certificates deployed to CAS array. This is very important in order to establish the number of intermediate certificates. In the first example we have one intermediate – VeriSign Class 3 Secure Server CA.
Now, let’s look another certificate:
In this case we have two intermediates – the issuer (VeriSign Class 3 Secure Server CA – G3) which is sub for VeriSign Class 3 Secure Server CA.
I was puzzled as of why web browser did not fail when visiting autodiscover.domain.com, but the phone did, until realized - browsers are capable of following the chain all the way up to the Root CA i.e. F5 presents Identity Certificate – server.domain.com and the browser will follow the chain up to the Root CA.
There are many, many posts in Internet about mobile devices failing to connect to EWS. Additional research shows that mobile devices, due to the limited size of the Certificate Store and platform specifics, require ALL INTERMIDIATE certificates to be presented along with the identity certificate in the initial handshake. This way the device will follow that chain “internally”, until the Root CA in the local store validate the chain.
We already know the limitation of Aries platform in this regard, and so I thought – what if we have the same limitation and behavior as Mobile Devices have? This prompted close examination of the handshake between client and server. Below is an example of successful handshake:
What is important here - we have immediate flow of type “Application Data”, which means the handshake was completed successfully, a trust was established, and data exchange has begun.
Let’s now look closely what happens:
After initial Client and Server “Hello”, the server will offer the Certificate(s).
As we continue to drill, we see that two certificates were offered
…server identity and the intermediate issuer, which matches the chain as we saw already:
All Lync Phone Edition have to do now is to check if “VeriSign Class 3 Secure Server CA” can be trusted, and we do because the Root – “VeriSign Class 3 Public Primary CA” is in the Trusted CA Root Store of LPE.
Now let’s see what was happening in the customer environment… we would expect to see this same flow internally…
During the Certificate presentation, we note that the ONLY certificate offered from F5 is the “Identity Certificate” and no Intermediate was present:
Of course, LPE could not follow the chain and the handshake failed miserably. We asked for autodiscover.dmain.com and received said Server Identity Certificate, but could not validate the Issuer - VeriSign Class 3 Secure Server CA.
We spoke with the F5 team, and they confirmed that the Internal VIP for CAS array had only Server Identity Certificate assigned, but not intermediate. After assigning the correct Intermediate Certificate to the internal services for Autodiscover and EWS, the issue was resolved. I am still baffled from the fact one can/must assign intermediate certificate to service on F5. This should be done automatically during the certificate import…
***This concept or troubleshooting can be applied when investigating connectivity issues between CAS and Mobile Devices where not only F5, but other HLBs are used and we have problems with SSL.