Wednesday, July 4, 2012

Connection to Microsoft Exchange is unavailable on Lync 2010 Phone Edition

You know, I’ve learned something today. When something does not work as expected, there is a reason for that, even when you don’t see it… right away.

One of our customers had an issue with Lync Phone Edition calendar integration. The phones simply refused to connect to EWS, period. To make the issue more puzzling, a phone signing form Public Internet worked, but one on LAN did not. Customer uses BIG-IP F5 hardware load balancer as HLB appliance.

The troubleshooting process started with reviewing the phone logs (.CLG2 in particular). I know, it is very hard to read them logs, but TextPad did the job. I was looking the calls for Autodiscover web services. LPE uses Autodiscover to obtain both internal and external EWS URLs. In the log I found the following: “NAutoDiscover::DnsAutodiscoverTask::TryAutodiscoverUrls: Exception with this url. hr=0x80072f0d”. This error translates to “The security certificate on the server is invalid”. What??? VeriSign Certificate is not trusted??? Tens of thousands of users connect to their Exchange accounts from desktop and mobile devices in and out, and yet LPE does not like the certificate.

There are three things I need to mention here:

  1. The customer uses certificate signed from Public CA on the CAS array
  2. Because of this fact, the customer already added VeriSign Root CA to the Lync certificate store by following the procedure described in Kevin’s blog:
  3. Customer set “internal” and “external” VIPs on F5 for all Exchange services
First step was to examine the certificates deployed to CAS array. This is very important in order to establish the number of intermediate certificates. In the first example we have one intermediate – VeriSign Class 3 Secure Server CA.

Now, let’s look another certificate:

In this case we have two intermediates – the issuer (VeriSign Class 3 Secure Server CA – G3) which is sub for VeriSign Class 3 Secure Server CA.

I was puzzled as of why web browser did not fail when visiting, but the phone did, until realized - browsers are capable of following the chain all the way up to the Root CA i.e. F5 presents Identity Certificate – and the browser will follow the chain up to the Root CA.

There are many, many posts in Internet about mobile devices failing to connect to EWS. Additional research shows that mobile devices, due to the limited size of the Certificate Store and platform specifics, require ALL INTERMIDIATE certificates to be presented along with the identity certificate in the initial handshake. This way the device will follow that chain “internally”, until the Root CA in the local store validate the chain.

We already know the limitation of Aries platform in this regard, and so I thought – what if we have the same limitation and behavior as Mobile Devices have? This prompted close examination of the handshake between client and server. Below is an example of successful handshake:

What is important here - we have immediate flow of type “Application Data”, which means the handshake was completed successfully, a trust was established, and data exchange has begun.

Let’s now look closely what happens:

After initial Client and Server “Hello”, the server will offer the Certificate(s).

As we continue to drill, we see that two certificates were offered

…server identity and the intermediate issuer, which matches the chain as we saw already:

All Lync Phone Edition have to do now is to check if “VeriSign Class 3 Secure Server CA” can be trusted, and we do because the Root – “VeriSign Class 3 Public Primary CA” is in the Trusted CA Root Store of LPE.

Now let’s see what was happening in the customer environment… we would expect to see this same flow internally…

During the Certificate presentation, we note that the ONLY certificate offered from F5 is the “Identity Certificate” and no Intermediate was present:

Of course, LPE could not follow the chain and the handshake failed miserably. We asked for and received said Server Identity Certificate, but could not validate the Issuer - VeriSign Class 3 Secure Server CA.

We spoke with the F5 team, and they confirmed that the Internal VIP for CAS array had only Server Identity Certificate assigned, but not intermediate. After assigning the correct Intermediate Certificate to the internal services for Autodiscover and EWS, the issue was resolved.  I am still baffled from the fact one can/must assign intermediate certificate to service on F5. This should be done automatically during the certificate import…

***This concept or troubleshooting can be applied when investigating connectivity issues between CAS and Mobile Devices where not only F5, but other HLBs are used and we have problems with SSL.


Ranjan said...

VeriSign certificate chain is keep chaging and also its strongly recommended by VeriSign to install their full chain with every certificate install, but many miss that steps.

Jason Shave [MSFT] said...

So what's unclear to me is the need to configure the web service to doll out intermediate and root certs.

Do we need to follow Kevin's advice and import the root, and intermediate certs or just the intermediate or??

My customer environment uses a DigiCert for autodiscover but an internally generated cert for EWS. The same internal CA is used for Lync.

Drago said...

Kevin, where exactly the failure does occur?

Garry Williams said...

Hi Drago,

Thanks for the post. We are experiencing the 'Connection to Microsoft Exchange is unavailable' error message when attempting to access Exchange calendar or call logs from Polycom CX600 phones running the latest firmware. We use a Geotrust cert on our Exchange CAS servers and F5 Big IP load balancers to publish and I already added the Geotrust root CA thumbprint to our Lync EWS configuration as a trusted root CA but that didn't fix it for us. I want to try this fix that you describe above but I'm unsure as to exactly what I need to do. Our cert chain is server cert --> Geotrust intermediate --> Geotrust root. I'm not familiar with Big IP configuration: how can it be configured to present both the server and intermediate certs? Can I import two certs in the HTTPS publishing rule? If you can advise of the steps I'd really appreciate it. Many thanks.


Anonymous said...

very useful, thanks.

Hugh Kelley said...

After updating our CX600 phones to 4.0.7577.4387 (from 4.0.7577.4100) we're seeing similar symptoms, though I think the cause is slightly different.

I've verified with NetMon that the SSL negotiation to the autodiscover service is happening. Looking at the IIS logs for that service, it seems like the client (LPE) never sends Windows credentials to authenticate. I just see a lot of HTTP 403 return codes. Naturally, EWS doesn't respond to an unauthenticated request so the phone just says the connection to Exchange is unavailable.

Any thoughts?

Drago said...

I am not aware of any change in regards of the authentication methods. have you tried "Reset to factory default" on one unit and then sign-in yet?

Anonymous said...

I just left the 4.0.7577.4100 firmware in since the later versions will not work.
It's must be cert related.

Anonymous said...

This may help. October 2013 CU update with newer certs.