Two days of debugging hell have brought home the point once again of the importance of knowing the fundamentals. We've recently been working on a project that involved a number of different components: a SQL database, SharePoint, and an ASP.NET application. There are several points of contact between these components, and the particular one that was causing grief in this case was a web service call from the ASP.NET application into SharePoint's native web service API.
The system had been working on our development boxes, but when we uploaded it to our hosting provider we got errors on the web service calls into SharePoint. A little debugging revealed that we were getting back HTTP error code 401 on the web service call: not authorized. What was interesting was that a single web service call was actually turning into 2 or 3 HTTP requests and the first request was always getting a 401 response. Sometimes other requests in the sequence would fail as well. I managed to convince myself that somehow SharePoint was rejecting the web service requests. So, we spent 2 days altering permissions in SharePoint, changing the directory security in IIS, and fiddling with the client web services code all in the hope of getting SharePoint to accept the requests.
When I help people debug, one of the most common pieces of advice I have is to understand what is happening at a fundamental level. Its not enough to change the code and make the problem go away. You've got to understand why. Sadly, I failed to take my own advice in this case.
Some of you will have probably already realized that the 401 response noted above is just the usual way that web servers and browsers interact to carry out authentication. I had this tucked away in the back of my head, but was so fixated on SharePoint being the issue that I didn't pay attention. Those of you familiar with SharePoint will also know that it doesn't do authentication on its own, it relies on IIS for this and so the notion that SharePoint was rejecting these requests was just wrong-headed.
During the debugging process, I had a network sniffer attached and noticed the WWW-Authenticate header requesting NTLM. Right then and there I should have gone and looked up what an NTML authentication sequence looked like. If I'd gone back to the fundamentals, I would have quickly seen that the authentication sequence was messed up. I'm still not completely certain of what is happening, but it turns out that the problem was caused by having a VPN connection to the server. For some reason, when the VPN is connected, the web service client fails to complete the NTML authentication process. Had I done this initially, I would have know that SharePoint was not the problem and could have at least made a guess that the VPN was an issue.
Ah well, another 2 days, another lession (re-)learned.