Sunday, February 22, 2009

DNS vs Firewall

Today’s odd one came about from my own mail server. I was helping my friend Doug with a Vista networking problem (Note: It was not Vista’s fault), when I found I could not send Doug an email. It always seems when you are trouble shooting one problem, you end up finding and having to fix others first.

After a quick look at my Exchange server, I found the email was stuck in the outbound queue; even odder was there were no delivery attempts in the SMTP log. At this point, you start walking through how mail is sent. I started NSLOOKUP and found that I could not resolve the MX record to Doug’s email domain. For those search engines out there, Doug’s email is hosted with Google Mail, GMAIL, or googlemail. I thought, “That’s odd,” so out came the packet sniffer on my internal DNS server.

My internal DNS Server was making the requests, but was never getting a reply back. I used NSLOOKUP to make the same request to a DNS server I had at a collocation and the MX record came through fine. Hmmm. OK so next was to attempt the NSLOOKUP outside my PIX firewall. Yep it worked, so something about my firewall was blocking the DNS reply and oddly, only certain DNS replies.

I checked the Cisco PIX 501 configuration and everything was fine. DNS Fixup was enabled and the allowed DNS packet size was 2000. I had already learned the hard way that Internet DNS server were no longer limiting themselves to 512 byte requests and had increased the size.

I found some old hubs and setup packet sniffers on the inside and outside of the firewall, cleared the DNS cache, and attempted the MX lookup again. DNS Request went out, the reply came back, and the PIX was dumping the reply, never allowing the reply to make it to the inside network.

Turns out the outside sniffer saw the problem. The reply was malformed. The packet was being truncated and the last part of the DNS answer was being removed. To put it simply, the header of the DNS reply would say that 5 answers were coming, but only 4 would be listed in the packet. Since DNS Fixup was enabled on the PIX, the packet was dropped.

Part of the reason for the truncation was Google Mail is return 11 mail hosts and the DNS was hosted on 5 DNS servers. This combination was big enough to overflow a 512 byte DNS reply. Sometimes too much redundancy can be a bad thing.

What also made this interesting was how the DNS Clients were completing the DNS Request when the first reply was returned malformed. Both the Windows DNS server and Windows XP NSLOOKUP clients would open a TCP connection to the DNS after the malformed UDP packet was received. Since the PIX blocked the malformed UDP reply, the server would never try TCP.

One workaround was to disable DNS Fixup on the PIX allowing the malformed packet to traverse the PIX. After the malformed DNS reply traversed the PIX, the Windows 2003 DNS server would then open a TCP connection and retry the request and would receive a valid DNS reply. Mailed flowed again and there was some rejoicing.

Looks like it may be coming to the time where I have to upgrade my old PIX 501 running 6.3 code with something a little newer. Also if the kind people at DynDNS.com or Enom would locate the problem with their DNS servers (that’s where Doug’s DNS was being hosted), it would be appreciated by all of us on the Internet. Again for those search engines out there, the DNS servers were:

dns1.name-services.com internet address = 98.124.192.1
dns2.name-services.com internet address = 216.52.184.248
dns3.name-services.com internet address = 98.124.193.1
dns4.name-services.com internet address = 69.64.145.225
dns5.name-services.com internet address = 70.42.37.7

Hope this helped
BK