a late night tale of dns
DNS has always been my adversary. Since I first started learning about Internet protocols, it's always gave me grief. Up until now, I thought we had a truce. I found a great book on DNS, wrote a few articles, and set up some nice production and testing servers. Life was good. Until tonight.
I was using my DDNS Script to add a simple subdomain to terraum.net. The output of the script was not what I expected -- meaning, it didn't work correctly. I had just used the script earlier on another server successfully, so I knew there wasn't a problem with the code. Debugging, I found the problem: though the subdomain didn't exist, the script was saying it did.
With an interactive Python session, I found the culprit:
$ python
>>> import dns.resolver
>>> a = dns.resolver.query('nonexistant.terrarum.net','a')
>>> for r in a:
... print r
...
67.15.129.30
>>>
$ host 67.15.129.30
30.129.15.67.in-addr.arpa domain name pointer ev1s-67-15-129-30.ev1servers.net.
If you copy and paste 67.15.129.30 into a web browser, it will take you to http://net.net. net.net: the slimy .net TLD wildcard domain.
Here's how it works: Sometimes when resolving a domain name, software will append a .net thinking the user just forgot to type it. If, by chance, the domain name already has a .net at the end, it now becomes .net.net. .net.net has a wildcard system set up that will resolve any domain ending in .net.net. If this is all being done in a web browser, you'll be taken to their webpage.
Now that I found the reason behind my subdomain problem, I needed to figure out why I was getting this reaction from one server and not the other. The difference between the servers was major: the problematic one is Debian Stable while the working one is Debian Testing.
I checked to see if both were using the same Python DNS library -- they were. I checked for any problems with DNS resolution such as root servers or other nameserver differences. No problems there. In a major move, I decided to upgrade to the Debian Testing version of BIND9. The problem was still there. Finally, I decided it was time to comb through the code.
To paraphrase this part (because talking about code debugging is so exciting), I found the problem on lines 274 and 478 of the resolver.py file:
274: self.domain = dns.name.Name(dns.name.from_text(socket.gethostname())[1:])
478: qnames_to_try.append(qname.concatenate(self.domain))
The first line will check the contents of the /etc/hostname file and store a portion of it in the variable self.domain. In my case, my /etc/hostname file was set to terrarum.net and the portion it stored was .net. Next, on line 478, it appends the self.domain variable (.net) to the domain name it's trying to resolve. So terrarum.net now becomes terrarum.net.net. This only happens with domain names ending in .net.
In the end, the problem turned out to be the contents of my /etc/hostname file. This file was something automatically generated by my hosting provider and I never took the time to double check it. Interestingly enough, Debian Stable and Debian Testing use two different versions of the hostname command. The newer one in Testing will not allow you to set the hostname to a name with a domain at the end.
So to fix my problem, I could do two things: simply set the value of /etc/hostname to just terrarum or, as I found out doing some other testing, add a . at the end of any .net domain I'm adding a subdomain for with my DDNS script.
Events like this are why I dislike DNS so much. I'm fairly confident that no one else runs into DNS issues like this. However, if you do, please share. In the meantime, all truces are off, DNS. It's back to war.
