In a lesson for Registry Operators everywhere, it appears the .me ccTLD suffered a significant incident in late March that could be described as an embarrassingly serious DNSSEC failure that could have escalated to catastrophic failure with the potential to disrupt all domain names in the zone file.
So what happened? Domain names in the .ME zone are signed with Domain Name System Security Extensions (DNSSEC). In DNSSEC signed TLDs, when an internet user attempts to contact one of the TLDâs domain names, such as through visiting a website, a request is sent to the zone-file to check if the domain name is authentic. DNSSEC works as a mechanism by which users can validate the data they receive from Authoritative DNS servers has not been altered between the server and themselves, known as a Man in the middle (MITM) attack.
In the case of the .ME failure, the requests could not be verified as the zone file signatures â which have a finite life â that were to be checked had expired with expiry dates listed as 20 March or earlier. This should never happen, but it did because the software responsible for communicating between the zone signing keys (ZSK) and the key signing keys (KSK) failed. It appears the ZSK and KSK which are meant to communicate with each other to regenerate signatures did not, and the software responsible for regenerating these signatures should have tried to have done so several days before. In short, the fact that the signatures didnât regenerate in time indicates a significant technical failure at the Registry.
The result was many .ME domain names would have been inaccessible for several days. Websites that werenât cached would have been unreachable, and email is likely to have been disrupted. For popular websites, content would have been cached but not able to be updated. In the world of Registry operations, this is an unacceptable impact to end users.
We don’t know why signatures weren’t refreshed, but we do know that it applied to both the KSK and ZSK, which suggests a system failure, not a one-off procedural failure. Itâs also is likely that this failure would have occurred many days prior to its discovery and remediation. (If you are technically inclined you can review the incident in greater detail at DNSViz.net.
The outage might have been a key-roll gone bad. The ZSK listed in the screen capture taken during the failure is different to the ZSK used in nic.me today. So the key has certainly been rolled since the incident. The Hardware Security Module (HSM) used for the .ME ccTLD is incredibly quirky, so maybe they lost access to the private key? Maybe re-signing required manual intervention? Without access to the incident report, and in the absence of any official communications, it is hard to understand why it took the Registry so long to detect the issue and identify and remedy the root cause.
When there are significant security incidents, itâs becoming an imperative to report such incidents. Governments are often mandating such transparency, and frankly, end-users deserve accountability. So, the question is should the domain name industry be open and report such incidents as we all strive for continual improvement and to support the advancement of internet services? The domain name industry is unique in that it operates a multi-stakeholder model for governance, sharing such information would assist other Registry Operators and backend providers to improve and learn from the mistakes of others.
With the past two industry security incidents, the Afilias Registrar Credentials incident of December 2017 that was grudgingly reported after complaints from Registrars, and the .io ccTLD Registry breach in July of 2017, a hack that was luckily self-declared by Matthew Bryant, sadly, it seems increasingly that some operators in the industry prefer to keep things quiet and hope nobody notices.
Comment was sought from doMEn Ltd but wasn’t available at the time of publication. This article will be updated when comment is available.