×

CENTR crawler: accuracy in classifying domains

News 30-03-2022

What is the CENTR ‘signs of life’ domain crawler?

Domain crawling is the process of scanning the web content and technical characteristics of a domain name.  Running these scans regularly over millions of domains and multiple TLDs allows us to get a picture of how the domains managed by CENTR members (as well as others) are used.  The CENTR crawler aims to provide a simple technical analysis of domain usage based on a decision tree approach to classification.  The results of the crawler provide standardised data for any TLD which can be benchmarked and used in aggregated market analysis included long term trends.

Why should we care how domains are being used?

Domain names are the core business of CENTR members.  Like any product, it is prudent to at least have a basic understanding of how that product is being used in the market.  Domain usage is linked to trust in the TLD, financial stability as well as future prospects of each registry.  The data can also play a role in the strategic decisions of a registry.  For these reasons and others, we consider domain usage to be an important metric in assessing general registry health. 

Improving accuracy in the CENTR crawler

An important feature of the CENTR crawler is the ability to detect and classify web pages by the content found on their initial landing pages.  To help with this effort, CENTR has built a tool which allows users to review how domains have been classified and to either validate or suggest a re-classification (see image 1).  To use the tool, go to the Labtools page (members only) and click ‘start the validation’. Non-members can also validate domains using a public version.

Once you have started the validator, you will be presented with a batch of 10 webpage screenshots the crawler has recently scanned.  For each domain, you can validate or suggest a re-classification or skip if you are unsure.  The batch of domains you will see will favour your own ccTLD (if we have scanned it) to ensure as much local knowledge is incorporated into the machine learning of the crawler.  You can however select any other TLDs (e.g. gTLDs) or a random selection of TLDs based on domains we need more help with.

All data from user validations is  fed back into the machine learning models of the crawler and help it continually improve its accuracy. 

Start the validation here: https://stats.centr.org/domains#labtools

(Please ensure you are already logged into the CENTR website before clicking on the link above.)

 

Image 1 | Example of a single domain validation

labtools

 

More about the CENTR crawler

The CENTR ‘signs-of-life’ domain crawler is running monthly across 20 member ccTLDs as well as the top 100 gTLDs.  The data gives a useful view on the technical classifications and characteristics of domains as well as the ability to track trends over time and compare TLDs.   View reporting and analysis here https://stats.centr.org/domains.

Published By Patrick Myles
Patrick Myles is the Data Analyst of CENTR, the European country code top-level domain association. Patrick is responsible for data management, member surveys and the development and technical maintenance of the CENTRstats data platform.