Fighting Template Sites

Fighting Template Sites

Recently, I translated an article about a new product from DeepSee that helps advertisers evaluate publisher quality. Now, the company has shared details about one of its initiatives — identifying template sites.

DeepSee.io is a company specializing in detecting and preventing ad fraud. They provide a platform for analyzing and assessing publisher quality, helping advertisers avoid placing ads on low-quality or fraudulent sites.

Template sites typically lack unique design and content, which is characteristic of many MFA sites. They often go unnoticed by advertisers, getting lost among numerous other sites. As creating such sites becomes easier, moderation on any platform becomes more challenging.

Measuring Design Uniqueness

Why do template sites look alike? This question can be answered by analyzing data on the prevalence of specific website design options. Template sites are usually created using a few simple tools like WordPress (the clear leader) and Squarespace. DeepSee's approach to determining design uniqueness is simple but effective: they track the themes and plugins used.

For example, let's take the site daysinncollinsville[.]com.

Among other things, this site uses an old SEO hacker trick. An expired domain with an existing history is purchased, and a template site with the desired content is loaded onto it.

This site is built on WordPress and uses only one additional plugin: gp-premium.

DeepSee found about 2000 sites that also use only this plugin. This makes the site's lack of uniqueness very noticeable, as most high-quality sites have a unique combination of plugins.

The generatepress theme chosen by the site creators is also very common — it is used on over 40,000 other sites.

Trust in the Author

How often have you visited a low-quality site and seen the author listed as admin?

This is a clear signal of low content quality. It is also common to use the site's URL as the author. DeepSee looks for something resembling a real human name, which lazy template creators obviously cannot fake.

Detecting Low-Quality Content

Many are interested in the possibility of detecting low-quality content using artificial intelligence, but current AI implementations cannot reliably perform this task. This does not mean AI is not used as one of the tools to identify such sites! DeepSee is inspired by the report Generative Models are Unsupervised Predictors of Page Quality: Colossal-Scale Study released in 2020, which showed the possibility of detecting low-quality content using AI focused on finding machine-generated text.

The report "Generative Models are Unsupervised Predictors of Page Quality: Colossal-Scale Study" explores the application of large generative language models, such as GPT-2, for assessing the quality of web pages. The study demonstrates that when trained in self-assessment mode, these models can serve as universal predictors of text quality, quickly identifying quality indicators with limited resources.

The authors conducted a qualitative and quantitative analysis of over 500 million web articles, making this the largest study in this area. The main conclusion: models can effectively distinguish between high-quality and low-quality content, improving content filtering and evaluation algorithms on the internet.

In their report, they identified four main types of content where AI is most prevalent:

  • Text translations
  • Essay farms
  • SEO sites
  • NSFW content

NSFW (Not Safe For Work) is a term used to denote content that should not be viewed in public places or at the workplace. Such content may include explicit materials, violent images, coarse language, or other elements inappropriate for public viewing.

DeepSee actively uses this method to identify AI-generated content, which also helps in the overall assessment of publisher quality.

Impact on the Market

Identifying and analyzing template sites, as DeepSee does, plays an important role in ensuring the quality of advertising campaigns. Careful evaluation of design uniqueness, author trust, and content quality helps advertisers choose platforms for their advertising campaigns more wisely. Thanks to such services, the market becomes more transparent and efficient, allowing focus on high-quality platforms and avoiding fraud.

Other materials on this topic