AI Licensing Comparison: RSL vs. Pay-Per-Crawl

Yesterday, a group of internet companies and publishers announced the launch of a new standard for content licensing: Really Simple Licensing (RSL).
The conglomerate includes big names such as Reddit, Yahoo, Medium, Quora, Ziff Davis, The MIT Press and more. The effort is directly targeted at AI companies and aims to compel such companies to pay in order to license content for AI training.
If this sounds familiar, it’s because, in July, Cloudflare announced its own scheme to monetize AI crawlers entitled Pay-Per-Crawl.
However, while the systems are similar in that they both aim to compel AI companies to pay for crawling websites, they differ significantly in terms of how they implement their efforts.
To that end, both systems have their advantages and drawbacks. To understand the benefits of each system, we first need to examine RSL and understand how it works.
It’s a system that is both surprisingly simple in concept and complex in execution. Even though the word “simple” is in the name, and it is simple for publishers, implementing this will be a complicated challenge for the RSL Collective.
The Basics of RSL
To fully understand RSL, it is essential to understand who is behind it. One of the three leaders is Eckart Walther, who co-created the Really Simple Syndication (RSS) standard. RSS is the standard used by blogs, news sites, forums, and other online services to provide updates to third parties, including RSS readers.
For example, you can subscribe to this site via RSS; however, if you prefer to sign up for the email newsletter, that also uses RSS. It’s how my site informs Mailerlite, my newsletter provider, of the new articles.
The goal of RSL is to create something similar, but instead of syndicating content, it offers a way to license it.
For the publisher, implementing RSL is straightforward. You join the RSL Collective, which is a free account, and then add the relevant license text to your robots.txt. You can choose to license your content for a royalty fee, require attribution or create your own custom license.
The process even allows companies to license nonpublic and proprietary content, including paywalled material.
The RSL Collective, through a series of APIs, will handle the billing and reporting of any collected license fees. In short, for the publisher, this is meant to be a “set and forget” system.
In that regard, RSL is very similar to RSS. It’s essentially something that, once implemented, is entirely transparent to the publisher. However, RSS is much easier on those using it, something that is not true of RSL. That, in turn, may be its most significant limitation.
Significant Challenges, Bigger Problems
The biggest challenge that RSL will face as a standard is getting AI companies on board with it.
Simply put, there is no mechanism for enforcement. Since RSL works through robots.txt, AI companies can simply ignore it. As we have seen before with ChatGPT and this site, this is something they do on a regular basis.
This marks the biggest difference between RSL and Cloudflare’s Pay-Per-Crawl system. Pay-Per-Crawl does not rely on robots.txt and will actively block bots that do not pay the toll. RSL, on the other hand, requires AI companies to comply with what is fundamentally an optional standard.
Both systems share a common weakness in that bots can crawl the same material on other websites. If your content is on a third-party site, even without your permission, they can still access it, regardless of whether you use Pay-Per-Crawl or RSL.
That said, the technical measures of the Pay-Per-Crawl system may provide greater legal protection, as bypassing it could amount to circumventing a copyright protection system.
But this doesn’t mean that Pay-Per-Crawl is the better system. First, it requires that you use Cloudflare’s content delivery network. RSL is significantly more open and can work on any platform that supports the robots.txt standard.
Second, RSL has backing from major websites and publishers, most notably Reddit. Some of these companies are already in lawsuits with AI companies or have reached licensing agreements with them. These are publishers who are already at the opposite end of the table from AI companies and are in a good position to push the standard.
Personally, I think Pay-Per-Crawl is much better from a technical standpoint, but RSL is much better from a practical and diplomatic standpoint. Pay-Per-Crawl is a powerful tool created by one company (albeit a powerful one), and RSL is a broader diplomatic effort bringing together a variety of stakeholders.
Bottom Line
One of the things I find most interesting about RSL is its connection to RSS. Back in the 2000s and early 2010s, RSS scraping was one of the most common methods of stealing content. Though the approach fell out of favor after Google updates in 2011, it wouldn’t be until 2021 that we got legal clarity on the issue in the United States.
RSS, perhaps unfairly, became synonymous with content theft and scraping for many webmasters.
Now, all these years later, Walther is introducing a new standard that he hopes will help protect content and ensure payment to creators from AI companies.
But, while I think the idea is a good one, without buy-in from AI companies, it will not succeed. The pitch to publishers is straightforward. The pitch to AI companies is less so.
While the RSL Collective claims that this can provide AI companies with greater legal certainty, the current legal climate raises doubts about this. The early fair use rulings appear to suggest that piracy, rather than crawling and training, is the primary legal concern.
This makes a technological solution like Pay-Per-Crawl much more appealing, as it works regardless of what the courts find.
Ultimately, I believe a combination of these approaches may be necessary. Rules without enforcement have little meaning. But enforcement without open standards and broad implementation will be limited.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.
