Tech

Reddit Blocks Internet Archive Over AI Scraping Concerns

After learning that AI companies were using the Internet Archive’s Wayback Machine to collect Reddit data, Reddit restricted access to it. This restriction prevents the Wayback Machine from indexing post details, user profiles, and comments.

The result will be limited to the Reddit homepage, which will only offer a record of trending headlines on given days. Reddit spokesperson, Tim Rathschmidt, explained: “Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.”

The Internet Archive’s mission is to preserve digital history by archiving web pages and cultural artifacts. Its Wayback Machine allows anyone to revisit sites as they appeared in the past. On the other hand, Reddit believes that significant portions of its data shouldn’t be freely archived to prevent misuse.

According to Rathschmidt, Reddit will restrict access until the Internet Archive can better defend against large-scale scraping and comply with content policies, such as respecting deletions and user privacy.

This move is the latest in Reddit’s ongoing crackdown on unauthorized scraping by AI developers.

In 2023, Reddit struck a deal with Google, giving the search giant access to both search and AI training data. In the same year, Reddit limited its API due to misuse by AI model developers. Because of the change, many popular third-party apps (Reddit client apps) were forced to close, causing concerns among users.

By blocking the Internet Archive, Reddit is extending this policy of limiting “free” scraping while reserving access for paid partnerships.

The Internet Archive has long positioned itself as a guardian of the open web, making Reddit’s restrictions a significant blow to researchers, and users who rely on historical archives of online discussions.

Mark Graham, director of the Wayback Machine, acknowledged Reddit’s actions in a statement: “We have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter.”

The restriction raises questions for both Redditors and the wider internet community.

  • Archived Reddit conversations, often used in research, may no longer be accessible.
  • Content removed from Reddit will be harder to track historically.
  • Users may gain more assurance that deleted posts or profiles won’t live on in the Wayback Machine indefinitely.

Dieudonné
Reviewing gadgets and apps. Mail: quarmecaptainn@gmail.com

You may also like

Leave a reply

Your email address will not be published. Required fields are marked *

More in:Tech