News organizations block AI crawler bots amid copyright concerns

IT World Canada Staff

2 years ago

20 per cent of the world’s top 1000 websites are now blocking AI crawler bots, which are used to collect web data for AI services, according to Originality.AI.

The move comes after OpenAI introduced its GPTBot crawler in early August. The GPTBot was designed to collect data from the web to improve future AI models. However, many websites, including the New York Times, Reuters, and CNN, blocked the GPTBot, concerned that it could be used to scrape their content without permission.

The blocking of AI crawler bots is a sign of the growing tension between websites and AI companies over the use of data. While AI companies argue that they need to collect data to train their models, websites are concerned about protecting their content and intellectual property.

According to Originality.AI, the percentage of the top 1000 websites blocking OpenAI’s ChatGPT bot surged from 9.1 per cent on August 22 to 12 per cent on August 29. The situation is further complicated by the lack of clear legal guidelines governing the use of AI crawler bots. As a result, websites are taking matters into their own hands by blocking these bots.

The sources for this piece include an article in Axios.