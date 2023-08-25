Major news outlets are rushing to block OpenAI from scraping information from their websites, content the company has previously used to better train its artificial intelligence software.

Reuters made one of the first moves against OpenAI, blocking its so-called "crawler" tool from its site on Aug. 8, according to Originality AI, an AI analysis website. Other organizations followed Reuters, including CNN, the Australian Broadcasting Corporation and the New York Times. In addition,

Business Insider and German outlet Die Welt have put up blocks, though Politico, which shares the same parent company, Axel Springer, hasn't done the same.

OpenAI and its viral AI chatbot ChatGPT ushered in an AI arms race fueled by data pulled from the web using a crawler, GPTBot, which works similar to a search engine.



The way it works is similar to a search engine: Websites use a piece of code called robot.txt to tell crawlers where they can go on their website and what information that can take from it. In the case of Google, that means a website can tell the search engine what pages to list on search results and which are private. The same logic can now be applied to GPTBot.



Google Search recently updated its privacy policy, giving itself the right to aggregate and use publicly available information to train its AI tools.



But some content creators complain AI models feed off their work without permission, citation or compensation, leading to a number of class action copyright lawsuits against OpenAI and Google and an FTC investigation.



In early August, the New York Times updated its terms of service to block companies from scraping its content to build AI models.



It’s unclear how restricting web crawlers might impact the accuracy of answers from AI chatbots like ChatGPT and Google’s Bard.