Generative AI is not just reshaping the internet landscape, some fear it might be killing the internet as we know it. In 2024, some of those fears are materializing.
Researchers have warned models could poison themselves by training on other AIs’ output. Some evidence of this decay is already emerging.
Elon Musk’s bot Grok has been reported to give responses that were obviously an OpenAI response, exposing how Grok had absorbed responses from ChatGPT in its training.
Grok is not the only place you can find OpenAI responses. Amazon has been flooded with product listings oddly named “OpenAI policy errors.”
This has some researchers predicting what they call “model collapse.” As AIs scrape more web data, including each other’s made up text, they lose touch with reality. Output quality declines and that output is further absorbed in training other AI models. With models continuing to consume and generate huge amounts of data, there may be no reversing the damage.
While the output is inaccurate, it is impossible for AIs to distinguish human versus AI content. As long as this is true, models will keep soaking up more machine falsehoods.
In the end, last year’s AI promise may this year become an unreliability nightmare. The tech could generate and spread AI generated spam and lies faster than humans can detect or correct the problem
There are two factors that might mitigate the impact of inaccurate data. Sam Altman has hinted at developments that might allow AI models to be trained on synthetic data which could, presumably have greater quality control. Second, there is the idea of Retrieval Augmented Generation (RAG) which leverages the conversation ability and understanding of an AI model, but also allows it to be restricted to reliable information sources.
In attempting to make deals with news and other reliable information sources, Altman could be signalling one way of preserving the accuracy of AI, even if, at the same time, the internet could be synonymous to misinformation.
Sources include: Analytics India