BOMBSHELL: Major Tech Co's Trained AI with STOLEN YouTube Videos

artesia · 18 July 2024 14:45

Major tech companies like Apple, Nvidia, and Anthropics have been training their AI models using stolen YouTube videos without creators’ permission, sourced from the Pile dataset. This controversy has sparked legal battles, with creators like Mr. Beast and MKBHD raising concerns over the ethical and legal complexities surrounding the use of copyrighted material in AI training sets.

artesia · 18 July 2024 15:05

In the video, it is revealed that major tech companies like Apple, Nvidia, and Anthropics have been using stolen YouTube videos to train their AI models. They have been obtaining data from a data set called the Pile, which contains subtitles from thousands of YouTube videos without the creators’ permission. This data set is open-source, but it was not supposed to include unauthorized data. Companies like Google, which owns YouTube, are particularly concerned as they see their content as a valuable data source for training AI models. The investigation found that Silicon Valley companies, including Apple, Nvidia, and Salesforce, had used subtitles from 173,000 YouTube videos from over 48,000 channels to train their AI.

Popular YouTubers such as Mr. Beast, MKBHD, PewDiePie, and more have had their content illegally used to train AI models. This has raised concerns among content creators who want control over how their content is used. For example, MKBHD pays for accurate transcriptions of his videos, only to have them stolen and used without permission. The issue extends beyond just YouTube videos, as other content creators, such as authors and musicians, have also faced similar challenges with their work being used without consent.

The use of stolen data in AI training sets has sparked legal battles, with some creators suing AI companies for copyright infringement. However, companies like Meta, OpenAI, and Bloomberg argue that their actions constitute fair use. The debate over fair use in AI training sets is likely to escalate and reach the highest levels of the US courts. This issue highlights the ethical and legal complexities surrounding the use of data, especially when it involves copyrighted material.

The video also touches upon the implications of AI companies using stolen data in their models. It raises questions about the fairness of extracting value from creators’ hard work without their permission. Additionally, the video mentions instances of AI companies like Mid Journey being accused of stealing data from movies to train their models. The lack of transparency from some AI companies, like OpenAI, about the sources of their training data further adds to the controversy.

Overall, the video delves into the ethical, legal, and practical implications of major tech companies using stolen YouTube videos to train their AI models. It sheds light on the ongoing debate surrounding fair use in AI training sets and the challenges faced by content creators whose work is being used without consent. The issue raises concerns about data privacy, intellectual property rights, and the need for greater accountability and transparency in the AI industry.