Artificial intelligence is coming for YouTube creators

More than 15.8 million YouTube videos from over 2 million channels appear in at least 13 public data sets used to train generative Artificial Intelligence video tools, often without creators’ permission. creators and legal advocates are contesting whether such mass downloading and training is lawful or ethical.

an investigation found that tech companies, universities, and research groups have collected at least 15.8 million YouTube videos from more than 2 million channels and placed them in at least 13 public data sets. nearly 1 million of those videos are how-to clips. many entries are anonymized, but researchers identified videos by extracting unique YouTube identifiers from the data sets. among the most represented sources are news and educational channels, with the BBC appearing at least 33,000 times and TED nearly 50,000 times. the downloads are distinct from YouTube’s subscriber download features: videos are being ripped en masse, a practice that violates YouTube’s terms of service, and the platform did not respond to requests for comment.

the collected footage is being prepared for training generative Artificial Intelligence models by splitting videos into short clips and pairing them with English-language captions. creators of data sets used view counts, automated models, or human curation to prioritize content described as cinematic or high quality, and curators often avoid videos with overlaid text or logos. captions are produced either by paid workers or by other models. companies and research teams that have used or published such data sets include Microsoft, Meta, Amazon, Nvidia, Runway, ByteDance, Snap, and Tencent. Meta, Amazon, and Nvidia responded to inquiries saying they respect creators and view their work as legally usable under current copyright law, while several other companies did not comment.

the presence of these videos in training corpora has immediate industry and legal implications. generative Artificial Intelligence videos are already competing with human-made content on YouTube, and the article links that shift to earlier disruptions caused by text-generation tools in online publishing. creators and rights holders have mounted lawsuits and public complaints, including major studio suits against image generators and a recent incident in which a deepfaked TED talk was repurposed in an ad that lost an award and prompted litigation. developers and platforms are simultaneously building commercial video-generation tools, offering consumer editing and face-swap products, and in some cases paying users to post synthetic content. the uncertainty over whether training on downloaded videos is lawful could reshape creators’ incentives to publish on YouTube and similar platforms.

75

Impact Score

Artificial Intelligence tool targets forged radiology reports

University at Buffalo researchers developed a detection system aimed at identifying radiology reports generated by Artificial Intelligence rather than clinicians. The work targets a growing risk of fraud in health care, insurance, and other record-driven industries.

NSF funds teacher training to expand Artificial Intelligence education nationwide

The U.S. National Science Foundation is awarding 11 million to the Computer Science Teachers Association to train K-12 educators in computer science and Artificial Intelligence instruction. The multistate initiative is designed to scale classroom-ready teaching capacity and broaden high-quality learning opportunities for students across the country.

NVIDIA DLSS 5 uses 2D frames and motion vectors

NVIDIA has outlined DLSS 5 as a system that takes 2D frames and motion vectors as input, then uses a generative Artificial Intelligence model to produce its final output. The approach focuses on 2D imagery rather than full 3D scene generation to improve computational efficiency.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.