Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

September 30, 2025

A new appeal in Thomson Reuters v. ROSS Intelligence will test whether using copyrighted works to train Artificial Intelligence is fair use. The piece argues that the practice emerged in academia alongside Geoffrey Hinton’s scaling breakthroughs, not in Silicon Valley.

The article contends that the most contentious element of modern Artificial Intelligence is its foundation on training models with copyrighted works without permission, often at massive scale. Framed as the technology’s alleged “original sin,” the author argues that this practice began in academia rather than in Silicon Valley, and lays out a historical and technical context that, in their view, supports a fair use defense. That debate is now squarely before the courts, with the first federal appeal of a decision rejecting fair use in Artificial Intelligence training set in Thomson Reuters v. ROSS Intelligence.

ROSS Intelligence’s founders were students at the University of Toronto, a hub of neural network research led by Geoffrey Hinton, who later received the Nobel Prize. The piece traces today’s training norms to the deep learning breakthrough popularized by Hinton and collaborators, especially the 2012 AlexNet paper, which showed model performance improves as datasets and compute scale. That insight, often summarized as “scaling,” is presented as the technological rationale for exposing models to ever larger and more diverse corpora. Even Chief Justice John Roberts, in his 2023 year-end report, highlighted that Artificial Intelligence fuses algorithms with enormous datasets to solve problems.

As these techniques moved from universities to startups and large platforms, researchers widely used unlicensed materials. The article cites BookCorpus, an early books dataset compiled without author permission, as a source later leveraged in influential systems and papers, including BERT, RoBERTa, OpenAI’s GPT, and XLNet. The author notes there are no U.S. copyright lawsuits against university researchers, but warns that if training is deemed non-transformative, academic labs could face direct or secondary liability, for example under a willful blindness theory. The author disagrees with that view and points to two federal rulings that have called Artificial Intelligence training a highly transformative fair use in cases involving Anthropic and Meta.

In contrast, Judge Stephanos Bibas held that ROSS Intelligence’s training on Westlaw headnotes was neither transformative nor fair use, a decision now on appeal to the Third Circuit. The article urges the appellate court to consider the origins and purpose of large-scale training in the “shadow of Geoffrey Hinton,” treating the use of unlicensed works for model development as transformative when aimed at technological progress with broad public benefits. The forthcoming decision will shape how courts weigh the history, method, and societal value of training data in determining fair use.

Source

75

Impact Score

Latest News

Debate over synthetic data and information limits in large language models

March 10, 2026

Commenters debate whether synthetic data generated by large language models introduces genuinely new information or merely remixes existing content, and how that affects scaling and reasoning capabilities.

Apple plans MacBook Ultra with OLED touchscreen and dynamic island

March 10, 2026

Apple is preparing a new high-end MacBook, potentially called MacBook Ultra, that introduces an OLED touchscreen and a dynamic island while sitting above the latest M5-based MacBook Pro models. The device marks a major shift in Apple’s stance on touchscreens in laptops as it seeks to stay competitive in a changing market.

Fujitsu debuts ‘Monaka’ Armv9 CPU sample with 3.5D packaging

March 10, 2026

Fujitsu has unveiled early silicon and an engineering sample of its ‘Monaka’ Armv9 CPU, built on TSMC’s 2 nm node and Broadcom’s 3.5D XDSiP packaging, ahead of a planned 2027 launch. The 144 core design targets Artificial Intelligence inference, simulation, and large scale data processing workloads.

Intel launches XeSS 3.0 SDK with multi frame generation and memory optimizations

March 10, 2026

Intel has released the XeSS 3.0 software development kit as a closed binary for Windows with a focus on multi frame generation and more efficient GPU memory use. The update lets developers boost frame rates and integrate XeSS more cleanly into existing engines.

Enterprise artificial intelligence adoption surges as companies chase productivity and roi

March 10, 2026

Enterprises across regions and sectors are rapidly scaling artificial intelligence from pilots to production, reporting higher revenue, lower costs and significant productivity gains. Open source tools, agentic systems and growing budgets are shaping artificial intelligence strategies, even as organizations struggle to find experts and wrangle data.

Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

75

Impact Score

Latest News

Debate over synthetic data and information limits in large language models

Apple plans MacBook Ultra with OLED touchscreen and dynamic island

Fujitsu debuts ‘Monaka’ Armv9 CPU sample with 3.5D packaging

Intel launches XeSS 3.0 SDK with multi frame generation and memory optimizations

Enterprise artificial intelligence adoption surges as companies chase productivity and roi

Contact Us