Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

September 30, 2025

A new appeal in Thomson Reuters v. ROSS Intelligence will test whether using copyrighted works to train Artificial Intelligence is fair use. The piece argues that the practice emerged in academia alongside Geoffrey Hinton’s scaling breakthroughs, not in Silicon Valley.

The article contends that the most contentious element of modern Artificial Intelligence is its foundation on training models with copyrighted works without permission, often at massive scale. Framed as the technology’s alleged “original sin,” the author argues that this practice began in academia rather than in Silicon Valley, and lays out a historical and technical context that, in their view, supports a fair use defense. That debate is now squarely before the courts, with the first federal appeal of a decision rejecting fair use in Artificial Intelligence training set in Thomson Reuters v. ROSS Intelligence.

ROSS Intelligence’s founders were students at the University of Toronto, a hub of neural network research led by Geoffrey Hinton, who later received the Nobel Prize. The piece traces today’s training norms to the deep learning breakthrough popularized by Hinton and collaborators, especially the 2012 AlexNet paper, which showed model performance improves as datasets and compute scale. That insight, often summarized as “scaling,” is presented as the technological rationale for exposing models to ever larger and more diverse corpora. Even Chief Justice John Roberts, in his 2023 year-end report, highlighted that Artificial Intelligence fuses algorithms with enormous datasets to solve problems.

As these techniques moved from universities to startups and large platforms, researchers widely used unlicensed materials. The article cites BookCorpus, an early books dataset compiled without author permission, as a source later leveraged in influential systems and papers, including BERT, RoBERTa, OpenAI’s GPT, and XLNet. The author notes there are no U.S. copyright lawsuits against university researchers, but warns that if training is deemed non-transformative, academic labs could face direct or secondary liability, for example under a willful blindness theory. The author disagrees with that view and points to two federal rulings that have called Artificial Intelligence training a highly transformative fair use in cases involving Anthropic and Meta.

In contrast, Judge Stephanos Bibas held that ROSS Intelligence’s training on Westlaw headnotes was neither transformative nor fair use, a decision now on appeal to the Third Circuit. The article urges the appellate court to consider the origins and purpose of large-scale training in the “shadow of Geoffrey Hinton,” treating the use of unlicensed works for model development as transformative when aimed at technological progress with broad public benefits. The forthcoming decision will shape how courts weigh the history, method, and societal value of training data in determining fair use.

Source

75

Impact Score

Latest News

How artificial intelligence overviews are reshaping search traffic for publishers

September 30, 2025

Independent studies show that Google’s artificial intelligence overviews are reducing clicks to publishers while changing user behavior. As zero-click results rise, publishers are rethinking content strategy, branding, and distribution.

Santander uses Artificial Intelligence tool to help bust human trafficking gangs

September 30, 2025

Santander UK says an Artificial Intelligence system from fintech firm ThetaRay has produced hundreds of alerts for the National Crime Agency, contributing to the takedown of trafficking rings. The bank plans broader deployment after a year of results.

Generative Artificial Intelligence in travel

September 30, 2025

PhocusWire’s hub compiles in-depth reporting on generative Artificial Intelligence across the travel industry, from customer service and marketing to product development. The page highlights new tools, research and leaders shaping automation, personalization and decision-making.

Nvidia’s investment in OpenAI rekindles concern over circular financing in the Artificial Intelligence boom

September 30, 2025

Nvidia’s latest funding of OpenAI intensifies scrutiny of its practice of backing customers who then buy or lease its GPUs. Analysts warn such circular deals can inflate perceived demand and echo patterns from past tech bubbles.

Mira Murati’s Thinking Machines solves LLM nondeterminism

September 29, 2025

Mira Murati’s startup Thinking Machines says it has eliminated nondeterminism in large language model outputs by introducing batch-invariant kernels. The approach yields identical responses at temperature zero, promising stronger reproducibility for research, audits, and safety-critical Artificial Intelligence.

Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

75

Impact Score

Latest News

How artificial intelligence overviews are reshaping search traffic for publishers

Santander uses Artificial Intelligence tool to help bust human trafficking gangs

Generative Artificial Intelligence in travel

Nvidia’s investment in OpenAI rekindles concern over circular financing in the Artificial Intelligence boom

Mira Murati’s Thinking Machines solves LLM nondeterminism

Contact Us