Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

A new appeal in Thomson Reuters v. ROSS Intelligence will test whether using copyrighted works to train Artificial Intelligence is fair use. The piece argues that the practice emerged in academia alongside Geoffrey Hinton’s scaling breakthroughs, not in Silicon Valley.

The article contends that the most contentious element of modern Artificial Intelligence is its foundation on training models with copyrighted works without permission, often at massive scale. Framed as the technology’s alleged “original sin,” the author argues that this practice began in academia rather than in Silicon Valley, and lays out a historical and technical context that, in their view, supports a fair use defense. That debate is now squarely before the courts, with the first federal appeal of a decision rejecting fair use in Artificial Intelligence training set in Thomson Reuters v. ROSS Intelligence.

ROSS Intelligence’s founders were students at the University of Toronto, a hub of neural network research led by Geoffrey Hinton, who later received the Nobel Prize. The piece traces today’s training norms to the deep learning breakthrough popularized by Hinton and collaborators, especially the 2012 AlexNet paper, which showed model performance improves as datasets and compute scale. That insight, often summarized as “scaling,” is presented as the technological rationale for exposing models to ever larger and more diverse corpora. Even Chief Justice John Roberts, in his 2023 year-end report, highlighted that Artificial Intelligence fuses algorithms with enormous datasets to solve problems.

As these techniques moved from universities to startups and large platforms, researchers widely used unlicensed materials. The article cites BookCorpus, an early books dataset compiled without author permission, as a source later leveraged in influential systems and papers, including BERT, RoBERTa, OpenAI’s GPT, and XLNet. The author notes there are no U.S. copyright lawsuits against university researchers, but warns that if training is deemed non-transformative, academic labs could face direct or secondary liability, for example under a willful blindness theory. The author disagrees with that view and points to two federal rulings that have called Artificial Intelligence training a highly transformative fair use in cases involving Anthropic and Meta.

In contrast, Judge Stephanos Bibas held that ROSS Intelligence’s training on Westlaw headnotes was neither transformative nor fair use, a decision now on appeal to the Third Circuit. The article urges the appellate court to consider the origins and purpose of large-scale training in the “shadow of Geoffrey Hinton,” treating the use of unlicensed works for model development as transformative when aimed at technological progress with broad public benefits. The forthcoming decision will shape how courts weigh the history, method, and societal value of training data in determining fair use.

75

Impact Score

OpenAI launches workspace agents in ChatGPT

OpenAI has introduced workspace agents in ChatGPT, giving teams shared Codex-powered agents that can handle multi-step work across business tools and Slack. The feature is aimed at recurring organizational workflows with admin controls, approvals, and enterprise monitoring.

Generative Artificial Intelligence in B2B sales and content creation

Generative Artificial Intelligence is presented as a way to reduce inefficiencies in customer-facing sales work and the production of sales materials. The research combines literature review, survey data, and a pilot experiment to identify where gains are most practical in B2B sales environments.

ChatGPT Images adds thinking capability

OpenAI has upgraded ChatGPT Images with a new thinking mode that can search the internet, generate multiple images, and verify outputs before finalizing results. The update also improves text rendering, dense compositions, multilingual support, and style flexibility.

OpenAI launches workspace agents in ChatGPT

OpenAI has introduced workspace agents in ChatGPT, giving teams shared Codex-powered agents that can handle multi-step work across business tools and Slack. The feature is aimed at recurring organizational workflows with admin controls, approvals, and enterprise monitoring.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.