Artificial intelligence training and fair use in the shadow of Geoffrey Hinton

A new appeal in Thomson Reuters v. ROSS Intelligence will test whether using copyrighted works to train Artificial Intelligence is fair use. The piece argues that the practice emerged in academia alongside Geoffrey Hinton’s scaling breakthroughs, not in Silicon Valley.

The article contends that the most contentious element of modern Artificial Intelligence is its foundation on training models with copyrighted works without permission, often at massive scale. Framed as the technology’s alleged “original sin,” the author argues that this practice began in academia rather than in Silicon Valley, and lays out a historical and technical context that, in their view, supports a fair use defense. That debate is now squarely before the courts, with the first federal appeal of a decision rejecting fair use in Artificial Intelligence training set in Thomson Reuters v. ROSS Intelligence.

ROSS Intelligence’s founders were students at the University of Toronto, a hub of neural network research led by Geoffrey Hinton, who later received the Nobel Prize. The piece traces today’s training norms to the deep learning breakthrough popularized by Hinton and collaborators, especially the 2012 AlexNet paper, which showed model performance improves as datasets and compute scale. That insight, often summarized as “scaling,” is presented as the technological rationale for exposing models to ever larger and more diverse corpora. Even Chief Justice John Roberts, in his 2023 year-end report, highlighted that Artificial Intelligence fuses algorithms with enormous datasets to solve problems.

As these techniques moved from universities to startups and large platforms, researchers widely used unlicensed materials. The article cites BookCorpus, an early books dataset compiled without author permission, as a source later leveraged in influential systems and papers, including BERT, RoBERTa, OpenAI’s GPT, and XLNet. The author notes there are no U.S. copyright lawsuits against university researchers, but warns that if training is deemed non-transformative, academic labs could face direct or secondary liability, for example under a willful blindness theory. The author disagrees with that view and points to two federal rulings that have called Artificial Intelligence training a highly transformative fair use in cases involving Anthropic and Meta.

In contrast, Judge Stephanos Bibas held that ROSS Intelligence’s training on Westlaw headnotes was neither transformative nor fair use, a decision now on appeal to the Third Circuit. The article urges the appellate court to consider the origins and purpose of large-scale training in the “shadow of Geoffrey Hinton,” treating the use of unlicensed works for model development as transformative when aimed at technological progress with broad public benefits. The forthcoming decision will shape how courts weigh the history, method, and societal value of training data in determining fair use.

75

Impact Score

Apple plans MacBook Ultra with OLED touchscreen and dynamic island

Apple is preparing a new high-end MacBook, potentially called MacBook Ultra, that introduces an OLED touchscreen and a dynamic island while sitting above the latest M5-based MacBook Pro models. The device marks a major shift in Apple’s stance on touchscreens in laptops as it seeks to stay competitive in a changing market.

Fujitsu debuts ‘Monaka’ Armv9 CPU sample with 3.5D packaging

Fujitsu has unveiled early silicon and an engineering sample of its ‘Monaka’ Armv9 CPU, built on TSMC’s 2 nm node and Broadcom’s 3.5D XDSiP packaging, ahead of a planned 2027 launch. The 144 core design targets Artificial Intelligence inference, simulation, and large scale data processing workloads.

ABB and NVIDIA bring industrial-grade physical artificial intelligence to factory robotics

ABB Robotics is integrating NVIDIA Omniverse into its RobotStudio suite to deliver physically accurate simulation for industrial robots, aiming to close the long-standing sim-to-real gap and cut deployment time and cost. Early pilots with Foxconn and Workr highlight how synthetic data and unified workflows could accelerate automation in complex manufacturing environments.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.