From fair use to foul play? Whistleblower’s last stand against OpenAI’s copyright tactics

Whistleblower allegations and mounting legal challenges suggest OpenAI may be straying from its original mission.

BySejal Sharma

AI LogsDec 17, 2024 11:17 AM EST

BySejal Sharma

Dec 17, 2024 11:17 AM EST

OpenAI CEO Sam Altman.OpenAI CEO Sam Altman

0:00 / 0:00

Sejal Sharma is IE’s AI columnist, offering deep dives into the world of artificial intelligence and its transformative impact across industries. Her bi-monthly AI Logs column explores the latest trends, breakthroughs, and ethical dilemmas in AI, delivering expert analysis and fresh insights. To stay informed, subscribe to our AI Logs newsletter for exclusive content.

W. Mark Felt (Deep Throat), Edward Snowden, and Chelsea Manning. To some, they’re traitors who turned against their countries; to others, they’re heroes who risked everything to expose the truth. They all paid a heavy price for their actions, not in dollars, but in exile and infamy.

Now, an alleged whistleblower from OpenAI, 26-year-old Suchir Balaji, has reportedly paid an even steeper price—his life. According to a police ruling, Balaji died by suicide in his San Francisco apartment on November 26, 2024. While details remain unverified, the allegations and timing have fueled speculation about the pressures he faced.

Balaji was a researcher at OpenAI. He quit the job in August 2024, and shortly after, in October 2024, he gave an interview to The New York Times claiming that his former employer unlawfully collected vast amounts of data from the internet to train its generative AI models—a practice he argued amounted to copyright infringement.

OpenAI has long faced accusations of training its models on copyrighted material without consent. The AI company faces a rather big stack of legal challenges—13 copyright lawsuits in the US, two in Canada, one in Germany, and one in India. Authors, comedians, news publications, etc, have accused OpenAI of using their content to train AI models without permission or compensation.

OpenAI has said that publicly available data is “protected by fair use” and that they view this principle as “fair to creators, necessary for innovators, and critical for US competitiveness.” Fair use lets people use copyrighted material without permission for things like criticism, news, or education, as long as it doesn’t harm the original work or its market.

So far, no court has issued an injunction to halt OpenAI’s operations, nor has any definitive ruling declared its data collection methods copyright infringement.

It’s a curious situation where the scales of justice are tipping in the direction of the machine.

Balaji’s expose of OpenAI

Balaji spent four years working on OpenAI’s AI models, including ChatGPT. At first, he only thought a little about the legal side of using massive amounts of internet data to train these systems. But after ChatGPT launched in 2022, he began to worry, reported NYT.

On the day NYT published the interview with him, Balaji published an essay titled ‘When does generative AI qualify for fair use?,’ in which he challenged the company’s stance by dissecting the four factors traditionally used to determine fair use under US law.

He argued that these systems don’t just copy content—they often compete with it, harming creators and businesses. He also says the outputs aren’t original enough to justify the ‘fair use’ claim.

First, is the use different enough from the original, or is just copying for profit? Second, is the original work creative (more protected under copyright law) or factual (less protected)? Third, how much of the original is being used, and are important parts copied?

And finally, does AI hurt the original work’s market, like taking traffic from websites or replacing paid tools? Most of the time, these points support AI use less than fairly, Balaji concluded.

“None of the four factors seem to weigh in favor of ChatGPT being a fair use of its training data. That being said, none of the arguments here are fundamentally specific to ChatGPT either, and similar arguments could be made for many generative AI products in a wide variety of domains,” he said in the essay.

Balaji also warned that the increasing prevalence of AI-generated content could erode the reliability of the internet itself, potentially leading to more inaccuracies and misinformation. What’s the solution? “The only way out of all this is regulation,” Balaji told the NYT.

As of early November, some lawsuits against OpenAI have already been dismissed. At the beginning of November, a federal judge in New York dismissed a lawsuit against the company by Raw Story and AlterNet, who claimed their articles were used without permission to train ChatGPT.

And then there is the classic case of the ‘if you can’t beat them, join them’ scenario playing out. News organizations like the Associated Press sued OpenAI for training its models using their news articles without permission. However, the news publication has since signed a deal with OpenAI to offer data for AI training.

Balaji says in his essay, “It’s unclear why these agreements would be signed if training on this data besides use,” but that’s beside the point.”

Copyright law falls short in shielding original publishers from big tech

Balaji was not the only figure raising concerns. A prominent figure in AI, Ed Newton-Rex, resigned last year from his role at Stability AI over similar disagreements about using copyrighted works to train generative AI models. He wrote that he couldn’t support the company’s stance that such use falls under ‘fair use.’

In the case of OpenAI, though, it gets murkier. Engineers working for OpenAI have also “accidentally” erased potential evidence in a copyright lawsuit filed by the NYT and Daily News.

The NYT, in its filing, said the data that the newspaper’s team had spent over 150 hours collecting as potential evidence was erased. Although OpenAI managed to recover much of the data, the NYT’s legal team said the original file names and folder structure are still missing. This makes it impossible to track where the newspaper’s copied articles may have been used in OpenAI’s AI models.

The lack of regulation isn’t helping anyone, especially the whistleblowers. Regulation ne pas doesn’t provide a protective shield to Balaji and Newton-Rex, who can report the wrongdoings inside companies and be protected from the company simultaneously. While this in no way speculates the cause of Balaji’s death, it highlights the lack of support he likely faced in such a challenging situation.

A missed opportunity lay in the AI Bill SB 1047, which addressed the protection of whistleblowers but ultimately fell short of passing due to concerns over enforcement, balancing innovation with regulation, and a big lobbying push from big tech to derail the bill.

The current lack of regulation and oversight means that tech giants like OpenAI can operate with little accountability, reaping the rewards of their AI innovations while sidestepping their responsibilities toward the creators whose works feed into these systems.

The ethical misconduct alleged by whistleblowers like Balaji and the mounting legal challenges suggest that OpenAI may be drifting away from its original mission—to develop AGI openly and beneficially for humanity. It is now on both the company and regulators to confront these issues head-on, ensuring that AI innovation does not come at the cost of creators’ rights and societal equity.

Balaji’s revelations mark a significant turning point in exposing the copyright infringement practices of big tech companies, ensuring that his courage and dedication will not be forgotten.

ARTICLES

See All

Interviews SpaceX veteran Tom Mueller on propulsion, reuse, and in-space mobility

Inside China China's EV battery fires test the limits of layout-led safety

Innovation How physicists solved acoustic levitation's biggest flaw

Case Studies The space-based heat maps changing how engineers fix cities

Beyond Earth China's Shijian maneuver signals a new lead in the orbital refueling race

Interviews Uni Stuttgart's Axel Körner builds the next generation of adaptive buildings

Interviews SpaceX veteran Tom Mueller on propulsion, reuse, and in-space mobility

Inside China China's EV battery fires test the limits of layout-led safety

Innovation How physicists solved acoustic levitation's biggest flaw

Case Studies The space-based heat maps changing how engineers fix cities

Beyond Earth China's Shijian maneuver signals a new lead in the orbital refueling race

ARTICLES

See All

Interviews SpaceX veteran Tom Mueller on propulsion, reuse, and in-space mobility

Inside China China's EV battery fires test the limits of layout-led safety

Innovation How physicists solved acoustic levitation's biggest flaw

Case Studies The space-based heat maps changing how engineers fix cities

Beyond Earth China's Shijian maneuver signals a new lead in the orbital refueling race

Interviews Uni Stuttgart's Axel Körner builds the next generation of adaptive buildings

Interviews SpaceX veteran Tom Mueller on propulsion, reuse, and in-space mobility

Inside China China's EV battery fires test the limits of layout-led safety

Innovation How physicists solved acoustic levitation's biggest flaw

Case Studies The space-based heat maps changing how engineers fix cities

Beyond Earth China's Shijian maneuver signals a new lead in the orbital refueling race

Premium

Shop

Jobs

IE Academy

IE Awards

From fair use to foul play? Whistleblower’s last stand against OpenAI’s copyright tactics

Balaji’s expose of OpenAI

Copyright law falls short in shielding original publishers from big tech

Recommended Articles

More from AI Logs

More from

ARTICLES

More from AI Logs

More from

ARTICLES

Balaji’s expose of OpenAI

Copyright law falls short in shielding original publishers from big tech

Recommended Articles

More from AI Logs

More from

ARTICLES

More from AI Logs

More from

ARTICLES

INNOVATION

ENTERTAINMENT

GUIDES