- Stability AI, the startup that makes the popular AI art tool Stable Diffusion, faces two lawsuits.
- They allege the company infringes on copyrights by scraping the web to train its art algorithms.
- Tech insiders say any loss would influence the ability of AI startups to get the data they need.
An artificial-intelligence company is facing two lawsuits that may change how generative-AI startups source their training material.
Stability AI, a London startup founded in 2019 that uses the tagline, “AI by the people, for the people,” on its website, produces free and open-source software tools that can be used to create art, music, or pretty much anything else that used to be the province of humanity solely.
The company’s most well-known product is the controversial Stable Diffusion (also known as DreamStudio to users). Enter text into a search bar, and Stable Diffusion will, for a lack of a better word, draw an image to match, right on the spot. Ask for a woman holding a puppy in the rainforest, and the AI will produce images of exactly that (though the people it generates are almost always white by default, but that’s an issue for another day).
The issue here is that Stable Diffusion trains its system by scraping images and artwork from the web, essentially copying and pasting images made by human artists into its vast databases of training material. It’s a cool tool. But it’s courted backlash, especially from artists who say they were neither asked nor compensated for the rights to use their art in this way.
Now a group of tech insiders say the two lawsuits against Stability AI — one from a group of artists and one from the photo-licensing giant Getty Images — could hurt not only the company but also generative AI in general.
In short, a loss for Stability in these suits would negatively influence the gathering of training material for the company and its peers. If Stability won, it would underscore the wild, new world of copyrights in the AI era.
Tech insiders who spoke with Insider said Stability AI, a company founded by CEO Emad Mostaque, would have to prove to the courts that its AI image generator was somehow not generating copied work — even though there’s a strong possibility that it is — but had transformed the images into something new and that its web-scraping technology legitimately collected those images.
What’s old is new again
Stability Diffusion released Stability AI in August, a time when the generative-AI market was starting to heat up. Two months later, the company raised $101 million in a seed round led by Coatue Management, Lightspeed Venture Partners, and O’Shaughnessy Ventures at a valuation of $1 billion.
Getty’s suit alleges Stability AI chose a backdoor to scrape the images it hosted instead of paying for access like everyone else. Likewise, the group of artists say Stability AI — along with Midjourney, another AI image generator — and DeviantArt, an online art community — did not seek any type of contractual agreement with creators before using copyrighted artwork for commercial gains.
Stability AI’s legal issues are reminiscent of the early days of Google Search, Kieran McCarthy, a tech attorney at McCarthy Law Group, said.
About the time Google went public in 2004, the then-Silicon Valley startup was being hit with copyright-infringement suits over the thumbnail images in its search results. The courts ruled that Google’s thumbnail images were fair use because it was “sufficiently transformative,” McCarthy said.
“Google is obviously copying pictures of famous artworks and posting it in search results,” he said.
But the way the image is produced in the search results was deemed “transformative” and therefore fair use.
McCarthy anticipates that Stability AI will turn to the same argument.
“If you are the attorneys for the defendants in this case — and likely in other scraping cases that are likely to come from other AI technologies in the future — I think you’re going to lean on that transformative argument,” he told Insider. “You’re going to say, ‘What we’re doing is a different work.'”
Other generative-AI tools are going to have similar things to prove: In November, Matthew Butterick, the attorney representing the group of artists in the case against Stability AI, launched a class-action lawsuit against GitHub and its parent company, Microsoft, alleging the company used code stashed on the platform to train an AI programming assistant known as Copilot without giving attribution to the code’s original authors.
It’s a matter of how you got it
To make Stable Diffusion work, Stability AI scrapes data from places around the internet to score a robust library of images to train its AI systems with. While it has never disclosed publicly what exactly those sources are, many artists and creatives are skeptical about its collection methods.
Michael B. Johnson, a former Pixar executive, tweeted at Mostaque in January, saying, “I would certainly like to see a present where the corpus of images used for training were ethically sourced.” He added: “Do you have a stated policy on this? Would be great to see.”
Mostaque responded by saying he believed the images were “ethically, morally and legally sourced and used” but that “some folks disagree.”
The backlash has landed the startup in the courts. A judge, and maybe a jury, in California and a judge in the UK will examine the terms and conditions used to obtain the copyrighted content.
The Snell & Wilmer attorney Tony Caldwell’s favorite analogy to use for this is, if there’s a gate up, and you cross the gate, that’s generally prohibited contact. What the court views favorably is if there’s no gate up when the web scrape happens. When you look at Getty’s situation, the company had a login, a license structure, and terms of service that created a type of “gate” around the content on its platform.
“That gate mechanism, if you will, really determines whether or not the court views the type of activity as conduct that scraper or that party should or should not engage in,” he said.
To that, Mark Beccue, an AI analyst at the research company Omdia, said: “Stability should be asking the artists or the intellectual-property owners if they can use it. It’s backwards. Why should an intellectual-property owner have to do that proactively?”