Stephen King: My Books Were Used to Train AI

L4sBot@lemmy.world · 1 year ago

Stephen King: My Books Were Used to Train AI

ashok36@lemmy.world · 1 year ago

I mean, yeah, duh. Just ask any of them to write a paragraph “in the style of INSERT AUTHOR”.

If it can, then it was trained on that author. I’m not sure how that’s a problem though.

ForgotAboutDre@lemmy.world · 1 year ago

We don’t have the legal framework for this type of thing. So people are going to disagree with how using training data for a commercial AI product should work.

I imagine Steven King would argue they didn’t have licenses or permission to use his books to train their AI. So he should be compensated or the AI deleted/retrained. He would argue buying a copy of the book only lets it be used for humans to read. Similar to buying a CD doesn’t allow you to put that song in your advert.

Drewelite@lemmynsfw.com · 1 year ago

I would argue we do have a legal precedent for this sort of thing. Companies hire creatives all the time and ask them to do things in the style of other creatives. You can’t copyright a style. You don’t own what you inspire.

IchNichtenLichten@lemmy.world · 1 year ago

That’s not what’s happening though. His works are being incorporated into a LLM without permission. I hope he sues the hell out of these people.

Drewelite@lemmynsfw.com · edit-2 1 year ago

But that is what’s happening in the minds of creatives. Reading a book and taking inspiration is functionally the same mechanism that an LLM uses to learn. They read Stephen King, they copy some part of the style. Potentially very closely and for a corporation’s gain if that’s what’s asked of them.

IchNichtenLichten@lemmy.world · 1 year ago

One person being influenced by a prose style isn’t the same as a company using a copyrighted work without permission to train a LLM.

Drewelite@lemmynsfw.com · 1 year ago

Every learning material a company or university has ever used has been used to train an LLM. Us.

Okay I’m being a bit facetious here. I know people and chat GPT aren’t equivalent. But the gap is closing. Maybe LLMs will never bridge the gap, but something will. I hesitate to write into law now that any work can never be ingested or emulated by another intelligent entity. While the difference between a machine and a human are clear to you now, one day they won’t be.

The longer we hold onto the idea that our brains are somehow magically different from the way computers (are) will learn to think, the harder we’ll get blindsided by reality when they’re indistinguishable from us.

IchNichtenLichten@lemmy.world · 1 year ago

There’s very little a LLM has in common with the human brain. We can’t do AGI yet and there’s no evidence that we will be able to create AGI any time soon.

The main issue as I see it is that we have companies trying to make money by creating LLMs. The people who created the source materials for these LLMs are not only not getting paid, they’re not even being asked permission. To me that’s dead wrong and I hope the courts agree.

BetaDoggo_@lemmy.world · 1 year ago

Is that illegal though? As long as the model isn’t reproducing the original then copyright isn’t being violated. Maybe in the future there will be laws against it but as of now the grounds for a lawsuit are shaky at best.

IchNichtenLichten@lemmy.world · 1 year ago

There are already laws around what you can’t and can’t do with copyrighted material. If the owners of the LLM didn’t obtain written permission I’d say they are on very shaky ground here.

BetaDoggo_@lemmy.world · 1 year ago

What laws specifically? The only ones I can find refer to limits on redistribution, which isn’t happening here. If the models were able to reproduce the contents of the books that would be another issue that would need to be resolved. But I can’t find anything that would prohibit training.

IchNichtenLichten@lemmy.world · edit-2 1 year ago

What laws specifically?

Existing laws to protect copywritten material.

“AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may consist of existing works such as text and images from the internet. This training process may involve making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.” Creating such copies, without express or implied permission from the various copyright owners, may infringe the copyright holders’ exclusive right to make reproductions of their work.”

https://crsreports.congress.gov/product/pdf/LSB/LSB10922