freetime219 days ago
So it sounds like they definitely scraped the content and used it for training, which is legal:<p>&gt; Japan’s copyright law allows AI developers to train models on copyrighted material without permission. This leeway is a direct result of a 2018 amendment to Japan’s Copyright Act, meant to encourage AI development in the country’s tech sector. The law does not, however, allow for wholesale reproduction of those works, or for AI developers to distribute copies in a way that will “unreasonably prejudice the interests of the copyright owner.”<p>The article is almost completely lacking in details though about <i>how</i> the information was reproduced&#x2F;distributed to the public. It could be a very cut-and-dry case where the model would serve up the entire article verbatim. Or it could be a much more nuanced case where the model will summarize portions of an article in its own words. I would need to read up on Japanese copyright law, as well as see specific examples of infringement, to be able to make any sort of conclusion.<p>It seems like a lot of people are very quick to jump to conclusions in the absence of any details, though, which I find frustating.
stubish18 days ago
&gt; So it sounds like they definitely scraped the content and used it for training, which is legal<p>It certainly seems legal to train. But the case is about scraping without permission. Does downloading an article from a website, probably violating some small print user agreement in the process, count as distribution or reproduction? I guess the court will decide.
incompatible18 days ago
According to the article, they are complaining that the downloaded content had &quot;been used by Perplexity to reproduce the newspaper’s copyrighted articles in responses to user queries.&quot; Derived works.
mvdtnz18 days ago
Reproducing articles is not &quot;deriving&quot; anything. It&#x27;s reproducing.
staticautomatic18 days ago
“Reproduce” in this context reads like “copy&#x2F;republish”, which would not be a derivative work.
incompatible18 days ago
Yes, if it&#x27;s an exact copy, but I don&#x27;t know if their system is actually presenting entire articles, or just fragments (copyrightable, perhaps) and perhaps mixing them with other text.
alexey-salmin18 days ago
Generally the court practice so far was that if you don&#x27;t register or login, you never accept the user agreement. If the website is still willing to serve content to non-registred users, you&#x27;re free to archive it. How you can use it afterwards is a separate question.
bgwalter18 days ago
LLMs are able to reproduce the entire IP. Sometimes it requires more than one prompt. I&#x27;ve seen examples in the wild where a single prompt was sufficient:<p><a href="https:&#x2F;&#x2F;jskfellows.stanford.edu&#x2F;theft-is-not-fair-use-474e11f0d063" rel="nofollow">https:&#x2F;&#x2F;jskfellows.stanford.edu&#x2F;theft-is-not-fair-use-474e11...</a><p>Therefore, their output is a derivative work and violates copyright. The 2018 amendment is driven by big capital and should be reverted. Machines can plagiarize at huge scale and should have have no human rights.
freetime218 days ago
I&#x27;m aware of the fact that LLMs can reproduce IP used in training data, and consider the example NYT article in your link to be &quot;a very cut-and-dry case&quot; of copyright infringment. And commercial AI companies especially should be held liable for damages if they can&#x27;t or won&#x27;t implement effective guardrails to prevent this from happening.<p>I&#x27;m somewhat optimistic this problem can be solved, though, with filters and usage policies. YouTube, another platform with basically unlimited potential for copyright infringement, has managed to implement a system that is good enough at preventing infringement to keep lawsuits at bay.<p>It&#x27;s also not clear if that&#x27;s what Yomiuri Shimbun is alleging here. In their 2023 &quot;Opinion on the Use of News Content by Generative AI&quot; [1] they give this example:<p>&gt; Newspaper companies have long provided databases containing past newspaper pages and articles for a fee, and in recent years, they have also sold article data for AI development. If AI imports large quantities of articles, photos, images, and other data from news organizations’ digital news sites without permission, commercial AI services for third parties developing it could conflict with the existing database sales market and “unreasonably prejudice the interests of the copyright owner” (Article 30-4 of the Act). Also, even if all or part of a particular article communicates nothing further than facts and hardly constitutes a copyright, many contents deserve legal protection because of the effort and cost invested by the newspaper companies. Even if an AI collects and uses only the factual part, it does not mean it will always be legal.<p>So basically arguing that 2018 amendment which allows the use of copyrighted works to train AI models without permission from the copyright holder is not applicable because the use would &quot;would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation&quot;. [2]<p>... which I think is a <i>much</i> more nuanced argument. I don&#x27;t think we can just lump all of these cases together and say &quot;it&#x27;s infringement&quot; or &quot;it&#x27;s fair use&quot; without actually considering the details in each case. Or the specific laws in each country.<p>[1] <a href="https:&#x2F;&#x2F;www.pressnet.or.jp&#x2F;statement&#x2F;20230517_en.pdf" rel="nofollow">https:&#x2F;&#x2F;www.pressnet.or.jp&#x2F;statement&#x2F;20230517_en.pdf</a><p>[2] <a href="https:&#x2F;&#x2F;www.cric.or.jp&#x2F;english&#x2F;clj&#x2F;cl2.html" rel="nofollow">https:&#x2F;&#x2F;www.cric.or.jp&#x2F;english&#x2F;clj&#x2F;cl2.html</a>
SilverElfin19 days ago
I don’t understand why corporations can violate copyright laws at hyper scale but individuals are banned from small scale piracy through authoritarian internet governance.
presentation19 days ago
Maxwell Tabarrok has a take on this, basically in his words:<p>&gt; The confusion of intellectual property and property rights is fair enough given the name, but intellectual property is not a property right at all. Property rights are required because property is rivalrous and exclusive: When one person is using a pair of shoes or an acre of land, other people’s access is restricted. This central feature is not present for IP: an idea can spread to an infinite number of people and the original author’s access to it remains untouched.<p>&gt; There is no inherent right to stop an idea from spreading in the same way that there is an inherent right to stop someone from stealing your wallet. But there are good reasons why we want original creators to be rewarded when others use their work: Ideas are positive externalities.<p>&gt; When someone comes up with a valuable idea or piece of content, the welfare maximizing thing to do is to spread it as fast as possible, since ideas are essentially costless to copy and the benefits are large.<p>&gt; But coming up with valuable ideas often takes valuable inputs: research time, equipment, production fixed costs etc. So if every new idea is immediately spread without much reward to the creator, people won’t invest these resources upfront, and we’ll get fewer new ideas than we want. A classic positive externalities problem.<p>&gt; Thus, we have an interest in subsidizing the creation of new ideas and content.<p>And so you can reframe whether or not IP rights should be assigned in this case, based on whether you believe that the welfare generated by making AI better by providing it with content is more valuable for society than the welfare generated by subsidizing copyright holders.<p>[1] <a href="https:&#x2F;&#x2F;open.substack.com&#x2F;pub&#x2F;maximumprogress&#x2F;p&#x2F;ai-copyright-cases-will-shape-the?r=1bmwua&amp;utm_medium=ios" rel="nofollow">https:&#x2F;&#x2F;open.substack.com&#x2F;pub&#x2F;maximumprogress&#x2F;p&#x2F;ai-copyright...</a>
greysphere18 days ago
There&#x27;s no inherent right to anything, really. The statements in whatever declaration or philosophy are just arbitrary lines. Physical property rights are just as arbitrary as the divine right if kings (and incredibly closely related when that property is inherited!)<p>The argument really isn&#x27;t based on rights, it&#x27;s based on the rules of the game have been that people that make things get to decide what folks get to do with those things via licensing agreements, except for a very small set of carve outs that everyone knew about when they made the thing. The argument is consent. The counter argument is one&#x2F;all of ai training falls under one of those carve outs, and&#x2F;or it&#x27;s undefined so it should default to whatever anyone wants, and&#x2F;or we should pass laws that change the rules. Most of these are just as logical as if someone invented resurrection tomorrow, then murder would no longer be a crime.
philipallstar18 days ago
&gt; the divine right if kings (and incredibly closely related when that property is inherited!)<p>These seem to be very different indeed. You only need to be able to own and give property to have inheritance.<p>If your property is owned by a monarch or de facto the state, and you work your lifetime to rent it from them, then you don&#x27;t get inheritance.
greysphere18 days ago
The similarity between divine right of kings and inheritance is that an unearned is transferred via circumstances of birth.<p>Your statements seem to extend that further: If you rent an apartment, you the property is owned by an landlord (lord is literally in the title!) and passed down by their wishes. Similarly if you work for Walmart for life, the company is owned and passed down by the Waltons. In these cases the property rights extend beyond life and are transferred via circumstances of birth, while the rights of labor end.<p>Interesting that IP rights are ended by death (or death+n years) as well. This line of reasoning suggests maybe that should apply to all property.
Hamuko18 days ago
&gt;<i>the welfare generated by making AI better by providing it with content is more valuable for society than the welfare generated by subsidizing copyright holders.</i><p>Isn&#x27;t the AI in this case also copyrighted intellectual property that benefits its owners and not the society? As far as I know, Perplexity is a private, for-profit corporation.<p>I don&#x27;t see how improving Perplexity&#x27;s proprietary models is any more beneficial to society than YouTube blocking ad blockers.
presentation18 days ago
Because there is arguably more societal value in commercial AI being able to do tasks well than there is in users being able to avoid looking at ads on an ad-supported platform.
pjc5018 days ago
That&#x27;s the standard rubric, but it doesn&#x27;t actually answer the question of differential enforcement, which comes down to the usual questions: money and power.
presentation18 days ago
In this case, the money and power are there precisely because people perceive AI as having the potential to reshape society, resulting in its creators receiving money and power, so it’s a bit of a chicken and egg situation.
wat1000019 days ago
You should also look at the welfare generated by showing that all are equal under the law, versus showing that companies can get away with blatant lawbreaking if they can convince people that it’s for the greater good.<p>The proper way to decide this would be to pass a law in the legislature. But of course our system in general and tech companies in particular don’t work that way.
presentation18 days ago
The USA tends to invest a great deal of legislative power in the courts, and as you mention the legislature isn’t very responsive nor effective, so this is what we get.
impossiblefork18 days ago
I actually think physical property rights are much more problematic than copyright.<p>Works are so sparse, and there is such an explosion in how many texts there are that when someone has a right to the exclusive use of one of these huge numbers that are almost unrepresentable, you lose almost nothing.<p>If someone didn&#x27;t announce that they had written, let&#x27;s say, Harry Potter and there was a secret law forbidding you from distributing it, that would be really bad, but it would <i>never</i> matter.<p>Copyright infringement is a pure theft of service. You took it because it was there, because someone had already spent the effort to make it, and that was the only reason you took it.<p>Land, physical property, etc. meanwhile, is something that isn&#x27;t created only by human effort.<p>For this reason copyright, rather than some fake pseudo-property of lower status than physical property, is actually much more legitimate than physical property.
aspenmayer18 days ago
How one adjudicates ownership or authorship disputes under copyright is fundamentally different than disputes about land and property ownership. We can go to records and so on in each case, but a resolution would be different in each case, because they are different sorts of potential violations or transgressions.<p>I don’t think it’s as clear who is at fault if I mention “he who must not be named” in a hypothetical scenario where Harry Potter was never published, and then start telling people about the manuscript I found. If I violated someone’s rights to privacy or property to get or keep the original manuscript, that’s one thing, but merely having it even if the author didn’t want me to have it as a copy especially is another issue. If I never published it but merely described it to others, I’m not sure if I’m any less culpable, but it seems like I should be.<p>I’m not sure how much more I can explore your thought experiment, but I appreciate you for sharing it with me.
mlinhares19 days ago
The law only exists for those without enough money and influence to control the enforcers.
wat1000019 days ago
They don’t even need control. It’s a version of the old saying that if you owe the bank a million dollars then you have a problem, but if you owe a billion dollars then the bank has a problem. If your company is important enough then it’s not possible (at least not politically) to punish it significantly. See also: 2008 and “too big to fail.”
nradov19 days ago
Perplexity is still a small startup. If Enron and Theranos could be published then Perplexity can be punished. So far it&#x27;s unclear whether they&#x27;ve done anything illegal.
aspenmayer18 days ago
&gt; Perplexity is still a small startup.<p><a href="https:&#x2F;&#x2F;www.crunchbase.com&#x2F;organization&#x2F;perplexity-ai" rel="nofollow">https:&#x2F;&#x2F;www.crunchbase.com&#x2F;organization&#x2F;perplexity-ai</a><p>I don’t know how to parse this. I don’t think of them as small. Though they were only founded in 2022 and may not have a huge number of employees, they have had 8 funding rounds. They’re private, so I don’t know what they have raised, but some say that the company could have a $18B valuation.<p><a href="https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2025-07-17&#x2F;ai-startup-perplexity-valued-at-18-billion-with-new-funding" rel="nofollow">https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2025-07-17&#x2F;ai-startu...</a> | <a href="https:&#x2F;&#x2F;archive.is&#x2F;6DZpo" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;6DZpo</a><p>Is that small?
wat1000019 days ago
It’s very difficult to punish Perplexity without also hitting OpenAI, Grok, Google, Facebook, etc.<p>It’s plenty clear to me that they’ve broken copyright law a lot. They’ve downloaded copyrighted material without permission for their own use, which we’ve been assured is Not Good for us individual people. Some of them even redistributed it by seeding torrents, which is even more Not Good.
nradov19 days ago
It only seems &quot;plenty clear&quot; to you because you&#x27;re ignorant about the basics of copyright law in the USA and Japan. Fortunately we have actual courts to decide these issues. The applicable laws (including centuries of case law in the USA) are complex and whether particular actions are legal often depends on nuances that aren&#x27;t covered in news articles.
wat1000018 days ago
I’m not talking about Japan. In the US, seeding a torrent containing copyrighted material without authorization from the copyright holder is unambiguously a copyright violation.
freetime218 days ago
Presumably you are talking about this case, where Meta is accused of having downloaded a bunch of having torrented a bunch of copyrighted works. [1]<p>Of relevance here is the fact that 1) Meta denies having seeded the content, and there looks to be no hard evidence that they distributed the content to other users, 2) the case is ongoing, so a decision has not yet been reached about whether they broke any laws, and 3) the fact that Meta is being sued for this shows that even corporations worth trillions of dollars are not immune to the consequences of breaking the law.<p>[1] <a href="https:&#x2F;&#x2F;www.tomshardware.com&#x2F;tech-industry&#x2F;artificial-intelligence&#x2F;meta-defends-using-pirated-material-claims-its-legal-if-you-dont-seed-content" rel="nofollow">https:&#x2F;&#x2F;www.tomshardware.com&#x2F;tech-industry&#x2F;artificial-intell...</a>
wat1000018 days ago
Of course they’re not immune to consequences. It’s just that the consequences are so relatively small that they don’t really care. Reminds me of the quote about how the law treats everyone equally: both rich and poor are forbidden to sleep under bridges, beg, and steal bread.
JimDabell19 days ago
Learning isn’t copying and copyright only restricts copying. Are you comparing cases where individuals distribute copies to cases where corporations are not distributing copies? The difference seems clear.
freetime219 days ago
&gt; I don’t understand why corporations can violate copyright laws at hyper scale<p>Can they, though? Isn&#x27;t that why Perplexity is being sued?
suspended_state18 days ago
Disclaimer: I am not a lawyer, this is just my interpretation of the situation from the comments above.<p>I don&#x27;t have an answer to your question, which seems more general and doesn&#x27;t correspond to the situation described by the article anyway: here the corporations have the right to use copyrighted materials to train their model, in the same way that you are allowed to learn from the same materials. You might even learn it by heart if you want to, but copyright laws forbid you from reproducing it, and in this instance the Japanese law tries to follow the same principle for AI models.<p>How should the corporations implement their training to prevent their models to reproduce the material verbatim is their problem, not the copyright holder&#x27;s, in exactly the same fashion if you learn an article by heart, it&#x27;s on you to make sure you won&#x27;t recite it to the public.
_DeadFred_18 days ago
For profit products are not individual human&#x27;s putting in effort to learn. Stop making that comparison.<p>Humans are human. Humans can human when there is no profit motive without it being a copyright violation. Effectively infinitely scaling, for profit products, can&#x27;t &#x27;human&#x27; without it being a copyright violation. The two are much different cases, in no way comparable.<p>For profit products are PRODUCTS intended to make money for companies. AIs are scalable past an individual human.<p>Rules&#x2F;concepts for humans are not relevant at all for for profit products.
yorwba18 days ago
Both corporations and individuals are banned from piracy, but both corporations and individuals can violate copyright laws at hyper scale until somebody stops them. Corporations are probably more likely to get sued, but also more likely to get a lawyer instead of completely losing their head over a legal threat.
prasadjoglekar18 days ago
Two different issues IMO. Piracy is depriving someone of payment for an item for which payment was expected. Neither you nor Perplexity may pirate a DVD that you didn&#x27;t buy.<p>Copyright usually doesn&#x27;t prevent copying per se, it&#x27;s the redistribution that is violative. You, as well as Perplexity are free to scrape public sites. You&#x27;ll both be sued if you distribute it.
charcircuit18 days ago
Transformative usages of copyrighted material is very different than people consuming content thr way it was meant to be consumed for free.
thrance18 days ago
Is it? Bulk downloading of every articles of a journal is OK if I train a neural network on it later, but accessing a single one without paying is not?
mightysashiman18 days ago
I&#x27;m not pirating, I&#x27;m AI model training. Got it!
thrance18 days ago
Yes, you do understand why. In our societies, capital is king.
t0lo18 days ago
Yanis Varoufakis would like to have a word with you
rr80819 days ago
Its the same reason how Uber could run a ride service without taxi medallions and Air BnB can open home stays in your neighborhood. If there is enough money involved, the VCs in Silicon Valley know who to pay to get what they want.
eviks18 days ago
Do you understand this for other laws?
hulitu18 days ago
It is because corpprattions can pay lawmakers for this, just how they did in the case of copyright law. Welcome to &quot;democracy&quot;.
pluto_modadic19 days ago
anthropic has lawyers and buys senators, aron swartz was one dude corporations could make an example of via the courts.
ujkhsjkdhf23419 days ago
Before someone mentions Japan effectively making all data fair use for AI training, Japan specifically forbids direct recreation which is what this lawsuit is about.
daedrdev19 days ago
Japan has extremely favorable copyright laws to the holders. My understanding is that without explicit permission, there is no fair use and so any reproduction or modified work is only allowed as long as they don&#x27;t request a takedown.
beepbooptheory19 days ago
From tfa:<p>&gt; Japan’s copyright law allows AI developers to train models on copyrighted material without permission. This leeway is a direct result of a 2018 amendment to Japan’s Copyright Act, meant to encourage AI development in the country’s tech sector. The law does not, however, allow for wholesale reproduction of those works, or for AI developers to distribute copies in a way that will “unreasonably prejudice the interests of the copyright owner.”
stubish18 days ago
I wonder if you can download the copyrighted material without permission though? The article specifically states &#x27;the scraping has been used by Perplexity to reproduce the newspaper’s copyrighted articles in responses to user queries without authorization&#x27;. They don&#x27;t seem to be complaining about the training (legal), but the scraping.
kazinator19 days ago
Training a model isn&#x27;t redistribution; only when you give someone a copy of the model can we think about there being a problem. At that point, you are not training, but redistributing a derived work.
Alex438619 days ago
tl;dr: If you are not directly affecting the &quot;sales&quot; of the product, you are good to go. But It seems perplexity did, and (as they might call it) directly trying to compete as a news source<p>Personally, About their news service, Their news summarization is kinda misleading with AI hallucination in some places.
AraceliHarker18 days ago
The belief that it&#x27;s acceptable to copy or alter copyrighted material unless the rights holder objects is merely an assertion by those who violate copyright law. Barring a few exceptions such as citation or non-commercial use without internet distribution, you are generally prohibited from using someone else&#x27;s creative work without their consent.
anticensor18 days ago
Japanese copyright law still has a few statutory exceptions.
ants_everywhere19 days ago
If they are copying and pasting news articles on their site, that&#x27;s a pretty straightforward copyright case I would think.<p>In the US at least this should be pretty well covered by the case law on news aggregators.
AraceliHarker18 days ago
It was the Yomiuri Shimbun, which boasts the world&#x27;s largest circulation, that established the mass reproduction of not just article bodies, but even headlines, as a violation of copyright.
ants_everywhere18 days ago
Thanks, I was aware of the distinction between article bodies and headlines for news aggregators, but I was not aware of Yomiuri Shimbun&#x27;s role
tjpnz18 days ago
Off-topic: Yomiuri Shimbun operates its own theme park and it&#x27;s an absolute delight, especially during winter months when there&#x27;s a spectacular light show during the evenings. I prefer it to Tokyo Disneyland because there&#x27;s plenty there to occupy young children but with reasonable waiting times.<p>Give it a try on your next visit to Tokyo. I recommend arriving on the cablecar - almost feels like you&#x27;re descending into Jurassic Park by helicopter (wife gets quite annoyed when I predictably start humming John Williams).<p><a href="https:&#x2F;&#x2F;www.yomiuriland.com&#x2F;en&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.yomiuriland.com&#x2F;en&#x2F;</a>
Shaddox18 days ago
The fundamental problem is that everyone is expected to pitch in to help train these AIs, but only a handful of people benefit from it.
lvl15518 days ago
This is what I call the Zuckerberg business model.
aspenmayer19 days ago
Original title edited for length:<p>&gt; Japan’s largest newspaper, Yomiuri Shimbun, sues AI startup Perplexity for copyright violations
ronsor19 days ago
I don&#x27;t know why Perplexity in particular gets everyone in a nit. It&#x27;s not even particularly special: a user inputs a query, an AI model does a web search and fetches some pages on the user&#x27;s behalf, and then it serves the result to the user.<p>Putting aside that other products, such as OpenAI&#x27;s ChatGPT and modern Google Search have the same &quot;AI-powered web search&quot; functionality, I can&#x27;t see how this is meaningfully different from a user doing a web search and pasting a bunch of webpages into an LLM chat box.<p>&gt; But what about ad revenue?<p>The user could be using an ad blocker. If they&#x27;re using Perplexity at all, they probably already are. There&#x27;s no requirement for a user agent to render ads.<p>&gt; But robots.txt!!!11<p>`robots.txt` is for recursive, fully automated requests. If a request is made on behalf of a user, through direct user interaction, then it may not be followed and IMO shouldn&#x27;t be followed. If you really want to block a user agent, it&#x27;s up to you to figure out how to serve a 403.<p>&gt; It&#x27;s breaking copyright by reproducing my content!<p>Yes, so does the user&#x27;s browser. The purpose of a user agent is to fetch and display content how the user wants. The manner in which that is done is irrelevant.
Alex438619 days ago
Well, some bots even spoof User-Agents, requesting tons of requests without proper rate-limiting (looking at you, ByteSpider)<p>No fair plays done by people, even before the LLMs, so we get the PoW challenge on everywhere.<p>And what is that conclusion? since Adblockers are used by anywhere, it is OK to corporates not to license them directly and just yank them and put it into curation service? especially without ads? that&#x27;s a licensing issue. the author allowed you to view the article if you provide them monetary support (i.e. ads), they didn&#x27;t allow you to reproduce and republish the work by default.<p>also calling browser itself as reproducing? Yes, the data might be copied in memory (but I wouldn&#x27;t call it as reproducing material, more like transfer from the server to another), but redistribution is the main point here.<p>It&#x27;s like saying well, &quot;the part of the variable is replicated to register from the L2 cache, so whole file on DRAM can be authorized to reproduce&quot;, Your point of calling &quot;it&#x27;s reproducing and should not be reproduced in first place&quot; can&#x27;t be prevented unless you bring non-turing computers that doesn&#x27;t use active memory.
kazinator19 days ago
The only reason you can say &quot;looking at you ByteSpider&quot; is that it identifies itself. In 2025, that qualifies it as a nice bot.<p>The nasty bots make a single access from an IP, and don&#x27;t use it again (for your server), and are disguised to look like a browser hit out of the blue with few identifying marks.
jaredwiener19 days ago
There&#x27;s a difference between what is technically feasible and what is allowed, legally or even morally.<p>Just because it is possible -- or even easy -- to essentially steal from newspapers&#x2F;other media outlets, doesn&#x27;t make it right, or legal. The people behind it put in labor, financial resources, and time to create a product that, like almost every other service, has terms attached -- and those usually come with some form of monetization. Maybe it is a paywall, maybe it is advertisements -- but it is there.<p>Using an adblocker, or finding some loophole around a paywall, etc, are all very easy to do technically, as any reader of this site knows. That said, the media outlet doesn&#x27;t have to allow it. And when it is violated on an industrial scale, like Perplexity, then they can be understandably upset and take legal action. And that includes any AI (or other technology, for that matter) that is a wrapper around plagiarism.<p>Sites opted in to Google originally because it fed them traffic. They most likely did not opt in to an AI rewriter that takes their work and republishes it without any compensation.
petesergeant18 days ago
&gt; I don&#x27;t know why Perplexity in particular gets everyone in a nit<p>I suspect they seem easier to sue than OpenAI, Anthropic, Meta, Google, and literally anything coming out of china.
totetsu19 days ago
The Japan Newspaper Publishers &amp; Editors Association is very active lobbying about this area <a href="https:&#x2F;&#x2F;www.pressnet.or.jp&#x2F;english&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.pressnet.or.jp&#x2F;english&#x2F;</a>
ekianjo18 days ago
&gt; If quality news content, which underpins democracy, decreases, the public’s right to know may be hampered.<p>quality news content has not been a thing for a long time now, so the public will not notice any change
charcircuit19 days ago
It&#x27;s best not to crawl Japanese newspapers. Japan does not have the same kind of fair use. Even reproducing facts from a newspaper can be infringing.
Hamuko18 days ago
Most of the world doesn&#x27;t have fair use.
pyrale18 days ago
I suspect we&#x27;ll see AI&#x27;s claim to fair use be challenged even in the US. The claim to be transformative is mostly based on the &quot;shape&quot; of the information being delivered (i.e. the AI rephrases the information).<p>However, the transformative nature of derivative work is not only about its apparence. It also factors in whether the transformation changes the nature of the message, and whether the derivative work is in direct competition with the original work [1]. I suspect for e.g. news articles, there&#x27;s a good case that people get information that way instead of going to the newspaper, which means the derivative work competes with the original. Also when it comes to reporting news, there&#x27;s not many ways to make the message different that doesn&#x27;t make the AI service bad.<p>[1]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Andy_Warhol_Foundation_for_the_Visual_Arts,_Inc._v._Goldsmith" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Andy_Warhol_Foundation_for_the...</a>
charcircuit18 days ago
Japan does have fair use.<p><a href="https:&#x2F;&#x2F;www.cric.or.jp&#x2F;english&#x2F;clj&#x2F;cl2.html#chapter2sect3sub5" rel="nofollow">https:&#x2F;&#x2F;www.cric.or.jp&#x2F;english&#x2F;clj&#x2F;cl2.html#chapter2sect3sub...</a>
Hamuko17 days ago
That&#x27;s not fair use.
mattigames19 days ago
I wish there was a open fund anyone could donate with the exclusive aim of suing Perplexity, OpenAI and others for copyright violations, where a team of lawyers would help the cases with the most likelihood to win, that would try to highlight that the way such systems are &quot;learning&quot; have little similitude to the intent of the law when it was written to give layaway for other artists&#x2F;authors to create similar creations.
CamperBob219 days ago
Amazing how many copyright maximalists there are on a site called &quot;Hacker News.&quot;<p>Seems to be a fairly recent trend. Wonder what changed.
mattigames18 days ago
Nothing changed on my case (and many others), is that perhaps you never grasped the big picture of our view, in that copyright law should be soft against consumers that violate it (for non-profit reasons) and hard against corporations that do.
CamperBob218 days ago
Let&#x27;s see if training a model is actually considered a copyright violation. I don&#x27;t know that, and neither do you.<p>If it <i>is</i> adjudicated to be a violation, well, that&#x27;s the end of copyright, for better or worse. AI is more important. Don&#x27;t fight to lock down information; fight for equitable access instead.
wat1000019 days ago
What changed is that copyright violation used to be something individuals did quietly, and got punished for. Now it’s something big companies are doing openly and they’re getting tons of money for it and zero consequences.
CamperBob218 days ago
&quot;Copyright violation?&quot; That remains to be seen, doesn&#x27;t it? Which court do you sit on, and how many trillions of dollars in future value do you feel comfortable tossing away?<p>The copyright industry has done all it can for us, even in the most charitable interpretation. They literally, by constitutional mandate, can&#x27;t be allowed to stand in the way of progress. We&#x27;re not talking Napster 2.0 here.
wat1000018 days ago
You’re going to give me shit for calling out a clear copyright violation because I’m not a judge, and yet you feel comfortable saying that it’s unconstitutional(?!) to stand in their way? What court do <i>you</i> sit on?
CamperBob217 days ago
A literal, plain-language reading of the Constitution is sufficient. Article I, Section 8, Clause 8: <i>[The Congress shall have Power . . . ] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.</i><p>Copyright doesn&#x27;t promote the progress of science. Rather the opposite, as it allows journals that contribute nothing to progress to charge the rest of us to access research our taxes paid for.<p>As for &quot;arts,&quot; useful and otherwise, those are secured these days via unbreakable permanent DRM, which overtly violates the constitutional basis of copyright law as a time-limited bargain with the public domain. You should be at least as outraged about that as you are about AI, but evidently you&#x27;re not.<p>Meanwhile, you&#x27;d have to have rocks in your head to argue that AI <i>doesn&#x27;t</i> constitute scientific progress at a bare minimum.
wat1000016 days ago
Actual judges on actual courts seem to think DRM is fine. So I’m confused. Do you reject laymen interpreting the law and only accept the evaluation of a judge, as indicated by your first comment? Or do you reject what judges say and go with your own “plain reading”? Seems like you’re confused about who’s qualified to say what constitutes lawbreaking.
CamperBob216 days ago
You do understand who the Constitution was written for, right? It wasn&#x27;t written primarily for interpretation by judges. Judicial review came along later. It was written for you and me, and for the legislators we elect.<p>I don&#x27;t view any decision or legislation that grants unbreakable DRM the force of law as legitimate. A work should benefit from temporary legal protection or permanent technical protection, but not both. My position is that if the founders had meant something other than a &quot;Limited Time,&quot; they would have said so. If you disagree, great, but that means we&#x27;re done here.<p>Matters such as whether AI training is fair use are better subjects for judicial review, IMO, because there&#x27;s no plain language to go by. Of course I reserve the right to disagree with <i>that</i> decision, and to subsequently ignore it, in keeping with the spirit of the times. :)<p>And a billion people in China will respect a copyright-maximalist decision even less than I will.
miohtama19 days ago
I wish there would be an open fund that allows me to do opposite and the fund would countersue copyright holders for holding development back and demanding excessive mafia payments
bluefirebrand19 days ago
People getting paid for the work they do is offensive to you?
wand3r19 days ago
I personally find this argument really lazy. In a very reductionist reframing, independent artists who uploaded some art to the internet for fun believe that AI shouldn&#x27;t be allowed to exist without them being paid, essential alleging their contribution to AI is fundamental to it&#x27;s existence. I would be a lot more receptive to the fact that all humans generally contributed to the information this system consumed and we enact some democratic law that 15% of all profits flow into some public tax fund, rather than litigate every single instance of potential copywrite on the per person or organizational level.<p>There are obviously laws that differ in every region but at a philosophical level I believe in the ideal of fair use. An AI is a distinctly different &quot;work&quot; than these originals and much like a human&#x27;s own output is informed by all the information they have taken in over their lifetime, so is the output of a model.
sensanaty18 days ago
If these AIs can&#x27;t exist without also gobbling up those artist&#x27;s work, then yes? You can&#x27;t have it both ways, either their artwork is worthless for the purposes of training an AI (in which case there should be no problem not hoovering up their art, right?) or it&#x27;s worth <i>something</i> and they should be compensated for it.
wand3r18 days ago
You are entitled to your opinion. Personally, I would only be able to accept your worldview if these artists grew up on something like an island without books or internet and pursued their craft 100% intuitively without any external influence. Then they could make a claim their work was 100% original. Otherwise, I find all human output to be derivative and build off the body of work of the entire race. This is one of mankind&#x27;s greatest advantages IMO.<p>edit: When many make this argument, what they are really saying is &quot;big fucks small&quot;. This may not be what you are saying, but seems to be the general philosophy of many who make this argument. I am sympathetic to that which is why I believe we should have something like a 15% tax or 2% of revenue of AI paid into a general tax fund. I find it impossible to litigate how much a news article should be &quot;worth&quot; when 400 of the same news article were written the same day with the value immeadiately diminishing after the &quot;news&quot; was new.
lmm18 days ago
Copyright is bad, but one rule for the rich and another for the poor is even worse.
jongjong19 days ago
IMO the legal system is in disarray due to extreme asymmetries in how the law is selectively applied.<p>First of all, the way certain platforms get sued for certain activities while others are left alone is unfair and creates significant market distortions.<p>Then there is the fact that wealthy individuals have much better legal representation than non-wealthy individuals.<p>Then there are tax loopholes which create market asymmetries above that.<p>The word &#x27;fair&#x27; doesn&#x27;t even make sense anymore. We&#x27;ve got to start asking; fair for who?