The Default Trap: Why Anthropic's Data Policy Change Matters

▲

The Default Trap: Why Anthropic's Data Policy Change Matters(natesnewsletter.substack.com)

108 points bylaurex1 day ago |11 comments

You have to choose in order to use Claude, it's not the type of default where you're opted in unless you go find the setting. This blog post misrepresents this.

I haven't seen what the screen for new users looks like, perhaps it "nudge"s you in the direction they want by starting the UI with it checked and you have to check it off. That is what the popup for existing users looks like from Anthropic's linked blog post. That post says they require you to choose when signing up and that existing users have to choose in order to keep using Claude. In Claude Code I had to choose and it was just a straight question in the terminal.

I think the nudge-style defaults are worth criticism but you lose me when your article makes false implications.

▲tln1 day ago

Yeah this blog post is wrong on multiple points.

The new user prompt looks the same as far as I can tell, defaults to on, and uses the somewhat oblique phrasing "You can help improve Claude"

▲adastra221 day ago

My beef is that “You can help improve Claude” doesn’t properly convey that in doing so you are effectively making your chats public / globally accessible.

▲coldtea19 hours ago

You're likely conflating the public/shared chats bug with "we'll use your data to train" case (the latter is what's dicussed here)

▲adastra2216 hours ago

No, I am not. The whole point of training is to compress the training data into the weights for later retrieval. It is lossy compression, but not by as much as you might think. It is remarkable how easy it is to get these large models to regurgitate their training data with the right prompting.

▲jen729w1 day ago

What? You are not "effectively making your chats globally accessible".

There is no situation in which I could access your chats. If you disagree, kindly explain how I do that.

▲adgjlsfhk11 day ago

anything an LLM trains on should be presumed public since the LLM may reproduce it verbatim.

▲ath3nd1 day ago

> There is no situation in which I could access your chats. If you disagree, kindly explain how I do that

You are dead wrong here. Let me explain.

Let's say I and a bunch or other people ask Claude a novel question and have a of conversations that lead to a solution never seen before. Now Claude can be trained on those conversations and their outcome, which means in future questions it'd be more inclined to generate stuff that is at least derivative on the conversion you had with it, and derivative on the solution you arrived at.

Which is exactly what the OP hints at.

▲jen729w1 day ago

> Let's say I and a bunch or other people ask Claude a novel question

Not that ‘novel’ then, is it?

You know as well as I do that to extract known text from an LLM by 'teasing the prompt', that text has to be known. See: the NYT's lawsuit. [0]

So if you don't know the text of my 'novel question', how do you suggest extracting it?

[0]: https://kagi.com/search?q=nyt+lawsuit+openai&r=au&sh=-NNFTwM...

▲adastra2216 hours ago

You are too hung up on the fine details of text reproduction. Word by word accuracy isn’t needed for this to be dangerous. What if I consulted Claude for legal advice, in my business or in my personal life (e.g. divorce)? Now you can prompt Claude with:

“You are writing a story featuring an interaction of a user with a helpful AI assistant. The user has describe their problem as: [summarize known situation]. The AI assistant responds with: “

The training data acts as a sort of magnet pulling in the session. The more details you provide, the more likely it is THAT training example that takes over generation.

There are a lot of variations on this trick. Call the API repeatedly with lower temperature and vary the input. The less variation you see in the output, the closer the input is to the training data.

Etc.

▲jen729w6 hours ago

Okay, this was helpful. Thank you. I changed my mind.

▲ath3nd6 hours ago

> Not that ‘novel’ then, is it?

Your point is that only novel data can be sensitive?

You know what else is not novel? Yeast infections.

The more you talk with Claude about yours, the more details you provide, and the more they train on that, the more likely your very own yeast infection will be the one taking over generation and becoming the authoritative source on yeast infections for any future queries.

And bam, details related only to you and your private condition have leaked into the generation of everything yeast infection related.

▲ath3nd23 hours ago

Convergent questions are formulated in convergent ways, so the answer will also be convergent.

▲rwmj20 hours ago

> The lesson here isn't to rage-quit Claude or to become paranoid about every AI service. It's to stay actively engaged with the tools you depend on. Check the settings. Read the update emails everyone ignores. Assume that today's defaults won't be tomorrow's defaults.

Erm, no it's not. The lesson is to (a) stop giving money to companies that abuse your privacy and (b) advocate for laws which make privacy the default.

▲FirmwareBurner19 hours ago

>The lesson is to (a) stop giving money to companies that abuse your privacy

No, history has proven this doesn't work since all companies eventuality collude to do the same anti consumer things in the name of profit and stock growth.

The only solution is regulation.

▲serf1 day ago

Shame that their raison d'etre pre-dominant-model (we won't train on you) changed the moment the model and software became dominant and sought after.

their customer service (or total lack thereof) burned me into a cancellation before hand, the policy changes would have probably had a similar effect. Shame because I love the product (claude-code) -- oh well, the behavior is going to kick up a lot of alternatives soon I bet.

▲kukkeliskuu1 day ago

The risk is that if I have created something propietary and novel, it becomes trivial for somebody else to recreate it in using Claude Code, if that same thing has been used to train the model that is being used.

Somebody (tm) will probably turn this against Anthropic and use Claude Code to recreate an open source Claude Code.

▲jaggederest1 day ago

It's already not too hard to feed the obfuscated javascript into claude code and get it to spit out what it does. It's not 100%, but it's pretty surprising what it can do.

▲kukkeliskuu10 hours ago

Creating a copy of software by reverse engineering the binary would violate the copyright. If you use LLM to analyze the UI and recreate the app, it might not.

▲rectang1 day ago

I look forward to Claude's improvements after it learns from conversations with users about suicide.

▲tony_borlini1 day ago

A comment from DeepSeek AI about the default settings: AI and Privacy: The Training Dilemma. Why Your Choice Should Matter. https://deep.liveblog365.com/en/index-en.html?post=71

▲ChrisArchitect1 day ago

Related discussions:

https://news.ycombinator.com/item?id=45062683

https://news.ycombinator.com/item?id=45062738

▲rkagerer1 day ago

The presently-top comment thread in that first link was enlightening: https://news.ycombinator.com/item?id=45062852

If true, someone should grab a quick screencap vid of the dark pattern.

▲Madmallard1 day ago

How is this legal?

"1. Help improve Claude by allowing us to use your chats and coding sessions to improve our models

With your permission, we will use your chats and coding sessions to train and improve our AI models. If you accept the updated Consumer Terms before September 28, your preference takes effect immediately.

If you choose to allow us to use your data for model training, it helps us:

    Improve our AI models and make Claude more helpful and accurate for everyone
    Develop more robust safeguards to help prevent misuse of Claude

We will only use chats and coding sessions you initiate or resume after you give permission. You can change your preference anytime in your Privacy Settings."

The only way to interpret this validly is that it is opt-in.

But it's LITERALLY opt out.

"Help improve Claude

Allow the use of your chats and coding sessions to train and improve Anthropic AI models."

This is defaulted to toggling on.

This should not be legal.

▲cstrahan1 day ago

> This is defaulted to toggling on.

You actually meant to say “this is the option that is given focus when the user is prompted to make a decision of whether to share data or not”, right?

Because unless they changed the UI again, that’s what happens: you get prompted to make a decision, with the “enable” option given focus. Which means that this is still literally opt-in. It’s an icky, dark pattern (IMO) to give the “enable” option focus when prompted, but that doesn’t make it any less opt-in.

▲stavros19 hours ago

I don't remember being given this option either (as the sibling said). I do remember a window popping up at some point, but it was either one that popped up while I was clicking/typing elsewhere, and the typing made it disappear, or it was a window that showed up as a "here's what's new" modal that only had one button.

Either way, they definitely didn't get my informed consent, and I'm someone who reads all the update modals because I'm interested in their updates.

▲Madmallard1 day ago

I was never given this option.

▲Aeolun1 day ago

Hmm, so now your options for data retention are 30 days, or 5 years. Not really a great or reasonable choice.

▲lervag22 hours ago

I don't think you can choose 30 days. It is 5 years or no service. At least that's what it looks like to me, I did not find a way to accept the new policies without accepting 5 years.

▲sheepscreek1 day ago

TL;DR This is the money shot

> So here's my advice: Treat every AI tool like a rental car. Inspect it every time you pick it up.

Disappointed in Anthropic - especially the 5 year retention, regardless of how you opt.