4/21/2025 at 7:49:55 PM
> To train, develop, and improve the artificial intelligence, machine learning, and models that we use to support our Services. We may use your Log and Usage Information and Prompts and Outputs Information for this purpose.https://windsurf.com/privacy-policy
Am I the only one bothered by this? Same with Gemini Advanced (paid) training on your prompts. It feels like I’m paying with money, but also handing over my entire codebase to improve your products. Can’t you do synthetic training data generation at this point, along with the massive amount of Q/A online to not require this?
by rudedogg
4/21/2025 at 8:50:55 PM
Oh, that's not great. Cursor has a privacy mode where you can avoid this.>If you enable "Privacy Mode" in Cursor's settings: zero data retention will be enabled, and none of your code will ever be stored or trained on by us or any third-party.
by graeme
4/22/2025 at 7:17:31 AM
Important notice: this is off by default, if you use cursor consider activating this option.by jeanlucas
4/22/2025 at 7:20:14 AM
this kind of flag is the trust me bro we hear since foreverby nbittich
4/21/2025 at 7:58:08 PM
Yeah that's a bad look. If I have an API key visible in my code does that get packaged up as a "prompt" automatically? Could it be spat out to some other user of a model in the future?(I assume that there's a reason that wouldn't happen, but it would be nice to know what that reason is.)
by simonw
4/22/2025 at 12:38:25 AM
I wonder how hard it is to fish the keys out of the model weights later with prompting . Presumably possible to literally brute force it by giving it the first couple chars and maybe an env variable name and asking it to complete itby Havoc
4/21/2025 at 10:57:09 PM
I'm also interested in the details on how this works in practice. I know that there was a front page post a few weeks ago about how Cursor worked, and there was a short blurb about how sets of security prompts told the LLM to not do things like hard code API keys, but nothing on the training side.by isjustintime
4/21/2025 at 8:30:00 PM
Gemini doesn't use paid API prompts for training.[1]I believe its just for free usage and the web app.
by Workaccount2
4/21/2025 at 8:45:28 PM
Yeah, I was referring to their webapp/Chat, aka Gemini Advanced. It uses your prompts for training unless you turn off chat history completely, or are in their “Workspace” enterprise version.https://support.google.com/gemini/answer/13594961?hl=en
> What data is collected and how it’s used
> Google collects your chats (including recordings of your Gemini Live interactions), what you share with Gemini Apps (like files, images, and screens), related product usage information, your feedback, and info about your location. Info about your location includes the general area from your device, IP address, or Home or Work addresses in your Google Account. Learn more about location data at g.co/privacypolicy/location.
Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.
Gemini Apps Activity is on by default if you are 18 or older. Users under 18 can choose to turn it on. If your Gemini Apps Activity setting is on, Google stores your Gemini Apps activity with your Google Account for up to 18 months. You can change this to 3 or 36 months in your Gemini Apps Activity setting.
by rudedogg
4/21/2025 at 8:34:46 PM
That's what I thoughtby Alifatisk
4/21/2025 at 8:45:37 PM
Without exception, every AI company is a play for your data. AI requires a continuing supply of new data to train on, it does not "get better" merely by using the existing trainsets with more compute.Furthermore, synthetic data is a flawed concept. At a minimum, it tends to propagate and amplify biases in the model generating the data. If you ignore that, there's also the fundamental issue that data doesn't exist purely to run more gradient descent, but to provide new information that isn't already compressed into the existing model. Providing additional copies of the same information cannot help.
by kmeisthax
4/21/2025 at 9:07:41 PM
it does not "get better" merely by using the existing trainsets with more compute.Pretty sure it does - that’s the whole point of using more test time compute. Also, a lot of research efforts goes into improving data efficiency.
by kadushka
4/21/2025 at 8:48:14 PM
> Same with Gemini Advanced (paid) training on your promptsI'm not sure if this is true.
> 17. Training Restriction. Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.
https://cloud.google.com/terms/service-terms
> This Generative AI for Google Workspace Privacy Hub covers... the Gemini app on web (i.e. gemini.google.com) and mobile (Android and iOS).
> Your content is not used for any other customers. Your content is not human reviewed or used for Generative AI model training outside your domain without permission.
> The prompts that a user enters when interacting with features available in Gemini are not used beyond the context of the user trust boundary. Prompt content is not used for training generative AI models outside of your domain without your permission.
> Does Google use my data (including prompts) to train generative AI models? No. User prompts are considered customer data under the Cloud Data Processing Addendum.
by parliament32
4/21/2025 at 9:00:32 PM
Right, it's the free Gemini that has this: https://ai.google.dev/gemini-api/terms#unpaid-services> When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.
by simonw
4/21/2025 at 9:02:11 PM
That’s for Google Cloud APIs.See my post here about Gemini Advanced (the web chat app) https://news.ycombinator.com/item?id=43756269
by rudedogg
4/21/2025 at 8:41:37 PM
Windsurf: where the users provide the wind and they do all the surfing.by amelius
4/22/2025 at 5:01:52 AM
Looks like the correct page is this: https://windsurf.com/securityIt says:
Zero-data retention mode is the default for any user on a team or enterprise plan and can be enabled by any individual from their profile page.
With zero-data retention mode enabled, code data is not persisted at our servers or by any of our subprocessors. The code data is still visible to our servers in memory for the lifetime of the request, and may exist for a slightly longer period (on the order of minutes to hours) for prompt caching The code data submitted by zero-data retention mode users will never be trained on. Again, zero-data retention mode is on by default for teams and enterprise customers.
by ayi
4/21/2025 at 11:52:47 PM
Nope literally never going to use it because of this.by bn-l
4/21/2025 at 10:51:05 PM
Hey we all want to keep and eat the cake, but I'm (kinda?) surprised that people expect these services that have been trained on large swaths of "available" data and now don't want to contribute. Even if you're paying: why the selfishness?by 627467
4/21/2025 at 10:56:14 PM
I think it is more of, LLMs should be treated as a utility service. Unless Google and others can clearly show the training data involved, the price that providers can charge for LLMs should be capped. I have no issue with contributing my conversations and my open source code, and I should expect in return a fair price.by sdesol
4/21/2025 at 8:18:04 PM
it's the reason they bought it...by blibble
4/21/2025 at 8:34:34 PM
No way Gemini Advanced user content is also being used for training?by Alifatisk