JetBrains wants your data to build smarter AI

A few years ago, “AI in coding” felt like science fiction. The idea that an editor might guess your next line or even write a method for you was relegated to demos, research labs, or sci-fi blog posts. But gradually, we’ve seen that become reality: tools like GitHub Copilot, ChatGPT-powered plugins, and built-in assistants in IDEs are now part of many developers’ daily workflows.

Still, there’s a problem. These tools often struggle when faced with real-world codebases—big, messy, full of legacy baggage, tangled dependencies, and domain-specific behavior. Their suggestions sometimes break things, hallucinate logic, or just feel disconnected from your project’s architecture.

That mismatch isn’t surprising. Most models are trained on public or open-source repositories, or synthetic examples. They don’t see the daily realities you deal with in private, production, or enterprise code. So while the “AI hype” is real, its usefulness is still constrained by the data it’s built on.

That’s the backdrop for JetBrains’ recent announcement about real-usage data sharing. They’re asking: if developers allow it, can tools learn from how you code—securely, privately, and only when you consent—to become truly helpful in your daily work?

Let’s explore how we got here, what JetBrains is proposing, and what you should know before opting in.

A short history

Most developers are familiar with classic autocompletion—typing pri and getting printf(...) or print(...) suggestions. That’s been in IDEs for decades. Over time, these features got richer (method suggestions, type-aware completions, smart imports).

Recently, the shift has been toward AI-powered models that can generate multiple lines, infer intent, or even refactor code. JetBrains launched its AI Assistant (still evolving) which connects your IDE to large language models (LLMs).

They also introduced Full Line Code Completion, a model that runs locally inside the IDE, offering suggestions without needing to call the cloud.

And then came Mellum—JetBrains’ own purpose-built LLM for code completion. It was built with developer workflows in mind, and in 2025 JetBrains open-sourced it, making it available publicly.

Mellum is designed to understand context more deeply, reason about your project, and improve over time. But to reach its full potential, JetBrains needs real developer behavior data—how you code, how often suggestions are accepted, how AI features are used. That’s what the new data-sharing initiative is about.

What JetBrains is proposing

JetBrains is rolling out a data-sharing program in upcoming IDE updates (starting with 2025.2.4) to collect more detailed data about how developers use their tools.

They already collect anonymous telemetry (feature usage, time spent, clicks).

But they now want, when allowed:

Edit history
Interaction with terminals
Prompts you send to the AI features and their responses
Code snippets around those prompts

They believe this kind of detailed, real usage data is critical for making AI assistants smarter, safer, and more aligned with how you actually build software.

They promise:

You stay in control—opt in or opt out at any time
Only specific JetBrains teams working on LLM/dev tools see the data
No selling or sharing with third parties
Compliance with EU data laws
Mechanisms to withdraw consent and limit retention

They also clarify that anonymous telemetry vs detailed data are different: only detailed data is new and more sensitive.

What changes, what stays the same

Let’s break down who is affected and how:

Non-commercial (hobby / open-source / educational)
Detailed data sharing is on by default, but you can disable it in settings.
Commercial / paid / enterprise users
Nothing will change by default. You’ll need to opt in (or you may not see any change if your organization doesn’t allow it).
Organizations / teams
Admins must explicitly enable detailed sharing. This helps avoid accidental leaks of proprietary IP.
Retention & deletion
JetBrains states collected detailed data will not be stored indefinitely. In earlier documents, they mention retention periods (e.g. one year) and policies for removal after opt-out.
Transmission when not opted in
If you don’t opt into detailed data, your AI inputs and outputs are sent to LLM providers as needed, but JetBrains won’t persist them.

You might ask: is this safe? Is it worth opting in?

Benefits

Over time, AI features could actually become reliable in your everyday work—not just toy examples
Smarter completions, fewer hallucinated suggestions, better alignment with your coding style
Improvements to security / false-positive detection
Being part of shaping the tools you use

Risks/concerns

Even with filters and anonymization, there’s always a residual risk of exposing sensitive code
You must trust that JetBrains enforces strict access, audits, and privacy controls
Opting in today only helps future models—not necessarily what you see right now
If your organization forbids external data sharing, enabling might be legally or contractually disallowed

What to read (carefully)

The FAQ from JetBrains’ legal docs (what is collected, who sees it, what’s allowed)
Data retention policy (how long your data might be kept)
How JetBrains handles semantic indexing and embeddings (they may upload representations, not original code)