JetBrains wants your data to build smarter AI

JetBrains wants your data to build smarter AI

AI coding tools are only as good as the data they learn from, and JetBrains now wants developers’ real-world coding activity to push its assistants beyond toy examples — but whether you opt in comes down to how much you trust their promise of security and control.

A few years ago, “AI in coding” felt like science fiction. The idea that an editor might guess your next line or even write a method for you was relegated to demos, research labs, or sci-fi blog posts. But gradually, we’ve seen that become reality: tools like GitHub Copilot, ChatGPT-powered plugins, and built-in assistants in IDEs are now part of many developers’ daily workflows.

Still, there’s a problem. These tools often struggle when faced with real-world codebases—big, messy, full of legacy baggage, tangled dependencies, and domain-specific behavior. Their suggestions sometimes break things, hallucinate logic, or just feel disconnected from your project’s architecture.

That mismatch isn’t surprising. Most models are trained on public or open-source repositories, or synthetic examples. They don’t see the daily realities you deal with in private, production, or enterprise code. So while the “AI hype” is real, its usefulness is still constrained by the data it’s built on.

That’s the backdrop for JetBrains’ recent announcement about real-usage data sharing. They’re asking: if developers allow it, can tools learn from how you code—securely, privately, and only when you consent—to become truly helpful in your daily work?

Let’s explore how we got here, what JetBrains is proposing, and what you should know before opting in.

A short history

Most developers are familiar with classic autocompletion—typing pri and getting printf(...) or print(...) suggestions. That’s been in IDEs for decades. Over time, these features got richer (method suggestions, type-aware completions, smart imports).

autocompletion code gif
classic (:

Recently, the shift has been toward AI-powered models that can generate multiple lines, infer intent, or even refactor code. JetBrains launched its AI Assistant (still evolving) which connects your IDE to large language models (LLMs).

They also introduced Full Line Code Completion, a model that runs locally inside the IDE, offering suggestions without needing to call the cloud.

And then came Mellum—JetBrains’ own purpose-built LLM for code completion. It was built with developer workflows in mind, and in 2025 JetBrains open-sourced it, making it available publicly.

Mellum is designed to understand context more deeply, reason about your project, and improve over time. But to reach its full potential, JetBrains needs real developer behavior data—how you code, how often suggestions are accepted, how AI features are used. That’s what the new data-sharing initiative is about.

What JetBrains is proposing

JetBrains is rolling out a data-sharing program in upcoming IDE updates (starting with 2025.2.4) to collect more detailed data about how developers use their tools.

They already collect anonymous telemetry (feature usage, time spent, clicks).

But they now want, when allowed:

  • Edit history
  • Interaction with terminals
  • Prompts you send to the AI features and their responses
  • Code snippets around those prompts

They believe this kind of detailed, real usage data is critical for making AI assistants smarter, safer, and more aligned with how you actually build software.

They promise:

  • You stay in control—opt in or opt out at any time
  • Only specific JetBrains teams working on LLM/dev tools see the data
  • No selling or sharing with third parties
  • Compliance with EU data laws
  • Mechanisms to withdraw consent and limit retention

They also clarify that anonymous telemetry vs detailed data are different: only detailed data is new and more sensitive.

What changes, what stays the same

Let’s break down who is affected and how:

  • Non-commercial (hobby / open-source / educational)
    Detailed data sharing is on by default, but you can disable it in settings.
  • Commercial / paid / enterprise users
    Nothing will change by default. You’ll need to opt in (or you may not see any change if your organization doesn’t allow it).
  • Organizations / teams
    Admins must explicitly enable detailed sharing. This helps avoid accidental leaks of proprietary IP.
  • Retention & deletion
    JetBrains states collected detailed data will not be stored indefinitely. In earlier documents, they mention retention periods (e.g. one year) and policies for removal after opt-out.
  • Transmission when not opted in
    If you don’t opt into detailed data, your AI inputs and outputs are sent to LLM providers as needed, but JetBrains won’t persist them.

You might ask: is this safe? Is it worth opting in?

Benefits

  • Over time, AI features could actually become reliable in your everyday work—not just toy examples
  • Smarter completions, fewer hallucinated suggestions, better alignment with your coding style
  • Improvements to security / false-positive detection
  • Being part of shaping the tools you use

Risks/concerns

  • Even with filters and anonymization, there’s always a residual risk of exposing sensitive code
  • You must trust that JetBrains enforces strict access, audits, and privacy controls
  • Opting in today only helps future models—not necessarily what you see right now
  • If your organization forbids external data sharing, enabling might be legally or contractually disallowed

What to read (carefully)

What to think about before opting in

Take time to read JetBrains’ FAQ and legal documents carefully, line by line.

Enable detailed data sharing only if you’re confident in their privacy and security practices.

If you do opt in, keep an eye on how the AI behaves. Ask yourself: are completions getting better? Is anything surfacing that feels too revealing?

Remember, you can withdraw consent instantly. If something doesn’t sit right, turn it off.

If enough developers choose to participate, the shared insights could make AI tools far more effective in real-world coding.

Leave a comment

Your email address will not be published. Required fields are marked *