Cursor New Tab Model Promises Significant Autocompletion Improvement

Cursor applies reinforcement learning to reduce noisy inline suggestions

Cursor’s Tab feature predicts the next editing action across a codebase and offers inline suggestions that can be accepted with Tab. Because Tab runs on every user action—handling over 400 million requests per day—there is a rich signal about which suggestions are accepted and which are ignored. That signal has been used to retrain the model with online reinforcement learning, producing a new default Tab model with measurable improvements in suggestion relevance.

The problem: noisy suggestions, not just smarter models

A high accept rate matters because irrelevant suggestions interrupt flow. Improving accept rates requires not only better prediction but also a sense of when not to offer anything at all. Previous approaches have added a separate binary filter that predicts whether a suggestion should be shown; an example of this approach can be found in Parth Thakkar’s examination of GitHub Copilot’s contextual filter score: https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.

Instead of a separate filter, Cursor integrates the decision into the Tab policy itself. The objective is to have the model learn both what to suggest and when to stay silent, leveraging the model’s internal representations rather than a downstream binary classifier.

Policy-gradient methods for suggestion policies

Cursor frames Tab as a policy optimization problem and applies policy gradient techniques to maximize an expected reward tied to suggestion utility. An explicit reward scheme encourages accepted suggestions and penalizes shown-but-rejected ones. As an illustrative example in the training setup:

a reward of 0.75 is given for an accepted suggestion,
a reward of −0.25 for a rejected suggestion,
a reward of 0 when nothing is shown.

With that scheme, showing a suggestion is beneficial only when the model’s estimated accept probability exceeds 25%. In practice, the deployed reward is more complex, accounting for suggestion size and scenarios where the editor jumps to other locations and triggers further suggestions. The policy-gradient update uses samples of (state, action, reward) and the gradient of log-probabilities to push the model toward actions that yield higher reward.

The need for on-policy data and fast rollouts

A key theoretical point is that policy gradient estimates are unbiased only when actions are sampled from the policy being optimized. That forces a workflow where new checkpoints are deployed to users to collect fresh data reflecting the current policy. Cursor’s infrastructure enables relatively rapid iteration: deploying a new checkpoint and collecting the subsequent interaction data takes 1.5 to 2 hours. This short loop is central to keeping updates on-policy and to the practical success of the method.

Results

Using this online reinforcement learning approach, Cursor reports a new Tab model that makes 21% fewer suggestions while achieving a 28% higher accept rate on the suggestions it does present. Those numbers indicate a move toward fewer, more relevant inline completions by integrating the decision to suggest directly into the suggestion policy.

For more detail, see the original post: https://cursor.com/blog/tab-rl

TL;DR

Cursor applies reinforcement learning to reduce noisy inline suggestions

The problem: noisy suggestions, not just smarter models

Policy-gradient methods for suggestion policies

The need for on-policy data and fast rollouts

Results

Continue the conversation on Slack

Related Articles

AI Coding Agents Boost Merged PRs by 39% and Shift Workflows, Study Finds

Cursor Adds Enterprise Controls for Safer, Observable Agent-Driven Coding

Cursor 2.0: Composer and Multi-Agent Interface for Faster Agentic Coding