Amp boosts codebase search with Gemini 3 Flash for 3× speed

Amp swapped Haiku 4.5 for Gemini 3 Flash in its codebase search subagent, yielding about 3× faster responses while matching F1 quality. The new model issues more parallel calls and finishes in fewer iterations.

Amp boosts codebase search with Gemini 3 Flash for 3× speed

TL;DR

  • Gemini 3 Flash replaces Haiku 4.5 in Amp’s codebase search subagent
  • ≈3× faster while maintaining same result quality (F1)
  • Increased parallelism: ~2.5 → ~8 parallel tool calls per iteration
  • Fewer iterations: typical searches drop from ~9 turns to ~3 turns
  • Exploration shift: issues more diverse queries and can conclude early once confidence thresholds are met
  • Backend-only change for the codebase search subagent; performance vs. latency places Gemini 3 Flash in a high-F1/low-latency zone

Amp swaps Haiku 4.5 for Gemini 3 Flash in its codebase search subagent

Amp’s codebase search subagent now uses Gemini 3 Flash in place of Haiku 4.5, producing roughly 3× faster performance while maintaining the same result quality. The change centers on how the model coordinates tool calls and drives search iterations, resulting in noticeably lower latency for equivalent F1 scores.

What changed under the hood

The new subagent behavior is characterized by greater parallelism and shorter search loops:

  • Parallel tool calls: Haiku 4.5 averaged about 2.5 parallel calls per iteration, whereas Gemini 3 Flash issues approximately 8 parallel calls.
  • Iterations to completion: Searches that needed around 9 turns with Haiku now tend to finish in about 3 turns with Gemini 3 Flash.
  • Exploration strategy: Gemini 3 Flash issues more diverse queries and can conclude early when sufficient evidence is gathered.

These shifts enable the subagent to explore the codebase more broadly in each turn and to terminate once confidence thresholds are met, rather than continuing longer multi-turn exchanges.

Performance profile

A performance vs. latency chart accompanying the announcement positions Gemini 3 Flash in the optimal zone: high F1 score combined with low latency. The visual emphasizes the tradeoff improvement — similar quality outcomes delivered with substantially reduced response times.

Performance vs Latency chart showing Gemini 3 Flash in the optimal zone: high F1 score, low latency

Context and effect

The swap affects Amp’s codebase search subagent specifically and is presented as a backend model change to improve throughput without degrading search accuracy. The core measurable benefits reported are increased parallelism per iteration and a reduced number of iterations to reach the same quality, producing the overall ~3× speedup.

Original source: https://ampcode.com/news/gemini-3-flash-search

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community