The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for MCP Dev Summit Bengaluru to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration..
IMPORTANT NOTE: Timing of sessions and room locations are subject to change.
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
As MCPs mature, a gap emerges between benchmark performance and production behaviour:
1. Servers are tested in isolation, however in production they run alongside 100s of other servers, which affects tool selection. Typical evaluation frameworks are unable to reproduce this scale. 2. Single tool calls cannot test workflow compliance (the order and dependency of tool calls across multi step tasks). 3. Benchmarks are unable to measure user experience or quantify transcript quality.
This talk presents the design philosophy for robust MCP evaluation, grounded on field data and traffic analysis from 400+ production MCP servers.
We introduce a set of success metrics for MCP server authors and show how this re-order benchmark leaderboards, why servers that top toy evals regress in production, and what server authors should measure before shipping.
Attendees leave with a framework they can apply directly, data to benchmark against, and a clearer view of how they can adopt MCP confidently at scale for enterprise and internal usecases.
Founder of Concierge AI. Ex-Uber building MCP systems at scale. Concierge AI manages 400+ public MCP deployments, Arnav focuses on MCP tool complexity and researches token overhead reduction at scale.