What follows is a version of an email I sent our entire R&D team about an explicit goal and deliberate action we’ll take to become twice as productive through our embrace of AI.
Advancements in AI are precipitating the most transformative shift in software engineering of the past 25 years. It’s dramatically expanding what software can do and transforming how we build it. We are all novices in this new world, and that’s filling me with wide-eyed excitement.
If we were to literally hit pause on further advancements, I’m convinced any engineering team just leveraging the already existing tools effectively should expect at least double their current productivity – a 2x improvement. Yet most people and teams in the industry at large are not getting close to this today, they aren’t trying, and they probably don’t believe it’s possible, and even if they do, behavior change is hard and the forces or incentives aren’t clear yet.
Those who get the greatest yield will be those who believe in the power and potential of these new tools and push themselves to best understand and wield their power.
Intercom is p99+ in the industry at building AI products. We’ve been investing in AI for a decade, long before it was cool. When ChatGPT dropped, we were ready, we moved fast, made AI our #1 priority, and launched Fin the same day GPT-4 was released. Fin is the #1 AI Agent for Customer Service space, powered by one of the strongest ML, engineering and product teams in the industry.

When we initially used these new tools we were underwhelmed, but the pace of progress has been unprecedented. Models and tools built around them were getting better, FAST. As time passed the anecdotes we’d share or hear internally ramped up in volume and impressiveness. Our internal slack channels #ai-can-do-that or #ai-in-engineering-feedback are treasure troves – I’m inspired each day by the ways my colleagues are leaning in hard. We try to remove any friction that gets in the way of people using or trying new tools. I think we are doing far better than most companies on this journey – but we should aspire to be p99+ here too.
So, over the next 12 months, achieving this 2x productivity goal is not merely an aspiration – it’s an explicit goal. It’s critical, timely, and within our reach. We will make whatever changes are necessary to achieve this. Achieving this goal will be the combined focus of all disciplines within our R&D org.
I’m convinced the changes will be far bigger and more dramatic than this in the medium term. I toyed with titling this `10x`, which brings with it all sorts of connotations, but even then, on some relatively short time horizon 10x will underpitch the potential. With 2x we are simply trying to be grounded in what’s a realistic and tangible expectation for us to set for our first milestone.
Our pace of innovation has always been a competitive strength. We must fight now to maintain that, or we will lose it to companies born in an AI-First world. This goal represents that fight. It’s explicitly not a “nice to have”. It’s a hardened goal, it’s part of our expectations for designers, engineers, PMs. We will track ourselves as leaders, managers, teams, and ICs against it. Companies formed in this era will naturally be optimised around this, funding will flow towards the most visionary who are also showing they can execute faster than others, they’ll hire people who are leaning hardest into exploiting the frontier of what’s possible. Established companies will have to adapt fast or die slow-ish.
What would 2x actually mean?
Twice as much output is possible with the same team, and it’s within reach. This would mean:
- Executing our strategy faster.
- Taking on the projects that we’ve painfully had to deprioritise while we are capacity constrained. The ROI of these projects changes dramatically when the `I` is reduced 2x.
- Raising the ceiling on our ambition, on how bold and creative we can be.
- Raising the floor on quality, edge case bugs and papercuts in the product no longer need to be starved for attention.
- We’d have far more capacity to tackle technical debt in our codebase that contributes to us moving slower.
It’s worth noting this is an entirely different vector to “just hire more people” . Even if we allocated the budget to hire 2x as many people, at our scale, it’s highly improbable we’d double our team size in 12 months. Even if we did, that’d come at huge cost and tradeoffs, hiring and onboarding takes time and carries risk, so we’d be slower for a year or two hoping to then catch up. Many of these folks wouldn’t work out, that’s more hiring and onboarding. And even if you pull that all off, 2x the people almost certainly doesn’t give you 2x the output, maybe it gives you +50%, so now you’re taking the checkbook out again.
How do we make this tangible?
We’ve all seen the mindblowing 0 to 1 vibe coding demos (e.g. multi-player 3D flight game), a couple of prompts doing what could take days or weeks for a competent engineer, this can translate into 10-100x speed up. If all your work is of that nature – you’re in for a treat. But building out an initial idea is very different from iterating on real mission critical systems and collaborating with 100s of humans to do so.
We have a couple of massive code bases, our rails monolith, 2M+ lines of code, our JS front end for Intercom, 1m+ lines of code, and a few others for our AI systems, our infrastructure automation, our mobile apps, our monetization systems, our messenger, our websites, and a long tail of others.
Most meaningful changes are complex and span multiple of these, not to mention multiple people and disciplines and teams AND we need to take into consideration the impact to the tens of thousands of companies that use Fin or Intercom as a mission critical part of how they run their business.
Suffice to say, this isn’t easy mode, even getting to the first 2x will be hard fought. Here’s some of the areas of impact that I see aggregating together to get us past that 2x milestone. They are roughly ordered in complexity and potential ROI.
- Make humans faster at what they do. AI tools and AI augmentation. e.g. Better IDEs with inline support, tab completion, agent modes. A reasonable target here is that we all get about 50% faster at our work as a result.
- Enable non-engineers to contribute. This might be a PM or designer or EM making changes directly themselves (fix styling, make copy changes, perhaps fix bugs), rather than see those changes languish in a queue, and/or interrupt the focus of engineers.
- Reduce sync and collaboration overheads. Writing code is often not the bottleneck – it’s how we work together as humans. AI presents some opportunity to simplify this – and be less dependent on synching or handing off with other humans to make progress. Tooling that better enables engineers to ship code that looks exactly as designed reduces painful roundtrips and reviews with a designer. Excellent AI code review, means you are not waiting to interrupt a colleague to get actionable feedback on your work. We have an advantage here as many of our team are well able to flex into adjacent roles to some extent, engineers that could be product managers, designers who can code, PMs with great design instincts; these overlaps enable us to be more autonomous as individuals at times.
- Make ourselves redundant for parts of the jobs we do today. Some parts of our job, that others rely on us for, *could* be instead done directly by the people that need it. As an expert in that job, if you can find a way that AI (or otherwise) can enable your ‘customers’ to do it themselves as easily as ask you to do it. In doing so you’ve not just freed up your focus for higher impact work, you’ve reduced their wait time, enabling them to complete their task quicker too. (e.g. infrastructure engineers, instead of manually fielding requests to spin up additional clusters, can provide strong templates, guardrails and automation so that the teams that need it can do that themselves)
- Recognise the problems where AI can give you 20-100x lift. This is especially relevant to work that follows a repeated pattern. Or problems that can be shaped to look like this. For example, we’ve worked hard to enable Fin to work on any customer service platform. It was months of hard work to figure out how to do that well for the first, weeks for the second, and now we’ve an understanding of the shape of the problem such that with use of our AI tools, it’s closer to hours or days to extend this to the next system we encounter demand for. Sahil had a great example of ~40x lift on the How I AI podcast with Claire Vo.
- Send large quantities of relatively simple work to an army of AI agents. There are some types of work that are plentiful but only have a high impact if you can do a huge volume of it. Rather than make your engineers 50% faster at tackling this endless stream, find ways to harness AI agents to do this work entirely, breaking the human scaling bottleneck. Often this is work that is harder to prioritise, but is constant toil, which you either pay for as friction, making it harder for you to do other work, or tax, something you need to keep paying away in the background to keep the friction at bay. Early examples of this are dead code/test deletion, unused feature flag removal, issue and exception triage and fixing, framework or dependency upgrades, code base refactoring.
In short, getting to 2x is a blend of raising the floor, making humans faster at what they do, finding occasional opportunities for home run hits, and finding ways to get work done without bottlenecking on other humans.
How will we measure it?
Ideally I’d like to measure how long it takes us, on average, to bring something from idea to being in the hands of our customers. Can we cut that in half consistently? This is a full team sport involving all disciplines in our R&D org. If you were to think about t-shirt size estimates, we’d be putting them all in a hot wash and shrinking them down a couple of sizes. I can’t think of a great way to operationalize that today, so we will start with something potentially flawed but pragmatic.
Our culture is already quite optimised around continuous delivery; Shipping is our heartbeat (our teams ship hundreds of PRs per day). A change that makes it to production is a reasonable proxy for impact. We have separate mechanisms that ensure we are working on the right things, and that the work we do is good quality.
So, we will measure our progress here by measuring the average number of merged pull requests (PRs) per engineer per month (others have used this measure too). Our emphasis should be on the system that results in this output metric; on the efficiency in how we work, how we collaborate, how we leverage AI and other tools. We’ll be sure to look at other signals to make sure quality remains high and increases in this metric are actually correlated with impact.
This number goes up when non-engineers can make changes directly. It goes up when AI agents can make changes autonomously (or at least with human approval). It goes up when engineers find ways to be more productive themselves, or less reliant on handoff or synching with other humans. Obviously not all PRs are equal; small PRs can be super high value, large PRs can be low value, but as a proxy it’s good enough. We also already see some durable improvement on this metric already in the last few months (~20% YoY), giving me confidence that we can move it much further.
There are of course ways using this measure could be misused, or dumb unintended behaviors that emerge as a result – so some caution is needed. We will keep track of how this metric is working in practice, and iterate where we see opportunities to sharpen it. Despite valid concerns – such as the risk of cynical gamification – we believe it’s important to measure this and to be transparent about it. We trust ourselves and our teams to use it responsibly and with high utility.
Using this metric in isolation to try and measure an individual’s performance can be problematic, comparing two individuals on this metric alone is not a high quality signal. But using it as a lens to assess the health and productivity of a team or an organisation is incredibly powerful. Your org is a factory, the production line spits out Pull Requests, we need to understand and optimise the factory to maximise the throughput of PRs.
These two similar teams have quite a different PR throughput, why is that? What is different about the team and environment that is causing that? What can we learn or change? This engineer has the lowest throughput in the org, why is that? This team reviews and approves PRs far more quickly than the rest – what can learn from this team?
How will we achieve this 2x goal?
Honestly, that’s not all figured out. This message isn’t attempting to figure out the entire plan; it’s about setting the direction and initial goal. Some amount of the necessary change is happening organically already – there is good situational awareness amongst our team of how things are changing in our industry, and a curiosity to adapt. Our managers and leaders are already adapting their leadership focus to support this goal.
One excellent piece of writing and advice I often go back to is called ‘Demanding and Supportive’ (by Ravi Gupta). We should be very demanding of ourselves and each other, that’s reflected in our goal, AND we should do everything we can to support each other to be successful against this goal. A huge part of achieving this goal will be providing the support that makes it easier for everyone to be successful; this will include everything from frictionless access to the best tools available to regular and high quality training and enablement that accelerates the rate of learning across the team.
Over the coming weeks and months we’ll share and discuss various ideas or changes we are making that will help contribute towards this goal. We’ll share these internally and publicly on our blog. These will include everything from the changes we are asking of leaders, changes to our hiring, tweaks to our performance process, changes to cultural practices like code review, how our various roles are evolving, such as how designers are changing their tooling and workflows or how designers and PMs contribute directly to our software or how our researchers and analysts use AI to provide better insight faster. All this and more.
As an aside, the higher level of demands we will place on ourselves simply reflect how our industry will adjust over time too – I know we are already screening potential interview candidates based on them having strong examples of using AI to be more effective in their jobs.
What am I doing so far?
Especially in times of change, I feel strongly that managers and leaders need to be close to the detail, close to understanding how teams are working, the problems they face, and seeing first hand what’s working or not and where the opportunities are. This is very much in the spirit of Genchi genbutsu. I’ll be spending at least 1 week per month embedding with our engineering teams. Last week I spent a few days working with one of our staff engineers, Peter, exploring the challenges we face shifting from our now deprecated use of Ember.js to our preferred use of React for our front end applications. We see far higher productivity and frankly developer happiness working on React – and the AI tools are far more effective there. We were exploring ways we can make this shift faster, porting/rebuilding parts of our application, and how we can encourage this shift etc. I’ve also spent more time meeting with founders of companies building AI tooling for engineers, to understand how these tools will evolve and which are most promising, and I’ve spent time talking to industry peers 1-1 and on podcasts about this topic, and writing and discussing it with leadership teams internally.
My advice to you?
A common sentiment I hear is that “I’m too busy to properly try these new tools”. I get it, we are all working against challenging deadlines, or snowed under in meetings or interviews or reactive work. With the belief that the payback will be worth it, we have to punch through those excuses. Take time off the production line to just play, to experiment. In our developer surveys we know ‘Not having time’ is the primary reason people cite for not trying new tools. Managers and leaders will support you taking the time to skill up. Aside from time during your work day, if ever there was a time in your career to lean into your craft it’s now, turn off Netflix and put down the doomscroll device and just start playing with the new tools. Try a task once, aiming for AI to solve it entirely, if it works first time, you fluked it, when it doesn’t work, reflect on what you could try differently, then try that, rinse/repeat 10 times. Use different tools. Get a feel for which works more naturally for you. This is more fun in pairs, as you can spark off each other, and both try different approaches to the same task in parallel.
Start every piece of work with a mindset of “there must be a way AI can help me do this quicker/better, I’m determined to find it”. I expect every manager will be supportive and encouraging of you carving out time, pushing back the occasional deadline if necessary in the near term. I also expect managers to themselves get more hands on and close to the work on their teams, and familiar with the tools too, so they can better support and influence the journey to 2x and beyond.
Buckle up!
Like I started with, 2x is simultaneously a conservative milestone to aim for, but also one that will be a massive game changer for our customers and our business. We have by no means figured this out, I know there’s a degree of naivety and optimism behind this, but also complete belief and determination. It’s quite probable that changes in what’s possible will completely change the calculus here. Tools are improving at an insane rate (e.g. recent Claude 4 release, OpenAI Codex etc), and there are so many companies building products that chase the potential here. Fundamentally, this is about the interplay between humans and tools – the upside primarily hinges on shifts in human behaviour. If we merely sit back and wait passively for our tools to improve, we’ll fall short in absolute terms and relative to others. We must acknowledge that we’ve all become novices again, and commit to charting a fresh path toward mastery with these new and evolving tools.
It’s enticingly easy to build AI-powered software. But if you do it for the wrong reasons – or don’t grasp the trade-offs – you’ll likely waste time and money.
The breakthrough capability of LLMs has flipped SaaS on its head: the biggest prize is in building software that replaces human work, instead of building tools that help humans do that work.
Our focus is on AI-first customer service, and our vision is that in a short number of years, the vast majority of customer service interactions will be with AI, not humans. This is a multi-hundred-billion-dollar opportunity and success means enabling all businesses to be leaner and more nimble, while also providing world-class customer service.
Every company can benefit from this shift by redeploying talent and capital to innovation and growth. But we are still early in the adoption curve, and great outcomes are not guaranteed – they hinge on the choices you make. Two stand out:
- Whether to build or buy an AI agent.
- If buying, the AI agent you choose. (Hint: Fin.)
I have a simple mental model for ROI (return on investment): What percentage of CS volume can I automate? And how much does it cost relative to human costs?
A further important nuance is that as you increase the automation rate, you are doing harder and more costly work so the ROI gains accelerate.
I respect anyone who wants to build their own AI agent, but here are the pitfalls you must overcome to match – let alone surpass – a “buy” approach.
The fundamental pitfall: Falling short of “best-in-class”
It’s easy to automate low-hanging fruit. The real question is how high you can push your automation rate – and how much you’re willing to invest to keep improving it.
Assuming an average cost of $6.60 per human-assisted support interaction (ref Deloitte), each AI-resolved conversation can save you roughly $5-$6. If you only reach a 40% resolution rate, which is not easy, but an off-the-shelf solution can get you to 50% or higher, the difference quickly means you are falling short on ROI.
To illustrate this, I’ve worked out savings showing that even if your self-built solution costs you $0 (which is obviously unrealistic), it will still provide worse ROI than a solution with higher resolution rates.
This is the fundamental pitfall: building or picking a tool that falls short of best-in-class will leave money on the table. Your top priority should be to achieve and sustain best-in-class performance.
Underestimating the total cost of ownership
Launching your own AI agent is a forever project, not a one-time build. Spinning up a prototype can be cheap and easy, but a production-grade system demands constant attention from a talented team. Best-in-class is a constantly moving target, and the game is to match or beat it. This means ongoing feature improvements, model upgrades, prompt tuning, as well as the non-trivial, unglamorous work around availability, data privacy, and security.
Once you get going, you’ll realize that most of the work isn’t in the bits of the system you can see; it’s in all the parts around it that you use to train and control it, how you integrate with other systems, and how you define policies and processes it can reliably follow.
There is a lot of software to build, and if you under-invest you’ll quickly become a bottleneck, holding your company back from higher resolution rates, and once again miss out on higher ROI.
To make the previous example more realistic, I’ve factored in a conservative $50,000/month for an engineering team to own this. To just break even on ROI, you’ll need to match the performance of best-in-class tools you can buy and you’ll need massive volume (approximately one million conversations per month) to offset the ongoing costs of your team.
This makes me think that most people pursuing this path either haven’t figured out the economics or are doing it for other reasons without thinking about the price tag or opportunity cost.
As you drive higher and higher resolution rates, your AI agent will be solving more difficult and time consuming cases, delivering increasingly higher ROI. However, you should anticipate higher engineering costs as the work gets much more difficult too.
Deflection is easy, delight is hard – speed of improvement matters
Automation rate alone isn’t the whole story. If you care about customer experience, not all deflections are equal. Quality matters.
When one of our customers compared Fin to a competitor, they saw that while both handled simple queries at a similar rate, Fin achieved 15 percentage points higher on CSAT.
This is no accident – it’s the result of hundreds of experiments run over many months against significant volume by dozens of specialized machine learning PhDs and engineers, enabling us to consistently improve both automation rate and customer satisfaction.
You’ll struggle to ever match this with a small investment of a single team and longer experimentation cycles due to lower volume.
High scale leads to faster feedback loops on experiments, leading to faster improvements in performance, leading to stronger demand and usage, further compounding the advantage. It’s incredibly hard to compete against this.
While you’re struggling to keep up with high quality support automation, the best-in-class AI agents will be leaning into customer success, generating real value for your business.
Don’t just take my word for it
If you have a strong AI team, massive support volume, and very specialized needs, it might make sense to build your own agent. But even Anthropic – one of the leading AI labs – uses our agent Fin because they recognize the constant iteration required to stay safe, accurate, and deeply integrated with support workflows.
Ultimately, Anthropic decided their engineering capacity was better spent improving their core products.
Invest in what makes you unique
This logic applies to companies in general: invest in what makes you unique. This is why we leverage vendors like AWS or PlanetScale – not because we don’t have excellent engineers who could self-host systems of our scale, but because it’s undifferentiated heavy lifting. Leaning on great partners enables us to apply greater focus on our primary mission.
Winning in software often looks like small teams generating tens or hundreds of millions in revenue. Cost-saving side quests rarely justify the diversion of engineering talent, especially when an off-the-shelf solution can get you there faster.
I get the appeal of building your own system – it’s fun, it’s a great learning experience, and there’s something special about shipping your own AI code. But my advice? Channel that energy into your own product, not a non-strategic side quest.
If you’re on the fence, I’m always happy to chat about how we approached building our AI agent and the lessons we learned along the way – whether you end up building your own or not.