{"id":325,"date":"2025-09-11T22:42:30","date_gmt":"2025-09-11T22:42:30","guid":{"rendered":"https:\/\/fin.ai\/research\/?p=325"},"modified":"2025-09-12T09:38:38","modified_gmt":"2025-09-12T09:38:38","slug":"to-escalate-or-not-to-escalate-that-is-the-question","status":"publish","type":"post","link":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/","title":{"rendered":"To escalate, or not to escalate, that is the question"},"content":{"rendered":"\n<p>One of Fin AI Agent&#8217;s most critical tasks is deciding when to escalate customer interactions to human support. This challenge has only grown as <a href=\"https:\/\/www.intercom.com\/help\/en\/articles\/11433030-a-more-conversational-human-fin-experience\">Fin has become more conversational<\/a>, and now most escalations happen through natural language, not <span style=\"background-color: #f0f0f0;border: 1px solid #ddd;border-radius: 20px;padding: 6px 16px;font-size: 14px;color: #333;font-weight: 500;cursor: default;vertical-align: middle\">Talk to a person \ud83d\udc64<\/span> button.<\/p>\n\n\n\n<p>Get this wrong, and you either flood support teams with unnecessary escalations or leave users stuck without human help. This decision needs to be both <strong>fast and very accurate<\/strong>.<\/p>\n\n\n\n<p class=\"has-very-light-gray-background-color has-background\">Today, we&#8217;re sharing how we built a custom multi-task model for escalation routing, achieving <strong>&gt;98% escalation accuracy<\/strong>, reducing latency, and increasing resolution rate.<\/p>\n\n\n\n<h2 id=\"understanding-the-escalation-challenge\" class=\"wp-block-heading\">Understanding the Escalation Challenge<\/h2>\n\n\n\n<p>Whenever a user interacts with Fin, our system needs to make a real-time, three-way decision:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Escalate immediately<\/strong> &#8211; Hand off to a human agent or trigger the custom escalation workflow<\/li>\n\n\n\n<li><strong>Offer to escalate<\/strong> &#8211; Ask the user if they&#8217;d like to talk to a human<\/li>\n\n\n\n<li><strong>Let Fin answer<\/strong> &#8211; Continue the AI-powered conversation<\/li>\n<\/ul>\n\n\n\n<p>This decision is informed by two key inputs: the <strong>conversation history<\/strong> and business-defined <strong>escalation guidelines<\/strong>. These guidelines are rules that businesses configure, such as <em>&#8220;Escalate immediately if the user expresses anger about billing&#8221;.<\/em><\/p>\n\n\n\n<p>The system must also provide reasoning for its decisions. When escalating due to a guideline match, we cite the specific guideline. Internally, we also log broader categories like <em>angry<\/em>, <em>request<\/em>, or <em>guideline<\/em>.<\/p>\n\n\n\n<p>For example, if a user writes <em>&#8220;I&#8217;d like to check the status of my order #12345&#8221;<\/em> and there&#8217;s a guideline saying <em>&#8220;If the user asks about a specific order, hand off to a human agent&#8221;<\/em>, the router would escalate right away, cite the guideline ID, and mark the reason as &#8220;guideline&#8221;.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"677\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-1024x677.png\" alt=\"\" class=\"wp-image-454\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-1024x677.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-300x198.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-768x508.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-1536x1015.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-2048x1354.png 2048w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/Escalate-router-7-cropped-1320x872.png 1320w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 id=\"starting-point-llm-based-routing\" class=\"wp-block-heading\">Starting Point: LLM-Based Routing<\/h2>\n\n\n\n<p>Our first setup used a large language model (LLM) to decide: should we escalate, what\u2019s the reason, and which guidelines matched. We also added guardrails to avoid edge cases like offering escalation twice in a row or escalating on the very first user message, unless there\u2019s a guideline explicitly allowing it.<\/p>\n\n\n\n<p>While it worked well, the LLM-based approach had limitations around latency and how much control we had over decision thresholds.<\/p>\n\n\n\n<h2 id=\"attempt-1-fine-tuning-smaller-llms\" class=\"wp-block-heading\">Attempt 1: Fine-Tuning Smaller LLMs<\/h2>\n\n\n\n<p>We first tried replacing our LLM with fine-tuned models. We experimented with Gemma and Qwen models of various sizes, training on 100,000 multilingual examples labeled with LLM outputs. This approach achieved solid 97% escalation accuracy, proving that custom models could compete with our LLM baseline.<\/p>\n\n\n\n<p>At the same time, we saw excellent results with encoder-based models on other tasks like issue classification and reranking, which made us curious about using them for escalation routing too. Encoder models looked promising for faster inference and more reliable predictions.<\/p>\n\n\n\n<h2 id=\"attempt-2-classification-without-guidelines\" class=\"wp-block-heading\">Attempt 2: Classification Without Guidelines<\/h2>\n\n\n\n<p>Our next approach was intentionally simple: use a BERT-style encoder for three-way classification <span style=\"background-color: #f5f5f5;padding: 2px 6px;border-radius: 4px;font-family: monospace;font-size: 0.9em\">not escalate<\/span> \/ <span style=\"background-color: #fff4e6;padding: 2px 6px;border-radius: 4px;font-family: monospace;font-size: 0.9em\">offer<\/span> \/ <span style=\"background-color: #ffe6e6;padding: 2px 6px;border-radius: 4px;font-family: monospace;font-size: 0.9em\">escalate<\/span> on English conversations without any escalation guidelines.<\/p>\n\n\n\n<p>We treated it as a standard text classification problem. The model takes the conversation history as input and outputs probabilities for each of the three escalation options.<\/p>\n\n\n\n<p><strong>The results surprised us. <\/strong>The custom model achieved 98% accuracy and often made better decisions than the &#8220;teacher\u201d LLM.<\/p>\n\n\n\n<p>However, this approach couldn\u2019t scale: the share of conversations with escalation guidelines was growing fast (now 75% of the traffic), and this model couldn\u2019t handle guideline citations. We needed something more powerful.<\/p>\n\n\n\n<h2 id=\"attempt-3-multi-task-architecture-with-citations\" class=\"wp-block-heading\">Attempt 3: Multi-Task Architecture with Citations<\/h2>\n\n\n\n<p>For our final approach, we built a single multi-task model that predicts three things at once:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Escalation decision<\/strong> (3-way classification)<\/li>\n\n\n\n<li><strong>Escalation reason<\/strong> (8 categories)<\/li>\n\n\n\n<li><strong>Guideline citations<\/strong> (which guidelines to cite)<\/li>\n<\/ul>\n\n\n\n<p>This approach gives us the accuracy we need and full control over the decision process. The multi-task design allows the model to learn <strong>shared representations that improve performance across all three tasks<\/strong>: the escalation decision informs the reason prediction, and both help with accurate guideline citation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture Deep-Dive<\/h3>\n\n\n\n<p>Our model uses a single encoder backbone with three classifier heads: escalation and reason classifiers use linear layers with softmax, and the guidelines classifier uses a linear layer with sigmoid for multi-label classification.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"433\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent-1024x433.png\" alt=\"Routing in AI agents: ModernBERT for multi-task classification\" class=\"wp-image-328\" style=\"width:623px;height:auto\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent-1024x433.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent-300x127.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent-768x325.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent-1536x650.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/escalate_router_design_transparent.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n\n\n\n<p>The most complex component is guideline citation. The encoder processes the entire input and produces contextual embeddings for every token:<\/p>\n\n\n\n<p>$$\\mathbf{h}_t = \\mathrm{ModernBERT}(x)_t$$<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"253\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent-1024x253.png\" alt=\"Substring citations using ModernBERT\" class=\"wp-image-329\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent-1024x253.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent-300x74.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent-768x190.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent-1536x380.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/guideline_classifier_transparent.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>To represent a guideline (a span of tokens in the input), we:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Identify its \\([\\mathrm{start}, \\mathrm{end})\\) token positions<\/li>\n\n\n\n<li>Extract the contextual embeddings for those tokens and average them (mean pooling):<br>\n$$ \\mathbf{h}_{\\mathrm{guideline}} = \\frac{1}{\\mathrm{end} &#8211; \\mathrm{start}} \\sum_{t = \\mathrm{start}}^{\\mathrm{end} &#8211; 1} \\mathbf{h}_t $$<\/li>\n\n\n\n<li>Score each guideline by passing its embedding through a linear layer and sigmoid:<br>\n$$P(\\mathrm{guideline\\,is\\,cited}) = \\sigma ( \\mathbf{W} \\mathbf{h}_{\\mathrm{guideline}} + b )$$<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Training details<\/h3>\n\n\n\n<p>We trained the multi-task model on 4M examples using a combined loss function that optimizes all three objectives end-to-end:<\/p>\n\n\n\n<p>$${\\mathscr{L}}_{\\mathrm{total}} = \\mathrm{CE}(P_{\\mathrm{esc}}, Y_{\\mathrm{esc}}) + \\mathrm{CE}(P_{\\mathrm{reason}}, Y_{\\mathrm{reason}}) + \\sum_{i=1}^N \\mathrm{BCE}(P_{\\mathrm{guideline}_i}, c_i)$$<\/p>\n\n\n\n<p>\\(\\mathrm{CE}\\)  is cross-entropy loss for escalation and reason classification, and \\(\\mathrm{BCE}\\) is binary cross-entropy loss for guideline citations. Training resulted in stable loss convergence with strong performance metrics on evaluation set:<\/p>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile\" style=\"grid-template-columns:auto 52%\"><div class=\"wp-block-media-text__content\">\n<ul class=\"wp-block-list\">\n<li><strong>Escalation accuracy<\/strong>: 97.4%<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reason accuracy<\/strong>: 97%<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Citation AUC<\/strong>: 98.7%<\/li>\n<\/ul>\n<\/div><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"652\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/loss_decrease_transparent-1024x652.png\" alt=\"\" class=\"wp-image-330 size-full\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/loss_decrease_transparent-1024x652.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/loss_decrease_transparent-300x191.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/loss_decrease_transparent-768x489.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/07\/loss_decrease_transparent.png 1164w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<h2 id=\"testing-and-optimization\" class=\"wp-block-heading\">Testing and Optimization<\/h2>\n\n\n\n<p>We conducted thorough offline testing, covering both in-distribution and out-of-distribution cases. The model performed well across the board, especially at spotting escalation requests and handling ambiguous situations. It often outperformed our original LLM-based system.<\/p>\n\n\n\n<p>To improve further, we analyzed conversations where the custom model and the old production model disagreed. This helped us spot edge cases and refine how we deal with tricky guideline-related situations.<\/p>\n\n\n\n<p>Instead of relying solely on a single model, we implemented a hybrid strategy:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Our custom model handles 90% of cases with <strong>&gt;98% accuracy<\/strong>.<\/li>\n\n\n\n<li>For the other 10% (mostly very long inputs or complex cases), we fall back to an LLM, which can handle longer context and has better generalization capabilities<\/li>\n<\/ul>\n\n\n\n<p>This setup gives us the best of both: the reliability and speed of a custom encoder model for common cases, and the adaptability of an LLM for the most challenging interactions.<\/p>\n\n\n\n<h2 id=\"impact\" class=\"wp-block-heading\">Impact<\/h2>\n\n\n\n<p>An A\/B test showed clear improvements:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Resolution rate increased significantly<strong> <\/strong>(p &lt; 0.01), including a significant gain in confirmed &#8220;hard&#8221; resolutions<\/li>\n\n\n\n<li>Escalation detection latency dropped by <strong>0.5s<\/strong><\/li>\n\n\n\n<li>Cost per resolution decreased by <strong>~3%<\/strong><\/li>\n<\/ul>\n\n\n\n<p>Also, refining escalation thresholds improved accuracy and cut false negatives \u2013 something we saw with our earlier LLM-based method.&nbsp;<\/p>\n\n\n\n<h2 id=\"key-learnings-and-discussion\" class=\"wp-block-heading\">Key Learnings and Discussion<\/h2>\n\n\n\n<p>A big part of building this system safely and effectively was treating fine-tuning code like production-grade software. For our multi-task model, we validated each component separately: starting from data collection, finding guideline positions, and batch construction, to checking output shapes, evaluation metrics, and loss computation.<\/p>\n\n\n\n<p>Before scaling up, we trained on small toy datasets to make sure outputs looked correct and the loss converged to zero. These early checks caught subtle issues that would&#8217;ve been painful to discover later during full training runs.<\/p>\n\n\n\n<p>Another key to our success was having access to high-quality, domain-specific customer support data. Combined with guidance from LLMs, this let us train a smaller model to outperform its original LLM teacher for our specific support scenarios.<\/p>\n\n\n\n<p>But developing robust ML systems takes more than just good performance <em>on average<\/em>. It requires careful handling of edge cases, using fallback strategies when the model isn\u2019t confident, and thorough testing with out-of-distribution data.<\/p>\n\n\n\n<p>Our custom escalation router demonstrates all these principles in action. As a result, we built a system that\u2019s not just more accurate, but also faster, cheaper, and more controllable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of Fin AI Agent&#8217;s most critical tasks is deciding when to escalate customer interactions to human support. This challenge has only grown as Fin has become more conversational, and now most escalations happen through natural&hellip;<\/p>\n","protected":false},"author":36,"featured_media":166,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30],"tags":[],"coauthors":[24],"class_list":["post-325","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-text-classification"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.6 (Yoast SEO v24.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>To escalate, or not to escalate, that is the question - \/research AI Agent routing<\/title>\n<meta name=\"description\" content=\"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving &gt;98% accuracy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"To escalate, or not to escalate, that is the question\" \/>\n<meta property=\"og:description\" content=\"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving &gt;98% accuracy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\" \/>\n<meta property=\"og:site_name\" content=\"\/research\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-11T22:42:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-12T09:38:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1344\" \/>\n\t<meta property=\"og:image:height\" content=\"896\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Ramil Yarullin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@intercom\" \/>\n<meta name=\"twitter:site\" content=\"@intercom\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ramil Yarullin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\"},\"author\":{\"name\":\"Ramil Yarullin\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/f9421a715135d2012ef2d39e6dade5d2\"},\"headline\":\"To escalate, or not to escalate, that is the question\",\"datePublished\":\"2025-09-11T22:42:30+00:00\",\"dateModified\":\"2025-09-12T09:38:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\"},\"wordCount\":1254,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png\",\"articleSection\":[\"Text Classification\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\",\"url\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\",\"name\":\"To escalate, or not to escalate, that is the question - \/research AI Agent routing\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png\",\"datePublished\":\"2025-09-11T22:42:30+00:00\",\"dateModified\":\"2025-09-12T09:38:38+00:00\",\"description\":\"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving >98% accuracy.\",\"breadcrumb\":{\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png\",\"width\":1344,\"height\":896},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/fin.ai\/research\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"To escalate, or not to escalate, that is the question\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/fin.ai\/research\/#website\",\"url\":\"https:\/\/fin.ai\/research\/\",\"name\":\"Intercom.ai\",\"description\":\"Insights and blogs from the AI Group building Fin at Intercom\",\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/fin.ai\/research\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/fin.ai\/research\/#organization\",\"name\":\"Intercom.ai\",\"url\":\"https:\/\/fin.ai\/research\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"width\":1024,\"height\":1024,\"caption\":\"Intercom.ai\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/intercom\",\"https:\/\/www.linkedin.com\/company\/intercom\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/f9421a715135d2012ef2d39e6dade5d2\",\"name\":\"Ramil Yarullin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/fd5365c919200a70cc952ae6bb3c256b\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/76cba499e063bf235208f2e6a4339cda?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/76cba499e063bf235208f2e6a4339cda?s=96&d=mm&r=g\",\"caption\":\"Ramil Yarullin\"},\"description\":\"is a Staff Machine Learning Scientist at Intercom with 8+ years of experience in engineering and applied research.\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/ramil-yarullin\/\"],\"url\":\"https:\/\/fin.ai\/research\/author\/ramil-yarullin\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"To escalate, or not to escalate, that is the question - \/research AI Agent routing","description":"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving >98% accuracy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/","og_locale":"en_US","og_type":"article","og_title":"To escalate, or not to escalate, that is the question","og_description":"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving >98% accuracy.","og_url":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/","og_site_name":"\/research","article_published_time":"2025-09-11T22:42:30+00:00","article_modified_time":"2025-09-12T09:38:38+00:00","og_image":[{"width":1344,"height":896,"url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png","type":"image\/png"}],"author":"Ramil Yarullin","twitter_card":"summary_large_image","twitter_creator":"@intercom","twitter_site":"@intercom","twitter_misc":{"Written by":"Ramil Yarullin","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#article","isPartOf":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/"},"author":{"name":"Ramil Yarullin","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/f9421a715135d2012ef2d39e6dade5d2"},"headline":"To escalate, or not to escalate, that is the question","datePublished":"2025-09-11T22:42:30+00:00","dateModified":"2025-09-12T09:38:38+00:00","mainEntityOfPage":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/"},"wordCount":1254,"commentCount":0,"publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"image":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png","articleSection":["Text Classification"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/","url":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/","name":"To escalate, or not to escalate, that is the question - \/research AI Agent routing","isPartOf":{"@id":"https:\/\/fin.ai\/research\/#website"},"primaryImageOfPage":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage"},"image":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png","datePublished":"2025-09-11T22:42:30+00:00","dateModified":"2025-09-12T09:38:38+00:00","description":"How we fine-tuned ModernBERT multi-task model for escalation routing in AI Agent for customer support, achieving >98% accuracy.","breadcrumb":{"@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#primaryimage","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-6-1.png","width":1344,"height":896},{"@type":"BreadcrumbList","@id":"https:\/\/fin.ai\/research\/to-escalate-or-not-to-escalate-that-is-the-question\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/fin.ai\/research\/"},{"@type":"ListItem","position":2,"name":"To escalate, or not to escalate, that is the question"}]},{"@type":"WebSite","@id":"https:\/\/fin.ai\/research\/#website","url":"https:\/\/fin.ai\/research\/","name":"Intercom.ai","description":"Insights and blogs from the AI Group building Fin at Intercom","publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/fin.ai\/research\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/fin.ai\/research\/#organization","name":"Intercom.ai","url":"https:\/\/fin.ai\/research\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","width":1024,"height":1024,"caption":"Intercom.ai"},"image":{"@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/intercom","https:\/\/www.linkedin.com\/company\/intercom"]},{"@type":"Person","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/f9421a715135d2012ef2d39e6dade5d2","name":"Ramil Yarullin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/fd5365c919200a70cc952ae6bb3c256b","url":"https:\/\/secure.gravatar.com\/avatar\/76cba499e063bf235208f2e6a4339cda?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/76cba499e063bf235208f2e6a4339cda?s=96&d=mm&r=g","caption":"Ramil Yarullin"},"description":"is a Staff Machine Learning Scientist at Intercom with 8+ years of experience in engineering and applied research.","sameAs":["https:\/\/www.linkedin.com\/in\/ramil-yarullin\/"],"url":"https:\/\/fin.ai\/research\/author\/ramil-yarullin\/"}]}},"_links":{"self":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/users\/36"}],"replies":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/comments?post=325"}],"version-history":[{"count":0,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/325\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media\/166"}],"wp:attachment":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media?parent=325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/categories?post=325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/tags?post=325"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/coauthors?post=325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}