{"id":482,"date":"2025-09-11T22:41:07","date_gmt":"2025-09-11T22:41:07","guid":{"rendered":"https:\/\/fin.ai\/research\/?p=482"},"modified":"2025-09-12T09:39:01","modified_gmt":"2025-09-12T09:39:01","slug":"building-a-better-language-detection-model-for-fin","status":"publish","type":"post","link":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/","title":{"rendered":"Building a Better Language Detection Model for Fin"},"content":{"rendered":"\n<p>When users ask Fin some question, they expect Fin to respond in the same language in which they ask the question. Detecting the language a user is speaking is a key step in the Fin pipeline. If we detect the wrong language, Fin might respond in a language the user doesn&#8217;t understand &#8211; leading to a poor (and often frustrating) experience. In this blog, we look at why language detection is tricky, why it needs to be better, and how we&#8217;re solving it.<\/p>\n\n\n\n<h2 id=\"why-do-we-need-language-detection\" class=\"wp-block-heading\">Why do we need language detection?\u00a0<\/h2>\n\n\n\n<p>On the surface, the problem looks simple: <strong>Fin needs to reply in the same language the user is using. <\/strong>Can\u2019t we just do prompt engineering, like below?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">You are customer support agent. You need to answer customer's question. Reply in the language they asked the question.\n\nHistory:\nFin: Hey There, How can I help you?\nCustomer: Hey, im need help.\n\n...\n\nReply:\n(Fin generates a reply in the user\u2019s language)<\/code><\/pre>\n\n\n\n<p>But here&#8217;s the catch: while large language models can reply in <em>virtually<\/em> <em>any<\/em> language, most Fin customers have human agents who speak only a few. That means Fin can only support a few languages <em>per customer<\/em>. Basically, customers want control over the languages that Fin responds in (they want the ability to constrain it to only languages that they give it permission to).<\/p>\n\n\n\n<p>Consider a hypothetical setup of Fin by AlpNet Communications, a telecom operator based in Switzerland. They&#8217;ve set English as the default language for their Fin workspace and enabled support for German, French, and Italian (the country&#8217;s official languages). So far, so good: if a customer writes in French, Fin replies in French; if they use Italian, Fin responds in Italian.<\/p>\n\n\n\n<p>But what happens when someone messages in Dutch? Technically, Fin can reply in Dutch, but it shouldn&#8217;t, as the customer has &#8220;prevented&#8221; it from happening. AlpNet doesn\u2019t support that language, and chances are, their agents can\u2019t either. This is where fallback logic comes.<\/p>\n\n\n\n<p>First, Fin checks whether the user&#8217;s language is supported. If it\u2019s not, it looks at the user\u2019s browser locale. If the locale is also unsupported, say it\u2019s Dutch, Fin falls back to English, the workspace\u2019s default.<\/p>\n\n\n\n<p>You <em>could<\/em> try to encode all that logic directly into a single prompt, but that quickly gets messy. A better approach is to handle the decision-making upstream and keep the prompt simple. This upstream logic itself can be handled by a separate LLM call, like below<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">Determine the language the user is writing in this message: \n\nUser: Hey, im need help.\n\nAI: &lt;detected_language><\/code><\/pre>\n\n\n\n<p>That helps, but extra LLM calls slows things down, and we want Fin to be fast. What if we use lighter models, trained for just one job: detecting language?<\/p>\n\n\n\n<h2 id=\"how-do-we-do-language-detection\" class=\"wp-block-heading\">How do we do language detection?<\/h2>\n\n\n\n<p>We used <a href=\"https:\/\/fasttext.cc\/docs\/en\/language-identification.html\">FastText<\/a> in production to detect the language. FastText\u2026<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u2026is commercially available and is Open Source (CC-BY-SA licensed)<\/li>\n\n\n\n<li>\u2026trained on a large corpus of 176 languages (data from <a href=\"https:\/\/www.wikipedia.org\/\">Wikipedia<\/a>, <a href=\"https:\/\/tatoeba.org\/eng\/\">Tatoeba<\/a> and <a href=\"https:\/\/web.archive.org\/web\/20240303014117\/http:\/\/nlp.ffzg.hr\/resources\/corpora\/setimes\/\">SETimes<\/a>)<\/li>\n\n\n\n<li>\u2026is FAST (takes 10s of microseconds on CPU)\u00a0<\/li>\n\n\n\n<li>\u2026and is <a href=\"https:\/\/modelpredict.com\/language-identification-survey\">more accurate<\/a> compared to the other models like Google\u2019s CLD3 and langdetect<\/li>\n<\/ul>\n\n\n\n<p>FastText predicts the chances that a piece of text belongs to one of the 176 languages. For example, given the text <em>Comment \u00e7a va?<\/em>, FastText says there&#8217;s a 95% chance it&#8217;s French. The remaining 5% is spread across the other 175 languages.<\/p>\n\n\n\n<p>However, we\u2019ve encountered issues with FastText in real-world use. The model was trained on clean, well-structured text, but Fin users often type in a more casual, imperfect way. Spelling errors and missing accents are common. Unfortunately, even minor deviations can significantly impact FastText&#8217;s predictions. For example, removing the cedilla in <em>Comment ca va?<\/em> drops the French confidence from 95% to 68%.<\/p>\n\n\n\n<p>FastText also struggles with short inputs or when the script doesn\u2019t match the language. Here are a few examples where it misclassifies the language:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Input Text<\/th><th class=\"has-text-align-center\" data-align=\"center\">Correct Language<\/th><th class=\"has-text-align-center\" data-align=\"center\">FastText Detection<\/th><th class=\"has-text-align-center\" data-align=\"center\">FastText Confidence<\/th><\/tr><\/thead><tbody><tr><td>im need help<\/td><td class=\"has-text-align-center\" data-align=\"center\">English<\/td><td class=\"has-text-align-center\" data-align=\"center\">German<\/td><td class=\"has-text-align-center\" data-align=\"center\">92%<\/td><\/tr><tr><td>im julia<\/td><td class=\"has-text-align-center\" data-align=\"center\">English<\/td><td class=\"has-text-align-center\" data-align=\"center\">German<\/td><td class=\"has-text-align-center\" data-align=\"center\">99%<\/td><\/tr><tr><td>buenas dias<\/td><td class=\"has-text-align-center\" data-align=\"center\">Spanish<\/td><td class=\"has-text-align-center\" data-align=\"center\">Portuguese<\/td><td class=\"has-text-align-center\" data-align=\"center\">70%<\/td><\/tr><tr><td>How to compute csat?<\/td><td class=\"has-text-align-center\" data-align=\"center\">English<\/td><td class=\"has-text-align-center\" data-align=\"center\">Hungarian<\/td><td class=\"has-text-align-center\" data-align=\"center\">77%<\/td><\/tr><tr><td>App store?<\/td><td class=\"has-text-align-center\" data-align=\"center\">English<\/td><td class=\"has-text-align-center\" data-align=\"center\">Italian<\/td><td class=\"has-text-align-center\" data-align=\"center\">88%<\/td><\/tr><tr><td>aap kaise ho?<\/td><td class=\"has-text-align-center\" data-align=\"center\">Hindi<\/td><td class=\"has-text-align-center\" data-align=\"center\">Finnish<\/td><td class=\"has-text-align-center\" data-align=\"center\">68%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>To stay on the side of caution, we predict the language only if the confidence crosses a certain threshold. Unfortunately, we have observed that this too leads to many cases like below where we don\u2019t predict the language, and often a different language gets used in fallback than the one that the user messaged.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Input Text<\/th><th class=\"has-text-align-center\" data-align=\"center\">Detected Language<\/th><th class=\"has-text-align-center\" data-align=\"center\">Confidence<\/th><\/tr><\/thead><tbody><tr><td>combien coute le pass eleve<\/td><td class=\"has-text-align-center\" data-align=\"center\">French<\/td><td class=\"has-text-align-center\" data-align=\"center\">47%<\/td><\/tr><tr><td>how can i write a query?<\/td><td class=\"has-text-align-center\" data-align=\"center\">English<\/td><td class=\"has-text-align-center\" data-align=\"center\">69%<\/td><\/tr><tr><td>kontingentregel zuweisen<\/td><td class=\"has-text-align-center\" data-align=\"center\">German<\/td><td class=\"has-text-align-center\" data-align=\"center\">51%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 id=\"why-is-language-detection-hard\" class=\"wp-block-heading\">Why is language detection hard?<\/h2>\n\n\n\n<p>There are many challenges a real-life language detector has to solve. Here are some of them:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Spelling mistakes<\/strong> &#8211; The model has to handle typos and bad grammar<\/li>\n\n\n\n<li><strong>Short context<\/strong> &#8211; Many messages are short (like ok, si, da, or greetings)<\/li>\n\n\n\n<li><strong>Ambiguous messages<\/strong> &#8211; Many messages are just email addresses or identifiers\u00a0<\/li>\n\n\n\n<li><strong>Mutual intelligibility<\/strong> &#8211; Some languages are very similar, like Hindi and Urdu, or Spanish and Portuguese<\/li>\n\n\n\n<li><strong>Script mismatch<\/strong> &#8211; Some users type in Latin script instead of their native script (like Bangla, Armenian, or Hindi)<\/li>\n\n\n\n<li><strong>Code switch<\/strong> &#8211; It is not uncommon to have multiple languages in a single message (e.g., \u5982\u4f55\u914d\u7f6ehelp center? )<\/li>\n\n\n\n<li><strong>Long tail<\/strong> &#8211; Most languages don\u2019t have enough training data to teach the model fine details<\/li>\n<\/ul>\n\n\n\n<h2 id=\"our-solution\" class=\"wp-block-heading\">Our solution<\/h2>\n\n\n\n<p>A key limitation of FastText lies in its simplicity. It represents a text input by averaging the vectors of its subwords and then applies a linear classifier to make predictions. While efficient and effective for many tasks, this approach lacks the capacity to capture complex contextual relationships. In contrast, transformer-based models like BERT have been shown to consistently outperform simpler models like FastText across a wide range of benchmarks. We believe that we can do better than FastText by training a BERT-like language model on carefully curated data. Towards that, we&#8217;ve curated both general and Fin-specific data for the 45 languages<sup data-fn=\"9813c3c8-b5b4-4d2b-bfe0-deeac1442c29\" class=\"fn\"><a id=\"9813c3c8-b5b4-4d2b-bfe0-deeac1442c29-link\" href=\"#9813c3c8-b5b4-4d2b-bfe0-deeac1442c29\">1<\/a><\/sup> we support, and trained our in-house language detection model. We use the general data to pre-train the language detection model for better generalization, and then fine-tune the model on Fin-specific data.\u00a0<\/p>\n\n\n\n<p><strong>General Data: <\/strong>this is a subset of <a href=\"https:\/\/huggingface.co\/datasets\/HuggingFaceFW\/fineweb-2\">FineWeb2<\/a> &#8211; an 18TB dataset with content from about 1800 language-script combinations. We curated around 800K labeled text examples. Since FineWeb2 includes multiple scripts per language, it helps solve the script mismatch problem.\u00a0<\/p>\n\n\n\n<p><strong>Fin-Specific Data: <\/strong>we collected this from random user conversations with Fin<sup data-fn=\"c619b319-0970-4533-b7a0-2b91d9755800\" class=\"fn\"><a id=\"c619b319-0970-4533-b7a0-2b91d9755800-link\" href=\"#c619b319-0970-4533-b7a0-2b91d9755800\">2<\/a><\/sup>. We used an LLM to identify the true language for each chat. If a message was too ambiguous, the LLM was told not to guess the language. We collected about 100K labeled conversations.<\/p>\n\n\n\n<p>A problem with randomly sampling Fin chats is that around 50% samples are in English, and the top few languages make up 90% of the data (see figure below). That means many tail languages only had 100-200 examples. To fix this, we randomly translated some English messages into each tail language to increase their volume. After this, every language has at least 2K examples<sup data-fn=\"7c6e601f-f60d-447c-93e1-4e6260e35439\" class=\"fn\"><a id=\"7c6e601f-f60d-447c-93e1-4e6260e35439-link\" href=\"#7c6e601f-f60d-447c-93e1-4e6260e35439\">3<\/a><\/sup>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"617\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-1024x617.png\" alt=\"\" class=\"wp-image-484\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-1024x617.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-300x181.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-768x463.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-1536x925.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3-1320x795.png 1320w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/09\/image-3.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>We experimented with two models from the BERT family:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/huggingface.co\/distilbert\/distilbert-base-multilingual-cased\">DistilBERT Multilingual<\/a> (135M parameters, Apache-2 licensed)<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/FacebookAI\/xlm-roberta-base\">XLM RoBERTa<\/a> (280M parameters, MIT licensed)<\/li>\n<\/ul>\n\n\n\n<p>We chose them as our base models because they are pre-trained on around 100 languages for general language modelling. So they already have some understanding of multilingual content.. First, we added a classification head to each base model and trained the model end-to-end on 800K examples from FineWeb2 for three epochs. Then we fine-tuned these models on Intercom\u2019s data using lower temperature.<\/p>\n\n\n\n<h2 id=\"evaluation\" class=\"wp-block-heading\">Evaluation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Offline Evaluation<\/h3>\n\n\n\n<p>We have a held-out test set of about 6K LLM labelled examples<sup data-fn=\"20a0eef2-fdb7-4876-9666-1b6d5037fdf0\" class=\"fn\"><a id=\"20a0eef2-fdb7-4876-9666-1b6d5037fdf0-link\" href=\"#20a0eef2-fdb7-4876-9666-1b6d5037fdf0\">4<\/a><\/sup>. This data was collected similar to the Fin-specific training data described above. We measure the goodness of the model on the following metrics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Precision<\/strong>: when the model says it&#8217;s (e.g.) English, how often is it really English?<\/li>\n\n\n\n<li><strong>Recall<\/strong>: of all actual (e.g.) English texts, how many did the model find?<\/li>\n\n\n\n<li><strong>F1 Score<\/strong>: single metric that combines precision and recall<\/li>\n<\/ul>\n\n\n\n<p>For all the metrics we report Macro Average (all languages are treated equally), and weighted average (weighted by number of samples in each language):<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Corpus<\/th><th class=\"has-text-align-center\" data-align=\"center\">Macro Precision<\/th><th class=\"has-text-align-center\" data-align=\"center\">Macro Recall<\/th><th class=\"has-text-align-center\" data-align=\"center\">Macro F1<\/th><th class=\"has-text-align-center\" data-align=\"center\">Weighted Precision<\/th><th class=\"has-text-align-center\" data-align=\"center\">Weighted Recall<\/th><th>Weighted F1<\/th><\/tr><\/thead><tbody><tr><td>Lingua<\/td><td class=\"has-text-align-center\" data-align=\"center\">81.83<\/td><td class=\"has-text-align-center\" data-align=\"center\">81.82<\/td><td class=\"has-text-align-center\" data-align=\"center\">80.14<\/td><td class=\"has-text-align-center\" data-align=\"center\">85.31<\/td><td class=\"has-text-align-center\" data-align=\"center\">84.94<\/td><td>83.27<\/td><\/tr><tr><td>FastText<\/td><td class=\"has-text-align-center\" data-align=\"center\">85.60<\/td><td class=\"has-text-align-center\" data-align=\"center\">79.93<\/td><td class=\"has-text-align-center\" data-align=\"center\">80.15<\/td><td class=\"has-text-align-center\" data-align=\"center\">86.65<\/td><td class=\"has-text-align-center\" data-align=\"center\">83.36<\/td><td>82.44<\/td><\/tr><tr><td>DistilBERT<\/td><td class=\"has-text-align-center\" data-align=\"center\">92.17<\/td><td class=\"has-text-align-center\" data-align=\"center\">87.16<\/td><td class=\"has-text-align-center\" data-align=\"center\">88.45<\/td><td class=\"has-text-align-center\" data-align=\"center\">92.77<\/td><td class=\"has-text-align-center\" data-align=\"center\">91.63<\/td><td>91.36<\/td><\/tr><tr><td><strong>XLM RoBERTa<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>93.12<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>88.59<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>89.81<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>94.14<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>93.28<\/strong><\/td><td><strong>93.09<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Along with FastText(our current model), we also tested <a href=\"https:\/\/github.com\/pemistahl\/lingua\">Lingua<\/a>, because of its claimed good performance on short texts. We see that the BERT family is doing much better than Lingua and FastText across all the metrics. F1 scores (both macro and weighted) are approximately 10 percentage points higher than FastText. Between DistilBERT and XLM-RoBERTa, the XLM-RoBERTa\u2019s F1 is about 1.5pp higher than the DistilBERT.\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">In Production\u00a0<\/h3>\n\n\n\n<p>We ran the RoBERTa-based language detection model in an A\/B test, with FastText as the control. The new model made 6pp more predictions than FastText<sup data-fn=\"8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27\" class=\"fn\"><a href=\"#8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27\" id=\"8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27-link\">5<\/a><\/sup>. Just to make sure that we don&#8217;t use wrong language on these \u201cextra\u201d conversations, we checked a random sample of these extra predictions using an LLM as a judge, and found that the new model was >99% accurate on these predictions.<\/p>\n\n\n\n<p>To further validate performance in production, we used an LLM to assess a random selection of conversations where both models would have made some prediction<sup data-fn=\"0f21fea9-1366-4ba2-9035-176466ccaa5c\" class=\"fn\"><a href=\"#0f21fea9-1366-4ba2-9035-176466ccaa5c\" id=\"0f21fea9-1366-4ba2-9035-176466ccaa5c-link\">6<\/a><\/sup>. Both models achieved over 99% accuracy overall<sup data-fn=\"b0d37562-f295-4097-8dcc-e769df37e11f\" class=\"fn\"><a href=\"#b0d37562-f295-4097-8dcc-e769df37e11f\" id=\"b0d37562-f295-4097-8dcc-e769df37e11f-link\">7<\/a><\/sup>. For each conversation, we simulated the output of the alternative model to identify disagreements. These occurred in fewer than 1% of cases. Notably, in those disagreements, RoBERTa outperformed FastText by a margin of 20pp in accuracy. As a final validation to the new model, since the full rollout of the RoBERTa model, we haven\u2019t seen any issues reported by our customers related to wrong language detection.<\/p>\n\n\n\n<h2 id=\"conclusion\" class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Language detection is a deceptively complex problem, especially in a real-world product like Fin, where the language detector must balance accuracy, and recall across a diverse and noisy input space. FastText gave us a good starting point, but to go further, we needed better data and a model tailored to the kinds of messages Fin actually sees.<\/p>\n\n\n\n<p>By developing our own in-house language detection model, we\u2019ve unlocked a couple of key advantages: we can now support new languages more easily, and control precision and recall by training on additional data where needed. The gains in recall and coverage are already improving user experiences, and we\u2019re set up to keep improving as Fin continues to grow.<\/p>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"9813c3c8-b5b4-4d2b-bfe0-deeac1442c29\"><a href=\"https:\/\/www.intercom.com\/help\/en\/articles\/8322387-use-fin-ai-agent-in-multiple-languages\n\">https:\/\/www.intercom.com\/help\/en\/articles\/8322387-use-fin-ai-agent-in-multiple-languages<br><\/a> <a href=\"#9813c3c8-b5b4-4d2b-bfe0-deeac1442c29-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><li id=\"c619b319-0970-4533-b7a0-2b91d9755800\">\u00a0We only use data from the apps that allow their data to be used for training <a href=\"#c619b319-0970-4533-b7a0-2b91d9755800-link\" aria-label=\"Jump to footnote reference 2\">\u21a9\ufe0e<\/a><\/li><li id=\"7c6e601f-f60d-447c-93e1-4e6260e35439\">\u00a0We verified the quality of these labels by computing accuracy on random 100 conversations per language using a separate LLM as a judge. The goal here is to just make sure that we don\u2019t have noisy examples for any particular language. We did identify some bugs in our earlier prompts using this approach. <a href=\"#7c6e601f-f60d-447c-93e1-4e6260e35439-link\" aria-label=\"Jump to footnote reference 3\">\u21a9\ufe0e<\/a><\/li><li id=\"20a0eef2-fdb7-4876-9666-1b6d5037fdf0\">\u00a0The dataset is mostly balanced (all but 8 languages have 200 examples, remaining languages have >= 40 examples) <a href=\"#20a0eef2-fdb7-4876-9666-1b6d5037fdf0-link\" aria-label=\"Jump to footnote reference 4\">\u21a9\ufe0e<\/a><\/li><li id=\"8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27\">\u00a0Technically FastText made predictions, but with lower confidence <a href=\"#8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27-link\" aria-label=\"Jump to footnote reference 5\">\u21a9\ufe0e<\/a><\/li><li id=\"0f21fea9-1366-4ba2-9035-176466ccaa5c\">\u00a0That is to say, on control bucket conversations we run RoBERTa, and in treatment bucket we ran FastText model <a href=\"#0f21fea9-1366-4ba2-9035-176466ccaa5c-link\" aria-label=\"Jump to footnote reference 6\">\u21a9\ufe0e<\/a><\/li><li id=\"b0d37562-f295-4097-8dcc-e769df37e11f\">\u00a0This apparent disconnect between offline and production could be because the offline dataset is balanced across the languages we support. <a href=\"#b0d37562-f295-4097-8dcc-e769df37e11f-link\" aria-label=\"Jump to footnote reference 7\">\u21a9\ufe0e<\/a><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>When users ask Fin some question, they expect Fin to respond in the same language in which they ask the question. Detecting the language a user is speaking is a key step in the Fin pipeline.&hellip;<\/p>\n","protected":false},"author":37,"featured_media":155,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"[{\"content\":\"<a href=\\\"https:\/\/www.intercom.com\/help\/en\/articles\/8322387-use-fin-ai-agent-in-multiple-languages\\n\\\">https:\/\/www.intercom.com\/help\/en\/articles\/8322387-use-fin-ai-agent-in-multiple-languages<br><\/a>\",\"id\":\"9813c3c8-b5b4-4d2b-bfe0-deeac1442c29\"},{\"content\":\"\u00a0We only use data from the apps that allow their data to be used for training\",\"id\":\"c619b319-0970-4533-b7a0-2b91d9755800\"},{\"content\":\"\u00a0We verified the quality of these labels by computing accuracy on random 100 conversations per language using a separate LLM as a judge. The goal here is to just make sure that we don\u2019t have noisy examples for any particular language. We did identify some bugs in our earlier prompts using this approach.\",\"id\":\"7c6e601f-f60d-447c-93e1-4e6260e35439\"},{\"content\":\"\u00a0The dataset is mostly balanced (all but 8 languages have 200 examples, remaining languages have >= 40 examples)\",\"id\":\"20a0eef2-fdb7-4876-9666-1b6d5037fdf0\"},{\"content\":\"\u00a0Technically FastText made predictions, but with lower confidence\",\"id\":\"8f8a4f28-5cdb-4e71-a3c5-92ceb1d38d27\"},{\"content\":\"\u00a0That is to say, on control bucket conversations we run RoBERTa, and in treatment bucket we ran FastText model\",\"id\":\"0f21fea9-1366-4ba2-9035-176466ccaa5c\"},{\"content\":\"\u00a0This apparent disconnect between offline and production could be because the offline dataset is balanced across the languages we support.\",\"id\":\"b0d37562-f295-4097-8dcc-e769df37e11f\"}]"},"categories":[1],"tags":[],"coauthors":[25],"class_list":["post-482","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.6 (Yoast SEO v24.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building a Better Language Detection Model for Fin - \/research<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Better Language Detection Model for Fin\" \/>\n<meta property=\"og:description\" content=\"When users ask Fin some question, they expect Fin to respond in the same language in which they ask the question. Detecting the language a user is speaking is a key step in the Fin pipeline.&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\" \/>\n<meta property=\"og:site_name\" content=\"\/research\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-11T22:41:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-12T09:39:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1344\" \/>\n\t<meta property=\"og:image:height\" content=\"896\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Dhruv Patel\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@intercom\" \/>\n<meta name=\"twitter:site\" content=\"@intercom\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dhruv Patel\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\"},\"author\":{\"name\":\"Dhruv Patel\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/2d626225abf2c015edd6db783e119e13\"},\"headline\":\"Building a Better Language Detection Model for Fin\",\"datePublished\":\"2025-09-11T22:41:07+00:00\",\"dateModified\":\"2025-09-12T09:39:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\"},\"wordCount\":1672,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\",\"url\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\",\"name\":\"Building a Better Language Detection Model for Fin - \/research\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png\",\"datePublished\":\"2025-09-11T22:41:07+00:00\",\"dateModified\":\"2025-09-12T09:39:01+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png\",\"width\":1344,\"height\":896},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/fin.ai\/research\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Better Language Detection Model for Fin\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/fin.ai\/research\/#website\",\"url\":\"https:\/\/fin.ai\/research\/\",\"name\":\"Intercom.ai\",\"description\":\"Insights and blogs from the AI Group building Fin at Intercom\",\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/fin.ai\/research\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/fin.ai\/research\/#organization\",\"name\":\"Intercom.ai\",\"url\":\"https:\/\/fin.ai\/research\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"width\":1024,\"height\":1024,\"caption\":\"Intercom.ai\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/intercom\",\"https:\/\/www.linkedin.com\/company\/intercom\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/2d626225abf2c015edd6db783e119e13\",\"name\":\"Dhruv Patel\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/eeacf7ba035a78d63bc6792257dc8eaa\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8f55bd0929e330981b8ccc7d8385636d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8f55bd0929e330981b8ccc7d8385636d?s=96&d=mm&r=g\",\"caption\":\"Dhruv Patel\"},\"description\":\"is a Senior Machine Learning Engineer, currently working in Fin Core workstream, with past experience in recommender systems.\",\"url\":\"https:\/\/fin.ai\/research\/author\/dhruv-patel\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building a Better Language Detection Model for Fin - \/research","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/","og_locale":"en_US","og_type":"article","og_title":"Building a Better Language Detection Model for Fin","og_description":"When users ask Fin some question, they expect Fin to respond in the same language in which they ask the question. Detecting the language a user is speaking is a key step in the Fin pipeline.&hellip;","og_url":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/","og_site_name":"\/research","article_published_time":"2025-09-11T22:41:07+00:00","article_modified_time":"2025-09-12T09:39:01+00:00","og_image":[{"width":1344,"height":896,"url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png","type":"image\/png"}],"author":"Dhruv Patel","twitter_card":"summary_large_image","twitter_creator":"@intercom","twitter_site":"@intercom","twitter_misc":{"Written by":"Dhruv Patel","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#article","isPartOf":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/"},"author":{"name":"Dhruv Patel","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/2d626225abf2c015edd6db783e119e13"},"headline":"Building a Better Language Detection Model for Fin","datePublished":"2025-09-11T22:41:07+00:00","dateModified":"2025-09-12T09:39:01+00:00","mainEntityOfPage":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/"},"wordCount":1672,"commentCount":0,"publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"image":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png","inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/","url":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/","name":"Building a Better Language Detection Model for Fin - \/research","isPartOf":{"@id":"https:\/\/fin.ai\/research\/#website"},"primaryImageOfPage":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage"},"image":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png","datePublished":"2025-09-11T22:41:07+00:00","dateModified":"2025-09-12T09:39:01+00:00","breadcrumb":{"@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#primaryimage","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/image-17-1.png","width":1344,"height":896},{"@type":"BreadcrumbList","@id":"https:\/\/fin.ai\/research\/building-a-better-language-detection-model-for-fin\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/fin.ai\/research\/"},{"@type":"ListItem","position":2,"name":"Building a Better Language Detection Model for Fin"}]},{"@type":"WebSite","@id":"https:\/\/fin.ai\/research\/#website","url":"https:\/\/fin.ai\/research\/","name":"Intercom.ai","description":"Insights and blogs from the AI Group building Fin at Intercom","publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/fin.ai\/research\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/fin.ai\/research\/#organization","name":"Intercom.ai","url":"https:\/\/fin.ai\/research\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","width":1024,"height":1024,"caption":"Intercom.ai"},"image":{"@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/intercom","https:\/\/www.linkedin.com\/company\/intercom"]},{"@type":"Person","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/2d626225abf2c015edd6db783e119e13","name":"Dhruv Patel","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/eeacf7ba035a78d63bc6792257dc8eaa","url":"https:\/\/secure.gravatar.com\/avatar\/8f55bd0929e330981b8ccc7d8385636d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8f55bd0929e330981b8ccc7d8385636d?s=96&d=mm&r=g","caption":"Dhruv Patel"},"description":"is a Senior Machine Learning Engineer, currently working in Fin Core workstream, with past experience in recommender systems.","url":"https:\/\/fin.ai\/research\/author\/dhruv-patel\/"}]}},"_links":{"self":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/482","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/comments?post=482"}],"version-history":[{"count":0,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/482\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media\/155"}],"wp:attachment":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media?parent=482"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/categories?post=482"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/tags?post=482"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/coauthors?post=482"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}