{"id":528,"date":"2026-03-30T19:06:23","date_gmt":"2026-03-30T19:06:23","guid":{"rendered":"https:\/\/fin.ai\/research\/?p=528"},"modified":"2026-03-30T19:11:43","modified_gmt":"2026-03-30T19:11:43","slug":"unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue","status":"publish","type":"post","link":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/","title":{"rendered":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue"},"content":{"rendered":"\n<p>We designed a topic modelling system to improve the Fin AI agent by detecting underperforming areas, identifying root causes, and enabling continuous optimisation. <\/p>\n\n\n\n<p>We wanted to get the quality benefits of modern generative AI, and the scalability and reliability of machine learning based approaches.<\/p>\n\n\n\n<p>This blog describes how we achieved the following progress:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Topic modelling is good for extracting themes from large text corpora. However, support conversations are short, noisy, and informal, which poses a lot of unique challenges<\/li>\n\n\n\n<li>To address this, we adopted an embedding-based clustering approach with <em>HDBSCAN<\/em> (which avoids predefined cluster counts and adapts to each customer\u2019s vocabulary, volume, and complexity)<\/li>\n\n\n\n<li>Using a bottom-up hierarchy, the system organically discovers emerging topics &#8211; such as new feature friction or early bugs &#8211; without requiring prior labels <\/li>\n\n\n\n<li>Finally, we layered generative AI on top of the unsupervised structure to name topics, summarise intent, and surface examples<\/li>\n\n\n\n<li>This hybrid approach transforms messy support conversations into actionable insights<\/li>\n<\/ul>\n\n\n\n<h2 id=\"introduction\" class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>Support conversations are messy &#8211; tickets and messages flood in daily, full of critical feedback and pain points. They\u2019re also fragmented and hard to summarise at scale. We wanted to turn this chaos into clarity: transform raw conversation data into a structured, actionable view of what customers are saying.<\/p>\n\n\n\n<p>Our solution combines classic machine learning with generative AI to automatically detect and label topics within support conversations. These topics give teams a clear lens to understand user behaviour, spot issues early, and take action. Originally built for Fin, our AI agent, the system began as a feedback loop:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Detect topics with low AI performance (low resolution rate), frequent handoffs, or poor satisfaction<\/li>\n\n\n\n<li>Analyse representative conversations<\/li>\n\n\n\n<li>Act &#8211; update knowledge, add tasks, refine workflows<\/li>\n\n\n\n<li>Measure results<\/li>\n\n\n\n<li>Repeat<\/li>\n<\/ol>\n\n\n\n<p>This cycle still powers how topics drive improvement, and is very useful. But we soon realised that topics were more than a debugging tool &#8211; they became a shared language across teams: support specialists, analysts, engineers, managers, and product marketers. Topics can be very useful for&#8230;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AI Content Managers:<\/strong> to track which topics Fin handles well vs. those needing training or escalation (i.e. using performance data to prioritise improvements)<\/li>\n\n\n\n<li><strong>Data Analysts:<\/strong> to monitor trends over time and user segments (e.g., perhaps refund issues are spiking; or there are more onboarding questions than expected post-release)<\/li>\n\n\n\n<li><strong>Technical Support Engineers:<\/strong> to identify clusters of related bugs, measure scope, and track how long issues persist<\/li>\n\n\n\n<li><strong>Support Managers:<\/strong> to see ticket volume by topic, to allocate support resources, and evaluate operational metrics across categories<\/li>\n\n\n\n<li><strong>Marketing and Sales:<\/strong> to spot patterns in product questions to refine messaging, improve docs, and surface recurring feature requests<\/li>\n\n\n\n<li><strong>&#8230;even more roles in future.<\/strong> Topics have proven flexible and interpretable, powering both day-to-day workflows and higher-level strategic decisions.<\/li>\n<\/ul>\n\n\n\n<h2 id=\"pipeline-overview\" class=\"wp-block-heading\">Pipeline Overview<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Practical Challenge<\/h3>\n\n\n\n<p>Topic modeling has decades of research behind it. At its core, it finds abstract themes &#8211; or \u201ctopics\u201d &#8211; within large text collections without labeled data. It\u2019s been applied everywhere from academic literature and news clustering, to social media and network analysis.<\/p>\n\n\n\n<p>However, conversational data is different. Support chats and assistant transcripts are short, noisy, and informal &#8211; far from the structured documents most models were built for. Our challenge was to adapt topic modelling to this messy reality and make results both accurate and useful for business teams. We also had to ensure it could scale.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Choosing the Right Modelling Approach<\/h3>\n\n\n\n<p>We treated this as a feasibility analysis &#8211; not just a technical evaluation, but a product development challenge. The right model needed to balance accuracy, interpretability, performance at scale, and business usability:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes has-tiny-font-size\"><table class=\"has-fixed-layout\"><thead><tr><th>Method<\/th><th>Pros<\/th><th>Cons<\/th><th>Verdict<\/th><\/tr><\/thead><tbody><tr><td><strong>Latent Dirichlet Allocation (LDA)<\/strong><\/td><td>&#8211; Interpretable topic-word distributions<br><br>&#8211; Probabilistic foundation<\/td><td>&#8211; Performs poorly on short texts due to data sparsity<br><br>&#8211; Requires strong word co-occurrence<br><br>&#8211; Sensitive to hyperparameters<br><br>&#8211; Often yields generic\/overlapping topics<\/td><td>\u274c <em>Not suitable for conversational data<\/em><\/td><\/tr><tr><td><strong>Non-Negative Matrix Factorization (NMF)<\/strong><\/td><td>&#8211; Fast and simple<br><br>&#8211; More interpretable than LDA<br><br>&#8211; Can work better on short texts than LDA<\/td><td>&#8211; Still bag-of-words based<br><br>&#8211; No semantic understanding<br><br>&#8211; Misses contextual or synonymous terms<\/td><td>\u26a0\ufe0f <em>Fast, but too limited for our needs<\/em><\/td><\/tr><tr><td><strong>Embedding-Based Clustering<\/strong><\/td><td>&#8211; Captures semantic meaning<br><br>&#8211; Works well on short, noisy text<br><br>&#8211; Highly flexible (model, clustering, reduction options)<br><br>&#8211; Produces interpretable structure<\/td><td>&#8211; Requires tuning<br><br>&#8211; Sensitive to clustering and dimensionality settings<br><br>&#8211; May group by style or length, if not tuned<br><\/td><td>\u2705 <em>Best balance of structure + accuracy<\/em><\/td><\/tr><tr><td><strong>Rule-Based \/ Keyword Tagging<\/strong><\/td><td>&#8211; Transparent and easy to implement<br><br>&#8211; Its issues are well known<br><br>&#8211; Can be a quick win<\/td><td>&#8211; Doesn\u2019t scale well<br><br>&#8211; High maintenance<br><br>&#8211; Misses nuance or novel phrasing<\/td><td>\u274c <em>Too brittle for evolving topics<\/em><\/td><\/tr><tr><td><strong>Zero\/Few-Shot Classification with LLMs<\/strong><\/td><td>&#8211; No training data needed<br><br>&#8211; Fast experimentation<br><br>&#8211; Leverages broad model knowledge<\/td><td>&#8211; Limited control<br><br>&#8211; Poor for fine-grained or large topic sets<br><br>&#8211; Not good for discovering unknown topics<\/td><td>\u26a0\ufe0f <em>Useful for small-scale tasks, not topic discovery<\/em><\/td><\/tr><tr><td><strong>Supervised Classification<\/strong><\/td><td>&#8211; High precision for known categories<br><br>&#8211; Good for feedback loops and continuous improvement<\/td><td>&#8211; Requires labeled data<br><br>&#8211; Can\u2019t handle emerging or unknown topics<br><br>&#8211; Not useful for discovery<\/td><td>\u26a0\ufe0f<em> Can be a complementary tool, but not good for discovery<\/em><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>After exploring approaches mentioned above, we ultimately chose embedding-based clustering because it struck the right balance between flexibility, scalability, and performance. Unlike rule-based methods or supervised classification, it allowed for fully unsupervised exploration &#8211; critical for surfacing unknown or evolving topics without relying on predefined labels.<\/p>\n\n\n\n<p>In our initial tests, traditional models like LDA or NMF struggled with short, noisy text, while embedding-based methods handled conversational nuance far better. Just as importantly, they scaled easily to millions of conversations. When we tested it, it just worked &#8211; producing coherent, interpretable topics out of the box. <\/p>\n\n\n\n<p>So we made a bet on this method, and it paid off!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How It Works: A High-Level View<\/h3>\n\n\n\n<p>Each customer gets a tailored topic model that breaks their conversations into distinct themes. Topics and subtopics are linked to key metrics like CSAT and resolution rate, and teams can drill down to review individual conversations or apply filters to explore trends. Here is how it looks in the UI:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"543\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-1024x543.png\" alt=\"\" class=\"wp-image-530\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-1024x543.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-300x159.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-768x407.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-1536x814.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6-1320x700.png 1320w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-6.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Here&#8217;s a simplified view of how it works under the hood:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"456\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-1024x456.png\" alt=\"\" class=\"wp-image-531\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-1024x456.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-300x134.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-768x342.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-1536x684.png 1536w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7-1320x587.png 1320w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-7.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The upper part of the picture is subtopic discovery. It is based on BERTopic framework, and here&#8217;s what it does:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Extract key questions<\/strong> \u2013 A lightweight LLM summarises the main questions from CS chats<\/li>\n\n\n\n<li><strong>Embed questions<\/strong> \u2013 A sentence transformer converts key questions into vector embeddings (we settled on <em>sentence-transformers\/all-MiniLM-L6-v2<\/em>, based on speed and quality)<\/li>\n\n\n\n<li><strong>Reduce dimensions<\/strong> \u2013 UMAP (particularly effective for short-text data) projects the embeddings into a lower-dimensional space<\/li>\n\n\n\n<li><strong>Cluster into subtopics<\/strong> \u2013 HDBSCAN groups the reduced embeddings into coherent clusters\/subtopics<\/li>\n<\/ol>\n\n\n\n<p>The rest is topic aggregation:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>From subtopics to topics <\/strong>\u2013 Clustering on subtopic embeddings surfaces higher-level, tentative topics<\/li>\n\n\n\n<li><strong>Refine with LLM<\/strong> \u2013 An LLM polishes tentative topics into a definitive set of final topics<\/li>\n\n\n\n<li><strong>Name topics &amp; subtopics<\/strong> \u2013 Another LLM generates clear, human-readable labels, based on the final topics<\/li>\n\n\n\n<li><strong>Compute centroids <\/strong>\u2013 Average embeddings define centroids for each cluster (topic\/subtopic)<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Why HDBSCAN Beats K-Means for Conversational Data<\/h3>\n\n\n\n<p>A pretty natural question is why didn&#8217;t we use well-known K-Means for clustering?<\/p>\n\n\n\n<p>Overall, we found embeddings and dimensionality reduction methods had only marginal impact, while clustering algorithms and hyperparameters made the biggest difference in output quality. The main advantage of HDBSCAN for our purposes is that it doesn\u2019t require a predefined number of clusters. That flexibility was essential &#8211; each customer needs a unique topic model, and support volume, vocabulary, and conversation style vary widely (which results in very different number of topics for different customers).<\/p>\n\n\n\n<p>The key parameter we tuned was <em>min_cluster_size<\/em>, which sets the smallest number of messages that can form a cluster. Lower values create more granular clusters; higher values produce fewer, broader ones. We settled on 15 as the minimum, which for our own Intercom data surfaced more than 700 micro-topics &#8211; a reflection of how diverse support conversations can be. Cluster counts also scale naturally with company size: more traffic means more clusters. This reinforced our intuition that fixing the number of clusters upfront wasn\u2019t an option.<\/p>\n\n\n\n<p>Here&#8217;s a relationship between company size (measured by 3-month conversation volume) and number of discovered subtopics. Larger companies with more diverse customers naturally yield more granular clusters &#8211; validating our choice of HDBSCAN\u2019s adaptive approach over fixed cluster counts:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"849\" height=\"450\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-8.png\" alt=\"\" class=\"wp-image-532\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-8.png 849w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-8-300x159.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-8-768x407.png 768w\" sizes=\"auto, (max-width: 849px) 100vw, 849px\" \/><\/figure>\n\n\n\n<p>Another advantage of HDBSCAN over K-Means is how it handles noise. K-Means forces every data point into a cluster &#8211; even outliers &#8211; and offers no confidence scores. HDBSCAN, by contrast, assumes noise exists and can explicitly flag it. In customer support, many one-off or unusual questions don\u2019t fit neatly into a topic. HDBSCAN\u2019s ability to mark these as outliers, with associated scores, is invaluable for managing such edge cases.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">From Subtopics to Topics: Building a Bottom-Up Hierarchy<\/h3>\n\n\n\n<p>Now let&#8217;s get to the second part of the pipeline: topic aggregation.<\/p>\n\n\n\n<p>Many systems define topic hierarchies top-down: decide the categories first, then slot everything in. That works if you already know what to expect &#8211; but support data is full of surprises. We wanted the opposite: a bottom-up approach where the data speaks first, surfacing themes customers may not anticipate. The process starts with fine-grained subtopics, discovered via embedding-based clustering with HDBSCAN (as described above). Each subtopic captures a narrow slice of intent:<\/p>\n\n\n\n<p>&#8211; <strong>Message:<\/strong> &#8220;What is the price for resolutions?&#8221; <br>\u2192 <strong>Subtopic<\/strong>: Fin resolution rate<\/p>\n\n\n\n<p>&#8211;<strong> Message:<\/strong> &#8220;What model does Fin use under the hood?&#8221; <br>\u2192 <strong>Subtopic<\/strong>: Fin LLM usage<\/p>\n\n\n\n<p>While different in detail, these subtopics both belong to the broader <strong>Fin topic<\/strong>. The challenge was: how do we automatically group subtopics into meaningful parent topics?<\/p>\n\n\n\n<p>We first tried HDBSCAN\u2019s hierarchy features, but the results were too coarse and inconsistent. So we reapplied our pipeline &#8211; this time clustering subtopic embeddings (mean sentence vectors). To our surprise, it worked: fine-grained Fin-related subtopics clustered under \u201cFin,\u201d and so on. This recursive clustering produced flexible, two-layer hierarchies that adapt to each customer\u2019s data &#8211; without predefined categories.<\/p>\n\n\n\n<p>Still, the second clustering wasn\u2019t perfect. To refine it, we added an LLM step: cleaning up inconsistencies, correcting edge cases, and generating clear, human-readable topic names. We layered generative models on top of the unsupervised structure so each cluster becomes understandable and actionable. The LLM helps by:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Flagging spam or irrelevant data<\/li>\n\n\n\n<li>Naming subtopics and topics<\/li>\n\n\n\n<li>Summarising clusters in plain language<\/li>\n\n\n\n<li>Highlighting representative examples<\/li>\n<\/ul>\n\n\n\n<p>This hybrid approach gives us the best of both worlds: the scale of ML with the fluency of generative AI.<\/p>\n\n\n\n<h2 id=\"conversational-data\" class=\"wp-block-heading\">Conversational Data<\/h2>\n\n\n\n<p>Traditional topic modeling was built for long-form, structured text &#8211; like news articles or research papers. But conversations don\u2019t follow those rules. They are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Short &amp; fragmented \u2013 messages like \u201cIt\u2019s not working\u201d or \u201clogin fails\u201d lack standalone context<\/li>\n\n\n\n<li>Informal &amp; messy \u2013 typos, abbreviations, emojis, and non-standard grammar are the norm<\/li>\n\n\n\n<li>Multi-turn &amp; multi-speaker \u2013 dialogues jump between issues across several back-and-forths<\/li>\n\n\n\n<li>Contextual &amp; implicit \u2013 meaning often depends on prior messages or hidden intent<\/li>\n<\/ul>\n\n\n\n<p>In other words, applying standard NLP pipelines to chats is like trying to summarize a movie from random one-liners. We had to rethink preprocessing, clustering, and even how we evaluate topic quality.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Preprocessing<\/h3>\n\n\n\n<p>Greetings, disclaimers, brand names, agent scripts, typos, and pleasantries can easily swamp the real signal. Without preprocessing, clustering just produces giant, meaningless groups like <em>\u201cHi, I have a question.\u201d<\/em> So we built a preprocessing pipeline that goes beyond standard NLP cleanup:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Remove brand names &amp; boilerplate. <\/strong>Frequent mentions (e.g., \u201cIntercom\u201d) or scripted phrases skew clusters, creating vague, oversized groups that add no real insight<\/li>\n\n\n\n<li><strong>Extract main messages &amp; collapse turns<\/strong><br>&#8211; A lightweight LLM pinpoints the messages that carry intent &#8211; what the customer wants, the issue they face, or their reaction<br>&#8211; When individual turns are too short or vague, we combine related ones to form richer inputs &#8211; especially helpful when intent unfolds gradually across a chat<\/li>\n\n\n\n<li><strong>Spam filtering<\/strong>. We use a lightweight LLM to filter out irrelevant conversations that aren&#8217;t related to customer support from our training data<\/li>\n\n\n\n<li><strong>&#8220;Classic&#8221; text cleaning.<\/strong> Lowercasing, punctuation handling, stop word removal, typo normalisation, emoji stripping (or translation), etc.<\/li>\n<\/ol>\n\n\n\n<p>Most of these are self-explanatory, but here&#8217;s an example of what step #2 does:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Raw Conversation<\/strong><\/th><th><strong>Preprocessed Output<\/strong><\/th><\/tr><\/thead><tbody><tr><td>User: Hi<br>AI: Hello, how can I help you?<br>User: I have a question<br>AI: I&#8217;m here to help you<br>User: Fin<br>User: What is the price?<\/td><td>What is the price for Fin resolution?<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Inference<\/h3>\n\n\n\n<p>Our inference strategy was heavily inspired by BERTopic. Instead of running the full sequence of trained models (Embeddings \u2192 UMAP \u2192 HDBSCAN) for each new conversation, we opted for a simpler, faster, and more production-friendly method: <strong>centroid-based inference.<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Cluster representation<\/strong>. After training, each discovered cluster is represented by a centroid &#8211; the mean embedding of all conversations in that cluster<\/li>\n\n\n\n<li><strong>New conversation processing<\/strong>. At inference time, we:<br>&#8211; Assign the conversation to the closest cluster, based on the highest similarity score<br>&#8211; Embed the new conversation using the same sentence transformer<br>&#8211; Compute cosine similarity between the new embedding and all cluster centroids<\/li>\n\n\n\n<li><strong>Confidence thresholding.<\/strong> We define a minimum similarity threshold. If no centroid crosses this threshold, the conversation is considered out-of-distribution and flagged as a potential new or unknown topic.<\/li>\n<\/ol>\n\n\n\n<p>Switching to centroid-based representations gave us <strong>high-throughput, real-time inference<\/strong> with a compact model. But it came with risks. Unlike HDBSCAN, which supports arbitrary cluster shapes, centroid methods assume clusters are roughly spherical. That\u2019s not always the case &#8211; some clusters are elongated or multi-modal, so the centroid may sit outside the densest region, reducing accuracy. To check this tradeoff, we analysed cluster geometry to confirm they were compact enough to make centroids a reliable shortcut.<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\" open><summary>Our geometry analysis confirmed the trade-off was worthwhile: clusters were compact enough for centroids to serve as reliable shortcuts; click to minimise this section, if you want to skip this detail.<\/summary>\n<p>To validate that centroids could reasonably represent our clusters, we needed to check if those clusters were roughly spherical &#8211; i.e., compact, well-distributed around a center, and not elongated or fragmented. We used Principal Component Analysis (PCA) and spectral entropy as geometric proxies.<\/p>\n\n\n\n<p>Here are the metrics we calculated for every cluster of multiple customer topic models:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>PCA Aspect Ratio<\/strong>: The ratio between the first and second PCA eigenvalues, measures how elongated clusters are <\/li>\n\n\n\n<li><strong>PCA Top Variance Ratio: <\/strong>Indicates how much variance is captured by the first principal component <\/li>\n\n\n\n<li><strong>Spectral Entropy: <\/strong>Quantifies the distribution of eigenvalues, measures how evenly variance is distributed across all components<\/li>\n<\/ul>\n\n\n\n<p>We performed this analysis across multiple clusters to validate the centroid-based approach, and it showed promising signs:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"763\" height=\"450\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-9.png\" alt=\"\" class=\"wp-image-533\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-9.png 763w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-9-300x177.png 300w\" sizes=\"auto, (max-width: 763px) 100vw, 763px\" \/><\/figure>\n\n\n\n<p>PCA Aspect Ratio &amp; Top Variance Ratio mostly fell in the expected range for spherical clusters (aspect ratio 1\u20132.5; variance ratio &gt;0.35). Spectral Entropy was more mixed: most clusters scored 0.6\u20130.8 &#8211; egg-shaped rather than round &#8211; with some above 0.8, indicating near-spherical shapes.<\/p>\n\n\n\n<p>In practice, this moderate anisotropy didn\u2019t prevent centroids from being effective, especially when clusters were well-separated.&nbsp;<\/p>\n\n\n\n<p>We also visually inspected the clusters to verify their shape, density, and cohesion. While not all clusters are perfectly spherical, both the metrics and visual inspections of samples confirmed that most are compact and coherent enough for centroids to act as reliable representatives:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-10-1024x683.png\" alt=\"\" class=\"wp-image-534\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-10-1024x683.png 1024w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-10-300x200.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-10-768x512.png 768w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-10.png 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Does it hurt performance?<\/strong> In short &#8211; no. To compare inference methods, we used LLM-as-judge: asking an LLM to evaluate whether each subtopic\u2013topic pair accurately captured the theme of a conversation. It\u2019s not a strict system metric, but a useful proxy for human judgment. The results were clear: centroid-based inference performed about 2 points better on average than the full pipeline, with both approaches exceeding 85% accuracy. In other words, simplifying didn\u2019t just hold up &#8211; it slightly improved results.<\/p>\n\n\n\n<p><strong>Outlier reduction:<\/strong> switching to centroids cut outliers by 50%, boosting topic coverage. This mattered because support data mixes dense clusters with diffuse, low-density groups. HDBSCAN favors well-defined clusters and often discards the rest as noise &#8211; even when those \u201cnoisy\u201d points are meaningful. Centroids let us reassign many of those borderline cases to valid topics, improving recall without sacrificing quality. <\/p>\n<\/details>\n<\/div>\n\n\n\n<p>To balance performance and flexibility, we split the pipeline into two stages:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Discovery with HDBSCAN \u2013 unsupervised exploration to uncover new topics without preset limits.<\/li>\n\n\n\n<li>Inference with centroids \u2013 a fast, similarity-based method optimised for scale<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Evaluating Topic Quality<\/h3>\n\n\n\n<p>Evaluating topic modelling is inherently difficult. As well as being an unsupervised task, it is also inherently ambiguous. There are many valid ways to split conversations into clusters, and the \u201cbest\u201d result often depends on human judgment and downstream usability rather than strict metrics. Still, every ML system needs some form of evaluation. We approached it from multiple angles:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Clustering quality \u2013 Are topics well-separated and coherent?<\/li>\n\n\n\n<li>Prediction accuracy \u2013 Are new conversations assigned correctly?<\/li>\n\n\n\n<li>Human validation \u2013 Do these topics make sense to real users?<\/li>\n<\/ul>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\" open><summary>Next, we&#8217;ll walk you through how we assessed clustering quality, prediction accuracy, and human validation. If you\u2019d rather skip the technical deep dive, feel free to skip\u00a0 &#8211;\u00a0 the key takeaway is that our evaluation confirmed the system\u2019s clusters are both coherent and practically useful.<br><\/summary>\n<h4 class=\"wp-block-heading\">Clustering quality<\/h4>\n\n\n\n<p>To evaluate cluster geometry in embedding space, we focused on two cosine-distance metrics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Mean Inter-Centroid Distance (Separation)<\/strong>. Measures the average distance between cluster centroids. This is critical for centroid-based inference, where close centroids can cause unstable topic assignments\n<ul class=\"wp-block-list\">\n<li>Higher = better separation \u2192 topics are distinct, with less semantic overlap<\/li>\n\n\n\n<li>Lower = weaker separation \u2192 clusters risk blending, making assignments less reliable<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Mean Intra-Cluster Distance (Cohesion<\/strong>). Measures the average distance between messages and their cluster centroid. This shows whether a centroid is truly representative of its cluster\n<ul class=\"wp-block-list\">\n<li>Lower = stronger cohesion \u2192 messages are compact and consistent<\/li>\n\n\n\n<li>Higher = weaker cohesion \u2192 clusters may contain noise or multiple subthemes<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>Together, these metrics capture the classic trade-off: clusters should be internally cohesive yet externally distinct &#8211; a must for our centroid-based approach.<\/p>\n\n\n\n<p>Here&#8217;s a plot of these two metrics, with each data point being an Intercom customer:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"600\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-11.png\" alt=\"\" class=\"wp-image-535\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-11.png 600w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-11-300x300.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-11-150x150.png 150w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<p>Mean inter-centroid distance falls mostly between 0.6 and 0.8, averaging around 0.75, suggesting strong topic separation. Mean intra-cluster distance ranges from 0.0 to 0.35, with a typical value around 0.2, indicating good cohesion in most cases. We did observe a few outliers &#8211; models with lower separation and\/or weaker cohesion. These may reflect noisier datasets or edge cases with very small or highly variable support volumes. However, the majority of models fall within an acceptable trade-off zone, showing that our unsupervised clustering approach generalises well across different customers with diverse conversation sets.<\/p>\n\n\n\n<p>Beyond metrics, we ran sanity checks to confirm clusters behaved as expected across datasets:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Noise Points (Outliers)<\/strong>: HDBSCAN flagged ~10\u201315% of conversations as outliers &#8211; reasonable for noisy support data. Too many (50\u201370%) would suggest the model is overly strict, while too few might mean clusters are too broad<\/li>\n\n\n\n<li><strong>Uneven Cluster Sizes<\/strong>: another failure mode is one giant cluster swallowing most data, leaving only tiny ones behind &#8211; usually a sign of overly permissive parameters<\/li>\n\n\n\n<li><strong>Cluster Size Distribution<\/strong>: in practice, we saw a J-shaped distribution: many small, focused clusters and a few large, high-volume ones. This matches expectations &#8211; most issues are niche, while a handful dominate support traffic:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"726\" height=\"450\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-12.png\" alt=\"\" class=\"wp-image-536\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-12.png 726w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-12-300x186.png 300w\" sizes=\"auto, (max-width: 726px) 100vw, 726px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Prediction accuracy<\/h4>\n\n\n\n<p>It\u2019s worth emphasising the distinction between cluster discovery and cluster assignment:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Discovery is about in-sample structure &#8211; how well the model can group historical conversations into meaningful topics using unsupervised learning<\/li>\n\n\n\n<li>Prediction is about out-of-sample generalization &#8211; how accurately we can assign new, unseen conversations to the right existing topic.<\/li>\n<\/ul>\n\n\n\n<p>This framing also highlights one of the most important challenges: deciding when a new conversation doesn\u2019t fit any existing topic. That\u2019s where the similarity threshold plays a key role. If enough conversations consistently fall below that threshold and form a dense region of their own it might signal the need to form a new topic cluster entirely.<\/p>\n\n\n\n<p>To optimize the decision boundary, we inspected the distribution of cosine similarity scores. A threshold of 0.5 provided a good trade-off between precision and recall: above 0.5 = assign to closest cluster; below = flag as unassigned.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"500\" src=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-13.png\" alt=\"\" class=\"wp-image-537\" srcset=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-13.png 800w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-13-300x188.png 300w, https:\/\/fin.ai\/research\/wp-content\/uploads\/2026\/01\/image-13-768x480.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n\n<p>To validate the quality of topic assignments, we used LLM-as-a-judge &#8211; a lightweight evaluation method where an LLM was asked: \u201cDoes this topic match the conversation theme?\u201d<\/p>\n\n\n\n<p>The model returned a True\/False answer, serving as a semantic proxy for accuracy in the absence of ground truth labels. This approach simulates human evaluation and helps assess how well the assigned topic captures the user\u2019s actual intent. We also used this method to cross-validate our cosine similarity threshold. Based on these judgments, we found that a threshold of 0.5 struck the right balance between precision and recall. Overall, assignment accuracy exceeded 85%.<\/p>\n\n\n\n<p><strong>Unassigned: outliers or a new topic? <\/strong>When a new conversation receives a cosine similarity score below 0.5, we assume it doesn\u2019t belong to any existing cluster &#8211; and it\u2019s flagged as unassigned. However, that doesn\u2019t always mean the conversation is noise or a rare edge case. Sometimes, it reflects something more important: a new topic that has recently emerged and wasn\u2019t present in the data when the original clusters were created. These unassigned conversations can be early signals &#8211; new issues, product questions, or behavioural patterns that haven\u2019t yet been captured by the system. More on how we detect and promote new topics in the next section (Model Updates).<\/p>\n\n\n\n<p><strong>Multi-topic aspect and borderline assignment<\/strong>. We also explored the multi-label nature of conversational data &#8211; since it\u2019s well understood that a single conversation can touch on multiple topics. There are two main reasons for this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>In longer conversations, the topic can naturally change over time. <\/strong>To handle this, we extract multiple key messages per conversation, allowing us to capture more than one topic when necessary. For example, if a chat starts with Fin pricing and then moves on to a bug in conversation assignment, our system will spot and tag both topics separately<\/li>\n\n\n\n<li><strong>One message can genuinely fit more than one topic.<\/strong> For instance, \u201cWhat\u2019s the difference between Fin AI Agent and Fin AI Copilot?\u201d belongs to both categories in the absence of a dedicated \u201ccomparison\u201d topic. Such semantic overlap is common in real-world language, especially in support data where questions blend multiple intents. Occasionally, overlap instead signals redundancy &#8211; two clusters describing the same issue &#8211; which can be spotted through small inter-centroid distances.<\/li>\n<\/ol>\n\n\n\n<p>To quantify ambiguity, we analysed secondary and tertiary topic matches via cosine similarity. Only about 5% of messages showed less than a 0.05 gap between their top two or three topic candidates &#8211; meaning true borderline cases are rare. While clearer separation between clusters reduces edge cases, genuine multi-intent conversations will always resist perfectly clean boundaries.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Human validation<\/h4>\n\n\n\n<p>No matter how good the metrics look, topics must make sense to people. Cohesion and separation scores are helpful, but they\u2019re meaningless if clusters don\u2019t reflect how users actually interpret their data.<\/p>\n\n\n\n<p>Quantitative metrics can\u2019t fully capture subjective structure &#8211; especially hierarchies. There\u2019s no single \u201cright\u201d way to group subtopics: we cluster by semantic similarity (e.g., all \u201cFin\u201d-related items under one topic), while some users prefer grouping by issue type, like bugs or pricing, across products.<\/p>\n\n\n\n<p>To stay grounded in real-world needs, we interviewed customers and gathered qualitative feedback on topic clarity and usefulness. This feedback loop remains central to our process &#8211; because in the end, quality is measured by how well the system helps people do their jobs.<br><\/p>\n<\/details>\n\n\n\n<h2 id=\"how-to-update-the-model\" class=\"wp-block-heading\">How to Update the Model?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Emerging topics<\/h3>\n\n\n\n<p>Unsupervised models are typically static &#8211; they\u2019re trained once and can\u2019t easily adapt to new data. But support data evolves constantly: new products launch, features change, and fresh themes appear. A model trained weeks ago won\u2019t recognize today\u2019s conversations. This causes two main issues:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Misassignment<\/strong> \u2013 new topics get forced into old clusters<\/li>\n\n\n\n<li><strong>Unassignment<\/strong> \u2013 new conversations fall below the similarity threshold (e.g., cosine &lt; 0.5) and remain unlabelled<\/li>\n<\/ol>\n\n\n\n<p>To address this, we built a daily pipeline that detects emerging topics and incrementally updates the model. Thanks to the centroid-based architecture, adding a new topic is simple &#8211; just introduce a new centroid.<\/p>\n\n\n\n<p>This is how we identify new topics:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Discovery on new data<\/strong>. We apply the same topic modelling pipeline (embedding \u2192 UMAP \u2192 HDBSCAN) to a combination of new unseen support conversations (out-of-sample), and unassigned conversations from recent inference runs<\/li>\n\n\n\n<li><strong>Centroid comparison &amp; deduplication<\/strong><br>&#8211; We compare the centroids of these newly discovered clusters against existing centroids to check for similarity<br>&#8211; We retain only the clusters that are well-separated from existing topics &#8211; ensuring they represent genuinely new themes.<\/li>\n\n\n\n<li><strong>Update the model.<\/strong> The filtered centroids are simply appended to the existing centroid list &#8211; effectively extending the model without retraining. This process lets us keep the topic model fresh and adaptive, without sacrificing scalability or interpretability.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Archiving topics<\/h3>\n\n\n\n<p>Just as new topics emerge, others naturally fade. Over time, two patterns appear:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Recurring topics \u2013 persistent themes like pricing, cancellations, or onboarding that remain active and should stay in the model<\/li>\n\n\n\n<li>Event-driven topics \u2013 short-lived spikes tied to launches, campaigns, or temporary bugs<\/li>\n<\/ul>\n\n\n\n<p>When these transient topics go quiet, we archive them. Centroids for inactive clusters are removed, reducing noise and keeping the model aligned with current conversation trends.<\/p>\n\n\n\n<h2 id=\"conclusion\" class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>We\u2019ve designed the system with real-world constraints in mind: messy input data, changing user behavior, and the need for fast, high-throughput inference. With centroid-based inference, ongoing model updates, and human-in-the-loop validation, we\u2019ve built a flexible, production-ready system that evolves alongside the conversations it\u2019s built to understand.<\/p>\n\n\n\n<p>Now, topics are no longer just labels &#8211; they\u2019re a lens into customer reality. And with the right mix of structure and language, that lens becomes a powerful tool for action.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We designed a topic modelling system to improve the Fin AI agent by detecting underperforming areas, identifying root causes, and enabling continuous optimisation. We wanted to get the quality benefits of modern generative AI, and the&hellip;<\/p>\n","protected":false},"author":51,"featured_media":114,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30],"tags":[],"coauthors":[35],"class_list":["post-528","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-text-classification"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.6 (Yoast SEO v24.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue - \/research<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue\" \/>\n<meta property=\"og:description\" content=\"We designed a topic modelling system to improve the Fin AI agent by detecting underperforming areas, identifying root causes, and enabling continuous optimisation. We wanted to get the quality benefits of modern generative AI, and the&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\" \/>\n<meta property=\"og:site_name\" content=\"\/research\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-30T19:06:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-30T19:11:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1-1024x683.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"683\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Mariia Matskevichus\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@intercom\" \/>\n<meta name=\"twitter:site\" content=\"@intercom\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mariia Matskevichus\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"20 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\"},\"author\":{\"name\":\"Mariia Matskevichus\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/a67a2fc19c676a0ff675ff5aa84fc7b9\"},\"headline\":\"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue\",\"datePublished\":\"2026-03-30T19:06:23+00:00\",\"dateModified\":\"2026-03-30T19:11:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\"},\"wordCount\":4273,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png\",\"articleSection\":[\"Text Classification\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\",\"url\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\",\"name\":\"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue - \/research\",\"isPartOf\":{\"@id\":\"https:\/\/fin.ai\/research\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png\",\"datePublished\":\"2026-03-30T19:06:23+00:00\",\"dateModified\":\"2026-03-30T19:11:43+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png\",\"width\":1920,\"height\":1280},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/fin.ai\/research\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/fin.ai\/research\/#website\",\"url\":\"https:\/\/fin.ai\/research\/\",\"name\":\"Intercom.ai\",\"description\":\"Insights and blogs from the AI Group building Fin\",\"publisher\":{\"@id\":\"https:\/\/fin.ai\/research\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/fin.ai\/research\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/fin.ai\/research\/#organization\",\"name\":\"Intercom.ai\",\"url\":\"https:\/\/fin.ai\/research\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"contentUrl\":\"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png\",\"width\":1024,\"height\":1024,\"caption\":\"Intercom.ai\"},\"image\":{\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/intercom\",\"https:\/\/www.linkedin.com\/company\/intercom\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/a67a2fc19c676a0ff675ff5aa84fc7b9\",\"name\":\"Mariia Matskevichus\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/d77d02d0877610a327d4a8eae8784935\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/6b8a8752b92c036ed7eb26957e746bc7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/6b8a8752b92c036ed7eb26957e746bc7?s=96&d=mm&r=g\",\"caption\":\"Mariia Matskevichus\"},\"description\":\"is a Staff Machine Learning Scientist turning advanced machine learning ideas into practical, user-facing systems.\",\"url\":\"https:\/\/fin.ai\/research\/author\/mariia-matskevichus\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue - \/research","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/","og_locale":"en_US","og_type":"article","og_title":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue","og_description":"We designed a topic modelling system to improve the Fin AI agent by detecting underperforming areas, identifying root causes, and enabling continuous optimisation. We wanted to get the quality benefits of modern generative AI, and the&hellip;","og_url":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/","og_site_name":"\/research","article_published_time":"2026-03-30T19:06:23+00:00","article_modified_time":"2026-03-30T19:11:43+00:00","og_image":[{"width":1024,"height":683,"url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1-1024x683.png","type":"image\/png"}],"author":"Mariia Matskevichus","twitter_card":"summary_large_image","twitter_creator":"@intercom","twitter_site":"@intercom","twitter_misc":{"Written by":"Mariia Matskevichus","Est. reading time":"20 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#article","isPartOf":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/"},"author":{"name":"Mariia Matskevichus","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/a67a2fc19c676a0ff675ff5aa84fc7b9"},"headline":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue","datePublished":"2026-03-30T19:06:23+00:00","dateModified":"2026-03-30T19:11:43+00:00","mainEntityOfPage":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/"},"wordCount":4273,"commentCount":0,"publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"image":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png","articleSection":["Text Classification"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/","url":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/","name":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue - \/research","isPartOf":{"@id":"https:\/\/fin.ai\/research\/#website"},"primaryImageOfPage":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage"},"image":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage"},"thumbnailUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png","datePublished":"2026-03-30T19:06:23+00:00","dateModified":"2026-03-30T19:11:43+00:00","breadcrumb":{"@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#primaryimage","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/grassmirror_colorful_data_illustration_schematic_with_interse_25ea7dac-8d19-461d-9ee6-9d1b7938d353_1-1.png","width":1920,"height":1280},{"@type":"BreadcrumbList","@id":"https:\/\/fin.ai\/research\/unsupervised-learning-meets-generative-ai-topic-modelling-for-real-world-dialogue\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/fin.ai\/research\/"},{"@type":"ListItem","position":2,"name":"Unsupervised Learning Meets Generative AI: Topic Modelling for Real-World Dialogue"}]},{"@type":"WebSite","@id":"https:\/\/fin.ai\/research\/#website","url":"https:\/\/fin.ai\/research\/","name":"Intercom.ai","description":"Insights and blogs from the AI Group building Fin","publisher":{"@id":"https:\/\/fin.ai\/research\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/fin.ai\/research\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/fin.ai\/research\/#organization","name":"Intercom.ai","url":"https:\/\/fin.ai\/research\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/","url":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","contentUrl":"https:\/\/fin.ai\/research\/wp-content\/uploads\/2025\/03\/favicon.png","width":1024,"height":1024,"caption":"Intercom.ai"},"image":{"@id":"https:\/\/fin.ai\/research\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/intercom","https:\/\/www.linkedin.com\/company\/intercom"]},{"@type":"Person","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/a67a2fc19c676a0ff675ff5aa84fc7b9","name":"Mariia Matskevichus","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/fin.ai\/research\/#\/schema\/person\/image\/d77d02d0877610a327d4a8eae8784935","url":"https:\/\/secure.gravatar.com\/avatar\/6b8a8752b92c036ed7eb26957e746bc7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6b8a8752b92c036ed7eb26957e746bc7?s=96&d=mm&r=g","caption":"Mariia Matskevichus"},"description":"is a Staff Machine Learning Scientist turning advanced machine learning ideas into practical, user-facing systems.","url":"https:\/\/fin.ai\/research\/author\/mariia-matskevichus\/"}]}},"_links":{"self":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/528","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/users\/51"}],"replies":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/comments?post=528"}],"version-history":[{"count":0,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/posts\/528\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media\/114"}],"wp:attachment":[{"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/media?parent=528"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/categories?post=528"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/tags?post=528"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/fin.ai\/research\/wp-json\/wp\/v2\/coauthors?post=528"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}