Enterprise APIThe AI engine behind the #1 AI agent for customer service.
Now available as an API.
Now available as an API.
Purpose-built models trained specifically for customer service. From full agent orchestration to granular retrieval and reranking endpoints — integrate Fin's AI infrastructure into your own product.


Choose the level to integrate at — turnkey agent to bare-metal.
End-to-end conversation resolution
Multi-turn context management
Intelligent escalation routing
Brand voice & policy enforcement
Source citation & accuracy validation
Semantic retrieval across your knowledge base
Cross-encoder reranking for precision
Grounded answer generation with citations
Configurable retrieval depth (K=5 to K=40)
Multilingual query support
- Retrieval/V1/RETRIEVE
- Reranking/V1/RERANK
- Embeddings/V1/EMBED
- Escalation Routing/V1/ROUTE
- Answer Validation/V1/VALIDATE
Purpose-built models, trained on real customer service data.
02Every component in the Fin AI Engine is a custom model fine-tuned on hundreds of millions of real support interactions. Not general-purpose. Not off-the-shelf. Built for this domain.
Query Refinement
Incoming queries are optimized before they hit the retrieval pipeline. Context from the full conversation history, user metadata, and timestamp are distilled into a semantically rich search query — maximizing retrieval precision from the first hop.
Semantic Retrieval
Our proprietary fin-cx-retrieval model — an embedding model trained on signal extracted from millions of Fin retrievals — scans your knowledge base and returns the top 40 candidate documents by semantic similarity.
Training uses InfoNCE contrastive loss with hard positives and hard negatives mined from production query logs — teaching the model the nuances of customer support language across industries and languages.
Cross-Encoder Reranking
The fin-cx-reranker has an 8,192 token context window, using ModernBERT-large as a building block. It reorders the 40 candidates by true relevance, trained using RankNet loss across 10s of millions of passage pairs.
Replaces other commercial reranking services with lower latency, higher throughput, and significantly higher quality on customer service content.
Escalation Routing
A multi-task ModernBERT classifier making three simultaneous predictions: escalation decision (3-way classification), reason categorization, and guideline citation (multi-label). Trained on millions of multilingual examples.
Achieves >98% accuracy at a competitive cost and a fraction of the latency of larger models often used for this task.
Generation & Validation
The top-ranked documents, full conversation context, user metadata, and brand guidelines are synthesized into a precise, cited response. A dedicated validation layer then checks for accuracy, hallucination, tone alignment, and policy compliance before the response is returned. Every answer is grounded in your content — never fabricated.
Industry leading performance for customer experience. Measured.
Built for trust and isolation.
Data Isolation
Per-workspace document isolation. No cross-tenant access. Query vectors discarded immediately after retrieval.
Secure Infrastructure
All fine-tuning on secure AWS infrastructure. No third-party data exposure. Per-passage independent embedding.
Audit & Transparency
Every routing decision is logged with reason categorization and guideline citation. Full explainability for compliance.
We publish our work. Read the technical details.
Finetuning Retrieval for Fin
How we fine-tuned a retrieval model on 2M real queries to achieve 72% precision — outperforming Voyage-Large-3 by 20 percentage points.
Read paperHow We Built a World-Class Reranker
Building a ModernBERT-based reranker that beats Cohere v3.5 across every metric — at 80% lower cost with no latency increase.
Read paperTo Escalate, or Not to Escalate
How a multi-task ModernBERT classifier achieves >98% routing accuracy — outperforming LLM teachers at a fraction of the cost.
Read paperUsing LLMs as a Reranker for RAG
A practical guide to parallel reranking, output token optimization, source diversity, and production deployment patterns.
Read paperBuilding Intercom's AI Infrastructure
How we built GPU-native developer infrastructure for training and serving custom models at scale on AWS.
Read paper
Integrate the Fin AI Engine
into your product.
The Fin API is available to qualified enterprise partners building AI-powered customer experiences. Tell us about your use case and we'll be in touch.
Custom pricing based on volume and endpoints
Dedicated onboarding and integration support
SLA-backed uptime guarantees
Enterprise security review available
We'll respond within 2 business days. Enterprise buyers only.