If you’d like to train Fin with website content, you can do this by syncing the public URL for that site.
Get started
Go to Train > Content and then select Website sync below the "Add content" section.
Now enter the public URL of your website (top-level domain) and click Next.
This will fetch all of the pages from the website URL you provide and will read from all the sub domain pages.
Tips:
Provide your external help center homepage link for best results.
Use top-level domains (e.g. https://myhelpcenter.com rather than https://myhelpcenter.com/articles).
Now you can click Sync, or configure the advanced settings below before syncing.
Advanced settings
If you want to configure your website sync further (such as including or excluding certain URLs) click into Advanced settings.
Additional URLs
Website structures can vary. To make sure that we sync your most relevant content, we recommend you add additional URLs for those specific subpages.
For example, if you input https://myhelpcenter.com/help as the primary URL above, you might also want to add the specific URL like https://myhelpcenter.com/help/index.html
URLs to exclude
To exclude certain pages you don’t want to sync content from, you can add a list of URL globs.
What is a url glob?
A glob is a string of literal and/or wildcard characters used to match file paths or URLs. Globbing is the act of locating files on a filesystem using one or more globs. Using URL globs also helps to get a range of URLs that are mostly the same, with only a small portion of it changing between the requests.
For example, this URL glob https://{store,docs}.example.com/** lets the crawler access all URLs starting with https://store.example.com/ or https://docs.example.com/ and https://example.com/**/*\?*foo=*
Page elements to include
To avoid scraping content from specific sections, you can select Custom and add a list of the CSS selectors you want to include or exclude.
Use this to exclude parts of the page that aren't useful for Fin—like navbars or banners.
The value must be a valid CSS selector as accepted by the document.querySelectorAll() function (e.g. sidebar, #newsletter-banner).
By default, we already exclude common elements such as headers, footers, modals, scripts, and inline images.
Clickable CSS selector
This allows for DOM elements identified by the CSS selector, to be clicked during the web sync process.
Use this to capture content hidden inside expandable sections, tabs, or dropdowns.
Enter valid CSS selectors like
[aria- expanded="false"],#expand_section,.tab.To match elements with multiple classes, use CSS chaining (no spaces). For example
.button.blue.smalltargets elements that have all three classes.To match multiple different elements, separate selectors with commas. For example:
.tab,accordiontargets both tab and accordion elements.
Wait to load CSS selector
To target content that may have a delay in appearing on the page, you can add a CSS selector that will make the web scraper wait before scraping content.
Use this when content loads slowly or after user interaction (e.g. via JavaScript).
The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.
The page will only be processed once the selected element appears-this overrides the default timing behavior.
Enter a valid CSS selector, such as
#load_content_idor.article_paragraph.
XML Sitemap
To access pages that might not be reachable from the initial URLs, you can enable XML Sitemap for a more robust web sync on sitemap supported websites.
If this option is enabled, the web scraper will look for Sitemaps at the domains of the provided source URL and enqueue matching URLs similarly as the links found on crawled pages. You can also reference a sitemap.xml file directly by adding it as another Start URL e.g. https://www.example.com/sitemap.xml.
When you've finished customizing advanced settings, click Sync.
Manage website sources
Once the sync is complete, you’ll receive an email notification and the website will appear as a synced source in Train > Content under the "Content sources" section.
If you click into a website source, you can preview and manage the individual pages that were synced from the public URL.
Note: Website sources are read-only and can’t be edited within your Fin workspace, they must be edited at the source.
Configure settings
When you view a website page, you’ll find a "Details" panel on the right which contains:
Data: View the content type, language, creation date, and last update (when it was last synced with the source).
Fin: Enable/disable for Fin Agent and Fin Copilot. When enabled, the content becomes available to customers and teammates, respectively
Audience: Ensure customers only get answers and see content from Fin Agent that is relevant for them.
Link: The public URL for this website source.
Reports: Tracks how often this content is involved and used to resolve conversations by Fin Agent .
Tag: Add a tag to group webpages together and keep content organized.
Make it available to Fin
To make a website source available to Fin Agent or Fin Copilot, go to Train > Content and click on the website source under the "Content sources" section, then open the relevant webpage you've synced.
From the "Details" panel, scroll down to “Fin” and toggle on:
Fin Agent - This setting will make the webpage available for Fin AI to use when responding to customers (it will respect any audience rules).
Fin Copilot - This setting will make the webpage available for Fin Copilot to use when responding to teammates.
Make it available to a specific audience
If this website source is only relevant for a specific subset of customers, you can use audience filters to make it visible to certain people.
First, you’ll need to create and define the audience you want to target.
Then go to Train > Content and click on the website source under the "Content sources" section, then open the relevant webpage you've synced.
From the "Details" panel, scroll down to “Fin” and use the audience dropdown to select one of your pre-defined audiences.
Note:
The default audience for public URLs is “Everyone”.
Fin Agent will also respect any audience you apply to a public URL and only use this article to answer customer questions if they match the audience rules.
Re-sync or remove a website as a source
If you’d like to re-sync or remove a public URL as a source, go to Train > Content, and click on the website source under the "Content sources" section, then open the Settings dropdown in the top right.
Here, you can select whether to Re-sync or Remove this source.
Tip: Website re-syncs usually happen weekly (depending on the size of the source) and can be re-synced manually at any time.
View website sync history
You can view a list of past website syncs to see when they were last run, which pages were found, and any failed pages. Go to Train > Content, and click on the website source under the "Content sources" section, then select View sync history.
Each row in the table represents a past or active run, and you can filter the runs by status (started, success, failed).
It includes the following information:
Sync date
Status
Synced pages
Excluded pages
Failed pages
Duration
Sync started by
If a sync has failed, you can hover over the status to see a detailed explanation for why.
Troubleshooting website sync
Common issues
When importing website content to enable Fin, you need to enter the public URL. This will search for all pages nested under that URL and sync them for Fin AI Agent to use.
If the importer didn't return the number of pages you expected, there are a few reasons...
The URL provided isn't the top level domain
The website sync works by going to the URL you provide and then searching for all pages nested under that URL. These pages must have the same URL pattern as the URL you provide.
For example, if the top level domain is https://myhelpcenter.com/home, then all pages you want to import must include /home prefix in the URL e.g. https://myhelpcenter.com/home/article. If they do not, remove the prefix and use the most basic URL stem e.g. https://myhelpcenter.com, then try the import again.
The URL is private
If the content you want to use is behind a login, Fin won't be able to access or import it.
Page limits
You can sync up to 100 different top level domains and Fin will sync a maximum of 30,000 pages from each source. Syncing can sometimes fail if there is a very large amount of content on a single page (you'll be notified if a sync fails).
Websites restricted to specific regional IPs
Fin's website sync (used to add public URLs for Fin AI Agent and Copilot) does not use a dedicated, custom user-agent string at this time.
Website sync errors
When you sync website content, you may see different statuses that indicate what happened during the process. To see your website sync status go to Train > Content and select the website source, then use the Status dropdown to filter by:
Syncing
Live
Failed
Excluded
Here’s what each one means and what you can do next:
Syncing
The page sync is still in progress. An initial sync can take anywhere from a few minutes to over an hour based on how much content you have.
Live
The page was successfully synced and can be enabled for Fin and Copilot.
Note: A successful sync doesn’t always mean we were able to scrape all of the content on the page. If you want to confirm full coverage, we recommend previewing Fin with answers you expect it to find from that page.
Excluded
These pages are intentionally not synced because you excluded them in the Advanced sync settings. They can't be retried or included unless otherwise specified.
Failed
These errors mean the sync didn’t complete and may require changes on your side before retrying:
1. Unknown error
Message: “This page couldn't be accessed. It may be slow or blocked. Try syncing again, or contact support if it fails.”
What it means: Something prevented us from accessing the page, but the cause isn’t clear.
2. Session blocked / Rate limited
Message: “The website is preventing us from accessing its content. Check if it's being blocked by an anti-crawler setting or firewall. Check your site configuration and try syncing again. If the issue persists, contact support.”
What it means: Your site is actively blocking or limiting our crawler.
3. Network, timeout, or similar errors
Message: “This page couldn't be accessed. It may be slow to load or blocked by anti-crawler settings or a firewall. Check your site configuration and try syncing again. If the issue persists, contact support.
What it means: The page didn’t load in time or couldn’t be reached due to network issues or blocking.
4. Duplicate
Message: “This page has the same content as another that's already synced. Only one version will be included.”
What it means: We detected identical content elsewhere, so only one copy is kept.
5. Keyword filtering
Message: “Pages with keywords like category, collection, or tag in the URL are excluded by default, as they usually don't contain unique content. If this page should be included, contact support.”
What it means: These URLs often represent lists, not standalone content pages.
6. Status code 400
Message: “Page content cannot be found. Check that the URL is valid and the page loads without issues.
What it means: The URL may be broken or returning an error on your website.
7. Blocked URL
Message: “This website domain is blocked from being synced. If you require this, contact support.”
What it means: The domain is intentionally excluded from syncing.
You can retry a failed page sync by hovering over the page, select the three dot menu and then select Resync.



















