Dataset builder

Users choose folders, web tags, and managed files to form private knowledge sets.

Knowledge Sources Source scope Buyer proof
Approved sources Files, apps, sites Core policy Scope and model lane Grounded answer Cited response path Visible activity Trace and usage proof

What the buyer should understand.

Satinash turns selected source paths into governed retrieval scope with visible readiness and scan health. For dataset builder, the important buyer proof is simple: Open the dataset builder path in the Satinash client, perform the normal user action for the Knowledge Sources workflow, and verify the visible state, evidence, limits, or artifact output that confirms the capability completed its job. A strong demo narrates the user action, then pauses on the visible state before moving on: the active scope, the eligible sources or tools, the status message, the artifact output, the limit state, and the next action that a normal user can take. The evaluator leaves knowing that this is a flagship proof point shown early in a buyer demo because it anchors the broader Satinash story, how it is governed, and which adjacent features to test next.

Inspect the source scope

dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels

Run the user workflow

Review connector labels, full paths, external IDs, and overlap warnings before saving. In the dataset builder documentation, this step includes the user-visible confirmation, the expected state change, and the reason the step matters to the buyer's evaluation checklist.

Confirm the proof path

Primary proof surface: Open the dataset builder path in the Satinash client, perform the normal user action for the Knowledge Sources workflow, and verify the visible state, evidence, limits, or artifact output that confirms the capability completed its job. The evaluator sees the user action and the confirmation in the same flow, then identifies the exact state, table row, message, preview, control, citation, diagnostic, or output that proves dataset builder worked.

What Dataset builder solves

Dataset builder solves the client-side problem described by its product summary: users choose folders, web tags, and managed files to form private knowledge sets. The feature is documented as a workflow a buyer can run in Satinash, with a visible beginning, a visible state change, and an inspection surface that confirms the work happened.

The strongest use case is not generic AI productivity. It is the specific knowledge sources moment where workspace owners, operations teams, support leads, and knowledge managers who decide which private material can ground answers need to decide which folders, websites, and uploads are eligible for retrieval, whether the dataset is enabled and ready to answer, whether a scan should run now or wait for normal refresh, and whether plan limits, missing models, or disabled state explain blocked ingestion. The page keeps that decision in view so the reader understands the job, the product surface, and the business reason for the capability.

Where it appears in the client

Dataset builder appears around dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels. Those locations give the buyer a concrete route through the product instead of a feature claim that only exists in a slide deck.

The relevant client objects are dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state. When the feature is evaluated, each object either provides scope, proves readiness, explains a limit, or shows the next action available to the user.

Proof surfaces and pitfalls

The primary proof surface is scan usage details for processed, skipped, failed, stale, and deferred-over-quota items; the secondary proof surface is document inventory rows created by the dataset and their serving readiness. Together they show the action, the state, and the evidence path a buyer can inspect during or after the demo.

The main pitfall is expecting disabled datasets or missing embedding models to answer in chat. A second pitfall is continuing to crawl after quota overflow instead of respecting the visible backlog state. The documentation names both because long-form feature pages need to explain how a buyer can misread the workflow and how the client UI resolves that confusion.

What the user gets.

What it solves: Dataset builder addresses a concrete client-side problem in Satinash: users choose folders, web tags, and managed files to form private knowledge sets. It keeps the discussion anchored in a workflow a buyer can actually run, not a broad AI claim. The documentation explains the moment of need, the risk of doing the work manually, and the reason this capability belongs in the product rather than in a training note or sales promise.

Where it appears: Dataset builder lives around dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels. The relevant user is usually workspace owners, operations teams, support leads, and knowledge managers who decide which private material can ground answers. During evaluation, the buyer can point to the control, table, drawer, route, preview, or status label that makes the capability visible, then follow it into the next Satinash surface without asking for hidden context.

User outcome: Dataset setup that mirrors how teams actually organize knowledge across folders, websites, and managed uploads. For dataset builder, that outcome is strongest when the user can start from a real task, see the scope and state, complete the action, and understand what changed. The before-and-after is clear enough that a stakeholder can retell the workflow after the demo.

Operational context: Expansion-first browsing, selected-path review, connector labels, and external IDs for traceability. The feature works with dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state. Those objects matter because they tell buyers what must already exist, what can be configured by a workspace user, and what needs inspection when the result looks different from expectation.

Decision support: Dataset builder helps teams decide which folders, websites, and uploads are eligible for retrieval, whether the dataset is enabled and ready to answer, whether a scan should run now or wait for normal refresh, and whether plan limits, missing models, or disabled state explain blocked ingestion. The documentation states those decisions directly so the page works as an evaluation aid, a sales leave-behind, and a product reference for people who were not in the live demo.

Related features: compare Dataset builder with Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling. Those nearby pages give the evaluator the rest of the workflow: the source setup, the control surface, the evidence trail, and the operational follow-through. Linking the pages this way keeps the 100-feature catalog from feeling like isolated fragments.

Scope boundary: Knowledge Source pages explain source eligibility, lifecycle, and readiness from the user's point of view; destructive repair or operator reset flows do not belong in this marketing content. For dataset builder, that boundary is important because the marketing content describes visible client behavior and buyer evidence while staying out of operator-only setup details unless they explain what the user can inspect.

Workflow documentation.

  1. Pick source systems, browse real folders or website scopes, and add selected paths to a dataset. Start the walkthrough by naming Dataset builder, the user role, and the current client location. Show the buyer exactly where the workflow begins, what object is selected, and which visible state tells the user the page is ready for action.
  2. Review connector labels, full paths, external IDs, and overlap warnings before saving. In the dataset builder documentation, this step includes the user-visible confirmation, the expected state change, and the reason the step matters to the buyer's evaluation checklist.
  3. Enable the dataset, run an eligible scan, and watch source readiness and quota state. In the dataset builder documentation, this step includes the user-visible confirmation, the expected state change, and the reason the step matters to the buyer's evaluation checklist.
  4. Use the dataset in a Core, chat, or widget once processed documents are serving. In the dataset builder documentation, this step includes the user-visible confirmation, the expected state change, and the reason the step matters to the buyer's evaluation checklist.
  5. Sync manually or adjust scope when sources, models, quotas, or business needs change. In the dataset builder documentation, this step includes the user-visible confirmation, the expected state change, and the reason the step matters to the buyer's evaluation checklist.
  6. Check configuration before judging the result. For Dataset builder, configuration includes dataset, selected folders, connector labels, and external folder IDs, plus the category-level controls listed in the page. A useful evaluation names which settings were chosen, which were inherited from a Core, plan, connector, dataset, or workspace, and which settings are intentionally not part of this feature.
  7. Inspect proof before moving to the next page. The best proof surface for this pass is scan usage details for processed, skipped, failed, stale, and deferred-over-quota items. If that surface is absent, the demo stops and explains why, because buyer confidence depends on seeing the evidence trail rather than hearing that it exists somewhere else.
  8. Close the workflow by comparing the result with Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling. That comparison helps the evaluator understand whether dataset builder is the entry point, the supporting control, the repair path, or the trust signal inside the larger knowledge sources story.

Proof, configuration, and buyer concerns.

Proof to inspect

  • Primary proof surface: Open the dataset builder path in the Satinash client, perform the normal user action for the Knowledge Sources workflow, and verify the visible state, evidence, limits, or artifact output that confirms the capability completed its job. The evaluator sees the user action and the confirmation in the same flow, then identifies the exact state, table row, message, preview, control, citation, diagnostic, or output that proves dataset builder worked.
  • Category proof: Show selected paths with connector labels, full paths, external folder IDs, and folder-only browse rows. Tie this proof to Dataset builder by naming the source object, status, or control that changed. A buyer does not have to infer whether the feature is active; the surface makes the active state legible.
  • Evidence trail: scan usage details for processed, skipped, failed, stale, and deferred-over-quota items. This is the surface to pause on during a demo because it shows how Satinash keeps the workflow inspectable after the initial click, message, upload, scan, connection, plan check, or widget preview.
  • Secondary evidence: document inventory rows created by the dataset and their serving readiness. This gives reviewers a second way to validate the same claim, which is useful when the buyer cares about support handoff, source governance, billing transparency, reliability, or daily user adoption.
  • Evaluation checklist: Run an eligible manual sync and inspect scan counts, failures, and document readiness. For dataset builder, record the expected result, the state that changed, and the related feature that would be tested next. That turns the page into a reusable checklist rather than a prose-only description.
  • Table-friendly facts: Dataset builder; slug dataset-builder; category Knowledge Sources; fit primary; route /features/dataset-builder/; works with datasets, folders, My Files, websites, connectors, scans, and document readiness; primary users workspace owners, operations teams, support leads, and knowledge managers who decide which private material can ground answers; related features Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling. These facts are intentionally compact so comparison tables and sales notes can reuse them without rewriting the page.
  • Buyer proof question: if a skeptical reviewer asks where dataset builder appears, what it depends on, and how to know it worked, the answer points to dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels, dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state, and the visible proof surfaces above.

Configuration notes

  • Configuration model: Dataset builder appears in the Knowledge Sources client experience through visible controls, status labels, evidence panels, and adjacent workflows that evaluators can inspect without relying on behind-the-scenes implementation details. In practical terms, Dataset builder is shaped by Connector roots, selectable folders, My Files folders, web tags, crawl profiles, and dataset paths., plus the category objects dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state. User-facing choices are separated from inherited workspace, Core, connector, dataset, or plan state so evaluators know what can be changed during normal use.
  • Setup checklist: Enable/disable lifecycle, manual sync, automatic refresh cadence, and scan eligibility. Before a demo, confirm the prerequisites are present and visible. If the feature depends on a Core, dataset, connector, widget, plan, upload, or role, the docs identify how that dependency appears to the user and what message appears when it is missing or inactive.
  • Limits, plan context, and table facts: Plan document limits, deferred backlog, embedding model compatibility, and ingestion profile behavior. The buyer does not need internal limit enforcement details, but they do need to know which capacity, model, connector, upload, document, widget, or team boundary can affect dataset builder. Table-ready configuration facts: Route family: /datasets, /datasets/create, and /datasets/:datasetId/edit, Primary evidence: selected paths, connector labels, scan diagnostics, document inventory, and lifecycle status, Main dependencies: connectors, embedding model, ingestion profile, plan document limits, and Core scope, and Buyer signal: source eligibility is reviewable before the assistant uses the material.
  • Pitfall to avoid: expecting disabled datasets or missing embedding models to answer in chat. Second pitfall to avoid: continuing to crawl after quota overflow instead of respecting the visible backlog state. The evaluation record captures chosen configuration, visible state before and after the action, proof surface inspected, and related feature tested next so stakeholders can compare the feature across accounts without relying on memory.

Buyer concerns

Where does dataset builder show up for an end user? It appears around dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels. The answer points to the route, panel, table, drawer, composer control, preview, status chip, or action row that makes the capability visible in the product.

Can users browse into folders without needing search tricks? For Dataset builder, the answer is visible in the active scope, the category-specific source objects, and the first proof surface. The buyer understands whether the feature uses approved knowledge, selected tools, a Core setting, a connector state, a plan allowance, or a public widget boundary.

Can a buyer prove exactly which sources are eligible for answers? That concern becomes a concrete evaluation check: Connect the dataset to a Core and verify chat answers can be narrowed back to that dataset. The buyer needs a visible pass or fail condition, not a vague assurance that the product can handle it.

What happens when scans hit quotas or a dataset is disabled? If the concern appears during a live demo, pause on the pitfall called out above, then show the status or configuration that resolves it. That pattern teaches evaluators how to self-serve the next time they see the same behavior.

How does a buyer compare this with related features? Start with Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling. If Dataset builder is the control, the related pages usually show the source setup, the output, the repair path, or the trust evidence that surrounds it.

What gets documented after evaluation? Capture the user role, the exact workflow, the dependency objects, the configuration choices, the proof surfaces inspected, the pitfalls observed, and the next related feature to validate. That makes dataset builder useful as long-form documentation rather than a short marketing blurb.

Evaluation tables.

These tables turn the documentation into something a buyer, sales engineer, or implementation lead can inspect during a live walkthrough.

Evaluation checklist

CheckWhat to inspectWhy it matters
Start with a real taskRun an eligible manual sync and inspect scan counts, failures, and document readiness. The task uses a realistic customer question and the same source, tool, plan, role, or widget context the buyer expects in production.This proves Dataset builder in the context where it will actually be used, rather than as an isolated demo click.
Confirm visible scopeInspect dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels and identify the active objects: dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state.The buyer can see what is eligible, what is excluded, and which setting explains the result.
Inspect proofPause on scan usage details for processed, skipped, failed, stale, and deferred-over-quota items and document inventory rows created by the dataset and their serving readiness; record the state before and after the user action.The feature is accepted on product evidence, not on a verbal promise.
Compare adjacent featuresContinue into Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling after the first pass.The buyer sees how Dataset builder fits into the rest of the knowledge sources workflow and which capability answers the next concern.

Proof matrix

EvidenceProduct proofBuyer value
Visible proofOpen the dataset builder path in the Satinash client, perform the normal user action for the Knowledge Sources workflow, and verify the visible state, evidence, limits, or artifact output that confirms the capability completed its job.Shows the exact client evidence a buyer can inspect during the feature walkthrough.
Category proofShow selected paths with connector labels, full paths, external folder IDs, and folder-only browse rows.Connects Dataset builder to the broader Knowledge Sources evaluation story.
Failure or limit proofPitfall to avoid: expecting disabled datasets or missing embedding models to answer in chat.Makes confusing states understandable before they become objections.
Related proofRelated features: Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling.Gives the evaluator a next page when they need source setup, output review, repair, or governance evidence.

Configuration matrix

AreaControl or dependencyImpact
Primary configurationConnector roots, selectable folders, My Files folders, web tags, crawl profiles, and dataset paths.Explains the main control or inherited setting that shapes dataset builder.
PrerequisitesRequired or relevant objects: dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state.Keeps the demo honest about what must exist before the feature can prove value.
LimitsPlan document limits, deferred backlog, embedding model compatibility, and ingestion profile behavior.Connects blocked, unavailable, or over-limit behavior to visible product guidance.
Table factsRoute family: /datasets, /datasets/create, and /datasets/:datasetId/edit, Primary evidence: selected paths, connector labels, scan diagnostics, document inventory, and lifecycle status, Main dependencies: connectors, embedding model, ingestion profile, plan document limits, and Core scope, and Buyer signal: source eligibility is reviewable before the assistant uses the materialProvides compact comparison data for sales notes, buyer checklists, and category pages.

Workflow map.

Start with Dataset builder at dataset create and edit routes, connector browse surfaces, My Files, web crawler setup, selected-path review, scan actions, and source readiness panels.
Confirm scope through dataset, selected folders, connector labels, external folder IDs, web tags, My Files paths, and scan state.
Inspect scan usage details for processed, skipped, failed, stale, and deferred-over-quota items and document inventory rows created by the dataset and their serving readiness.
Continue into Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling for the adjacent buyer questions.
Capture the route, proof state, and configuration choices for the buyer handoff.

Best practices

  • Run an eligible manual sync and inspect scan counts, failures, and document readiness.
  • Connect the dataset to a Core and verify chat answers can be narrowed back to that dataset.
  • Record the route /features/dataset-builder/, proof surfaces, configuration state, and related features Multi-connector datasets, Web crawler knowledge source, My Files workspace drive, and Dataset quota handling.
  • Use the feature with the user audience executive evaluators, department leads, and first pilot teams so the evaluation reflects the intended rollout path.

Limits to discuss

  • expecting disabled datasets or missing embedding models to answer in chat
  • continuing to crawl after quota overflow instead of respecting the visible backlog state
  • primary documentation proves the happy path, the visible limits, and the recovery behavior because the capability shapes trust in the rest of the platform
  • Knowledge Source pages explain source eligibility, lifecycle, and readiness from the user's point of view; destructive repair or operator reset flows do not belong in this marketing content.

Terms buyers will hear.

TermDefinitionUse in evaluation
Feature route/features/dataset-builder/Canonical URL for the buyer-facing documentation page.
Feature fitprimary: a flagship proof point shown early in a buyer demo because it anchors the broader Satinash story.Explains whether the feature is a flagship, focused, supporting, or trust-oriented page.
Primary usersworkspace owners, operations teams, support leads, and knowledge managers who decide which private material can ground answersClarifies who must understand and validate the workflow.
Works withdatasets, folders, My Files, websites, connectors, scans, and document readinessLists the adjacent product areas that shape the feature in use.

See dataset builder in a live Satinash workflow.

Bring one source set and one customer question. The demo should prove the answer path, not just describe it.

Book a demo