> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zappway.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasources

> Individual pieces of content within your datastores. Upload documents, add websites, or create text entries that power your AI Employee knowledge base.

> **Note:**
> A **Datasource** is a single piece of content (file, URL, or text) that exists within a **Datastore**. Think of datastores as folders and datasources as the individual files inside them.

***

## 🔢 Table of Contents

1. [Understanding Datasources](#1-understanding-datasources)
2. [Types of Datasources](#2-types-of-datasources)
3. [Adding Datasources](#3-adding-datasources)
4. [Managing Datasources](#4-managing-datasources)
5. [Viewing and Editing](#5-viewing-and-editing)
6. [Best Practices](#6-best-practices)

***

## 1. Understanding Datasources

### What is a Datasource?

A **Datasource** is an individual unit of content that contributes to your knowledge base. Each datasource contains information that your AI Employees can reference when responding to queries.

**Relationship:**
Organization
└─ Datastore (e.g., "Customer Support Docs")
├─ Datasource 1 (e.g., "Product Manual.pdf")
├─ Datasource 2 (e.g., "FAQ Page")
└─ Datasource 3 (e.g., "Return Policy")

### Datasource vs. Datastore

| Datasource                      | Datastore                          |
| ------------------------------- | ---------------------------------- |
| Individual piece of content     | Collection of datasources          |
| Single file, URL, or text entry | Container for multiple datasources |
| Example: "Setup Guide.pdf"      | Example: "Product Documentation"   |

***

## 2. Types of Datasources

### File-Based Datasources

Upload documents from your computer:

**Supported Formats:**

* **PDF** (.pdf)
* **Word** (.docx, .doc)
* **Text** (.txt)
* **Markdown** (.md)
* **Spreadsheets** (.xlsx, .xls, .csv)
* **Presentations** (.pptx, .ppt)

**Characteristics:**

* Stored permanently in ZappWay
* Processed once upon upload
* Can be downloaded later
* Versioning through re-upload

### URL-Based Datasources

Import content from websites:

**Types:**

* Single web page
* Blog article
* Documentation page
* Product page
* Knowledge base article

**Characteristics:**

* Content snapshot taken at import time
* Can be refreshed/reprocessed
* Does not auto-update (manual refresh needed)
* Public pages only (no authentication)

### Text-Based Datasources

Manually entered content:

**Use Cases:**

* Quick FAQ answers
* Short policies
* Custom instructions
* Temporary information
* Snippets and notes

**Characteristics:**

* Fastest to create
* Easy to edit inline
* No file upload needed
* Ideal for short content

***

## 3. Adding Datasources

### From Datastore Detail Page

**Steps:**

1. Navigate to desired datastore
2. Click **"Add Datasource"** button
3. Choose datasource type:
   * File Upload
   * Website URL
   * Text Entry
4. Follow type-specific steps below

### Adding Files

**Single File Upload:**

1. Select **"File Upload"**
2. Click **"Choose File"** or drag-and-drop
3. Select file from your computer
4. File uploads automatically
5. Processing begins immediately

**Multiple File Upload:**

1. Select **"File Upload"**
2. Click **"Choose Files"** or drag multiple files
3. All files are queued
4. Each processes independently
5. Track individual progress

**Supported Operations:**

* **Drag and Drop**: Drag files directly onto upload area
* **Batch Upload**: Select multiple files at once
* **Progress Tracking**: Real-time upload and processing status

### Adding Websites

**Single URL:**

1. Select **"Website"**
2. Enter full URL (including `https://`)
3. Choose import mode:
   * **Single Page**: Import only this URL
   * **Crawl Site**: Follow links automatically
4. Configure options (if crawling):
   * Max pages to crawl
   * URL patterns to include/exclude
5. Click **"Add"**

**Crawl Options Explained:**

* **Max Depth**: How many clicks deep to follow links
  * Depth 0: Only the page you specified
  * Depth 1: Specified page + linked pages
  * Depth 2: Above + pages linked from those pages

* **URL Patterns**: Control which pages to import
  * Include: `/docs/*` (only documentation pages)
  * Exclude: `/blog/*` (skip blog posts)

### Adding Text

**Manual Entry:**

1. Select **"Text Entry"**
2. Enter a descriptive name
3. Type or paste content in text area
4. Optionally use Markdown formatting
5. Click **"Save"**

**Formatting Options:**

* **Plain Text**: No formatting
* **Markdown**: Headings, lists, bold, italic
* **HTML**: Rich formatting (advanced)

**Best Practices:**

* Keep text entries focused (single topic)
* Use clear headings for structure
* Include relevant keywords
* Provide complete context

***

## 4. Managing Datasources

### Datasource List View

**Information Displayed:**

Each datasource card shows:

* **Type Icon**: File, URL, or text indicator
* **Name**: Datasource title or filename
* **Status**: Processing state
* **Size/Length**: File size or character count
* **Date Added**: Upload/creation timestamp
* **Actions Menu**: View, edit, delete options

### Datasource Status

**Status Types:**

| Status         | Icon | Meaning                   |
| -------------- | ---- | ------------------------- |
| **Ready**      | 🟢   | Processed and available   |
| **Processing** | 🟡   | Currently being indexed   |
| **Failed**     | 🔴   | Error during processing   |
| **Uploading**  | 🔵   | File transfer in progress |

### Viewing Datasource Details

**Click on any datasource to see:**

* **Full Content Preview**: Read the processed text
* **Metadata**:
  * Original filename (if applicable)
  * Source URL (if applicable)
  * Upload date and time
  * File size
  * Character/word count
* **Processing Info**:
  * Number of chunks created
  * Processing time
  * Status history
* **Usage Stats**:
  * Number of times referenced by AI
  * Connected datastores
  * Last accessed timestamp

### Editing Datasources

**What Can Be Edited:**

* **Name**: Update the datasource name
* **Text Content**: Modify text-based datasources
* **URL**: Update website address (triggers reprocess)

File content (must re-upload)
Processing settings
Creation date

Steps to Edit:

Click datasource name or "Edit" button
Modify allowed fields
Click "Save Changes"
Reprocessing occurs if content changed

Reprocessing Datasources
When to Reprocess:

Website content has been updated
Need to refresh URL-based datasource
Processing failed initially
Want to apply new chunking strategy

How to Reprocess:

Open datasource details
Click "Reprocess" button
Confirm action
Wait for processing to complete

Note: Original content is fetched again for URL datasources.
Deleting Datasources
Warning: Deletion is permanent and cannot be undone. The datasource is removed from all connected AI Employees immediately.
Steps:

Locate datasource in list
Click "Delete" button (trash icon)
Confirm deletion in modal
Datasource is removed permanently

Bulk Delete:

Select multiple datasources (checkboxes)
Click "Delete Selected"
Confirm batch deletion
All selected datasources are removed

5. Viewing and Editing
   Content Preview
   Available for All Types:

Processed Text: View extracted content
Original Format: Download source file (if applicable)
Chunks View: See how content was segmented

Preview Features:

Search: Find keywords within datasource
Highlight: Keyword highlighting in content
Copy: Copy sections of text
Download: Download original file

Chunk Viewer
What Are Chunks?
Chunks are the segments your datasource is divided into for AI retrieval. Each chunk is optimized for semantic search.
Viewing Chunks:

Open datasource details
Click "View Chunks" tab
Browse all generated chunks
See chunk boundaries and overlap

Chunk Information:

Chunk ID (for debugging)
Content preview
Token count
Position in original document

Use Cases:

Verify content was chunked properly
Troubleshoot missing information
Understand how AI will see content

Metadata Viewer
Available Metadata:

File Properties:

Original filename
File type/extension
File size
Upload timestamp

Source Information:

URL (if applicable)
Domain
Crawl date
Page title

Processing Data:

Processing duration
Number of chunks
Embedding model used
Index date

6. Best Practices
   File Preparation
   Before Uploading:

Quality Check:

Ensure text is selectable (not images)
Check for OCR errors in scanned docs
Verify content is complete

Formatting:

Use clear headings
Include table of contents (for long docs)
Structure with logical sections

Naming:

Use descriptive filenames
Include version numbers
Add dates if time-sensitive

Example Good Names:

Product\_Manual\_v2.1\_Jan2025.pdf
Return\_Policy\_Updated.docx
API\_Reference\_Latest.pdf

URL Best Practices
Choosing URLs:
✅ Good URLs:

Direct documentation pages
Specific article/blog post URLs
Static content pages
Well-structured sites

❌ Avoid:

URLs requiring login
JavaScript-heavy applications
Frequently changing pages (news sites)
Dynamic search result pages

Crawling Tips:

Start with small crawls (test with 10-20 pages)
Use URL patterns to filter content
Avoid crawling entire large sites
Re-crawl periodically for updates

Content Organization
Strategy 1: By Document Type
Group similar datasources:

All PDFs together
All web pages together
All text entries together

Strategy 2: By Topic
Organize by subject:

Product A documentation
Product B documentation
General policies

Strategy 3: By Freshness
Prioritize by update frequency:

Frequently updated (refresh monthly)
Stable content (refresh yearly)
Archive (reference only)

Maintenance
Regular Tasks:
Weekly:

Check for failed datasources
Reprocess important URLs
Remove outdated content

Monthly:

Audit all datasources
Update version-controlled docs
Verify critical information is current

Quarterly:

Major content refresh
Delete unused datasources
Reorganize if needed

Version Control:

Include version numbers in names
Keep only current version (delete old)
Document changes in descriptions
Use consistent naming conventions

✅ Quick Reference
Adding Datasources
TypeBest ForSpeedFile UploadDocuments, manualsFast (1-5 min)Website URLWeb content, blogsMedium (2-10 min)Text EntryQuick facts, FAQsInstant
File Size Limits

Maximum per file: 50 MB (typical)
Recommended: Under 20 MB for faster processing
For large files: Split into multiple smaller files

Common Formats
Best Compatibility:

PDF (text-based)
.docx
.txt / .md
.csv / .xlsx

Last Updated: March 2026
Version: 1.0
Platform: ZappWay Datasources
