Datastores - ZappWay - Connect custom data to large language models

Important: Datastores are the foundation of intelligent AI responses. By connecting relevant data sources to your AI Employees, you enable them to provide accurate, context-aware answers based on your organization’s specific information.

1. What is a Datastore?

Overview

A Datastore is a collection of data sources that provides knowledge to your AI Employees. Think of it as a knowledge base that your AI can reference when answering questions or performing tasks. Key Concepts:

Datastore: A container that groups related datasources together
Datasource: Individual pieces of content (documents, websites, text files, etc.)
Chunks: Small segments of processed data optimized for AI retrieval
Embeddings: Vector representations of your data used for semantic search

Why Use Datastores?

Benefits:

Accuracy: AI responses based on your actual data, not general knowledge
Context: Provide domain-specific information unique to your business
Control: Manage exactly what information the AI can access
Updates: Keep knowledge current by updating datasources
Organization: Group related information logically by department, topic, or purpose

Common Use Cases:

Product documentation and manuals
Company policies and procedures
FAQ databases
Technical specifications
Customer support knowledge bases
Training materials
Legal documents
Marketing content

2. Creating Your First Datastore

Quick Start

Steps:

Navigate to Datastores in the sidebar (or visit /datastores)
Click “Create Datastore” button
Fill in the creation form:
- Name (required): Descriptive name for the datastore
- Description (optional): Purpose and contents overview
Click “Create”
Your new datastore appears in the list

Naming Best Practices

Good Examples:

“Customer Support Documentation”
“Product Catalog 2025”
“HR Policies and Procedures”
“Technical API Reference”
“Sales Training Materials”

Avoid:

Generic names like “Datastore 1” or “Test”
Unclear abbreviations
Names without context

Datastore Settings

Configuration Options: After creation, you can configure:

Name: Update the datastore name
Description: Add or modify description
Visibility: Control who can access (if applicable)
Connected AI Employees: View which agents use this datastore

3. Adding Datasources

What is a Datasource?

A Datasource is an individual piece of content within a datastore. Each datastore can contain multiple datasources of different types.

How to Add Datasources

Location: Inside any datastore detail page Methods:

Method 1: File Upload

Click “Add Datasource” or “Upload Files”
Select “File Upload” option
Choose files from your computer:
- Single file upload
- Multiple files (batch upload)
- Drag and drop support
Files are uploaded and processed automatically

Supported File Types:

PDF documents
Word documents (.docx, .doc)
Text files (.txt)
Markdown files (.md)
CSV files
Excel spreadsheets (.xlsx, .xls)
PowerPoint presentations (.pptx, .ppt)

Method 2: Website/URL

Click “Add Datasource”
Select “Website” option
Enter the URL of the webpage
Configure crawling options:
- Single page: Import only the specified URL
- Crawl site: Follow links and import multiple pages
- Max depth: How many levels deep to crawl
- URL patterns: Include/exclude specific paths
Click “Add” to start import

URL Examples:

Documentation sites: https://docs.example.com
Blog posts: https://blog.example.com/article
Product pages: https://shop.example.com/products

Method 3: Text Input

Click “Add Datasource”
Select “Text” option
Paste or type content directly
Give it a descriptive name
Click “Save”

Use Cases:

Quick FAQ entries
Policy snippets
Short reference materials
Temporary information

Method 4: Integrations (Coming Soon)

Future support for:

Google Drive folders
Notion databases
Confluence spaces
GitHub repositories
SharePoint sites

Batch Operations

Upload Multiple Files:

Select multiple files in the upload dialog
All files are queued for processing
Track progress for each file individually
Failed uploads can be retried

Import Multiple URLs:

Enable “Bulk URL Import”
Paste multiple URLs (one per line)
Configure shared settings for all URLs
Start batch import

4. Supported Data Types

Documents

File Formats:

Type	Extensions	Notes
PDF	.pdf	Text extraction, OCR for scanned docs
Word	.docx, .doc	Full formatting preserved
Text	.txt, .md	Plain text and Markdown
Spreadsheets	.xlsx, .xls, .csv	Table data extracted
Presentations	.pptx, .ppt	Slide content and notes

Processing Features:

Text Extraction: Automatic extraction from all formats
OCR: Optical character recognition for image-based PDFs
Table Parsing: Structured data from spreadsheets
Metadata Extraction: Author, creation date, titles

Web Content

Supported:

Public websites (HTML pages)
Blog posts and articles
Documentation sites
Product pages
Knowledge base articles

Features:

Smart Crawling: Follow internal links automatically
Content Cleaning: Remove navigation, ads, footers
JavaScript Rendering: Support for dynamic content
Sitemap Support: Import via sitemap.xml

Limitations:

Cannot access password-protected sites
Rate limiting may apply for large crawls
Some dynamic content may not render perfectly

Structured Data

CSV/Excel Processing:

Each row can become a separate datasource
Column headers used for context
Numeric data preserved
Support for large datasets (up to 100,000 rows)

JSON Data:

Import structured JSON files
Nested objects supported
Array elements handled intelligently

Raw Text

Use Cases:

FAQ entries
Policy statements
Product descriptions
Quick reference materials

Formatting:

Markdown supported for rich formatting
Plain text for simple content
HTML can be pasted directly

5. Processing and Indexing

How Processing Works

When you add a datasource, ZappWay automatically:

Extracts: Pulls text content from files/URLs
Cleans: Removes irrelevant elements (ads, navigation)
Chunks: Splits content into optimal-sized segments
Embeds: Creates vector representations for semantic search
Indexes: Stores in a searchable database

Processing Status

Status Indicators:

Status	Meaning	What to Do
Uploading	File transfer in progress	Wait
Processing	Extracting and chunking content	Wait (may take 1-5 minutes)
Indexing	Creating embeddings	Wait
Ready	Available for AI use	Nothing, it works!
Failed	Processing error occurred	Retry or check file

Tracking Progress:

Real-time status updates on datasource cards
Progress percentage for large files
Estimated time remaining
Error messages for failed processing

Chunking Strategy

What is Chunking? Large documents are split into smaller pieces (chunks) to optimize AI retrieval. Each chunk is typically 500-1000 tokens. Why Chunking Matters:

Relevance: AI can find the exact section needed
Performance: Faster search and retrieval
Context: Each chunk maintains coherent context

Automatic Optimization: ZappWay automatically determines the best chunk size based on:

Document type (PDF, text, web page)
Content structure (headings, paragraphs)
Information density

Embeddings and Search

What are Embeddings? Embeddings are numerical representations of text that capture semantic meaning. They enable the AI to find relevant information even when exact keywords do not match. Example: Query: “How do I reset my password?”
Match: Chunk containing “Password recovery process”
(Even though “reset” is not in the chunk text) Search Types:

Semantic Search: Meaning-based matching (default)
Keyword Search: Exact term matching
Hybrid Search: Combination of both (best results)

6. Connecting to AI Employees

Why Connect Datastores?

Connecting a datastore to an AI Employee allows that agent to reference the datastore’s knowledge when responding to queries. Without a connection, the AI cannot access the data.

Connection Methods

Method 1: During AI Employee Creation

When creating a new AI Employee:

In the creation form, find “Knowledge” section
Click the knowledge selector
Search and select datastores
Multiple datastores can be connected
Save the AI Employee

Method 2: From AI Employee Settings

For existing AI Employees:

Navigate to AI Employees → Select employee
Go to Settings tab
Find “Knowledge” section
Click “Edit” or knowledge selector
Add/remove datastores
Click “Save”

Method 3: From Integration Setup

When setting up integrations (WhatsApp, Messenger, etc.):

During integration configuration
Select AI Employee
Knowledge selector appears automatically
Choose relevant datastores
Complete integration setup

Multiple Datastore Strategy

Best Practices: Scenario 1: Specialized AI Employees

Support AI: Connect only support-related datastores
Sales AI: Connect product catalogs and pricing
HR AI: Connect policies and employee handbooks

Scenario 2: Comprehensive Knowledge

Connect multiple related datastores to one AI
Example: Support AI with product docs + FAQs + troubleshooting guides

Performance Considerations:

More datastores = more data to search
Keep connections relevant to avoid confusion
Typical limit: 5-10 datastores per AI Employee

Viewing Connections

From Datastore Page: Each datastore card shows:

Connected AI Employees: Count of agents using this datastore
Click to view: List of specific AI Employees

From AI Employee Page: Settings tab displays:

Connected Datastores: Full list with names
Quick remove: Unlink datastores easily

7. Managing Datastores

Datastore List View

Main Page Display: Each datastore card shows:

Name: Datastore title
Description: Purpose overview
Datasource Count: Number of sources inside
Connected AI Employees: How many agents use it
Last Updated: Most recent modification
Actions: Edit, Delete, View Details

Viewing Datastore Contents

Detail Page: Click on any datastore to see:

Overview: Name, description, stats
Datasources List: All contained datasources
- File name/URL
- Type (PDF, website, text)
- Status (Ready, Processing, Failed)
- Size/Length
- Upload date
- Actions (View, Delete)
Activity Log: Recent changes and updates
Connected AI Employees: Full list

Editing Datastores

What Can Be Edited:

Name: Update datastore name
Description: Modify description
Add Datasources: Upload new files or URLs
Remove Datasources: Delete outdated sources
Reprocess: Trigger re-indexing if needed

Steps:

Click datastore name to open detail page
Click “Edit” button (top-right)
Modify fields as needed
Click “Save Changes”

Updating Content

When to Update:

Product information changes
New documentation released
Policy updates
Correcting outdated information

How to Update: Option 1: Replace Datasource

Delete old datasource
Upload new version
Wait for processing

Option 2: Add New Version

Upload new file with version number
Keep old version for reference (optional)
Remove old version later

Option 3: Reprocess Some changes to websites can be captured by:

Clicking “Reprocess” on a web datasource
Content is re-crawled and updated

Deleting Datastores

Warning: Deleting a datastore removes all contained datasources and disconnects it from all AI Employees. This action cannot be undone. Steps:

Navigate to datastore detail page
Click “Delete” button (usually in menu)
Confirm deletion in modal
Datastore and all datasources are permanently removed

Before Deleting:

Check which AI Employees are connected
Consider archiving instead if data might be needed later
Export important datasources if needed

Archiving (If Available)

Some plans support archiving:

Temporarily disable a datastore without deleting
Disconnects from AI Employees automatically
Can be restored later
Preserves all datasources and configuration

8. Best Practices

Organizing Datastores

Strategy 1: By Department Create separate datastores for each team:

“Customer Support Knowledge Base”
“Sales Resources”
“HR Policies”
“Engineering Documentation”

Benefits:

Clear ownership
Easy access control
Focused knowledge per team

Strategy 2: By Topic Organize by subject matter:

“Product Information”
“Technical Specifications”
“Company Policies”
“Training Materials”

Benefits:

Cross-functional access
Logical grouping
Easy to find information

Strategy 3: By AI Employee One datastore per AI Employee:

“Support Bot Knowledge”
“Sales Bot Resources”
“Onboarding Assistant Data”

Benefits:

Direct 1:1 mapping
Simplified management
Clear scope per agent

Content Quality

Document Preparation: Before uploading:

Clean Up: Remove irrelevant sections
Structure: Use clear headings and hierarchy
Format: Ensure text is selectable (not images)
Accuracy: Verify information is current
Completeness: Include all necessary context

Writing for AI:

Clear Language: Avoid ambiguous terms
Complete Sentences: Full thoughts, not fragments
Context: Include background information
Examples: Provide concrete examples where possible
Consistency: Use consistent terminology

Maintenance Schedule

Daily:

Monitor processing status for new uploads
Check for failed datasources

Weekly:

Review AI Employee response quality
Identify knowledge gaps
Add missing information

Monthly:

Audit all datastores for outdated content
Update changed information
Remove deprecated datasources
Check for duplicate content

Quarterly:

Major content review and refresh
Reorganize if needed
Archive unused datastores
Performance optimization

Performance Optimization

Keep Datastores Focused: ✅ Good:

“Customer Support - Product A”
“Customer Support - Product B”

❌ Avoid:

“Everything About Our Company”

Optimal Size:

Small Datastore: 10-50 datasources
Medium Datastore: 50-200 datasources
Large Datastore: 200-1000 datasources

Signs of Too Much Data:

Slow AI response times
Irrelevant information in responses
Difficulty finding specific information

Solutions:

Split into multiple focused datastores
Remove redundant content
Archive old versions

Security and Privacy

Sensitive Information: Best Practices:

Audit Before Upload: Review for confidential data
Redact: Remove personal information, credentials
Access Control: Use appropriate visibility settings
Regular Reviews: Check for exposed sensitive data

What to Avoid Uploading:

Personal identifiable information (PII)
Passwords or API keys
Financial records (unless encrypted)
Confidential business strategies
Private customer data

Compliance Considerations:

GDPR: Ensure lawful processing of personal data
HIPAA: Do not upload protected health information
PCI DSS: Never upload payment card data
Industry-specific: Follow your sector’s regulations

9. Troubleshooting

Common Issues

Datasource Failed to Process

Problem: Status shows “Failed” after upload Possible Causes:

Corrupted file
Unsupported format
File too large
Password-protected document
Network timeout

Solutions:

Check File:
- Open file on your computer
- Verify it is not corrupted
- Ensure file is not password-protected
File Size:
- Check file size (limit: typically 50MB per file)
- For large files, split into smaller parts
- Compress PDFs if possible
Format:
- Verify file extension matches actual format
- Try converting to a different format
- Use PDF for best compatibility
Retry:
- Click “Retry” button
- Or delete and re-upload

Website Crawl Failed

Problem: URL datasource shows error after crawling Possible Causes:

Website blocks crawlers
Authentication required
JavaScript-heavy site
Server timeout
Invalid URL

Solutions:

Check URL:
- Verify URL is accessible in browser
- Ensure no typos
- Use direct page URL, not redirect
Authentication:
- Public pages only (no login required)
- Contact site owner for crawler access
Alternative Methods:
- Copy content manually as text datasource
- Use PDF print instead
- Try different page from same site

AI Not Using Datastore Knowledge

Problem: AI responses do not reference datastore content Checks:

Connection:
- Verify datastore is connected to AI Employee
- Check connection from both sides
Processing Status:
- Ensure all datasources show “Ready”
- Wait if still “Processing”
Content Relevance:
- Confirm question relates to datastore content
- Try more specific queries
- Check if information actually exists in datasources
AI Configuration:
- Verify datastore tool is enabled
- Check AI Employee instructions do not prevent tool use

Testing: Ask direct questions about specific information you know is in the datastore: Good test: “What is the return policy?” (if return policy is in datastore)
Bad test: “Tell me everything you know” (too vague)

Duplicate or Conflicting Information

Problem: AI gives inconsistent answers Cause: Multiple datasources contain different versions of the same information Solutions:

Identify Duplicates:
- Review datasources for overlapping content
- Check upload dates to find old versions
Clean Up:
- Delete outdated datasources
- Keep only the most recent version
- Update descriptions to note version/date
Version Control:
- Include version numbers in datasource names
- Example: “Product Manual v2.0 (Jan 2025)”
- Remove old versions promptly

Slow Processing

Problem: Datasources stuck in “Processing” for extended time Expected Times:

Small text file: 10-30 seconds
PDF (10-50 pages): 1-3 minutes
Large PDF (100+ pages): 5-15 minutes
Website (single page): 30-60 seconds
Website (crawl 50 pages): 5-10 minutes

If Unusually Slow:

Wait: Large files legitimately take time
Refresh: Reload page to check updated status
Check Status: Look for error messages
Support: Contact support if stuck > 30 minutes

📊 Usage Limits

Limits by Plan

Typical Limits:

Plan	Datastores	Datasources per Datastore	Storage
Free	1	10	10 MB
Growth	5	50	100 MB
Pro	20	200	1 GB
Enterprise	100	1000	10 GB
Ultimate	Unlimited	Unlimited	1 TB

Note: Limits may vary. Check your plan details in Settings → Billing.

Checking Usage

Location: Datastores page header Information Displayed:

Current datastore count vs. limit
Total storage used vs. limit
Warning if approaching limits

Example Alert:

⚠️ Storage Limit Warning (9.2 GB / 10 GB)
You're approaching your storage limit. Consider upgrading or removing unused datasources.
[Upgrade Plan]

What Happens at Limit?

Datastores Limit Reached:

Cannot create new datastores
Can still add datasources to existing datastores
Upgrade prompt displayed

Storage Limit Reached:

Cannot upload new files
Can still add text datasources
Must delete datasources or upgrade

Datasource Limit Reached:

Cannot add more datasources to that datastore
Can create new datastores (if under datastore limit)
Can delete old datasources to free slots

🔗 Integration with AI Employees

Datastore Tool

When you connect a datastore to an AI Employee, the Datastore Tool is automatically enabled. This tool allows the AI to search and retrieve information from connected datastores.

How the AI Uses Datastores

Process:

User asks question → AI Employee receives message
AI determines if question requires datastore knowledge
Tool is called → Datastore tool searches connected datastores
Relevant chunks returned → Top matching content pieces
AI synthesizes → Combines search results with its knowledge
Response generated → Answer based on your data

Example Flow:

User: "What is the return policy?"
  ↓
AI: [Searches datastore for "return policy"]
  ↓
Datastore: [Returns relevant policy text]
  ↓
AI: "According to our policy, you can return items within 30 days..."

Customizing Datastore Behavior

In AI Employee Settings: You can configure:

Search Threshold: How closely content must match query
Max Results: How many chunks to return
Priority: Which datastores to search first
Fallback: What to do if no relevant content found

Advanced Prompting: Include instructions in the AI Employee’s system prompt: Example:

When answering questions, always check the datastore first. 
Only use your general knowledge if the datastore does not 
contain relevant information. Always cite the source document 
when referencing datastore content.

📞 Support & Resources

Getting Help

In-App Support:

Help button in dashboard
Live chat (available on Pro+ plans)
Documentation center

Common Resources:

Feedback

Report Issues:

Use feedback button in dashboard
Email: [email protected]
Include:
- Datastore ID
- Datasource name
- Error message (if any)
- Steps to reproduce

Feature Requests:

Submit via in-app feedback
Community forum
Feature voting board

✅ Quick Reference

Essential Actions

Task	Location	Action
Create Datastore	Datastores page	”Create Datastore”
Add File	Datastore detail	”Add Datasource” → “Upload”
Add URL	Datastore detail	”Add Datasource” → “Website”
Connect to AI	AI Employee settings	Knowledge selector
View Contents	Datastore card	Click name
Delete Datasource	Datasource card	Delete button
Reprocess	Datasource actions	”Reprocess”

Processing Status Guide

🔵 Uploading    → File transfer in progress
🟡 Processing   → Extracting content
🟠 Indexing     → Creating embeddings
🟢 Ready        → Available for use
🔴 Failed       → Error occurred

Best File Formats

Priority Order:

PDF - Best for documents
.docx - Good for text documents
.txt/.md - Simple text content
.csv - Structured data
URL - Web content

Last Updated: January 2025
Version: 1.0
Platform: ZappWay Datastores

Datastore

Train With

​🔢 Table of Contents

​1. What is a Datastore?

​Overview

​Why Use Datastores?

​2. Creating Your First Datastore

​Quick Start

​Naming Best Practices

​Datastore Settings

​3. Adding Datasources

​What is a Datasource?

​How to Add Datasources

​Method 1: File Upload

​Method 2: Website/URL

​Method 3: Text Input

​Method 4: Integrations (Coming Soon)

​Batch Operations

​4. Supported Data Types

​Documents

​Web Content

​Structured Data

​Raw Text

​5. Processing and Indexing

​How Processing Works

​Processing Status

​Chunking Strategy

​Embeddings and Search

​6. Connecting to AI Employees

​Why Connect Datastores?

​Connection Methods

​Method 1: During AI Employee Creation

​Method 2: From AI Employee Settings

​Method 3: From Integration Setup

​Multiple Datastore Strategy

​Viewing Connections

​7. Managing Datastores

​Datastore List View

​Viewing Datastore Contents

​Editing Datastores

​Updating Content

​Deleting Datastores

​Archiving (If Available)

​8. Best Practices

​Organizing Datastores

​Content Quality

​Maintenance Schedule

​Performance Optimization

​Security and Privacy

​9. Troubleshooting

​Common Issues

​Datasource Failed to Process

​Website Crawl Failed

​AI Not Using Datastore Knowledge

​Duplicate or Conflicting Information

​Slow Processing

​📊 Usage Limits

​Limits by Plan

​Checking Usage

​What Happens at Limit?

​🔗 Integration with AI Employees

​Datastore Tool

​How the AI Uses Datastores

​Customizing Datastore Behavior

​📞 Support & Resources

​Getting Help

​Feedback

​✅ Quick Reference

​Essential Actions

​Processing Status Guide

​Best File Formats

🔢 Table of Contents

1. What is a Datastore?

Overview

Why Use Datastores?

2. Creating Your First Datastore

Quick Start

Naming Best Practices

Datastore Settings

3. Adding Datasources

What is a Datasource?

How to Add Datasources

Method 1: File Upload

Method 2: Website/URL

Method 3: Text Input

Method 4: Integrations (Coming Soon)

Batch Operations

4. Supported Data Types

Documents

Web Content

Structured Data

Raw Text

5. Processing and Indexing

How Processing Works

Processing Status

Chunking Strategy

Embeddings and Search

6. Connecting to AI Employees

Why Connect Datastores?

Connection Methods

Method 1: During AI Employee Creation

Method 2: From AI Employee Settings

Method 3: From Integration Setup

Multiple Datastore Strategy

Viewing Connections

7. Managing Datastores

Datastore List View

Viewing Datastore Contents

Editing Datastores

Updating Content

Deleting Datastores

Archiving (If Available)

8. Best Practices

Organizing Datastores

Content Quality

Maintenance Schedule

Performance Optimization

Security and Privacy

9. Troubleshooting

Common Issues

Datasource Failed to Process

Website Crawl Failed

AI Not Using Datastore Knowledge

Duplicate or Conflicting Information

Slow Processing

📊 Usage Limits

Limits by Plan

Checking Usage

What Happens at Limit?

🔗 Integration with AI Employees

Datastore Tool

How the AI Uses Datastores

Customizing Datastore Behavior

📞 Support & Resources

Getting Help

Feedback

✅ Quick Reference

Essential Actions

Processing Status Guide

Best File Formats