Skip to main content

Overview

PDF is one of the most versatile formats for training your AI. Upload reports, ebooks, scanned documents, and more.

Supported Types

  • Text PDFs: Native digital PDFs
  • Scanned PDFs: OCR text extraction
  • Forms: Extract form fields and values
  • Multi-page: Unlimited pages

Quick Upload

1

Prepare PDF

Ensure PDF is not password-protected
2

Upload

Datastore → Add Datasource → File Upload
3

OCR

Scanned PDFs automatically processed with OCR

Features

Smart Text Extraction

  • Maintains document structure
  • Preserves tables and lists
  • Extracts hyperlinks
  • Reads headers and footers

OCR for Scanned Documents

  • Converts images to searchable text
  • Supports 50+ languages
  • Processes handwriting (with limitations)

Best Practices

High-Quality Scans: 300 DPI or higher ✅ Remove Passwords: Unlock PDFs before uploading ✅ Merge Related PDFs: Combine chapters/sections ✅ Bookmarks: Use for long documents

Use Cases

  • 📄 Contracts & Legal Docs
  • 📖 Ebooks & Guides
  • 📊 Annual Reports
  • 🎓 Research Papers

Troubleshooting

OCR Quality Issues:
  • Use high-resolution scans
  • Ensure good lighting
  • Avoid skewed pages
Large File Sizes:
  • Compress PDF (reduce image quality)
  • Split into smaller files

Limits

PlanMax SizeOCR Pages
Free10 MB50
Professional50 MBUnlimited