Overview
PDF is one of the most versatile formats for training your AI. Upload reports, ebooks, scanned documents, and more.Supported Types
- Text PDFs: Native digital PDFs
- Scanned PDFs: OCR text extraction
- Forms: Extract form fields and values
- Multi-page: Unlimited pages
Quick Upload
1
Prepare PDF
Ensure PDF is not password-protected
2
Upload
Datastore → Add Datasource → File Upload
3
OCR
Scanned PDFs automatically processed with OCR
Features
Smart Text Extraction
- Maintains document structure
- Preserves tables and lists
- Extracts hyperlinks
- Reads headers and footers
OCR for Scanned Documents
- Converts images to searchable text
- Supports 50+ languages
- Processes handwriting (with limitations)
Best Practices
✅ High-Quality Scans: 300 DPI or higher ✅ Remove Passwords: Unlock PDFs before uploading ✅ Merge Related PDFs: Combine chapters/sections ✅ Bookmarks: Use for long documentsUse Cases
- 📄 Contracts & Legal Docs
- 📖 Ebooks & Guides
- 📊 Annual Reports
- 🎓 Research Papers
Troubleshooting
OCR Quality Issues:- Use high-resolution scans
- Ensure good lighting
- Avoid skewed pages
- Compress PDF (reduce image quality)
- Split into smaller files
Limits
| Plan | Max Size | OCR Pages |
|---|---|---|
| Free | 10 MB | 50 |
| Professional | 50 MB | Unlimited |

