DocumentCreator 1.0.0¶
Overview¶
Description¶
Converts a string (Markdown, HTML, or other supported formats) to a specified document format (Word, PDF, ODT, EPUB, etc.) using Pandoc and saves it to blob storage.Retrieves an existing file from blob storage, converts it to a specified document format (Word, PDF, ODT, EPUB, etc.) using Pandoc, and saves the converted file back to blob storage.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| output_file_type | FileType | Target document format for conversion. Supported formats: DOCX, PDF, ODT, EPUB, HTML, RTF, JSON, PPTX, MARKDOWN. PDF conversion uses XeLaTeX engine for advanced formatting. | FileType.DOCX |
| title | str | Output filename (without extension) for the converted document. Used as the base name when saving to blob storage. | Document_Title |
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| content | Union[str, File] | Input content for conversion. Can be a string containing Markdown, HTML, or other supported markup formats, OR an existing File object from blob storage (supports PDF, DOCX, XLSX, PPTX, EML, ICS, TXT, CSV, TSV, JSON, HTML formats). |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| result | File | Converted document file saved in blob storage. Contains file metadata (id, name, size, content_type) and can be used as input for other file processing blocks. |
Examples¶
# Convert Markdown text to Word document
- id: create_word_doc
uses: DocumentCreator@1.0.0
with:
title: "Sales Report Q4"
output_file_type: "docx"
content: |
# Q4 Sales Report
## Executive Summary
Sales increased by 23% compared to Q3, reaching $2.4M total revenue.
## Key Metrics
- New customers: 450
- Retention rate: 87%
- Average deal size: $5,300
## Next Steps
Focus on enterprise accounts and product expansion.
outputs:
result: sales_report_docx
Error Handling¶
PandocConversionError
- Error Code
pandoc_conversion_failed- Common Cause
- Pandoc cannot convert the input format to target format due to unsupported features or malformed input
- Solution
- Check input format compatibility, ensure content is valid markup, verify Pandoc supports the conversion path
BlobStorageError
- Error Code
blob_save_failed- Common Cause
- Failed to save converted file to blob storage due to storage limits or connection issues
- Solution
- Verify blob storage connectivity, check storage quotas, ensure proper blob service configuration
FileProcessingError
- Error Code
file_processing_failed- Common Cause
- Input file is corrupted, empty, or in unsupported format for extraction/conversion
- Solution
- Verify input file integrity, check file format support, ensure file is not password-protected or corrupted
FAQ¶
What input formats are supported for string content?
The block auto-detects format from content: HTML (if contains HTML tags), Markdown (if starts with #, *, -), ReStructuredText (if starts with ..), and defaults to Markdown for plain text. Supported formats include all Pandoc inputs: Markdown variants, HTML, RST, LaTeX, DocBook, MediaWiki, Textile, and more.
What file formats can be converted from existing files?
Supported input files: PDF, DOCX, XLSX, PPTX (via Document Intelligence), EML/ICS email files, TXT/CSV/TSV text files, JSON and HTML files. All are converted to Markdown internally before final format conversion.
Why use XeLaTeX for PDF conversion?
XeLaTeX engine provides superior Unicode support, advanced typography, and better handling of complex layouts compared to pdfTeX. This ensures high-quality PDF output with proper font rendering and international character support.
How do I handle large files or batch conversions?
The block processes one file at a time. For large files, ensure adequate blob storage space. For batch processing, use multiple DocumentCreator blocks in parallel or implement a loop pattern with the file collection. Monitor memory usage for very large documents.
Can I preserve formatting when converting between formats?
Formatting preservation depends on target format capabilities. DOCX to PDF preserves most formatting. HTML to DOCX maintains structure and basic styling. Some advanced formatting may be lost in cross-format conversions due to format limitations.