RowChunk 1.0.0¶
Overview¶
Available Versions: 2.0.0 | 1.0.0 (current)
Description¶
Splits a table document into one chunk per row. The input can be a CSV, XLSX, or a markdown table. The output is a list of chunks, each containing a row of the table.
Configuration Options¶
No configuration options available.
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| file | File or str | Table data source. Supports File objects for CSV/XLSX/documents, or markdown table strings. Documents are converted to tables using document intelligence |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| chunks | list[Chunk] | Array of chunk objects, one per table row. Each chunk contains JSON-serialized row data with column names as keys and cell values as values |
Version History¶
- 2.0.0 - Native implementation
- 1.0.0 (Current) - Native implementation
Examples¶
blocks:
- name: process_customer_data
type: RowChunk
input:
file: "customer_records.csv"
# Input CSV structure:
# customer_id,name,email,signup_date,total_orders
# cust_001,John Doe,john@example.com,2024-01-15,5
# cust_002,Sarah Smith,sarah@example.com,2024-02-20,12
# cust_003,Mike Johnson,mike@example.com,2024-03-10,3
# Output: 3 chunks with JSON content like:
# Chunk 0: {"customer_id": "cust_001", "name": "John Doe", "email": "john@example.com", "signup_date": "2024-01-15", "total_orders": 5}
# Chunk 1: {"customer_id": "cust_002", "name": "Sarah Smith", "email": "sarah@example.com", "signup_date": "2024-02-20", "total_orders": 12}
# Chunk 2: {"customer_id": "cust_003", "name": "Mike Johnson", "email": "mike@example.com", "signup_date": "2024-03-10", "total_orders": 3}
Error Handling¶
Unsupported File Type
- Error Code
BlockError- Common Cause
- File extension is not supported for table processing
- Solution
- Use supported formats: CSV, XLSX, or document formats (PDF, DOCX, images) that can be converted to tables
Table Parse Error
- Error Code
ValueError- Common Cause
- Invalid table structure or corrupt file data that cannot be parsed
- Solution
- Ensure file contains valid table data with consistent column structure and readable format
Document Intelligence Error
- Error Code
ServiceError- Common Cause
- Document intelligence service fails to convert document to markdown table format
- Solution
- Verify document intelligence service is available and document contains recognizable table structures
FAQ¶
What file formats are supported for table processing?
Direct support for CSV and XLSX files. Document formats (PDF, DOCX, PPTX, images) are converted to markdown tables using document intelligence. Markdown table strings can also be processed directly.
How does the block handle mixed data types in columns?
The block automatically converts columns with mixed data types to strings to ensure consistent JSON serialization. This prevents data type conflicts while preserving the original values as readable text.
What happens with empty or null values in table cells?
Empty rows (where all cells are null/NaN) are skipped entirely. Individual null cells are preserved as null values in the JSON output, maintaining the original data structure.
How are Excel files with multiple sheets handled?
All sheets in the Excel file are processed sequentially. Each row from every sheet becomes a separate chunk, with the chunk index continuing across sheets rather than resetting.
Can this block handle large tables efficiently?
The block processes tables row-by-row using pandas iterrows(), which is memory-efficient for large datasets. However, very large files may benefit from pre-processing into smaller chunks before using this block.