Skip to content

RowChunk 1.0.0

Overview

Function Beginner

Version Source

Available Versions: 2.0.0 | 1.0.0 (current)

Description

Splits a table document into one chunk per row. The input can be a CSV, XLSX, or a markdown table. The output is a list of chunks, each containing a row of the table.

Configuration Options

No configuration options available.

Inputs

NameData TypeDescription
fileFile or strTable data source. Supports File objects for CSV/XLSX/documents, or markdown table strings. Documents are converted to tables using document intelligence

Outputs

NameData TypeDescription
chunkslist[Chunk]Array of chunk objects, one per table row. Each chunk contains JSON-serialized row data with column names as keys and cell values as values

Version History

  • 2.0.0 - Native implementation
  • 1.0.0 (Current) - Native implementation

Examples

blocks:
  - name: process_customer_data
    type: RowChunk
    input:
      file: "customer_records.csv"

# Input CSV structure:
# customer_id,name,email,signup_date,total_orders
# cust_001,John Doe,john@example.com,2024-01-15,5
# cust_002,Sarah Smith,sarah@example.com,2024-02-20,12
# cust_003,Mike Johnson,mike@example.com,2024-03-10,3

# Output: 3 chunks with JSON content like:
# Chunk 0: {"customer_id": "cust_001", "name": "John Doe", "email": "john@example.com", "signup_date": "2024-01-15", "total_orders": 5}
# Chunk 1: {"customer_id": "cust_002", "name": "Sarah Smith", "email": "sarah@example.com", "signup_date": "2024-02-20", "total_orders": 12}
# Chunk 2: {"customer_id": "cust_003", "name": "Mike Johnson", "email": "mike@example.com", "signup_date": "2024-03-10", "total_orders": 3}

Error Handling

Unsupported File Type

Error Code
BlockError
Common Cause
File extension is not supported for table processing
Solution
Use supported formats: CSV, XLSX, or document formats (PDF, DOCX, images) that can be converted to tables

Table Parse Error

Error Code
ValueError
Common Cause
Invalid table structure or corrupt file data that cannot be parsed
Solution
Ensure file contains valid table data with consistent column structure and readable format

Document Intelligence Error

Error Code
ServiceError
Common Cause
Document intelligence service fails to convert document to markdown table format
Solution
Verify document intelligence service is available and document contains recognizable table structures

FAQ

What file formats are supported for table processing?

Direct support for CSV and XLSX files. Document formats (PDF, DOCX, PPTX, images) are converted to markdown tables using document intelligence. Markdown table strings can also be processed directly.

How does the block handle mixed data types in columns?

The block automatically converts columns with mixed data types to strings to ensure consistent JSON serialization. This prevents data type conflicts while preserving the original values as readable text.

What happens with empty or null values in table cells?

Empty rows (where all cells are null/NaN) are skipped entirely. Individual null cells are preserved as null values in the JSON output, maintaining the original data structure.

How are Excel files with multiple sheets handled?

All sheets in the Excel file are processed sequentially. Each row from every sheet becomes a separate chunk, with the chunk index continuing across sheets rather than resetting.

Can this block handle large tables efficiently?

The block processes tables row-by-row using pandas iterrows(), which is memory-efficient for large datasets. However, very large files may benefit from pre-processing into smaller chunks before using this block.