RowChunk 1.0.0¶

Overview¶

Function Beginner

Available Versions: 2.0.0 | 1.0.0 (current)

Description¶

Splits a table document into one chunk per row. The input can be a CSV, XLSX, or a markdown table. The output is a list of chunks, each containing a row of the table.

Configuration Options¶

No configuration options available.

Inputs¶

Name	Data Type	Description
file	`File or str`	Table data source. Supports File objects for CSV/XLSX/documents, or markdown table strings. Documents are converted to tables using document intelligence

Outputs¶

Name	Data Type	Description
chunks	`list[Chunk]`	Array of chunk objects, one per table row. Each chunk contains JSON-serialized row data with column names as keys and cell values as values

Version History¶

2.0.0 - Native implementation
1.0.0 (Current) - Native implementation

Examples¶

blocks:
  - name: process_customer_data
    type: RowChunk
    input:
      file: "customer_records.csv"

# Input CSV structure:
# customer_id,name,email,signup_date,total_orders
# cust_001,John Doe,john@example.com,2024-01-15,5
# cust_002,Sarah Smith,sarah@example.com,2024-02-20,12
# cust_003,Mike Johnson,mike@example.com,2024-03-10,3

# Output: 3 chunks with JSON content like:
# Chunk 0: {"customer_id": "cust_001", "name": "John Doe", "email": "john@example.com", "signup_date": "2024-01-15", "total_orders": 5}
# Chunk 1: {"customer_id": "cust_002", "name": "Sarah Smith", "email": "sarah@example.com", "signup_date": "2024-02-20", "total_orders": 12}
# Chunk 2: {"customer_id": "cust_003", "name": "Mike Johnson", "email": "mike@example.com", "signup_date": "2024-03-10", "total_orders": 3}

    

blocks:
  - name: process_sales_report
    type: RowChunk
    input:
      file: "quarterly_sales.xlsx"

# Processes all sheets in Excel file
# Input Excel structure (Sheet1 - Product Sales):
# product_id | product_name      | category    | q1_sales | q2_sales | q3_sales
# prod_001   | Wireless Headphones| Electronics | 15000    | 18500    | 22000
# prod_002   | Phone Case        | Accessories | 8500     | 9200     | 7800
# prod_003   | Bluetooth Speaker | Electronics | 12000    | 14500    | 16800

# Output: One chunk per row across all sheets with JSON content:
# Chunk 0: {"product_id": "prod_001", "product_name": "Wireless Headphones", "category": "Electronics", "q1_sales": 15000, "q2_sales": 18500, "q3_sales": 22000}
# Chunk 1: {"product_id": "prod_002", "product_name": "Phone Case", "category": "Accessories", "q1_sales": 8500, "q2_sales": 9200, "q3_sales": 7800}

    

blocks:
  - name: parse_markdown_table
    type: RowChunk
    input:
      file: |
        | employee_id | name | department | salary |
        | emp_001 | Alice Brown | Engineering | 85000 |
        | emp_002 | Bob Wilson | Marketing | 65000 |
        | emp_003 | Carol Davis | Sales | 70000 |
        | emp_004 | Dave Miller | Engineering | 92000 |

# Parses markdown table format using regex pattern matching
# Output: 4 chunks with JSON content:
# Chunk 0: {"employee_id": "emp_001", "name": "Alice Brown", "department": "Engineering", "salary": "85000"}
# Chunk 1: {"employee_id": "emp_002", "name": "Bob Wilson", "department": "Marketing", "salary": "65000"}
# Chunk 2: {"employee_id": "emp_003", "name": "Carol Davis", "department": "Sales", "salary": "70000"}
# Chunk 3: {"employee_id": "emp_004", "name": "Dave Miller", "department": "Engineering", "salary": "92000"}

    

Error Handling¶

Unsupported File Type

Error Code: BlockError
Common Cause: File extension is not supported for table processing
Solution: Use supported formats: CSV, XLSX, or document formats (PDF, DOCX, images) that can be converted to tables

Table Parse Error

Error Code: ValueError
Common Cause: Invalid table structure or corrupt file data that cannot be parsed
Solution: Ensure file contains valid table data with consistent column structure and readable format

Document Intelligence Error

Error Code: ServiceError
Common Cause: Document intelligence service fails to convert document to markdown table format
Solution: Verify document intelligence service is available and document contains recognizable table structures

FAQ¶

What file formats are supported for table processing?

Direct support for CSV and XLSX files. Document formats (PDF, DOCX, PPTX, images) are converted to markdown tables using document intelligence. Markdown table strings can also be processed directly.

How does the block handle mixed data types in columns?

The block automatically converts columns with mixed data types to strings to ensure consistent JSON serialization. This prevents data type conflicts while preserving the original values as readable text.

What happens with empty or null values in table cells?

Empty rows (where all cells are null/NaN) are skipped entirely. Individual null cells are preserved as null values in the JSON output, maintaining the original data structure.

How are Excel files with multiple sheets handled?

All sheets in the Excel file are processed sequentially. Each row from every sheet becomes a separate chunk, with the chunk index continuing across sheets rather than resetting.

Can this block handle large tables efficiently?

The block processes tables row-by-row using pandas iterrows(), which is memory-efficient for large datasets. However, very large files may benefit from pre-processing into smaller chunks before using this block.