RowChunk 2.0.0¶

Overview¶

Function Beginner

Available Versions: 2.0.0 (current) | 1.0.0

Description¶

(v2) Splits a table document into one JSON chunk per row, computes embeddings, and emits a single ChunkGroup containing all rows.

Configuration Options¶

No configuration options available.

Inputs¶

Name	Data Type	Description
data	`File or str`	Table data source. Supports File objects for CSV/XLSX/documents, or markdown table strings. Documents are converted using document intelligence with enhanced table extraction

Outputs¶

Name	Data Type	Description
chunks	`Chunks`	Structured Chunks object containing a ChunkGroup with all table rows as individual chunks. Includes computed embeddings and parent-child relationships

Version History¶

2.0.0 (Current) - Native implementation
1.0.0 - Native implementation

Examples¶

blocks:
  - name: process_inventory_table
    type: RowChunk_2_0_0
    input:
      data: "inventory_report.csv"

# Input CSV:
# product_id,name,category,stock,price,status
# INV001,Laptop Computer,Electronics,45,899.99,active
# INV002,Wireless Mouse,Electronics,120,29.99,active
# INV003,Office Chair,Furniture,23,159.99,active

# Output: Chunks object with ChunkGroup containing:
# - Parent metadata (file info, content summary)
# - 3 child chunks with JSON content and computed embeddings
# - Structured relationships for downstream processing

    

blocks:
  - name: analyze_sales_data
    type: RowChunk_2_0_0
    input:
      data: "sales_performance.xlsx"

# Excel processing with embedding computation:
# Each row becomes a searchable chunk with vector embeddings
# Enables semantic search across table data
# Maintains parent-child relationships for context
# Supports multiple sheets with unified chunk grouping

# Output structure:
# chunks:
#   parent:
#     id: "file_id_or_generated"
#     name: "sales_performance.xlsx"
#     content: "summary_or_file_id"
#   chunks:
#     - id: "generated_uuid"
#       content: '{"salesperson": "John Smith", "region": "North", "revenue": 125000, "quarter": "Q1"}'
#       embeddings: [computed_vector_array]
#       position: 0
#     - id: "generated_uuid"  
#       content: '{"salesperson": "Sarah Johnson", "region": "South", "revenue": 98500, "quarter": "Q1"}'
#       embeddings: [computed_vector_array]
#       position: 1

    

blocks:
  - name: extract_document_tables
    type: RowChunk_2_0_0
    input:
      data: "financial_report.pdf"

# Document intelligence converts PDF tables to markdown
# Enhanced table extraction handles multiple tables per document
# Each table row becomes a structured chunk
# Maintains document context through parent relationships

# Processing flow:
# 1. PDF → Document Intelligence → Markdown tables
# 2. Extract all tables using improved regex patterns
# 3. Convert each table to DataFrame with type coercion
# 4. Create parent object with document metadata
# 5. Generate chunks with embeddings for each row
# 6. Group all chunks under unified parent structure

# Supported document formats:
# - PDF, DOCX, PPTX (via document intelligence)
# - Images (JPEG, PNG, TIFF, etc.) with table detection
# - Direct CSV/XLSX processing with enhanced data handling

    

Error Handling¶

No Markdown Table Found

Error Code: BlockError
Common Cause: Document conversion or string input doesn't contain recognizable markdown table format
Solution: Ensure document contains table structures or markdown input follows proper table syntax with pipe separators

Unsupported File Type

Error Code: BlockError
Common Cause: File extension is not supported for table processing or document intelligence conversion
Solution: Use supported formats: CSV, XLSX, or document formats (PDF, DOCX, PPTX, images) convertible by document intelligence

Chunk Building Error

Error Code: ProcessingError
Common Cause: Failed to build chunks or ChunkGroup structure during processing
Solution: Check data integrity and ensure table rows contain valid data that can be JSON serialized

FAQ¶

What are the key improvements in RowChunk v2.0.0?

Version 2.0.0 adds structured Chunks output with ChunkGroup organization, automatic embedding computation, enhanced markdown table extraction, and improved parent-child relationships for better data lineage and searchability.

How does the ChunkGroup structure work?

All table rows are grouped under a single parent object containing file metadata. Each row becomes a child chunk with computed embeddings, maintaining relationships that enable context-aware processing and semantic search capabilities.

Does v2.0.0 handle multiple tables in documents better?

Yes, v2.0.0 includes enhanced table extraction methods that can identify and process multiple markdown tables within a single document, combining all rows into one cohesive ChunkGroup while preserving table boundaries.

What embedding capabilities are included?

Each chunk automatically receives computed embeddings based on its JSON content, enabling vector similarity search, semantic clustering, and AI-powered data analysis workflows that weren't available in v1.0.0.

Can I migrate from RowChunk v1.0.0 to v2.0.0?

Migration requires updating downstream blocks to handle the Chunks model instead of individual Chunk objects. The core table processing logic remains compatible, but output structure is now organized as ChunkGroups with embeddings.