VectorStore 1.0.0¶

Overview¶

Function Advanced

Available Versions: 1.0.0 (current) | 1.0.0 (current)

Description¶

Combined block that performs semantic chunking of content and stores embeddings in an in-memory vector database for generic content.

Semantic Chunking: Splits a document into Chunks, grouping semantically related sentences together.

Store and Search: 1: Updates the index with embeddings for each chunk. 2: Provides search functionality based on cosine similarity of embeddings.

Configuration Options¶

Name	Data Type	Description	Default Value
buffer_size	`int`	Number of sentences grouped together during semantic chunking. Higher values create larger semantic groups but may reduce granularity.	`1`
breakpoint_percentile_threshold	`int`	Percentile threshold (0-100) for determining semantic breakpoints. Higher values require stronger semantic changes to create chunk boundaries.	`95`
top_k	`int`	Maximum number of most relevant chunks to return from semantic search queries. Controls search result volume.	`5`

Inputs¶

Name	Data Type	Description
data	`Any`	Content to store in vector database. Accepts single objects or arrays with 'name' and 'content' fields. Auto-generates names if missing.
query_name	`str`	Name identifier to retrieve specific stored content by exact match. Used with get operation.
query	`str`	Search query text for semantic similarity search across all stored chunks. Returns most relevant content matches.
run	`str`	Trigger parameter to explicitly request store information summary. Any string value initiates the info request.

Outputs¶

Name	Data Type	Description
store_info	`StoreInfoResponse`	Summary of stored content including recently uploaded items and previously stored items with names and content lengths.
get_content	`str`	Full content string for the specified name, or error message if name not found in store.
search_chunks	`List[str]`	List of content chunks ranked by semantic similarity to the search query, limited by top_k configuration.

Version History¶

1.0.0 (Current) - Native implementation
1.0.0 (Current) - Native implementation

Examples¶

# Store documents for semantic search
- block_type: VectorStore
  name: document_store
  config:
    top_k: 3
  inputs:
    data:
      - name: "company_policy"
        content: "Our company values innovation, collaboration, and customer success. We believe in creating products that make a meaningful impact while fostering an inclusive work environment."
      - name: "project_requirements" 
        content: "The new analytics dashboard must support real-time data visualization, user role management, and export functionality to CSV and PDF formats."

    

# Search stored content by similarity
- block_type: VectorStore
  name: search_engine
  config:
    top_k: 5
  inputs:
    query: "dashboard features and requirements"
  # Returns most relevant chunks about dashboard functionality

# Retrieve specific document by name
- block_type: VectorStore
  name: content_retriever
  inputs:
    query_name: "company_policy"
  # Returns: "Our company values innovation, collaboration..."

    

# Fine-tuned semantic chunking for technical docs
- block_type: VectorStore
  name: technical_knowledge_base
  config:
    buffer_size: 3
    breakpoint_percentile_threshold: 85
    top_k: 10
  inputs:
    data:
      - name: "api_documentation"
        content: "Authentication requires Bearer tokens in request headers. Rate limiting applies at 1000 requests per hour. All responses return JSON with status codes 200, 400, 401, or 500."
        shortDescription: "REST API authentication guide"
        sourceType: "technical_docs"
      - name: "troubleshooting_guide"
        content: "Common issues include network timeouts, invalid credentials, and quota exceeded errors. Check logs for detailed error messages and retry with exponential backoff."
        extraInfo: "Updated January 2024"

# Get store statistics
- block_type: VectorStore
  name: store_manager
  inputs:
    run: "get_status"
  # Returns: StoreInfoResponse with upload counts and content sizes

    

Error Handling¶

ChunkingError

Error Code: RuntimeError
Common Cause: Semantic chunking process fails due to invalid content or embedding service issues
Solution: Verify content is valid text and embedding service is accessible. Check content length and format.

EmbeddingServiceError

Error Code: ConnectionError
Common Cause: Embedding service unavailable or API limits exceeded during vector generation
Solution: Check embedding service configuration, API credentials, and rate limits. Implement retry logic with exponential backoff.

ContentNotFoundError

Error Code: KeyError
Common Cause: Requested content name does not exist in the vector store
Solution: Verify the exact name used when storing content. Use store_info to list available content names.

FAQ¶

How do I optimize semantic chunking for my content type?

Adjust buffer_size and breakpoint_percentile_threshold based on content structure. For technical documentation, use buffer_size: 3 and breakpoint_percentile_threshold: 85 for more granular chunks. For narrative content, use buffer_size: 1 and breakpoint_percentile_threshold: 95 to preserve semantic coherence.

What's the difference between semantic search and exact retrieval?

Semantic search uses the query input to find content chunks with similar meaning, even if wording differs. Exact retrieval uses query_name to fetch complete documents by their stored name. Use semantic search for discovery and exact retrieval for known documents.

How many documents can I store effectively?

The in-memory vector store handles thousands of documents efficiently. Performance depends on content size and available RAM. For large-scale deployments, consider chunking very long documents before storage and monitoring memory usage during embedding generation.

Can I update or delete stored content?

The current implementation doesn't support updates or deletions. Content persists for the workflow session. To modify content, restart the workflow or use conditional logic to skip re-processing unchanged documents.

How do I handle embedding service failures gracefully?

Implement error handling around the VectorStore block with fallback strategies. Monitor embedding service availability and implement retry logic. Consider batching large content uploads and validating content before processing to minimize failure points.