Skip to content

VectorStore 1.0.0

Overview

Function Advanced

Version Source

Available Versions: 1.0.0 (current) | 1.0.0 (current)

Description

Combined block that performs semantic chunking of content and stores embeddings in an in-memory vector database for generic content.

Semantic Chunking: Splits a document into Chunks, grouping semantically related sentences together.

Store and Search: 1: Updates the index with embeddings for each chunk. 2: Provides search functionality based on cosine similarity of embeddings.

Configuration Options

NameData TypeDescriptionDefault Value
buffer_sizeintNumber of sentences grouped together during semantic chunking. Higher values create larger semantic groups but may reduce granularity.1
breakpoint_percentile_thresholdintPercentile threshold (0-100) for determining semantic breakpoints. Higher values require stronger semantic changes to create chunk boundaries.95
top_kintMaximum number of most relevant chunks to return from semantic search queries. Controls search result volume.5

Inputs

NameData TypeDescription
dataAnyContent to store in vector database. Accepts single objects or arrays with 'name' and 'content' fields. Auto-generates names if missing.
query_namestrName identifier to retrieve specific stored content by exact match. Used with get operation.
querystrSearch query text for semantic similarity search across all stored chunks. Returns most relevant content matches.
runstrTrigger parameter to explicitly request store information summary. Any string value initiates the info request.

Outputs

NameData TypeDescription
store_infoStoreInfoResponseSummary of stored content including recently uploaded items and previously stored items with names and content lengths.
get_contentstrFull content string for the specified name, or error message if name not found in store.
search_chunksList[str]List of content chunks ranked by semantic similarity to the search query, limited by top_k configuration.

Version History

  • 1.0.0 (Current) - Native implementation
  • 1.0.0 (Current) - Native implementation

Examples

# Store documents for semantic search
- block_type: VectorStore
  name: document_store
  config:
    top_k: 3
  inputs:
    data:
      - name: "company_policy"
        content: "Our company values innovation, collaboration, and customer success. We believe in creating products that make a meaningful impact while fostering an inclusive work environment."
      - name: "project_requirements" 
        content: "The new analytics dashboard must support real-time data visualization, user role management, and export functionality to CSV and PDF formats."

Error Handling

ChunkingError

Error Code
RuntimeError
Common Cause
Semantic chunking process fails due to invalid content or embedding service issues
Solution
Verify content is valid text and embedding service is accessible. Check content length and format.

EmbeddingServiceError

Error Code
ConnectionError
Common Cause
Embedding service unavailable or API limits exceeded during vector generation
Solution
Check embedding service configuration, API credentials, and rate limits. Implement retry logic with exponential backoff.

ContentNotFoundError

Error Code
KeyError
Common Cause
Requested content name does not exist in the vector store
Solution
Verify the exact name used when storing content. Use store_info to list available content names.

FAQ

How do I optimize semantic chunking for my content type?

Adjust buffer_size and breakpoint_percentile_threshold based on content structure. For technical documentation, use buffer_size: 3 and breakpoint_percentile_threshold: 85 for more granular chunks. For narrative content, use buffer_size: 1 and breakpoint_percentile_threshold: 95 to preserve semantic coherence.

What's the difference between semantic search and exact retrieval?

Semantic search uses the query input to find content chunks with similar meaning, even if wording differs. Exact retrieval uses query_name to fetch complete documents by their stored name. Use semantic search for discovery and exact retrieval for known documents.

How many documents can I store effectively?

The in-memory vector store handles thousands of documents efficiently. Performance depends on content size and available RAM. For large-scale deployments, consider chunking very long documents before storage and monitoring memory usage during embedding generation.

Can I update or delete stored content?

The current implementation doesn't support updates or deletions. Content persists for the workflow session. To modify content, restart the workflow or use conditional logic to skip re-processing unchanged documents.

How do I handle embedding service failures gracefully?

Implement error handling around the VectorStore block with fallback strategies. Monitor embedding service availability and implement retry logic. Consider batching large content uploads and validating content before processing to minimize failure points.