VectorStore 1.0.0¶
Overview¶
Available Versions: 1.0.0 (current) | 1.0.0 (current)
Description¶
Combined block that performs semantic chunking of content and stores embeddings in an in-memory vector database for generic content.
Semantic Chunking: Splits a document into Chunks, grouping semantically related sentences together.
Store and Search: 1: Updates the index with embeddings for each chunk. 2: Provides search functionality based on cosine similarity of embeddings.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| buffer_size | int | Number of sentences grouped together during semantic chunking. Higher values create larger semantic groups but may reduce granularity. | 1 |
| breakpoint_percentile_threshold | int | Percentile threshold (0-100) for determining semantic breakpoints. Higher values require stronger semantic changes to create chunk boundaries. | 95 |
| top_k | int | Maximum number of most relevant chunks to return from semantic search queries. Controls search result volume. | 5 |
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| data | Any | Content to store in vector database. Accepts single objects or arrays with 'name' and 'content' fields. Auto-generates names if missing. |
| query_name | str | Name identifier to retrieve specific stored content by exact match. Used with get operation. |
| query | str | Search query text for semantic similarity search across all stored chunks. Returns most relevant content matches. |
| run | str | Trigger parameter to explicitly request store information summary. Any string value initiates the info request. |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| store_info | StoreInfoResponse | Summary of stored content including recently uploaded items and previously stored items with names and content lengths. |
| get_content | str | Full content string for the specified name, or error message if name not found in store. |
| search_chunks | List[str] | List of content chunks ranked by semantic similarity to the search query, limited by top_k configuration. |
Version History¶
- 1.0.0 (Current) - Native implementation
- 1.0.0 (Current) - Native implementation
Examples¶
# Store documents for semantic search
- block_type: VectorStore
name: document_store
config:
top_k: 3
inputs:
data:
- name: "company_policy"
content: "Our company values innovation, collaboration, and customer success. We believe in creating products that make a meaningful impact while fostering an inclusive work environment."
- name: "project_requirements"
content: "The new analytics dashboard must support real-time data visualization, user role management, and export functionality to CSV and PDF formats."
Error Handling¶
ChunkingError
- Error Code
RuntimeError- Common Cause
- Semantic chunking process fails due to invalid content or embedding service issues
- Solution
- Verify content is valid text and embedding service is accessible. Check content length and format.
EmbeddingServiceError
- Error Code
ConnectionError- Common Cause
- Embedding service unavailable or API limits exceeded during vector generation
- Solution
- Check embedding service configuration, API credentials, and rate limits. Implement retry logic with exponential backoff.
ContentNotFoundError
- Error Code
KeyError- Common Cause
- Requested content name does not exist in the vector store
- Solution
- Verify the exact name used when storing content. Use store_info to list available content names.
FAQ¶
How do I optimize semantic chunking for my content type?
Adjust buffer_size and breakpoint_percentile_threshold based on content structure. For technical documentation, use buffer_size: 3 and breakpoint_percentile_threshold: 85 for more granular chunks. For narrative content, use buffer_size: 1 and breakpoint_percentile_threshold: 95 to preserve semantic coherence.
What's the difference between semantic search and exact retrieval?
Semantic search uses the query input to find content chunks with similar meaning, even if wording differs. Exact retrieval uses query_name to fetch complete documents by their stored name. Use semantic search for discovery and exact retrieval for known documents.
How many documents can I store effectively?
The in-memory vector store handles thousands of documents efficiently. Performance depends on content size and available RAM. For large-scale deployments, consider chunking very long documents before storage and monitoring memory usage during embedding generation.
Can I update or delete stored content?
The current implementation doesn't support updates or deletions. Content persists for the workflow session. To modify content, restart the workflow or use conditional logic to skip re-processing unchanged documents.
How do I handle embedding service failures gracefully?
Implement error handling around the VectorStore block with fallback strategies. Monitor embedding service availability and implement retry logic. Consider batching large content uploads and validating content before processing to minimize failure points.