VectorSearch 2.0.0¶
Overview¶
Available Versions: 2.0.0 (current) | 1.0.0
Description¶
Searches for items in a dataset using vector embeddings and returns the top results based on similarity.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| topn | int | Maximum number of search results to return. Controls result volume and helps manage performance for large datasets. | 10 |
| token_limit | int | Maximum total tokens allowed in response. When exceeded, results are truncated by string length, then list size, then dictionary keys until under limit. | 2000 |
| dataspace_ids | list[UUID] or Constant(value=None) | Specific dataspace IDs to search within. When empty or null, searches all dataspaces accessible in current workspace. | [] |
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| query | str | Search query text for vector similarity matching. Uses embedding-based similarity to find semantically related dataset items. |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| results | list[DataSetItem] | List of dataset items ranked by similarity score, interleaved across multiple datasets, limited by topn and token_limit configurations. |
Version History¶
- 2.0.0 (Current) - Native implementation
- 1.0.0 - Native implementation
Examples¶
# Search across all accessible dataspaces
- block_type: VectorSearch_2_0_0
name: semantic_search
config:
topn: 5
token_limit: 1500
inputs:
query: "machine learning model accuracy metrics"
# Returns top 5 most relevant items about ML accuracy
Error Handling¶
WorkspaceNotAvailableError
- Error Code
Exception- Common Cause
- VectorSearch block used outside of workspace context where dataspaces are not available
- Solution
- Ensure the block is used within a workspace that has configured dataspaces. Verify workspace setup and dataspace access permissions.
DataspaceAccessError
- Error Code
PermissionError- Common Cause
- Specified dataspace_ids are not accessible in current workspace or don't exist
- Solution
- Verify dataspace IDs exist and current workspace has read access. Remove invalid IDs or grant appropriate permissions.
EmbeddingConfigError
- Error Code
ConfigurationError- Common Cause
- Embedding service configuration unavailable or invalid for tokenization and similarity search
- Solution
- Check embeddings configuration in workspace settings. Ensure embedding service is properly configured and accessible.
FAQ¶
How does the token limit affect search results?
When results exceed token_limit, the block truncates content intelligently: first shortening strings, then reducing list items, then dictionary keys. This ensures responses fit within downstream processing limits while preserving the most relevant information.
How are results ranked across multiple datasets?
The block searches each dataset separately, then interleaves results to ensure fair representation. The top result from dataset A, then dataset B, then back to A's second result, etc. This prevents one large dataset from dominating results.
What happens when no dataspaces are accessible?
The search returns an empty list. Ensure your workspace has configured dataspaces and your user has appropriate read permissions. Check the workspace configuration if results are unexpectedly empty.
How do I optimize performance for large-scale searches?
Use lower topn values (3-10) for faster searches. Set appropriate token_limit based on downstream processing needs. Consider using dataspace_ids to limit scope when searching specific domains or projects.
Can I search across multiple workspaces simultaneously?
No, the block operates within the current workspace context only. To search across multiple workspaces, run separate VectorSearch blocks in each workspace and combine results in your workflow logic.