Skip to content

VectorSearch 2.0.0

Overview

Data Intermediate

Version Source

Available Versions: 2.0.0 (current) | 1.0.0

Description

Searches for items in a dataset using vector embeddings and returns the top results based on similarity.

Configuration Options

NameData TypeDescriptionDefault Value
topnintMaximum number of search results to return. Controls result volume and helps manage performance for large datasets.10
token_limitintMaximum total tokens allowed in response. When exceeded, results are truncated by string length, then list size, then dictionary keys until under limit.2000
dataspace_idslist[UUID] or Constant(value=None)Specific dataspace IDs to search within. When empty or null, searches all dataspaces accessible in current workspace.[]

Inputs

NameData TypeDescription
querystrSearch query text for vector similarity matching. Uses embedding-based similarity to find semantically related dataset items.

Outputs

NameData TypeDescription
resultslist[DataSetItem]List of dataset items ranked by similarity score, interleaved across multiple datasets, limited by topn and token_limit configurations.

Version History

  • 2.0.0 (Current) - Native implementation
  • 1.0.0 - Native implementation

Examples

# Search across all accessible dataspaces
- block_type: VectorSearch_2_0_0
  name: semantic_search
  config:
    topn: 5
    token_limit: 1500
  inputs:
    query: "machine learning model accuracy metrics"
  # Returns top 5 most relevant items about ML accuracy

Error Handling

WorkspaceNotAvailableError

Error Code
Exception
Common Cause
VectorSearch block used outside of workspace context where dataspaces are not available
Solution
Ensure the block is used within a workspace that has configured dataspaces. Verify workspace setup and dataspace access permissions.

DataspaceAccessError

Error Code
PermissionError
Common Cause
Specified dataspace_ids are not accessible in current workspace or don't exist
Solution
Verify dataspace IDs exist and current workspace has read access. Remove invalid IDs or grant appropriate permissions.

EmbeddingConfigError

Error Code
ConfigurationError
Common Cause
Embedding service configuration unavailable or invalid for tokenization and similarity search
Solution
Check embeddings configuration in workspace settings. Ensure embedding service is properly configured and accessible.

FAQ

How does the token limit affect search results?

When results exceed token_limit, the block truncates content intelligently: first shortening strings, then reducing list items, then dictionary keys. This ensures responses fit within downstream processing limits while preserving the most relevant information.

How are results ranked across multiple datasets?

The block searches each dataset separately, then interleaves results to ensure fair representation. The top result from dataset A, then dataset B, then back to A's second result, etc. This prevents one large dataset from dominating results.

What happens when no dataspaces are accessible?

The search returns an empty list. Ensure your workspace has configured dataspaces and your user has appropriate read permissions. Check the workspace configuration if results are unexpectedly empty.

How do I optimize performance for large-scale searches?

Use lower topn values (3-10) for faster searches. Set appropriate token_limit based on downstream processing needs. Consider using dataspace_ids to limit scope when searching specific domains or projects.

Can I search across multiple workspaces simultaneously?

No, the block operates within the current workspace context only. To search across multiple workspaces, run separate VectorSearch blocks in each workspace and combine results in your workflow logic.