GetAllDocuments 1.0.0¶
Overview¶
Description¶
Retrieves all documents (DataSetItems) from every dataset within a specified dataspace. This block aggregates documents across multiple datasets by fetching the dataspace configuration, extracting all dataset IDs, and collecting all items from each dataset into a unified collection.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| dataspace_id | UUID | Unique identifier of the dataspace to retrieve documents from. The block will fetch all datasets within this dataspace and aggregate their documents. | None |
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| run | Any | Workflow run context object. Used internally for execution tracking and context management. Typically provided automatically by the workflow engine. |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| documents | list[DataSetItem] | Aggregated list of all DataSetItems from every dataset within the specified dataspace. Each DataSetItem contains properties, metadata (createdAt, modifiedAt), and indexText content. Items from all datasets are combined into a single flat list. |
Examples¶
# Retrieve all documents from a dataspace
- id: get_all_docs
uses: GetAllDocuments@1.0.0
with:
dataspace_id: "550e8400-e29b-41d4-a716-446655440000"
run: "Workflow run context"
outputs:
documents: all_documents
Error Handling¶
DataspaceNotFoundError
- Error Code
dataspace_not_found- Common Cause
- Specified dataspace_id does not exist or is not accessible through the configuration API
- Solution
- Verify the dataspace_id exists and is accessible, check API connectivity and authentication
DatasetAccessError
- Error Code
dataset_retrieval_failed- Common Cause
- One or more datasets within the dataspace cannot be accessed or queried
- Solution
- Check dataset permissions, verify dataset service connectivity, ensure datasets are properly initialized
ConfigAPIError
- Error Code
config_api_connection_failed- Common Cause
- Cannot connect to configuration API service to fetch dataspace information
- Solution
- Verify configuration API service availability, check network connectivity and authentication credentials
FAQ¶
How is this different from GetDataSet block?
GetAllDocuments retrieves ALL documents from ALL datasets within a dataspace, while GetDataSet retrieves specific items from a single dataset with filtering and pagination. Use GetAllDocuments for bulk operations across an entire dataspace, and GetDataSet for targeted queries on specific datasets.
What happens if a dataspace has many large datasets?
The block will retrieve all items from every dataset, which could result in very large result sets. For large dataspaces, consider memory and performance implications. You may want to implement pagination or use GetDataSet with filtering instead for better performance control.
Can I filter the documents returned by this block?
This block does not support filtering - it returns ALL documents from the dataspace. To filter documents, use the results with a Filter block afterward, or use GetDataSet blocks on individual datasets with specific filter criteria.
How are documents ordered in the results?
Documents are returned in the order they are retrieved from each dataset, with all items from the first dataset, then all items from the second dataset, etc. The order within each dataset depends on the dataset service implementation. For specific ordering, process results with a Sort block.
What if some datasets in the dataspace are empty?
Empty datasets simply contribute no items to the final result. The block will continue processing all datasets and return documents from those that contain data. The aggregation handles empty datasets gracefully without errors.