GetDataSet 1.0.0¶
Overview¶
Description¶
Retrieves dataset items with advanced SQL-style filtering and sorting capabilities. This block provides comprehensive data retrieval features including: • SQL-style property filtering with full operator support (=, !=, >, >=, <, <=, IN) • Complex logical expressions with AND/OR operators and parentheses • Multi-datatype support (strings, numbers, booleans, dates, null values) • Flexible pagination for large result sets • SQL-style sorting with multiple columns and directions (ASC/DESC) The block converts SQL-style filter and sort expressions directly to Cosmos DB queries for optimal performance.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| filter | str or Constant(value=None) | SQL-style filter expression supporting operators (=, !=, >, >=, <, <=, IN), logical operators (AND, OR), and parentheses. Filters by properties like Name, Status, Size, createdAt, etc. | None |
| sort | str or Constant(value=None) | SQL-style sort expression for ordering results. Format: 'Column ASC/DESC' or 'Column1 ASC, Column2 DESC' for multi-column sorting. | None |
| skip | int | Number of items to skip from the beginning of results for pagination. Use with take to implement paging. | 0 |
| take | int | Maximum number of items to return in results. Combined with skip for pagination control. | 50 |
| dataset_id | UUID or Constant(value=None) | Specific dataset ID to retrieve from. If None, attempts to use the first available dataset in the workspace. | None |
Inputs¶
No inputs available.
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| results | list[DataSetItem] | Ordered list of matching dataset items from the specified dataset. Each DataSetItem contains properties, metadata (createdAt, modifiedAt), and indexText content. Returns empty list if no dataset_id provided or no matches found. |
Examples¶
# Basic dataset retrieval
- id: get_dataset_basic
uses: GetDataSet@1.0.0
with:
dataset_id: "550e8400-e29b-41d4-a716-446655440000"
take: 25
outputs:
results: customer_records
Error Handling¶
WorkspaceError
- Error Code
workspace_required- Common Cause
- Block is used outside of a workspace context or workspace is null
- Solution
- Ensure the block is executed within a properly initialized workspace environment
DatasetNotFoundError
- Error Code
dataset_not_accessible- Common Cause
- Specified dataset_id does not exist in the workspace or is not accessible
- Solution
- Verify the dataset_id exists in the current workspace's dataspaces. Returns empty list if invalid
FilterParsingError
- Error Code
invalid_filter_syntax- Common Cause
- Malformed SQL-style filter expression with invalid syntax or operators
- Solution
- Check filter syntax: proper quotes for strings, valid operators (=, !=, >, >=, <, <=, IN), correct logical operators (AND, OR)
FAQ¶
What's the difference between skip/take and traditional pagination?
Skip/take provides offset-based pagination where skip=20 and take=10 retrieves items 21-30. This is efficient for sequential page navigation but can be expensive for large offsets. For high-volume datasets, consider implementing cursor-based pagination using sort by unique fields like createdAt.
How do I filter by multiple properties with complex logic?
Use parentheses to group conditions: `(Status = 'Published' OR Status = 'Review') AND Size > 1000 AND createdAt >= '2024-01-01'`. The parser supports nested logical expressions with AND/OR operators and proper precedence handling.
What properties are available for filtering and sorting?
You can filter/sort by: (1) Built-in metadata fields: createdAt, modifiedAt, indexText (2) Custom dataset properties defined in your dataset schema (3) System properties like Size, Name, Type depending on your dataset structure. Property names are case-sensitive.
How can I optimize performance for large datasets?
Best practices: (1) Use specific filters to reduce result sets (2) Implement reasonable take limits (50-500 items) (3) Sort by indexed fields when possible (4) Avoid deep pagination with large skip values (5) Use dataset_id to target specific datasets rather than scanning all available datasets.
Can I use wildcards or pattern matching in filters?
The current filter implementation supports exact matches and comparison operators. For text search, use the indexText field which contains searchable content. For pattern matching, consider using a separate search block or implement client-side filtering on the returned results.