GetDataSet 1.0.0¶

Overview¶

Data Intermediate

Description¶

Retrieves dataset items with advanced SQL-style filtering and sorting capabilities. This block provides comprehensive data retrieval features including: • SQL-style property filtering with full operator support (=, !=, >, >=, <, <=, IN) • Complex logical expressions with AND/OR operators and parentheses • Multi-datatype support (strings, numbers, booleans, dates, null values) • Flexible pagination for large result sets • SQL-style sorting with multiple columns and directions (ASC/DESC) The block converts SQL-style filter and sort expressions directly to Cosmos DB queries for optimal performance.

Configuration Options¶

Name	Data Type	Description	Default Value
filter	`str or Constant(value=None)`	SQL-style filter expression supporting operators (=, !=, >, >=, <, <=, IN), logical operators (AND, OR), and parentheses. Filters by properties like Name, Status, Size, createdAt, etc.	`None`
sort	`str or Constant(value=None)`	SQL-style sort expression for ordering results. Format: 'Column ASC/DESC' or 'Column1 ASC, Column2 DESC' for multi-column sorting.	`None`
skip	`int`	Number of items to skip from the beginning of results for pagination. Use with take to implement paging.	`0`
take	`int`	Maximum number of items to return in results. Combined with skip for pagination control.	`50`
dataset_id	`UUID or Constant(value=None)`	Specific dataset ID to retrieve from. If None, attempts to use the first available dataset in the workspace.	`None`

Inputs¶

No inputs available.

Outputs¶

Name	Data Type	Description
results	`list[DataSetItem]`	Ordered list of matching dataset items from the specified dataset. Each DataSetItem contains properties, metadata (createdAt, modifiedAt), and indexText content. Returns empty list if no dataset_id provided or no matches found.

Examples¶

# Basic dataset retrieval
- id: get_dataset_basic
  uses: GetDataSet@1.0.0
  with:
    dataset_id: "550e8400-e29b-41d4-a716-446655440000"
    take: 25
  outputs:
    results: customer_records

    

# Advanced filtering and sorting
- id: get_filtered_documents
  uses: GetDataSet@1.0.0
  with:
    dataset_id: "550e8400-e29b-41d4-a716-446655440000"
    filter: "(Status = 'Published' OR Status = 'Review') AND Size > 1024 AND createdAt >= '2024-01-01'"
    sort: "createdAt DESC, Name ASC"
    take: 50
  outputs:
    results: filtered_documents

# Complex filter with IN operator
- id: get_media_files
  uses: GetDataSet@1.0.0
  with:
    dataset_id: "550e8400-e29b-41d4-a716-446655440000"
    filter: "Type IN ['Image', 'Video', 'Audio'] AND IsPublic = true"
    sort: "Size DESC"
    take: 100
  outputs:
    results: media_files

    

# Paginated data retrieval - Page 1
- id: get_page_1
  uses: GetDataSet@1.0.0
  with:
    dataset_id: "550e8400-e29b-41d4-a716-446655440000"
    filter: "Status = 'Active'"
    sort: "createdAt DESC"
    skip: 0
    take: 20
  outputs:
    results: page_1_results

# Paginated data retrieval - Page 2
- id: get_page_2
  uses: GetDataSet@1.0.0
  with:
    dataset_id: "550e8400-e29b-41d4-a716-446655440000"
    filter: "Status = 'Active'"
    sort: "createdAt DESC"
    skip: 20
    take: 20
  outputs:
    results: page_2_results

    

Error Handling¶

WorkspaceError

Error Code: workspace_required
Common Cause: Block is used outside of a workspace context or workspace is null
Solution: Ensure the block is executed within a properly initialized workspace environment

DatasetNotFoundError

Error Code: dataset_not_accessible
Common Cause: Specified dataset_id does not exist in the workspace or is not accessible
Solution: Verify the dataset_id exists in the current workspace's dataspaces. Returns empty list if invalid

FilterParsingError

Error Code: invalid_filter_syntax
Common Cause: Malformed SQL-style filter expression with invalid syntax or operators
Solution: Check filter syntax: proper quotes for strings, valid operators (=, !=, >, >=, <, <=, IN), correct logical operators (AND, OR)

FAQ¶

What's the difference between skip/take and traditional pagination?

Skip/take provides offset-based pagination where skip=20 and take=10 retrieves items 21-30. This is efficient for sequential page navigation but can be expensive for large offsets. For high-volume datasets, consider implementing cursor-based pagination using sort by unique fields like createdAt.

How do I filter by multiple properties with complex logic?

Use parentheses to group conditions: `(Status = 'Published' OR Status = 'Review') AND Size > 1000 AND createdAt >= '2024-01-01'`. The parser supports nested logical expressions with AND/OR operators and proper precedence handling.

What properties are available for filtering and sorting?

You can filter/sort by: (1) Built-in metadata fields: createdAt, modifiedAt, indexText (2) Custom dataset properties defined in your dataset schema (3) System properties like Size, Name, Type depending on your dataset structure. Property names are case-sensitive.

How can I optimize performance for large datasets?

Best practices: (1) Use specific filters to reduce result sets (2) Implement reasonable take limits (50-500 items) (3) Sort by indexed fields when possible (4) Avoid deep pagination with large skip values (5) Use dataset_id to target specific datasets rather than scanning all available datasets.

Can I use wildcards or pattern matching in filters?

The current filter implementation supports exact matches and comparison operators. For text search, use the indexText field which contains searchable content. For pattern matching, consider using a separate search block or implement client-side filtering on the returned results.