LLM 1.0.0¶
Overview¶
v1.0.0 Native
Available Versions: 1.0.1 | 1.0.0 (current)
Description¶
Processes chat messages using an LLM model with function calling capabilities, featuring two steps: chat for message processing and handle_tool_result for managing tool execution results.
Configuration Options¶
| Name | Data Type | Description | Default Value |
|---|---|---|---|
| use_thread_history | bool | Whether to include previous conversation history in the context for maintaining conversational flow across multiple interactions | false |
Inputs¶
| Name | Data Type | Description |
|---|---|---|
| message | list[ContentItem] or ContentItem or str | Input message that can be a string, single content item (text/image), or list of content items for processing by the LLM |
Outputs¶
| Name | Data Type | Description |
|---|---|---|
| response | ResponseSchemaT | Output of the LLM, which will be a string if the response schema is a string, or a dictionary if the response schema is an object |
| sources | list[Source] | List of sources used in the response when citations are enabled through dataset items |
| reasoning | str | Reasoning content (chain of thought) from the LLM if available and reasoning_effort is configured |
Version History¶
- 1.0.1 - Native implementation
- 1.0.0 (Current) - Native implementation
Examples¶
# Simple text generation with GPT-4o
- name: generate_response
block: LLM_1_0_0
config:
use_thread_history: false
llm_config:
model: "gpt-4o"
api_key: "sk-proj-abc123..."
temperature: 0.7
max_tokens: 500
pre_prompt: "You are a helpful assistant that provides clear and concise answers."
input:
message: "Explain the benefits of cloud computing for small businesses."
Error Handling¶
LLM API Authentication Errors
Cause: Invalid or expired API key, incorrect model name, or insufficient API quota/permissions for the specified LLM provider.
Solution: Verify your API credentials and model access:
config:
llm_config:
model: "gpt-4o" # Ensure model name is correct
api_key: "sk-proj-abc123..." # Valid API key with sufficient quota
# Check your provider's documentation for correct model names
# OpenAI: gpt-4o, gpt-3.5-turbo
# Anthropic: claude-3.5-sonnet, claude-3-haiku
Token Limit Exceeded
Cause: The combination of input message, conversation history, and system prompts exceeds the model's context window limit.
Solution: Optimize token usage by adjusting configuration:
config:
use_thread_history: false # Disable history if not needed
llm_config:
max_tokens: 1024 # Set appropriate output limit
pre_prompt: "Brief response." # Keep system prompts concise
# For large inputs, consider chunking or summarizing content first
Tool Call Processing Failures
Cause: LLM generates invalid tool calls, tool execution fails, or tool response format is incompatible with expected schema.
Solution: Ensure proper tool configuration and error handling:
config:
tools:
my_tool:
description: "Clear, specific tool description" # Help LLM understand when to use
parameters:
required_param: "string" # Define clear parameter types
llm_config:
temperature: 0.3 # Lower temperature for more predictable tool calls
FAQ¶
What's the difference between LLM v1.0.0 and v1.0.1?
The key differences between versions:
- Base Class: v1.0.0 extends WorkSpaceBlock, while v1.0.1 extends Block with DATASET scope
- Configuration: v1.0.0 includes use_thread_history config option, v1.0.1 does not
- Scope: v1.0.1 is designed specifically for dataset operations
- Thread History: v1.0.0 allows configurable conversation history, v1.0.1 uses fixed behavior
- Use Case: Choose v1.0.0 for general chat applications, v1.0.1 for dataset-focused workflows
How do I configure different LLM providers?
The block supports multiple LLM providers through the llm_config parameter:
- OpenAI: Use models like "gpt-4o", "gpt-3.5-turbo" with API key format "sk-proj-..."
- Anthropic: Use models like "claude-3.5-sonnet", "claude-3-haiku" with API key format "sk-ant-api03-..."
- Parameters: All providers support temperature (0.0-2.0), max_tokens, and pre_prompt
- Reasoning: Set reasoning_effort to "low", "medium", or "high" for supported models (currently some OpenAI models)
Always check your provider's documentation for the latest model names and authentication requirements.
How does conversation history work with use_thread_history?
When use_thread_history is enabled in v1.0.0:
- Context Preservation: Previous messages and responses are included in subsequent calls
- Memory Management: The block automatically manages token limits by truncating old messages when needed
- Tool Interactions: Tool calls and responses are preserved in the conversation context
- Performance Impact: Longer histories consume more tokens and may slow responses
- Best Practice: Enable for multi-turn conversations, disable for independent single queries
Note: v1.0.1 does not have this configuration option and uses default history behavior.
What are the best practices for prompt engineering with this block?
Effective prompt engineering techniques:
- Clear Instructions: Use specific, actionable language in pre_prompt and input messages
- Context Setting: Define the LLM's role and expected behavior clearly
- Output Format: Use response_schema to enforce structured outputs when needed
- Temperature Control: Use low values (0.1-0.3) for factual tasks, higher (0.7-0.9) for creative tasks
- Token Management: Keep prompts concise while providing necessary context
- Examples: Include few-shot examples in your pre_prompt for consistent formatting
How do I optimize performance and manage costs?
Performance and cost optimization strategies:
- Model Selection: Use smaller models (gpt-3.5-turbo, claude-3-haiku) for simple tasks
- Token Limits: Set appropriate max_tokens to avoid unnecessary generation
- History Management: Disable use_thread_history when conversation context isn't needed
- Batch Processing: Process multiple similar requests together when possible
- Caching: Implement caching layers for repeated queries with identical inputs
- Monitoring: Track token usage and API costs to identify optimization opportunities
Remember that reasoning_effort increases token consumption but may improve response quality for complex tasks.