Skip to content

LLM Providers

Runra Runtime supports multiple LLM providers. Configure them once and any agent adapter can use any provider. This decouples your agent choice from your model choice.

ProviderKeyModels
OpenAIopenaigpt-4o, gpt-4o-mini, gpt-5, o1, o3, o4-mini
Anthropicanthropicclaude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-3-5
Geminigeminigemini-2.5-pro, gemini-2.5-flash
OpenRouteropenrouterAny OpenRouter model
OllamaollamaAny local Ollama model
vLLMvllmAny vLLM-served model
CustomcustomOpenAI-compatible endpoint
import { Runra } from "@runra/runtime";
const runra = new Runra({
llm: {
provider: "openai",
config: {
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-5",
baseUrl: "https://api.openai.com/v1", // optional
organization: "org-abc123", // optional
temperature: 0.7, // optional (default: 0.3)
maxTokens: 4096, // optional
requestTimeoutMs: 30000, // optional (default: 30000)
},
},
agent: {
provider: "claude-code",
config: {
model: "gpt-5", // Override: agent uses this model via the OpenAI provider
},
},
// ... sandbox and observability config
});
const runra = new Runra({
llm: {
provider: "anthropic",
config: {
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-sonnet-4-20250514",
maxTokens: 8192,
thinking: {
type: "enabled",
budgetTokens: 4000,
},
},
},
agent: {
provider: "claude-code",
config: {
model: "claude-sonnet-4-20250514",
permissionMode: "auto-approve",
},
},
});
const runra = new Runra({
llm: {
provider: "gemini",
config: {
apiKey: process.env.GEMINI_API_KEY,
model: "gemini-2.5-pro",
temperature: 0.3,
maxOutputTokens: 8192,
safetySettings: [
{ category: "HARM_CATEGORY_DANGEROUS_CONTENT", threshold: "BLOCK_ONLY_HIGH" },
],
},
},
});

Use OpenRouter to access hundreds of models through a single API key:

const runra = new Runra({
llm: {
provider: "openrouter",
config: {
apiKey: process.env.OPENROUTER_API_KEY,
model: "anthropic/claude-sonnet-4-20250514",
baseUrl: "https://openrouter.ai/api/v1",
appName: "my-runra-agent", // Required by OpenRouter
headers: { // Optional extra headers
"HTTP-Referer": "https://myapp.com",
},
},
},
});

You can switch models without changing providers:

// Swap models by changing one value
const llmConfig = {
provider: "openrouter",
config: {
apiKey: process.env.OPENROUTER_API_KEY,
// model: "anthropic/claude-sonnet-4-20250514",
// model: "openai/gpt-5",
// model: "google/gemini-2.5-pro",
model: "meta-llama/llama-4-maverick",
},
};

Run models locally with Ollama:

const runra = new Runra({
llm: {
provider: "ollama",
config: {
model: "llama3.2:70b",
baseUrl: "http://localhost:11434", // Ollama default
temperature: 0.1,
numCtx: 32768, // Context window size
},
},
});

Make sure Ollama is running and the model is pulled:

Terminal window
ollama pull llama3.2:70b
ollama serve

Use a vLLM server for high-throughput local inference:

const runra = new Runra({
llm: {
provider: "vllm",
config: {
model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct",
baseUrl: "http://localhost:8000/v1",
apiKey: "not-needed", // vLLM typically doesn't require auth
maxTokens: 4096,
},
},
});
interface LLMConfig {
/** Provider key */
provider: "openai" | "anthropic" | "gemini" | "openrouter" | "ollama" | "vllm" | "custom";
/** Provider-specific configuration */
config: {
/** API key for the provider */
apiKey?: string;
/** Override base URL (for proxies, local models, custom endpoints) */
baseUrl?: string;
/** Model name / ID */
model: string;
/** Temperature (0.0 - 2.0). Lower = more deterministic */
temperature?: number;
/** Maximum tokens per response */
maxTokens?: number;
/** Request timeout in milliseconds */
requestTimeoutMs?: number;
/** Maximum retries on transient errors */
maxRetries?: number;
/** Additional HTTP headers */
headers?: Record<string, string>;
};
}

Any service with an OpenAI-compatible API can be used as a provider:

const runra = new Runra({
llm: {
provider: "custom",
config: {
baseUrl: "https://my-proxy.internal/v1",
apiKey: process.env.MY_PROXY_KEY,
model: "custom-fine-tuned-model",
temperature: 0.3,
headers: {
"X-Custom-Header": "value",
},
},
},
});

Implement the LLMProvider interface for complete control:

import type { LLMProvider } from "@runra/runtime";
interface LLMRequest {
messages: Array<{ role: string; content: string | Array<unknown> }>;
tools?: Array<{ name: string; description: string; input_schema: unknown }>;
systemPrompt?: string;
temperature?: number;
maxTokens?: number;
}
interface LLMResponse {
content: string;
toolCalls?: Array<{ id: string; name: string; input: unknown }>;
stopReason: "end_turn" | "tool_use" | "max_tokens" | "stop";
usage: {
inputTokens: number;
outputTokens: number;
};
}
class MyCustomLLMProvider implements LLMProvider {
readonly id = "my-custom-provider";
private apiKey!: string;
private baseUrl!: string;
async initialize(config: Record<string, unknown>): Promise<void> {
this.apiKey = config.apiKey as string;
this.baseUrl = (config.baseUrl as string) || "https://my-api.example.com";
}
async chat(request: LLMRequest): Promise<LLMResponse> {
const response = await fetch(`${this.baseUrl}/chat`, {
method: "POST",
headers: {
"Authorization": `Bearer ${this.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "my-model",
messages: this.buildMessages(request),
tools: request.tools,
temperature: request.temperature ?? 0.3,
max_tokens: request.maxTokens ?? 4096,
}),
});
const data = await response.json();
return this.parseResponse(data);
}
private buildMessages(request: LLMRequest) {
const messages = [];
if (request.systemPrompt) {
messages.push({ role: "system", content: request.systemPrompt });
}
messages.push(...request.messages);
return messages;
}
private parseResponse(data: any): LLMResponse {
const choice = data.choices[0];
const message = choice.message;
return {
content: message.content || "",
toolCalls: message.tool_calls?.map((tc: any) => ({
id: tc.id,
name: tc.function.name,
input: JSON.parse(tc.function.arguments),
})),
stopReason: choice.finish_reason,
usage: {
inputTokens: data.usage?.prompt_tokens || 0,
outputTokens: data.usage?.completion_tokens || 0,
},
};
}
async dispose(): Promise<void> {}
}
// Register and use
Runra.registerLLMProvider("my-custom-provider", () => new MyCustomLLMProvider());

Configure fallbacks when your primary provider is unavailable:

const runra = new Runra({
llm: {
provider: "openai",
config: {
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-5",
},
fallbacks: [
{
provider: "anthropic",
config: {
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-sonnet-4-20250514",
},
},
{
provider: "openrouter",
config: {
apiKey: process.env.OPENROUTER_API_KEY,
model: "meta-llama/llama-4-maverick",
},
},
],
},
});

Fallbacks trigger when the primary provider returns 5xx errors or times out. They’re tried in order.

All providers can read credentials from environment variables:

ProviderEnvironment Variable
OpenAIOPENAI_API_KEY
AnthropicANTHROPIC_API_KEY
GeminiGEMINI_API_KEY
OpenRouterOPENROUTER_API_KEY

If apiKey is not set in config, the runtime reads it from the environment automatically.