Gateway Inference

Gateway inference lets your code call Islo-managed models without configuring a model provider key. Use the islo package to create a short-lived session token, then pass that token to an OpenAI-compatible or Anthropic-compatible SDK.

The islo package is used here only for token management. Inference requests are sent through SDKs that accept a compatible base URL and API key in code.

Setup

$uv add islo
$export ISLO_API_KEY="your-islo-api-key"

OpenAI SDK

Use the OpenAI-compatible base URL:

https://gateway.islo.dev/inference/openai/v1

For Python:

$uv add openai
1import os
2
3from islo.custom.auth import SyncTokenProvider
4from openai import OpenAI
5
6session_token = SyncTokenProvider(
7 "https://api.islo.dev",
8 os.environ["ISLO_API_KEY"],
9)()
10
11client = OpenAI(
12 api_key=session_token,
13 base_url="https://gateway.islo.dev/inference/openai/v1",
14)
15
16response = client.chat.completions.create(
17 model="kimi-k2.7-code",
18 messages=[
19 {"role": "user", "content": "Say hello from Islo gateway inference."},
20 ],
21 max_tokens=128,
22)
23print(response)

Call the OpenAI Responses API with the same client:

1response = client.responses.create(
2 model="kimi-k2.7-code",
3 input="Say hello from Islo gateway inference.",
4 max_output_tokens=128,
5)
6print(response)

Anthropic SDK

Use the Anthropic-compatible base URL:

https://gateway.islo.dev/inference/anthropic

For Python:

$uv add anthropic
1import os
2
3from anthropic import Anthropic
4from islo.custom.auth import SyncTokenProvider
5
6session_token = SyncTokenProvider(
7 "https://api.islo.dev",
8 os.environ["ISLO_API_KEY"],
9)()
10
11client = Anthropic(
12 api_key=session_token,
13 base_url="https://gateway.islo.dev/inference/anthropic",
14)
15
16message = client.messages.create(
17 model="kimi-k2.7-code",
18 max_tokens=128,
19 messages=[
20 {"role": "user", "content": "Say hello from Islo gateway inference."},
21 ],
22)
23print(message)

For long-running Anthropic clients, create a fresh session token before the current token expires.

Claude Agent SDK

Claude Agent SDK reads Anthropic-compatible connection settings from the process environment. Set those values from a freshly created Islo session token before constructing the client:

$uv add claude-agent-sdk
1import os
2
3from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
4from islo.custom.auth import SyncTokenProvider
5
6session_token = SyncTokenProvider(
7 "https://api.islo.dev",
8 os.environ["ISLO_API_KEY"],
9)()
10
11os.environ["ANTHROPIC_BASE_URL"] = "https://gateway.islo.dev/inference/anthropic"
12os.environ["ANTHROPIC_API_KEY"] = session_token
13
14options = ClaudeAgentOptions(
15 system_prompt="You are a helpful assistant.",
16 model="kimi-k2.7-code",
17 max_turns=20,
18)
19
20async with ClaudeSDKClient(options=options) as client:
21 await client.query("Say hello from Islo gateway inference.")
22
23 async for message in client.receive_response():
24 if hasattr(message, "content"):
25 for block in message.content:
26 if hasattr(block, "text"):
27 print(block.text, end="", flush=True)

For long-running agents, create a fresh session token before starting a new client.

OpenAI Agents SDK

Use the OpenAI Agents SDK with an AsyncOpenAI client configured for the gateway:

$uv add openai-agents openai
1import asyncio
2import os
3
4from agents import Agent, OpenAIChatCompletionsModel, Runner, set_tracing_disabled
5from islo.custom.auth import SyncTokenProvider
6from openai import AsyncOpenAI
7
8session_token = SyncTokenProvider(
9 "https://api.islo.dev",
10 os.environ["ISLO_API_KEY"],
11)()
12
13openai_client = AsyncOpenAI(
14 api_key=session_token,
15 base_url="https://gateway.islo.dev/inference/openai/v1",
16)
17
18set_tracing_disabled(disabled=True)
19
20async def main() -> None:
21 agent = Agent(
22 name="Assistant",
23 instructions="You are a helpful assistant.",
24 model=OpenAIChatCompletionsModel(
25 model="kimi-k2.7-code",
26 openai_client=openai_client,
27 ),
28 )
29
30 result = await Runner.run(agent, "Say hello from Islo gateway inference.")
31 print(result.final_output)
32
33
34asyncio.run(main())

LangChain

Use LangChain’s OpenAI chat model with the gateway base URL:

$uv add langchain-openai
1import os
2
3from islo.custom.auth import SyncTokenProvider
4from langchain_openai import ChatOpenAI
5
6session_token = SyncTokenProvider(
7 "https://api.islo.dev",
8 os.environ["ISLO_API_KEY"],
9)()
10
11chat = ChatOpenAI(
12 api_key=session_token,
13 base_url="https://gateway.islo.dev/inference/openai/v1",
14 model="kimi-k2.7-code",
15)
16
17response = chat.invoke("Say hello from Islo gateway inference.")
18print(response.content)

Instructor

Use Instructor with an OpenAI client configured for the gateway:

$uv add instructor openai pydantic
1import os
2
3import instructor
4from islo.custom.auth import SyncTokenProvider
5from openai import OpenAI
6from pydantic import BaseModel
7
8
9class Greeting(BaseModel):
10 message: str
11
12
13session_token = SyncTokenProvider(
14 "https://api.islo.dev",
15 os.environ["ISLO_API_KEY"],
16)()
17
18openai_client = OpenAI(
19 api_key=session_token,
20 base_url="https://gateway.islo.dev/inference/openai/v1",
21)
22client = instructor.from_openai(openai_client)
23
24greeting = client.chat.completions.create(
25 model="kimi-k2.7-code",
26 response_model=Greeting,
27 messages=[
28 {"role": "user", "content": "Say hello from Islo gateway inference."},
29 ],
30)
31print(greeting.message)