LLM Client¶
One abstract base, one concrete implementation, multiple modalities.
BaseModelClient defines the contract. OpenAIClient implements it โ handling text generation, vision, structured output, streaming, speech-to-text, text-to-speech, real-time S2S, and image generation from a single class.
Modality map¶
graph TB
BC["BaseModelClient\n(core/llm/base_client.py)"] --> OC["OpenAIClient\n(integrations/llm/openai/)"]
OC --> TXT["๐ Text\ngenerate()\ngenerate_stream()"]
OC --> VIS["๐ผ Vision\ngenerate() with ImageContent"]
OC --> STR["๐ Structured output\nresponse_format=MyModel"]
OC --> STT["๐ค STT\ntranscribe()"]
OC --> TTS["๐ TTS\nstream_tts()"]
OC --> S2S["๐ก Realtime S2S\ncreate_s2s_session()\ns2s_ws_url()"]
OC --> IMG["๐จ Image gen\ngenerate_image()"]
style BC fill:#0d2b2b,stroke:#2dd4bf,color:#e2ecff
style OC fill:#1a1a2e,stroke:#818cf8,color:#e2ecff
style TXT fill:#1a2b1a,stroke:#4ade80,color:#e2ecff Import paths¶
# Abstract base (no external deps โ safe to import anywhere)
from ravi.core.llm.base_client import BaseModelClient
# Concrete implementation
from ravi.integrations.llm.openai.openai_client import OpenAIClient
Text generation¶
Blocking¶
from ravi.integrations.llm.openai.openai_client import OpenAIClient
from ravi.core.messages import SystemMessage, UserMessage
client = OpenAIClient(model="gpt-4o", temperature=0.7)
messages = [
SystemMessage("You are a helpful assistant."),
UserMessage(content=["What is the capital of France?"]),
]
response = await client.generate(messages)
print(response.content[0].text) # "Paris"
print(response.usage) # {prompt_tokens, completion_tokens, total_tokens}
Streaming¶
sequenceDiagram
autonumber
participant A as Agent
participant C as OpenAIClient
participant O as OpenAI API
A->>C: generate_stream(messages, tools)
C->>O: POST /chat/completions stream=true
loop Token by token
O-->>C: SSE chunk
C-->>A: yield AssistantMessage (partial)
end
O-->>C: [DONE]
C-->>A: yield AssistantMessage (final, finish_reason) async for partial in client.generate_stream(messages):
if partial.content:
print(partial.content[0].text, end="", flush=True)
Capability checks¶
Audio and image methods are optional. Always check the property before calling.
# Audio
if client.supports_audio:
transcript = await client.transcribe(
audio_bytes=audio_data,
filename="meeting.mp3",
model="gpt-4o-transcribe",
language="en",
)
async for chunk in client.stream_tts(
text="Hello world",
voice="coral", # alloy echo fable nova onyx sage shimmer
model="tts-1",
response_format="mp3",
):
audio_buffer.write(chunk)
# Real-time speech-to-speech
if client.supports_s2s:
ws_url = client.s2s_ws_url(model="gpt-4o-realtime-preview")
session = await client.create_s2s_session(
model="gpt-4o-realtime-preview",
voice="coral",
instructions="You are a voice assistant.",
)
# Image generation
if client.supports_image_generation:
urls = await client.generate_image(
"a cat in a spacesuit, realistic",
model="gpt-image-1",
size="1024x1024",
quality="high", # low / medium / high / auto
)
# Returns list[str] โ URLs or data:image/png;base64,... strings
Structured output¶
Pass a Pydantic model as response_format and the LLM is forced to return valid JSON matching your schema.
from pydantic import BaseModel
class Analysis(BaseModel):
sentiment: str
confidence: float
summary: str
response = await client.generate(
messages,
response_format=Analysis,
)
result = response.parsed # Analysis instance, or None if parse failed
if result:
print(result.sentiment, result.confidence)
Vision input¶
Pass ImageContent objects in a UserMessage to feed images to the LLM.
from ravi.core.messages import UserMessage, ImageContent
msg = UserMessage(content=[
"Describe what you see in this image.",
ImageContent(url="https://example.com/photo.jpg", detail="high"),
])
# From bytes (no network fetch)
msg = UserMessage(content=[
ImageContent(data=open("chart.png", "rb").read(), media_type="image/png"),
])
# From OpenAI Files API
msg = UserMessage(content=[
ImageContent(file_id="file-abc123"),
])
detail values: "low" ยท "high" ยท "original" ยท "auto" (default)
Token counting¶
count = await client.count_tokens(messages)
# Used by TokenBudgetContext to fit messages in the context window
Source¶
| File | What it owns |
|---|---|
core/llm/base_client.py | BaseModelClient ABC โ text, audio, image generation signatures |
integrations/llm/openai/openai_client.py | OpenAIClient โ all modalities |