The AudioAgent provides text-to-speech (TTS) capabilities using AI models through Prism PHP. Convert any text into natural-sounding speech with multiple voice options and audio formats.
Quick Start
Generate speech from text with just a few lines of code:
use Vizra\VizraADK\Agents\AudioAgent;
// Simple TTS generation
$audio = AudioAgent::run('Welcome to our application')->go();
// Save the audio
$audio->storeAs('welcome.mp3');
// Get the URL
$url = $audio->url();
Voice Selection
Choose from six distinct AI voices, each with their own character:
// Using the voice() method
$audio = AudioAgent::run('Hello world')
->voice('nova')
->go();
// Using voice preset methods
$audio = AudioAgent::run('Hello world')
->shimmer()
->go();
Available Voices
| Voice | Description |
|---|
alloy | Balanced, neutral voice (default) |
echo | Warm, conversational tone |
fable | Expressive, storytelling style |
onyx | Deep, authoritative voice |
nova | Friendly, energetic delivery |
shimmer | Clear, professional sound |
Each voice has a shortcut method: alloy(), echo(), fable(), onyx(), nova(), shimmer().
Select your preferred output format based on your use case:
$audio = AudioAgent::run('Your order has been confirmed')
->format('wav')
->go();
| Format | Use Case |
|---|
mp3 | Web playback, general use (default) |
wav | High quality, editing workflows |
opus | Streaming, low latency |
aac | iOS/Safari compatibility |
flac | Lossless archival |
Speech Speed
Adjust the speaking rate between 0.25x and 4.0x:
// Slower for accessibility
$audio = AudioAgent::run('Important instructions')
->speed(0.8)
->go();
// Faster for summaries
$audio = AudioAgent::run('Quick recap of today')
->speed(1.5)
->go();
Model Selection
Override the default model for different quality/speed tradeoffs:
// Use HD model for higher quality
$audio = AudioAgent::run('Premium narration content')
->using('openai', 'tts-1-hd')
->go();
// Use mini model for faster generation
$audio = AudioAgent::run('Quick notification')
->using('openai', 'gpt-4o-mini-tts')
->go();
Available Models
| Model | Description |
|---|
tts-1 | Standard quality, faster generation (default) |
tts-1-hd | Higher quality, slower generation |
gpt-4o-mini-tts | Latest model with enhanced capabilities |
Storage
Auto-store with Custom Filename
$audio = AudioAgent::run('Welcome message')
->voice('nova')
->storeAs('welcome.mp3')
->go();
echo $audio->url(); // Returns the stored file URL
echo $audio->path(); // Returns the storage path
Auto-store with Generated Filename
$audio = AudioAgent::run('Dynamic content')
->store()
->go();
// File stored with ULID filename like: vizra-adk/generated/audio/01HXY...mp3
Store to Specific Disk
$audio = AudioAgent::run('Private audio')
->storeAs('private/message.mp3', 's3')
->go();
Manual Storage
$audio = AudioAgent::run('Hello world')->go();
// Store with custom filename
$audio->storeAs('greetings/hello.mp3');
// Or auto-generate filename
$audio->store();
Working with AudioResponse
The AudioResponse object provides methods to access and manipulate the generated audio:
$audio = AudioAgent::run('Sample text')
->nova()
->format('mp3')
->go();
// Access audio data
$audio->data(); // Raw binary audio data
$audio->base64(); // Base64-encoded audio
$audio->toDataUri(); // Data URI for embedding (data:audio/mpeg;base64,...)
// Metadata
$audio->text(); // Original input text
$audio->voice(); // Voice used (e.g., 'nova')
$audio->format(); // Format (e.g., 'mp3')
$audio->mimeType(); // MIME type (e.g., 'audio/mpeg')
$audio->metadata(); // Full metadata array
// Storage status
$audio->isStored(); // Check if audio was stored
$audio->url(); // Get URL (stored or data URI)
$audio->path(); // Get storage path
$audio->disk(); // Get storage disk used
Embedding in HTML
$audio = AudioAgent::run('Play this message')->go();
// Using data URI (no storage needed)
$html = '<audio controls src="' . $audio->toDataUri() . '"></audio>';
// Using stored URL
$audio->store();
$html = '<audio controls src="' . $audio->url() . '"></audio>';
Async/Queued Generation
For long-running generations or batch processing, use Laravel queues:
// Basic async generation
AudioAgent::run('Long article content here...')
->onQueue('media')
->go();
// With callback after completion
AudioAgent::run('Order confirmation for order #12345')
->shimmer()
->format('mp3')
->onQueue('media')
->then(function ($audio) {
$audio->storeAs('confirmations/order-12345.mp3');
// Send notification, update database, etc.
Notification::send($user, new AudioReadyNotification($audio->url()));
})
->go();
Queue Options
AudioAgent::run('Background audio task')
->onQueue('media') // Specify queue name
->delay(60) // Delay in seconds
->tries(5) // Retry attempts
->timeout(180) // Timeout in seconds
->then(fn($audio) => $audio->store())
->go();
Async Return Value
When using async mode, go() returns job information instead of the audio:
$result = AudioAgent::run('Async generation')
->onQueue('media')
->go();
// $result contains:
// [
// 'job_dispatched' => true,
// 'job_id' => 'uuid-string',
// 'queue' => 'media',
// 'agent' => 'audio_agent',
// 'prompt' => 'Async generation',
// ]
User Context
Associate audio generation with a user for tracking and personalization:
AudioAgent::run('Personal greeting')
->forUser($user)
->withSession('session-123')
->go();
Sub-agent Delegation
Allow your LLM agents to generate audio by delegating to the AudioAgent:
use Vizra\VizraADK\Agents\BaseLlmAgent;
use Vizra\VizraADK\Tools\DelegateToMediaAgentTool;
class AssistantAgent extends BaseLlmAgent
{
protected string $name = 'assistant';
protected string $instructions = <<<'PROMPT'
You are a helpful assistant. When the user asks you to
read something aloud or create audio, use the generate_audio
tool to convert text to speech.
PROMPT;
protected array $tools = [
DelegateToMediaAgentTool::class,
];
protected array $mediaAgents = [
\Vizra\VizraADK\Agents\AudioAgent::class,
];
}
Or create the tool directly:
use Vizra\VizraADK\Tools\DelegateToMediaAgentTool;
class NarratorAgent extends BaseLlmAgent
{
protected function tools(): array
{
return [
DelegateToMediaAgentTool::forAudio(),
];
}
}
When the LLM calls the generate_audio tool, it can specify:
text (required): The text to convert to speech
voice (optional): Voice selection
format (optional): Output format
Configuration
Environment Variables
# Provider and model
VIZRA_ADK_AUDIO_PROVIDER=openai
VIZRA_ADK_AUDIO_MODEL=tts-1
# Defaults
VIZRA_ADK_AUDIO_VOICE=alloy
VIZRA_ADK_AUDIO_FORMAT=mp3
VIZRA_ADK_AUDIO_SPEED=1.0
Config File
In config/vizra-adk.php:
'media' => [
'audio' => [
'provider' => env('VIZRA_ADK_AUDIO_PROVIDER', 'openai'),
'model' => env('VIZRA_ADK_AUDIO_MODEL', 'tts-1'),
'default_voice' => env('VIZRA_ADK_AUDIO_VOICE', 'alloy'),
'default_format' => env('VIZRA_ADK_AUDIO_FORMAT', 'mp3'),
'default_speed' => env('VIZRA_ADK_AUDIO_SPEED', 1.0),
],
'storage' => [
'disk' => env('VIZRA_ADK_MEDIA_DISK', 'public'),
'path' => env('VIZRA_ADK_MEDIA_PATH', 'vizra-adk/generated'),
],
],
Complete Example
Here’s a comprehensive example showing common patterns:
use Vizra\VizraADK\Agents\AudioAgent;
class PodcastService
{
public function generateIntro(string $episodeTitle): string
{
$text = "Welcome to today's episode: {$episodeTitle}";
$audio = AudioAgent::run($text)
->using('openai', 'tts-1-hd')
->nova()
->format('mp3')
->speed(1.0)
->storeAs("podcasts/intros/{$this->slug($episodeTitle)}.mp3")
->go();
return $audio->url();
}
public function generateChapterAudio(array $chapters): void
{
foreach ($chapters as $index => $chapter) {
AudioAgent::run($chapter['content'])
->onyx()
->format('mp3')
->onQueue('podcast-generation')
->then(function ($audio) use ($chapter) {
$audio->storeAs("chapters/{$chapter['id']}.mp3");
Chapter::find($chapter['id'])->update([
'audio_url' => $audio->url(),
'audio_generated_at' => now(),
]);
})
->go();
}
}
}
Error Handling
try {
$audio = AudioAgent::run('Generate this audio')
->nova()
->go();
$audio->store();
} catch (\Exception $e) {
Log::error('Audio generation failed', [
'error' => $e->getMessage(),
]);
}
API Reference
AudioAgent Methods
| Method | Description |
|---|
run(string $text) | Static method to start fluent chain |
execute($input, $context) | Direct execution with context |
toToolDefinition() | Get tool schema for sub-agent use |
executeFromToolCall($args, $context) | Execute from LLM tool call |
| Method | Description |
|---|
voice(string $voice) | Set voice |
format(string $format) | Set output format |
speed(float $speed) | Set speech speed (0.25-4.0) |
alloy(), echo(), etc. | Voice presets |
using(string $provider, string $model) | Override provider/model |
forUser(Model $user) | Set user context |
withSession(string $id) | Set session ID |
withContext(array $context) | Add context data |
store(?string $path, ?string $disk) | Auto-store with path |
storeAs(string $filename, ?string $disk) | Auto-store with filename |
onQueue(string $queue) | Enable async on queue |
delay(int $seconds) | Delay queue execution |
tries(int $count) | Set retry attempts |
timeout(int $seconds) | Set timeout |
then(Closure $callback) | Callback after completion |
go() | Execute and return result |
AudioResponse Methods
| Method | Returns | Description |
|---|
data() | string | Raw binary audio |
base64() | string | Base64-encoded audio |
toDataUri() | string | Data URI for embedding |
text() | string | Original input text |
voice() | string | Voice used |
format() | string | Audio format |
mimeType() | string | MIME type |
metadata() | array | Full metadata |
store(?string $disk) | static | Store with auto filename |
storeAs(string $filename, ?string $disk) | static | Store with custom filename |
url() | ?string | Get URL |
path() | ?string | Get storage path |
disk() | ?string | Get storage disk |
isStored() | bool | Check if stored |
raw() | mixed | Get raw Prism response |
toArray() | array | Convert to array |
toJson() | string | Convert to JSON |