Skip to main content
The AudioAgent provides text-to-speech (TTS) capabilities using AI models through Prism PHP. Convert any text into natural-sounding speech with multiple voice options and audio formats.

Quick Start

Generate speech from text with just a few lines of code:
use Vizra\VizraADK\Agents\AudioAgent;

// Simple TTS generation
$audio = AudioAgent::run('Welcome to our application')->go();

// Save the audio
$audio->storeAs('welcome.mp3');

// Get the URL
$url = $audio->url();

Voice Selection

Choose from six distinct AI voices, each with their own character:
// Using the voice() method
$audio = AudioAgent::run('Hello world')
    ->voice('nova')
    ->go();

// Using voice preset methods
$audio = AudioAgent::run('Hello world')
    ->shimmer()
    ->go();

Available Voices

VoiceDescription
alloyBalanced, neutral voice (default)
echoWarm, conversational tone
fableExpressive, storytelling style
onyxDeep, authoritative voice
novaFriendly, energetic delivery
shimmerClear, professional sound
Each voice has a shortcut method: alloy(), echo(), fable(), onyx(), nova(), shimmer().

Audio Formats

Select your preferred output format based on your use case:
$audio = AudioAgent::run('Your order has been confirmed')
    ->format('wav')
    ->go();

Supported Formats

FormatUse Case
mp3Web playback, general use (default)
wavHigh quality, editing workflows
opusStreaming, low latency
aaciOS/Safari compatibility
flacLossless archival

Speech Speed

Adjust the speaking rate between 0.25x and 4.0x:
// Slower for accessibility
$audio = AudioAgent::run('Important instructions')
    ->speed(0.8)
    ->go();

// Faster for summaries
$audio = AudioAgent::run('Quick recap of today')
    ->speed(1.5)
    ->go();

Model Selection

Override the default model for different quality/speed tradeoffs:
// Use HD model for higher quality
$audio = AudioAgent::run('Premium narration content')
    ->using('openai', 'tts-1-hd')
    ->go();

// Use mini model for faster generation
$audio = AudioAgent::run('Quick notification')
    ->using('openai', 'gpt-4o-mini-tts')
    ->go();

Available Models

ModelDescription
tts-1Standard quality, faster generation (default)
tts-1-hdHigher quality, slower generation
gpt-4o-mini-ttsLatest model with enhanced capabilities

Storage

Auto-store with Custom Filename

$audio = AudioAgent::run('Welcome message')
    ->voice('nova')
    ->storeAs('welcome.mp3')
    ->go();

echo $audio->url();  // Returns the stored file URL
echo $audio->path(); // Returns the storage path

Auto-store with Generated Filename

$audio = AudioAgent::run('Dynamic content')
    ->store()
    ->go();

// File stored with ULID filename like: vizra-adk/generated/audio/01HXY...mp3

Store to Specific Disk

$audio = AudioAgent::run('Private audio')
    ->storeAs('private/message.mp3', 's3')
    ->go();

Manual Storage

$audio = AudioAgent::run('Hello world')->go();

// Store with custom filename
$audio->storeAs('greetings/hello.mp3');

// Or auto-generate filename
$audio->store();

Working with AudioResponse

The AudioResponse object provides methods to access and manipulate the generated audio:
$audio = AudioAgent::run('Sample text')
    ->nova()
    ->format('mp3')
    ->go();

// Access audio data
$audio->data();      // Raw binary audio data
$audio->base64();    // Base64-encoded audio
$audio->toDataUri(); // Data URI for embedding (data:audio/mpeg;base64,...)

// Metadata
$audio->text();      // Original input text
$audio->voice();     // Voice used (e.g., 'nova')
$audio->format();    // Format (e.g., 'mp3')
$audio->mimeType();  // MIME type (e.g., 'audio/mpeg')
$audio->metadata();  // Full metadata array

// Storage status
$audio->isStored();  // Check if audio was stored
$audio->url();       // Get URL (stored or data URI)
$audio->path();      // Get storage path
$audio->disk();      // Get storage disk used

Embedding in HTML

$audio = AudioAgent::run('Play this message')->go();

// Using data URI (no storage needed)
$html = '<audio controls src="' . $audio->toDataUri() . '"></audio>';

// Using stored URL
$audio->store();
$html = '<audio controls src="' . $audio->url() . '"></audio>';

Async/Queued Generation

For long-running generations or batch processing, use Laravel queues:
// Basic async generation
AudioAgent::run('Long article content here...')
    ->onQueue('media')
    ->go();

// With callback after completion
AudioAgent::run('Order confirmation for order #12345')
    ->shimmer()
    ->format('mp3')
    ->onQueue('media')
    ->then(function ($audio) {
        $audio->storeAs('confirmations/order-12345.mp3');

        // Send notification, update database, etc.
        Notification::send($user, new AudioReadyNotification($audio->url()));
    })
    ->go();

Queue Options

AudioAgent::run('Background audio task')
    ->onQueue('media')        // Specify queue name
    ->delay(60)               // Delay in seconds
    ->tries(5)                // Retry attempts
    ->timeout(180)            // Timeout in seconds
    ->then(fn($audio) => $audio->store())
    ->go();

Async Return Value

When using async mode, go() returns job information instead of the audio:
$result = AudioAgent::run('Async generation')
    ->onQueue('media')
    ->go();

// $result contains:
// [
//     'job_dispatched' => true,
//     'job_id' => 'uuid-string',
//     'queue' => 'media',
//     'agent' => 'audio_agent',
//     'prompt' => 'Async generation',
// ]

User Context

Associate audio generation with a user for tracking and personalization:
AudioAgent::run('Personal greeting')
    ->forUser($user)
    ->withSession('session-123')
    ->go();

Sub-agent Delegation

Allow your LLM agents to generate audio by delegating to the AudioAgent:
use Vizra\VizraADK\Agents\BaseLlmAgent;
use Vizra\VizraADK\Tools\DelegateToMediaAgentTool;

class AssistantAgent extends BaseLlmAgent
{
    protected string $name = 'assistant';

    protected string $instructions = <<<'PROMPT'
        You are a helpful assistant. When the user asks you to
        read something aloud or create audio, use the generate_audio
        tool to convert text to speech.
    PROMPT;

    protected array $tools = [
        DelegateToMediaAgentTool::class,
    ];

    protected array $mediaAgents = [
        \Vizra\VizraADK\Agents\AudioAgent::class,
    ];
}
Or create the tool directly:
use Vizra\VizraADK\Tools\DelegateToMediaAgentTool;

class NarratorAgent extends BaseLlmAgent
{
    protected function tools(): array
    {
        return [
            DelegateToMediaAgentTool::forAudio(),
        ];
    }
}
When the LLM calls the generate_audio tool, it can specify:
  • text (required): The text to convert to speech
  • voice (optional): Voice selection
  • format (optional): Output format

Configuration

Environment Variables

# Provider and model
VIZRA_ADK_AUDIO_PROVIDER=openai
VIZRA_ADK_AUDIO_MODEL=tts-1

# Defaults
VIZRA_ADK_AUDIO_VOICE=alloy
VIZRA_ADK_AUDIO_FORMAT=mp3
VIZRA_ADK_AUDIO_SPEED=1.0

Config File

In config/vizra-adk.php:
'media' => [
    'audio' => [
        'provider' => env('VIZRA_ADK_AUDIO_PROVIDER', 'openai'),
        'model' => env('VIZRA_ADK_AUDIO_MODEL', 'tts-1'),
        'default_voice' => env('VIZRA_ADK_AUDIO_VOICE', 'alloy'),
        'default_format' => env('VIZRA_ADK_AUDIO_FORMAT', 'mp3'),
        'default_speed' => env('VIZRA_ADK_AUDIO_SPEED', 1.0),
    ],

    'storage' => [
        'disk' => env('VIZRA_ADK_MEDIA_DISK', 'public'),
        'path' => env('VIZRA_ADK_MEDIA_PATH', 'vizra-adk/generated'),
    ],
],

Complete Example

Here’s a comprehensive example showing common patterns:
use Vizra\VizraADK\Agents\AudioAgent;

class PodcastService
{
    public function generateIntro(string $episodeTitle): string
    {
        $text = "Welcome to today's episode: {$episodeTitle}";

        $audio = AudioAgent::run($text)
            ->using('openai', 'tts-1-hd')
            ->nova()
            ->format('mp3')
            ->speed(1.0)
            ->storeAs("podcasts/intros/{$this->slug($episodeTitle)}.mp3")
            ->go();

        return $audio->url();
    }

    public function generateChapterAudio(array $chapters): void
    {
        foreach ($chapters as $index => $chapter) {
            AudioAgent::run($chapter['content'])
                ->onyx()
                ->format('mp3')
                ->onQueue('podcast-generation')
                ->then(function ($audio) use ($chapter) {
                    $audio->storeAs("chapters/{$chapter['id']}.mp3");

                    Chapter::find($chapter['id'])->update([
                        'audio_url' => $audio->url(),
                        'audio_generated_at' => now(),
                    ]);
                })
                ->go();
        }
    }
}

Error Handling

try {
    $audio = AudioAgent::run('Generate this audio')
        ->nova()
        ->go();

    $audio->store();

} catch (\Exception $e) {
    Log::error('Audio generation failed', [
        'error' => $e->getMessage(),
    ]);
}

API Reference

AudioAgent Methods

MethodDescription
run(string $text)Static method to start fluent chain
execute($input, $context)Direct execution with context
toToolDefinition()Get tool schema for sub-agent use
executeFromToolCall($args, $context)Execute from LLM tool call

MediaAgentExecutor Methods (Fluent Chain)

MethodDescription
voice(string $voice)Set voice
format(string $format)Set output format
speed(float $speed)Set speech speed (0.25-4.0)
alloy(), echo(), etc.Voice presets
using(string $provider, string $model)Override provider/model
forUser(Model $user)Set user context
withSession(string $id)Set session ID
withContext(array $context)Add context data
store(?string $path, ?string $disk)Auto-store with path
storeAs(string $filename, ?string $disk)Auto-store with filename
onQueue(string $queue)Enable async on queue
delay(int $seconds)Delay queue execution
tries(int $count)Set retry attempts
timeout(int $seconds)Set timeout
then(Closure $callback)Callback after completion
go()Execute and return result

AudioResponse Methods

MethodReturnsDescription
data()stringRaw binary audio
base64()stringBase64-encoded audio
toDataUri()stringData URI for embedding
text()stringOriginal input text
voice()stringVoice used
format()stringAudio format
mimeType()stringMIME type
metadata()arrayFull metadata
store(?string $disk)staticStore with auto filename
storeAs(string $filename, ?string $disk)staticStore with custom filename
url()?stringGet URL
path()?stringGet storage path
disk()?stringGet storage disk
isStored()boolCheck if stored
raw()mixedGet raw Prism response
toArray()arrayConvert to array
toJson()stringConvert to JSON