This tutorial demonstrates how to implement real-time text-to-speech (TTS) streaming using the parler_tts_mini model and Parler-TTS library.
model.py
input_schema.py
file, which defines the input structure for our model. You can find the complete file in our GitHub repository.
For this tutorial, we’ll use two text inputs:
prompt_value
: The main text to be converted to speechinput_value
: The voice instructions for the TTS modelstring
data type. The output will be streamed using Server-Sent Events (SSE), delivering audio chunks as base64
encoded strings. This approach allows for real-time audio playback as the speech is generated.
To enable streaming with SSE, it’s crucial to set the IS_STREAMING_OUTPUT
property to True
in your model configuration. This tells the system to expect and handle a continuous output stream rather than a single response.
It’s important to note the limitations when working with streaming inputs:
INT
, STRING
, and BOOLEAN
are supported as input datatypes.[1]
. For multiple inputs or complex objects, use json.dumps(object)
to convert them to a string before passing.input_schema.py
file with the following content:
ParlerTTSStreamer
class and import all the required functions.
app.py
we will define the class and import all the required functions
def initialize
: In this function, we will create an object of the ParlerTTSStreamer
class which will load the model. You can define any variable
that you want to use during inference.
def infer
: The infer
function is the core of your model’s inference process. It’s invoked for each incoming request and is responsible for processing the input and generating the streamed output. Here’s a breakdown of its key components:
a. Output Streaming Setup:
output_dict
with a key 'OUT'
.mp3_str
), we update the output_dict
:
stream_output_handler
for streaming the generated audio output chunks. It provides stream_output_handler.send_streamed_output()
function to send this chunk to the client:
def finalize
: This function cleans up all the allocated memory.
12.4.1
.
Add a custom model
button that you see on the top right. An import wizard will open up.
--gpu A100
: Specifies the GPU type for deployment. Available options include A10
, A100
, and T4
.--runtime inferless-runtime-config.yaml
: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.