Build a Serverless Customer Service Voicebot
Welcome to an engaging tutorial designed to walk you through creating a customer support voicebot where users can voice their queries and receive solutions. You’ll learn to integrate speech recognition, large language, and text-to-speech models to develop a responsive and efficient voice-based customer support application.
Key Components of the Application
In building this application, we’ll utilize these components:
- Dataset: We will use the dataset Twitter customer support exchanges to help the voicebot develop natural and effective conversational abilities, improving its response accuracy.
- Vector Database: We will use Pinecone will store embeddings of the dataset, aiding in the retrieval of relevant information to provide context to the language model.
- Embedding Model: We will utilize an embedding model bge-small-en-v1.5 to convert textual data from our dataset into numerical vectors. By storing these vectors in a Pinecone, our bot can quickly access relevant information to generate accurate and contextually appropriate responses.
- Automatic Speech Recognition Model: We will use the whisper-large-v3 to convert spoken words into text.
- Text Generation Model: The Hermes-2-Pro-Llama-3-8B will generate responses to user queries.
- Text-to-Audio Model: Piper will convert the generated text responses into speech for a seamless conversational experience.
Crafting Your Application
This tutorial guides you through creating a customer support voicebot where users can speak their queries and the bot responds with spoken solutions. It leverages technologies such as Pinecone, Faster-Whisper, LlamaIndex, Piper, and Inferless.
Document Processing and Storage in Pinecone
To process and store documents in Pinecone, we download and prepare the dataset, then load it using a SimpleDirectoryReader
. We initialize Pinecone and create an index to store the document embeddings. These embeddings enable efficient retrieval and querying, providing relevant context for the language model in the application.
Core Development Steps
Speech-to-Speech Generation
- Objective: Capture user voice input, transcribe it to text, generate the text response, and convert it back to speech.
- Action: Implement a Python class (InferlessPythonModel) to handle the entire speech-to-speech process, including voice input handling, model integration, and audio response generation.
Setting up the Environment
Dependencies:
- Objective: Ensure all necessary libraries are installed.
- Action: Run the command below to install dependencies:
This command ensures your environment has all the tools required for the application.
Deploying Your Model with Inferless CLI
Inferless allows you to deploy your model using Inferless-CLI. Follow the steps to deploy using Inferless CLI.
Clone the repository of the model
Let’s begin by cloning the model repository:
Deploy the Model
To deploy the model using Inferless CLI, execute the following command:
Explanation of the Command:
--gpu A100
: Specifies the GPU type for deployment. Available options includeA10
,A100
, andT4
.--runtime inferless-runtime-config.yaml
: Defines the runtime configuration file. If not specified, the default Inferless runtime is used.
Demo of the Customer Service Voicebot.
Alternative Deployment Method
Inferless also supports a user-friendly UI for model deployment, catering to users at all skill levels. Refer to Inferless’s documentation for guidance on UI-based deployment.
Choosing Inferless for Deployment
Deploying your Customer Service Voicebot application with Inferless offers compelling advantages, making your development journey smoother and more cost-effective. Here’s why Inferless is the go-to choice:
- Ease of Use: Forget the complexities of infrastructure management. With Inferless, you simply bring your model, and within minutes, you have a working endpoint. Deployment is hassle-free, without the need for in-depth knowledge of scaling or infrastructure maintenance.
- Cold-start Times: Inferless’s unique load balancing ensures faster cold-starts. Expect around 2.87 seconds to process each queries, significantly faster than many traditional platforms.
- Cost Efficiency: Inferless optimizes resource utilization, translating to lower operational costs. Here’s a simplified cost comparison:
Scenario 1
You are looking to deploy a Customer Service Voicebot application for processing 100 queries.
Parameters:
- Total number of queries: 100 daily.
- Inference Time: All models are hypothetically deployed on A100 80GB, taking 2.87 seconds of processing time and a cold start overhead of 24.01 seconds.
- Scale Down Timeout: Uniformly 60 seconds across all platforms, except Hugging Face, which requires a minimum of 15 minutes. This is assumed to happen 100 times a day.
Key Computations:
- Inference Duration:
Processing 100 queries and each takes 2.87 seconds
Total: 100 x 2.87 = 287 seconds (or approximately 0.08 hours) - Idle Timeout Duration:
Post-processing idle time before scaling down: (60 seconds - 2.87 seconds) x 100 = 5713 seconds (or 1.59 hours approximately) - Cold Start Overhead:
Total: 100 x 24.01 = 2401 seconds (or 0.67 hours approximately)
Total Billable Hours with Inferless: 0.08 (inference duration) + 1.59 (idle time) + 0.67 (cold start overhead) = 2.34 hours
Total Billable Hours with Inferless: 2.34 hours
Scenario 2
You are looking to deploy a Customer Service Voicebot application for processing 1000 queries per day.
Key Computations:
- Inference Duration:
Processing 1000 queries and each takes 2.87 seconds Total: 1000 x 2.87 = 2870 seconds (or approximately 0.8 hours) - Idle Timeout Duration:
Post-processing idle time before scaling down: (60 seconds - 2.87 seconds) x 100 = 5713 seconds (or 1.59 hours approximately) - Cold Start Overhead:
Total: 100 x 24.01 = 2401 seconds (or 0.67 hours approximately)
Total Billable Hours with Inferless: 0.8 (inference duration) + 1.59 (idle time) + 0.67 (cold start overhead) = 3.06 hours
Total Billable Hours with Inferless: 3.06 hours
Scenarios | On-Demand Cost | Serverless Cost |
---|---|---|
100 requests/day | $28.8 (24 hours billed at $1.22/hour) | $2.85 (2.34 hours billed at $1.22/hour) |
1000 requests/day | $28.8 (24 hours billed at $1.22/hour) | $3.73 (3.06 hours billed at $1.22/hour) |
By opting for Inferless, you can achieve up to 90.10% cost savings.
Please note that we have utilized the A100(80 GB) GPU for model benchmarking purposes, while for pricing comparison, we referenced the A10G GPU price from both platforms. This is due to the unavailability of the A100 GPU in SageMaker.
Also, the above analysis is based on a smaller-scale scenario for demonstration purposes. Should the scale increase tenfold, traditional cloud services might require maintaining 2-4 GPUs constantly active to manage peak loads efficiently. In contrast, Inferless, with its dynamic scaling capabilities, adeptly adjusts to fluctuating demand without the need for continuously running hardware.
Conclusion
By following this guide, you’re now equipped to build and deploy a sophisticated Customer Service Voicebot application. This tutorial showcases the seamless integration of advanced technologies, emphasizing the practical application of creating cost-effective solutions.