Text-to-speech GUI chatbot using Tkinter in Python and ElevenLabs API

7 min readJan 20, 2024

Traditional chatbots rely solely on text-based interactions. However, as AI evolves, the addition of voice brings a new dimension to these interactions. With tools like ElevenLabs, you can choose from a range of voices or even clone voices to convey nuances such as emotion and personality, making the interaction more relatable and engaging. A text-to-speech (TTS) chatbot understands and responds to text input and converts text responses into spoken words. This enhancement makes the interaction more dynamic and engaging. It’s particularly useful in scenarios where users prefer or require auditory communication, such as hands-free environments, accessibility for visually impaired users, or simply to enhance the overall user experience.

In this blog, we will create a simple text-to-speech GUI (Graphical User Interface) chat application where users can interact with the bot.

Prerequisites

Before diving into the development process, ensure you have the following prerequisites ready:

Python installed on your system.
A suitable environment for Python development, such as an IDE or a simple text editor.
An API key from ElevenLabs for accessing their text-to-speech service. To get the API key:

From your Profile settings, you can access your API key.

With these prerequisites met you can start building your text-to-speech chatbot using Tkinter and ElevenLabs.

Step 1: Setting up the environment

We are going to begin by installing the required libraries.

On your terminal, install the ElevenLabs Python API to add the text-to-speech functionality.

pip install elevenlabs

Note: If you encounter an ImportError: No module named 'elevenlabs' try reinstalling it by following the commands here.

2. Tkinter usually comes bundled with Python, but you can add it if it’s not installed.

For Mac users: brew install python-tk
For Linux users: sudo apt-get install python3-tk

3. Install Fmpeg, a multimedia framework:

For Mac users: brew install ffmpeg
For Linux users: sudo apt-get install ffmpeg

After setting up the environment, you’re ready to write the chatbot code.

Step 2: Develop the chatbot logic

Start by creating a new Python script chatbot.py.
Import the os and sys modules to utilize operating system functionalities and interact with the Python interpreter.

import os
import sys

Next, create a function that defines how your chatbot will interact with users. This function will take user input as an argument and return a response.

def chatbot_response(user_input):
    user_input = user_input.lower()  # Convert input to lowercase for consistent comparison

    # Greeting responses
    if 'hello' in user_input or 'hi' in user_input:
        return "Hi there! How can I help you today?"
    elif 'bye' in user_input:
        return "Goodbye! Have a great day!"

    # Additional responses
    elif 'how are you' in user_input:
        return "I'm just a bot, but thanks for asking! How can I assist you?"
    elif 'name' in user_input:
        return "I am a friendly chatbot. What can I do for you?"
    elif 'time' in user_input:
        from datetime import datetime
        current_time = datetime.now().strftime("%H:%M")
        return f"The current time is {current_time}."
    elif 'date' in user_input:
        from datetime import datetime
        today_date = datetime.now().strftime("%Y-%m-%d")
        return f"Today's date is {today_date}."
    elif 'joke' in user_input:
        return "Why don't scientists trust atoms? Because they make up everything!"

    # Default response for unrecognized input
    else:
        return "I'm not sure how to respond to that. Can you ask something else?"

This chatbot will:

Respond to greetings
Respond to inquiries about its well-being
Tell its name
Provide the current time and date
Tell a simple joke

If the input doesn’t match any known patterns, it provides a default response.

Add a simple loop to input text and receive responses in the console.

def run_console_test():
    print("Chatbot Test. Type 'quit' to exit.")

    while True:
        user_input = input("You: ")
        if user_input.lower() == 'quit':
            break

        response = chatbot_response(user_input)
        print("Chatbot:", response)

# Run the test
run_console_test()

Run python chatbot.pyto test your chatbot function in the console. When you run this script, you can type inputs and see how the chatbot responds.

Step 3: Integrating ElevenLabs API

With the ElevenLabs Python API, we can add voice to our chatbot. The API converts text responses into speech:

from elevenlabs import generate, play

# Your existing text-to-speech function
def text_to_speech(text):
    api_key = os.getenv('ELEVEN_API_KEY', '<Your API Key')
    audio = generate(
        text=text,
        voice="Sarah",  # Change the voice as needed
        model="eleven_multilingual_v2",  # Choose the appropriate model
        api_key=api_key  # Pass the API key here
    )
    return audio

Note: ElevenLabs has a range of voices in different languages that you can choose from. For this example, we are using the voice Sarah.

Next, create a function that will play the generated audio.

def chatbot_respond_and_speak(user_input):
    response = chatbot_response(user_input)  # Get the chatbot response
    audio = text_to_speech(response)  # Convert response to speech
    play(audio)  # Play the audio
    return response

This chatbot_respond_and_speak function ties everything together. It gets the response from your chatbot, converts it to speech, and then plays the audio.

Step 4: Building the GUI with Tkinter

In this step, we will build a simple graphical user interface (GUI) for our chatbot using Tkinter, Python’s built-in library for creating GUI applications. This interface will allow users to interact with the chatbot through a windowed application rather than the command line.

Start by initializing Tkinter and setting up the main window

import tkinter as tk

window = tk.Tk()
window.title("Chatbot")
window.geometry('400x500')  # Set the window size

Create the chat display area. We use a Frame widget to hold our chat content. A Listbox is used to display the chat messages, and a Scrollbar is added for scrolling through the chat history.

chat_frame = tk.Frame(window)
scrollbar = tk.Scrollbar(chat_frame)
msg_list = tk.Listbox(chat_frame, height=15, width=50, yscrollcommand=scrollbar.set)
scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
msg_list.pack(side=tk.LEFT, fill=tk.BOTH)
msg_list.pack()
chat_frame.pack()

Add a user input field for the user to type their message and a send button to submit it. The lambda function is used to link the button with the chatbot's response function.

entry_frame = tk.Frame(window)
user_input_text = tk.Entry(entry_frame, width=40)
user_input_text.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
send_button = tk.Button(entry_frame, text="Send", command=lambda: chatbot_respond_and_speak_gui())
send_button.pack(side=tk.RIGHT)
entry_frame.pack()

Create a function to integrate the chatbot responses and text-to-speech. The chatbot_respond_and_speak_gui function captures the user’s input, sends it to the chatbot logic, and displays the response in the Listbox. It also calls the text-to-speech function to audibly play the chatbot's response.

def chatbot_respond_and_speak_gui():
    user_input = user_input_text.get()
    if user_input:
        msg_list.insert(tk.END, "You: " + user_input)
        response = chatbot_response(user_input)
        msg_list.insert(tk.END, "Chatbot: " + response)
        audio = text_to_speech(response)
        play(audio)
        user_input_text.delete(0, tk.END)
        msg_list.see(tk.END)  # Auto-scroll to the latest message

Bind the Enter key to the send function for a better user experience, allowing users to press Enter to send their messages. Add the window.mainloop() command to start the event loop, which keeps the application window open and responsive to user inputs.

def on_enter_key(event):
    chatbot_respond_and_speak_gui()

window.bind('<Return>', on_enter_key)
window.mainloop()

Run your Python script in your terminal.

Python chatbot.py

This will launch the chatbot’s GUI window.

Start by typing various messages into the chatbot’s input field and observe the responses.
Test the chatbot’s ability to handle different types of queries, including those it is designed to respond to and some it might not recognize.
Listen to the text-to-speech output to ensure the audio is clear and the responses are correctly vocalized.

Full Code

Here’s the complete Python script which includes the chatbot logic, ElevenLabs API integration for text-to-speech, and the Tkinter GUI.

import os
import tkinter as tk
from datetime import datetime
from elevenlabs import generate, play

# Your existing text-to-speech function
def text_to_speech(text):
    api_key = os.getenv('ELEVEN_API_KEY', '<Your API Key>')
    audio = generate(
        text=text,
        voice="Sarah",  # Change the voice as needed
        model="eleven_multilingual_v2",  # Choose the appropriate model
        api_key=api_key  # Pass the API key here
    )
    return audio
    
# Chatbot logic
def chatbot_response(user_input):
    user_input = user_input.lower()

    if 'hello' in user_input or 'hi' in user_input:
        return "Hi there! How can I help you today?"
    elif 'bye' in user_input:
        return "Goodbye! Have a great day!"
    elif 'how are you' in user_input:
        return "I'm just a bot, but thanks for asking! How can I assist you?"
    elif 'name' in user_input:
        return "I am a friendly chatbot. What can I do for you?"
    elif 'time' in user_input:
        current_time = datetime.now().strftime("%H:%M")
        return f"The current time is {current_time}."
    elif 'date' in user_input:
        today_date = datetime.now().strftime("%Y-%m-%d")
        return f"Today's date is {today_date}."
    elif 'joke' in user_input:
        return "Why don't scientists trust atoms? Because they make up everything!"
    else:
        return "I'm not sure how to respond to that. Can you ask something else?"


# Initialize Tkinter window
window = tk.Tk()
window.title("Chatbot")
window.geometry('400x500')

# Chat window
chat_frame = tk.Frame(window)
scrollbar = tk.Scrollbar(chat_frame)
msg_list = tk.Listbox(chat_frame, height=15, width=50, yscrollcommand=scrollbar.set)
scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
msg_list.pack(side=tk.LEFT, fill=tk.BOTH)
msg_list.pack()
chat_frame.pack()

# User input area
entry_frame = tk.Frame(window)
user_input_text = tk.Entry(entry_frame, width=40)
user_input_text.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
send_button = tk.Button(entry_frame, text="Send", command=lambda: chatbot_respond_and_speak_gui())
send_button.pack(side=tk.RIGHT)
entry_frame.pack()

# Function to update chat window and handle response
def chatbot_respond_and_speak_gui():
    user_input = user_input_text.get()
    if user_input:
        msg_list.insert(tk.END, "You: " + user_input)
        response = chatbot_response(user_input)
        msg_list.insert(tk.END, "Chatbot: " + response)
        audio = text_to_speech(response)
        play(audio)
        user_input_text.delete(0, tk.END)
        msg_list.see(tk.END)  # Auto-scroll to the latest message

# Bind Enter key for sending message
def on_enter_key(event):
    chatbot_respond_and_speak_gui()

window.bind('<Return>', on_enter_key)

# Start the Tkinter event loop
window.mainloop()

Wrapping up

I hope you enjoyed reading through this tutorial as much as I did writing it. You can explore further by enhancing its features, refining its responses, or even deploying it for practical use.