Sunday, May 19, 2024

Building a Voice-Enabled ChatGPT Terminal with ESP32 and Google TTS

efy tested

This project takes the ChatGPT terminal introduced in the November 2023 issue to the next level where it now speaks like a VoiceGPT.

In this new design, the function of speaking out the questions and answers has been added by incorporating an I2S sound module like the MAX98357A and a 4-ohm loudspeaker. The output power is impressive and distortion-free; it must be heard to be believed.

Voice-Enabled ChatGPT Terminal with ESP32 and Google TTS
Fig. 1: Voice-Enabled ChatGPT Terminal with ESP32 and Google TTS

Two variations are shown here. Fig. 1 displays the author’s prototype, and the components required for the terminal are listed in the Bill of Materials (Table 1). Table 2 shows the pin connections between the ESP32 board (MOD1) and the 8.89cm (3.5-inch) TFT (MOD2). Table 3 shows the pin connec­tions between the PS2 keyboard and the ESP32 board.

Table 1: Bill of Materials
ESP32 – MCU (MOD1)1
PS2 keyboard1
8.89cm (3.5-inch) TFT (MOD2)1
Wires, PCB, connector20
IC 7805: 5V regulator IC (IC1)1
I2S sound module MAX98357A (MOD3)1
100MFD capacitor (C1, C2)2
Rectifier diode 1N4007 (D1)1
Table 2: Pin connections between ESP32 board and 8.89cm (3.5-inch) TFT
8.89cm (3.5-inch) TFT pinESP32 Board Pin8.89cm (3.5-inch) TFT PinESP32 Pin
D01G13VCC/5 Volt5V regulator
GndGndESP32 Gnd to be connected to 5V regulator ground and ESP32 Vin to be connected to 5V regulator output
Table 3: Pin connections between PS2 keyboard and ESP32 board
PS2 Keyboard PinESP32 Pin
DATA pinG35
CLK pinG34
5V5V pin of voltage regulator 7805 (IC1)
Table 4: Pin connections between MAX98357A (MOD3) and ESP32 board (MOD1)
MAX98357AESP32 Pin
5V5V pin of voltage regulator 7805 (IC1)

Voice-Enabled ChatGPT Terminal – Circuit and Working

Fig. 2 shows the circuit diagram of the ChatGPT terminal that talks to the ESP32 board. It is built around the ESP32 board (MOD1), 8.89cm (3.5-inch) TFT (MOD2), MAX98357A (MOD3), 5V voltage regulator 7805 (IC1), PS2 keyboard, and a few other components.

Circuit for Voice-Enabled ChatGPT Terminal with ESP32
Fig. 2: Circuit diagram

Connect the components and display them according to the circuit diagram. The voltage regulator used in the circuit powers the device in the voltage range of 5V to 9V. However, you can replace the voltage regulator with a 5V-regulated power adaptor.

- Advertisement -

Note that the ‘gain’ of MAX98357A is connected to the ground to increase output power. Removing this connection will slightly reduce the output power.

A PS2 keyboard like a USB keyboard needs only 4 pins to connect to a microcontroller’s 5V, ground, data, clock (or IRQ) pins. Since this is a pure data-in device, some GPI pins of ESP32 such as G34 and G35 can be connected to it.

GPIO pins G34 and G35 are data-in pins only; they are not data-out pins.

- Advertisement -

Never use them in TFT, LEDs, relays, etc, as they need signal out, unlike keyboard pins that only need data in from the keyboard. Of course, you can connect a PS2 keyboard to other GPIO pins as well.

Fig. 3 shows the pin details of the PS2 keyboard and mouse cable pin details. Fig. 4 shows the I2S voice module.

PS2 keyboard and mouse cable pins
Fig. 3: PS2 keyboard and mouse cable pins
I2S voice module
Fig. 4: I2S voice module

Making ChatGPT Talk

Making ChatGPT talk is made simple by the breakthrough with Google TTS (text-to-speech) and using ESP32 board.

Here, GTTS is used, because in earlier designs, the few hardware-controlled sound libraries available for free like ESP8266SAM.h, AudioOutputI2S.h, and AudioGeneratorRTTTl.h, although they work for ESP32, the sound quality of ESP8266SAM.h is disappointing.

For extended speech, it is challenging to discern, as it sounds like it is speaking through a long pipe. While AudioGeneratorRTTTL.H produces beautiful ringtones, it cannot speak.

Therefore, Google TTS is the only choice for this system.

OPEN-AI API creation
Fig. 5: OPEN-AI API creation

Only three GPIO pins and an internet connection are required to make the I2S amplifier work with Google TTS; 4 header files (available on GitHub) are also necessary to activate the amplifier. These header files have been compiled into a zip file for ease of use with this terminal.

Google TTS can speak one million characters per month free for your account. Speaking ‘EFY’ is a 3-character expense but speaking ‘Electronics For You’ is a 17-character expense! Also, Google will speak only 200 characters at a time. Beyond 200 characters, a paid account from Google is required.

Simply ask a question, seek an answer, or request a code snippet, and the ESP32 will assist in writing the problem statement on the TFT screen using a PS/2 keyboard connected to the ESP32.

The ESP32 will then send the question to ChatGPT, and the output obtained from ChatGPT will be displayed on the same TFT. Simultaneously, the loudspeaker connected to the I2S amplifier (MAX98357A) will speak out the text visible on the screen.

Voice-Enabled ChatGPT Terminal Code
Fig. 6: Voice-Enabled ChatGPT Terminal code

No multimedia computer is needed for this work, though an internet Wi-Fi connection and a secret key to access the ChatGPT API are necessary. This secret key is the single point of access for the API; no other login/password is required.

You might wonder why an old PS2 keyboard is used instead of a USB keyboard. Although an attempt was made to use a sleek, small USB keyboard, it could not communicate with the ESP32. The same applies to the sleekest Bluetooth keyboard.

The software code is developed in the Arduino IDE. In the code, it is necessary to configure the OPEN AI API and the Wi-Fi SSID and password. After configuring the code, the source code can be uploaded to the ESP32 by selecting the correct COM port and board as ESP32. Fig. 6 displays a snippet of the source code.

Construction and Testing

You may assemble the circuit on a general-purpose PCB, as shown in the author’s prototype. First, upload the source code of the terminal into the ESP32 board. Then, refer to Tables 2 through 4 before assembling. After powering on, wait for some time.

The delays used inside loops are quite specific. You may adjust them, but starting with the provided values is suggested. Once an understanding of the responses is gained, adjustments can be made, as needed.

The initial questions, such as ‘Who are you?’, were answered meticulously by the ChatGPT, producing a self-introduction on the screen, and the speaker delivering it nicely.

Subsequently, more advanced questions were asked, such as ‘talk about EFY’ and ‘the distance between Earth and the Sun.’ Each time, the ChatGPT understood the narrative clearly and provided meticulous answers. The speaker worked flawlessly to deliver the voice output clearly and loudly.

Voice-Enabled ChatGPT Terminal using ESP32 and Google TTS
Fig. 7: Voice-Enabled ChatGPT Terminal using ESP32 and Google TTS

The most advanced level of questioning involved tasks like writing 5 sentences about EFY, India, NTPC,, etc. Throughout all the tests, the ChatGPT performed exceptionally well.

However, as these answers exceeded 200 characters, Google TTS refused to speak them. To address this, the answer string was modified to trim it to 200 characters. This way, while the full answer appears on the screen, internally it speaks up to 200 characters only. The author’s final prototype, including the keyboard used in testing, is shown in Fig. 7.

Aftermath: The next modification for this terminal will make it fully voice interactive, so it will listen to questions and respond like an obedient robot! This development is for the EFY readers.

Somnath Bera is an electronics enthusiast. He has contributed many articles across the globe as a freelancer.


Unique DIY Projects

Electronics News

Truly Innovative Tech

MOst Popular Videos

Electronics Components