Speech Synthesizer Development System

This system was designed to facilitate the development of a low cost speech module using a PIC and some EEPROM memory. This has already been done (see PICTalker) using 64k of memory. So far, this device is working with only 11k of memory by using techniques discussed below.

The development board is shown here along with the programming device. To facilitate rapid loading of sound files, it utilizes a 23k256 serial ram instead of eeprom. A real working unit could use a serial eeprom (25C320) with the same pinout, along with some software changes to read it.

The sound file is packed with 34 speech sounds (phonemes), so far. Phonemes were split into two groups: those that had to be recorded as sound snippets while others were stored as single waves.

Vowel sounds along with many other sounds like 'R', 'W', or 'L', are stored by taking a typical sample of one cycle of the audio. At 8khz sample rate, this comes to about 80 bytes each.

Other sounds like 'S', 'TH', or 'P', need to be stored in their entirety and take from 200 to 1200 bytes each. The software in the PIC has a table translating individual letters to stored sounds. The table also specifies the repeat count for each sample. If the sample does not repeat, the repeat count is set to one (see softare listing).

A serial interface was created using the 'steal the negative voltage from the other guy' method for generating usable RS-232 voltage levels. The UART function is done entirely in software. It runs fine at 56k baud talking to the PC.

The SPI interface to the RAM was also done in software. Some resistors and a red LED let the 23k256 run on about 4 volts - well within design limits. The resistors allow the pic to feed 5 volt signals without damage to the RAM.

Audio output is generated using a 10k digital potentiometer (MCP41010) as a cheap and easy to interface D/A converter. Its output is padded down with some resistors to feed the LM386 power amplifier. If line level audio is desired, don't install the LM386 circuit other than the output capacitor and run a jumper to pin-6 on the MCP41010.

The existing software communicates in full duplex to a terminal program with input buffering and backspacing supported at the command prompt. When the download command is executed, it waits for single binary data bytes and echos a '!' after each one. The terminal program waits for this before sending each succesive byte. When the data stream halts for about 1 second, the PIC times out and reverts to the command prompt. Load times run about 20 seconds for 11k of data.

The sound file itself is created by using the FTALK program from the DOS (you still remember DOS?) prompt. Here is a ZIP file with all the sound sample files. The sample files are all in SND format. This format contains only the sound sample with no header information.

Here are the HEX file, Board Layout, and Schematic.

The sample output of this device saying "I am a really cheap computer" demonstrates how mechanical the voice is (like talking to a Cylon - from the original series). I am still tweeking the phonemes for more intelligability. This type of device will not produce natural sounding speech, but produces very interesting results for the small memory size.