Audio compression algorithm
Mike Keitz email (remove spam text)
On Mon, 9 Mar 1998 08:48:03 +0100 Morgan Olsson <INAME.COM> writes: mrt
I'd like to see that, since I've just independently developed something
very similar. The application is to delay a voice signal by up to a
second (using a single 256K x 1 DRAM chip) and also play back "canned"
messages (from the serial Flash ROM that I don't have yet). This should
give the bucket-brigade devices and ISD voicestore chips usually used for
such applications a run for their money.
I'm using CVSD (Continuously Variable-Slope Delta) encoding. This is not
a very high-performance (low bit rate) method, but it is very simple,
requiring no ADCs or DACs and practically no digital processing. I
suppose it could also be called "one-bit sigma-delta".
The present version operates at 54 Kbps and delivers what I would
consider "AM radio quality" audio -- very clear with only slight noise
and limited bandwidth. I've also tried at 32 Kbps and got a typical
telephone quality -- quite noticeably noisy but much better than the
typical digital answering machine. Rates of about 16 Kbps should be
suitable for answering machine -- the message could be understood, but
would sound heavily "processed." 16Kbps is about 1Mbit/minute.
Most that I know about CVSD I learned from the MC3417/MC3418 description
in an old Motorola book. I don't know if Motorola makes this chip any
In a CVSD decoder, a 1-bit output drives an analog integrator either up
or down each sample period. The size of the steps is dynamically changed
to match the overall envelope of the signal. The step size is controlled
by an analog filter which slowly (20 ms) decays unless it is recharged by
a sequence of either 4 ones or 4 zeros in the data stream. This sequence
indicates that the signal amplitude is higher than the integrator can
follow with the present step size. The "coincidence" pulse increases the
step size. The output of the decoder needs to be followed with a fairly
good LPF to get rid of high -frequency artifacts, which sound more like
white noise than the aliasing "squeaks" usually heard with conventional
The encoder is simply a decoder with it's output constantly compared to
the input signal (an "analysis by synthesis" method). If the difference
is positive, the next bit is a 1; if negative the next bit is a zero.
The comparator output is sampled every sample period and immediately
applied to the encoder's integrator (and coincidence detector), as well
as being stored or transmitted. Since the encoder is actually just a
couple of more parts connected to a decoder, most of the hardware can be
shared in a half-duplex system like an answering machine. My audio delay
unit needs two independent circuits of course.
I implemented the CVSD stages except for the coincidence logic in analog
circuitry external to the PIC. I used a CD4069UB biased into linear mode
for the 2 slope voltage generators. These generate two complementary
voltages that idle at mid-supply. Pulses from the coincidence output
cause one voltage to increase and the other to decrease. The two
voltages go to a 4053 analog switch that selects the positive one if the
data bit is 1, and the negative one if the data bit is 0. The output of
the switch feeds the integrator, which is part of a quad op-amp. The
other op-amps are used as LPFs. The encoder's comparator is implemented
by summing the encoder integrator with the input signal and applying it
to another 4069 inverter, then to a PIC pin. The threshold of the PIC
pin sets the compare level. Do not use the RA4 pin, as it has a Schmitt
trigger which will not work as well.
Inside the PIC, a single loop running at the sample rate does everything.
The comparator data is sampled and applied to the encoder slope switch
as well as being shifted into RAM to check if the last 4 bits were either
all 1 or all 0. If so, the coincidence output is tri-stated on. The
encoder bits are also sent to the DRAM.
I used a 256K DRAM even though less than 64K of capacity is needed
because 64K chips do not support CAS-before-RAS cycles. In fact, some
256K chips don't either. I had a WE and Toshiba chip that don't work.
Fujitsu and Hitachi chips did work. The CAS-before-RAS refresh is key to
my DRAM interface because I can use the internal refresh counter as part
of the address. The other part of the address comes from a 4040 counter
chip. It is connected directly to the A inputs on the DRAM and supplies
only the column address. This setup gives 128K of sequentially-accessed
RAM using 7 PIC pins. Only 128K is useful because the internal refresh
counter is only 8 bits, so half of the rows in the chip aren't used.
There may be a way around this but I don't need even that much space.
I hope this description is reasonably complete, but the project is rather
complex so if anyone is interested I can clarify more.
>I believe the Hi8 video recorders use 8-bit logarithmically compressed
>data. i.e one bit change nerar zero level is much less than one bit
>near top values.
This sounds like "u-law" or "a-law" depending on the exact encoding used.
Many "codec" chips exist to make a nonlinear A-D conversion internally,
supplying an 8-bit output every sample period (64 Kbps if the sample rate
is 8 KHz) that can be stored without further processing. At 8 K
samples/sec, the quality is a good telephone quality. It is used in most
You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at http://www.juno.com
Or call Juno at (800) 654-JUNO [654-5866]
See also: www.piclist.com/techref/io/audio.htm?key=audio
You must be a member of the
piclist mailing list
(not only a www.piclist.com member) to post to the