Searching \ for 'Vertical Adders' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=vertical+adders
Search entire site for: 'Vertical Adders'.

Truncated match.
PICList Thread
'Vertical Adders'
1998\12\02@105243 by Scott Dattalo

face
flavicon
face
I've got bad news, good news, and some okay news.

bad news:

John's wonderful 3-instruction vertical adder does not work for the
'general' case.

good news:

It can be made to work as a phase accumulator.

okay news:

I found a 7-instruction version that works for the 'general' case. It uses
a temporary register (regardless of the number of stages), but at least it
saves a cycle over what I posted last week.


First, here's why the three cycle version doesn't work.

INPUT  | XXA   | TRUE
-------+-------+------
K R W  | R+ W+ | R+ W+
-------+-------+------
0 0 0  | 0  0  | 0  0
0 0 1  | 1  1  | 1  0
0 1 0  | 1  0  | 1  0
0 1 1  | 0  0  | 0  1
1 0 0  | 1  1  | 1  0
1 0 1  | 0  0  | 0  1
1 1 0  | 0  0  | 0  1
1 1 1  | 1  0  | 1  1

The truth table lists all of the possible combinations of the input bits:
K - 'the constant', R - 'the register', and W - 'the carry'. The goal is
to compute:

R+ = (R + K + W) & 1
W+ = (R + K + W) >> 1

Or, the next state of bit R (R+) is the sum of the three bits R, K, and W.
The ' & 1 ' keeps only the least significant bit. The most signifcant bit
is copied to the output carry (W+) and propogated to the next stage.

The first column of R+ W+ outputs is from John's algorithm which I've
abbreviated XXA for XOR, XOR, & AND, the instructions from which the
algorithm is composed. The second column of R+ W+ outputs is for the so
called 'true' value or the result obtained when you perform the addition.
Just to be clear, look at the last row in the truth table. All three
inputs are '1'. So 1 + 1 + 1 produces 3; or the result is 1 and there's a
carry. Thus for the true value outputs, you see the 1 for R+ and 1 for W+.

Now, if you examine the XXA outputs you'll see that they exactly match the
true value outputs for R+. Wonderful! Unfortunately the carry only matches
at two positions. Bummer. However, if you imagine that the carry is
inverted then the XXA matches for 6 of the input combinations (that is,
invert the W+ column for the XXA outputs). The inverted carry can also be
re-interpreted as the result of subtraction. In other words (and I believe
Dmitry made this observation), John's XXA algorithm is closer to a
vertical subtractor than a vertical adder.

To illustrate this point, I wrote a simple C program that produces the
sequence of numbers by continuously adding a constant to a register (using
the XXA algorithm for addition, of course). I arbitrarily chose 8 stages
which would correspond to 8 vertical bits. However, instead of displaying
the results vertically, I chose to display them horizontally.  Here are
the counting sequences for a selection of various k's (in each case r and
w are initially zero).


Sum sequence for k=0x1
 0xff   0xff   0xfe   0xfd   0xfc   0xfb   0xfa   0xf9
 0xf8   0xf7   0xf6   0xf5   0xf4   0xf3   0xf2   0xf1
 0xf0   0xef   0xee   0xed   0xec   0xeb   0xea   0xe9
 0xe8   0xe7   0xe6   0xe5   0xe4   0xe3   0xe2   0xe1
 0xe0   0xdf   0xde   0xdd   0xdc   0xdb   0xda   0xd9
 0xd8   0xd7   0xd6   0xd5   0xd4   0xd3   0xd2   0xd1
 0xd0   0xcf   0xce   0xcd   0xcc   0xcb   0xca   0xc9

Sum sequence for k=0x2
 0xfe   0xff   0xfd   0xfb   0xf9   0xf7   0xf5   0xf3
 0xf1   0xef   0xed   0xeb   0xe9   0xe7   0xe5   0xe3
 0xe1   0xdf   0xdd   0xdb   0xd9   0xd7   0xd5   0xd3
 0xd1   0xcf   0xcd   0xcb   0xc9   0xc7   0xc5   0xc3
 0xc1   0xbf   0xbd   0xbb   0xb9   0xb7   0xb5   0xb3

Sum sequence for k=0x3
 0x01   0xfe   0xfc   0xfd   0xfa   0xfb   0xf8   0xf9
 0xf6   0xf7   0xf4   0xf5   0xf2   0xf3   0xf0   0xf1
 0xee   0xef   0xec   0xed   0xea   0xeb   0xe8   0xe9
 0xe6   0xe7   0xe4   0xe5   0xe2   0xe3   0xe0   0xe1
 0xde   0xdf   0xdc   0xdd   0xda   0xdb   0xd8   0xd9

Sum sequence for k=0x4
 0xfc   0xff   0xfb   0xf7   0xf3   0xef   0xeb   0xe7
 0xe3   0xdf   0xdb   0xd7   0xd3   0xcf   0xcb   0xc7
 0xc3   0xbf   0xbb   0xb7   0xb3   0xaf   0xab   0xa7
 0xa3   0x9f   0x9b   0x97   0x93   0x8f   0x8b   0x87
 0x83   0x7f   0x7b   0x77   0x73   0x6f   0x6b   0x67
 0x63   0x5f   0x5b   0x57   0x53   0x4f   0x4b   0x47

Sum sequence for k=0x5
 0x 3   0xfe   0xfa   0xf5   0xf0   0xf3   0xee   0xe9
 0xe4   0xe7   0xe2   0xdd   0xd8   0xdb   0xd6   0xd1
 0xcc   0xcf   0xca   0xc5   0xc0   0xc3   0xbe   0xb9


Sum sequence for k=0x6
 0x 2   0xfc   0xf9   0xfb   0xf5   0xf7   0xf1   0xf3
 0xed   0xef   0xe9   0xeb   0xe5   0xe7   0xe1   0xe3
 0xdd   0xdf   0xd9   0xdb   0xd5   0xd7   0xd1   0xd3

Sum sequence for k=0x7
 0xfd   0xff   0xf8   0xf5   0xf6   0xf3   0xec   0xe9
 0xea   0xe7   0xe0   0xdd   0xde   0xdb   0xd4   0xd1
 0xd2   0xcf   0xc8   0xc5   0xc6   0xc3   0xbc   0xb9


Sum sequence for k=0x8
 0xf8   0xff   0xf7   0xef   0xe7   0xdf   0xd7   0xcf
 0xc7   0xbf   0xb7   0xaf   0xa7   0x9f   0x97   0x8f
 0x87   0x7f   0x77   0x6f   0x67   0x5f   0x57   0x4f
 0x47   0x3f   0x37   0x2f   0x27   0x1f   0x17   0x f
 0x 7   0xff   0xf6   0xee   0xe6   0xde   0xd6   0xce
 0xc6   0xbe   0xb6   0xae   0xa6   0x9e   0x96   0x8e


Sum sequence for k=0x9
 0x 7   0xfe   0xf6   0xed   0xe4   0xdb   0xd2   0xc9
 0xc0   0xc7   0xbe   0xb5   0xac   0xa3   0x9a   0x91
 0x88   0x8f   0x86   0x7d   0x74   0x6b   0x62   0x59
 0x50   0x57   0x4e   0x45   0x3c   0x33   0x2a   0x21
 0x18   0x1f   0x16   0x d   0x 4   0xfb   0xf3   0xea
 0xe1   0xd8   0xdf   0xd6   0xcd   0xc4   0xbb   0xb2

So as you can see, the general trend is that the sequence of numbers
decrease by 'k' each iteration. Many sequences have no errors. Some only
have errors when there is a rollover involed (e.g. 8 and 4 - though I
didn't show 4 completely).  Other sequences like 5 and 6 have bizarre
results (like sometimes counting up for one iteration).

Thus I believe with careful experimentation that you could find
appropriate k's to approximate a phase accumulator for square wave
generation.

Finally, here's a 7-instruction vertical adder:

------------
;7-cycle vertical add
vert_add:
       movwf   t               ;Save the input carries
       xorwf   k,w             ;The first 'x' in XXA
       xorwf   r,f             ;The second 'x'
       andwf   t,f             ; t = (carry_in) & (carry_in ^ r_in ^ k)
       andwf   r,w             ;The 'a' in XXA
       xorwf   t,w             ;Invert those two incorrect carries
                               ;(for krw = 001 and 011)
       xorwf   k,w             ;Invert the output carry iff the input
                               ;constant, k, is a 1. (lower half of
                               ;the truth table).


Scott

1998\12\02@175442 by Dmitry Kiryashov

flavicon
face
Hello Scott.

The only one and only good news ;) I've done successfully
squeezing your 7 words example downto 6. Probably anyone
see the way to remove couples of words more ...

Scott I'm also interesting to look into C code you mentioned
about in your letter. (realization of John's idea)

Thank you in advance.

WBR Dmitry.


movfw   Cy      ;carry cell
xorlw   K       ;constant or second var
iorwf   Cy,f    ;invertion mask
xorwf   R,f     ;result
andwf   R,w     ;pre-carry
xorwf   Cy,f    lfinal true carry

-->
+--------+-------------+
| R K Cy | w Cy R w Cy |
+--------+-------------+
| 0 0 0  | 0 0  0 0 0  |
| 0 0 1  | 1 1  1 1 0  |
| 0 1 0  | 1 1  1 1 0  |
| 0 1 1  | 0 1  0 0 1  |
| 1 0 0  | 0 0  1 0 0  |
| 1 0 1  | 1 1  0 0 1  |
| 1 1 0  | 1 1  0 0 1  |
| 1 1 1  | 0 1  1 0 1  |
+--------+-------------+

{Quote hidden}

1998\12\02@202620 by Regulus Berdin

picon face
Hi Dmitry,

Your code is nice. But usually constant K must be a register which can
add an extra cycle to put the XORed K to W.

I have also another solution (different approach also 7 cycles). Also
uses a temporary storage.  Uses the property that W+ is the invert of R+
except for the instances that W, R and K are all the same (0 or 1).

                    (RxW) (invert mask)
K R W  T=KxW R+=TxR  KxR+  T or (KxR+)  W+
0 0 0  0     0       0     0            0
0 0 1  1     1       1     1            0
0 1 0  0     1       1     1            0
0 1 1  1     0       0     1            1
1 0 0  1     1       0     1            0
1 0 1  0     0       1     1            1
1 1 0  1     0       1     1            1
1 1 1  0     1       0     0            1

Here it is:

       xorwf   K,w     ;W = W xor K
       xorwf   R       ;R+ = (T xor K) xor R
       movwf   T       ;T = W xor K
       movf    R,w     ;here 'W xor R' is extracted from R+
       xorwf   K,w     ;W = W xor R
       iorwf   T,w     ;W = (W xor R) or (W xor K) result of 1 if
                       ;either W, R or K differs
       xorwf   R,w     ;invert R with mask from W

Maybe somebody also can shave a few cycles from it.

regards,
Reggie


Dmitry Kiryashov wrote:
{Quote hidden}

1998\12\02@213746 by Dmitry Kiryashov

flavicon
face
Regulus Berdin wrote:
>
> Hi Dmitry,
>
> Your code is nice. But usually constant K must be a register which can
> add an extra cycle to put the XORed K to W.

I see no problem there. jUST write xorwf K,w instead of xorlw K :)

WBR Dmitry.

> Dmitry Kiryashov wrote:
> >
> > movfw   Cy      ;carry cell
> > xorlw   K       ;constant or second var
> > iorwf   Cy,f    ;invertion mask
> > xorwf   R,f     ;result
> > andwf   R,w     ;pre-carry
> > xorwf   Cy,f    lfinal true carry

1998\12\02@215623 by Regulus Berdin

picon face
Dmitry Kiryashov wrote:
>
> I see no problem there. jUST write xorwf K,w instead of xorlw K :)

I guess you're right. I goofed. Sorry!

regards,
Reggie

1998\12\03@001718 by Regulus Berdin

picon face
Hi again,

I had checked your code, seems that there are two errors on the result
and may not work.

regards,
Reggie

Dmitry Kiryashov wrote:
{Quote hidden}

1998\12\03@073540 by Dmitry Kiryashov

flavicon
face
Regulus Berdin wrote:

> I had checked your code, seems that there are
> two errors on the result and may not work.

Again I see no problem there. ;)

1. R=0 K=1 Cy=0

movfw   Cy      ;W=0
xorlw   K       ;W=1
iorwf   Cy,f    ;Cy=1
xorwf   R,f     ;R=1
andwf   R,w     ;W=1
xorwf   Cy,f    ;Cy=0

0 + 1 + 0 = 01b. Cy=0 R=1. Isn't it ? ;)

2. R=1 K=1 Cy=0

movfw   Cy      ;W=0
xorlw   K       ;W=1
iorwf   Cy,f    ;Cy=1
xorwf   R,f     ;R=0
andwf   R,w     ;W=0
xorwf   Cy,f    ;Cy=1

1 + 1 + 0 = 10b. Cy=1 R=0. The questions ? ;)

Reggie there are no errors in this code. Please apply
efforts to reduce it not to find an errors inside it.

WBR Dmitry.

{Quote hidden}

1998\12\03@110229 by Scott Dattalo

face
flavicon
face
On Thu, 3 Dec 1998, Dmitry Kiryashov wrote:

> Regulus Berdin wrote:
>
> > I had checked your code, seems that there are
> > two errors on the result and may not work.
>
> Again I see no problem there. ;)

I think Reggie may've been confused because your routine propogates the
carry in the 'Cy' register and not in 'W'. The routine DOES work.

> WBR Dmitry.
>
> > > movfw   Cy      ;carry cell
> > > xorlw   K       ;constant or second var
> > > iorwf   Cy,f    ;invertion mask
> > > xorwf   R,f     ;result
> > > andwf   R,w     ;pre-carry
> > > xorwf   Cy,f    lfinal true carry

Amazing! Dmitry has my vote for the Pic Nerd of the Month!


I can't see a way to simplify this anymore. But I think there are few
comments to be made about this routine when it is compared to those
old-fashioned horizontal counters. If you add two horizontal registers
together, only two instructions are needed:

 movf  r1,w
 addwf r2,f

If you're implementing phase accumulators, you need the msb of the sum
and:
 movf  r1,w
 addwf r2,f
 rlf   r2,w
 rlf   phase,f

If you're implementing multibyte adders (e.g. 16 bits) you need 6
instructions for the addition. If you're using this as phase accumulator
then two more instructions are needed to extract the msb. The number of
instructions required for say 8 counters is:
 (6 + 2) * 8 = 80

With Dmitry's 6-cycle per stage vertical adder (actually the first stage
only needs to be two cycles) you need:
 6 * (stages -1 ) + 2 instructions

And for 16 bits that comes out to 92 instructions. For 14 bits, the
horizontal and vertical counters are equivalent and for fewer than 14 (but
more than 8) the vertical is faster!

-----------------------
I did make one observation that slightly improves John's 3-cycle
sequence:

 xorlw  k
 xorwf  r,f
 iorlw  ~k
 andwf  r

I haven't fully tested this, but my (maybe faulty) analysis shows that
only one term is incorrect in the truth table (instead of two). Of course,
K now MUST be a constant (unless you want to have an inverted copy in an
extra register).

-----------------------

C-program to John Payson's 3-cycle vertical adder

http://www.interstice.com/~sdattalo/vertical_adder.c


Scott

1998\12\03@125351 by Stig Brautaset

flavicon
face
> I think Reggie may've been confused because your routine propogates the
> carry in the 'Cy' register and not in 'W'. The routine DOES work.
>
> > WBR Dmitry.
> >
> > > > movfw   Cy      ;carry cell
> > > > xorlw   K       ;constant or second var
> > > > iorwf   Cy,f    ;invertion mask
> > > > xorwf   R,f     ;result
> > > > andwf   R,w     ;pre-carry
> > > > xorwf   Cy,f    lfinal true carry
>
> Amazing! Dmitry has my vote for the Pic Nerd of the Month!

What is a vertical adder used for? What is its purpose? And what is a phase
accumulator?

curious regards, Stig


> If you're implementing phase accumulators, you need the msb of the sum
> and:
>   movf  r1,w
>   addwf r2,f
>   rlf   r2,w
>   rlf   phase,f

1998\12\03@132957 by Dmitry Kiryashov

flavicon
face
Hello Scott.

> If you add two horizontal registers together,
> only two instructions are needed:
>
>   movf  r1,w
>   addwf r2,f
>
> If you're implementing phase accumulators, you
> need the msb of the sum and:
>   movf  r1,w
>   addwf r2,f
>   rlf   r2,w
>   rlf   phase,f
>
> If you're implementing multibyte adders (e.g. 16 bits) you need 6
> instructions for the addition. If you're using this as phase accumulator
> then two more instructions are needed to extract the msb. The number of
> instructions required for say 8 counters is:
>   (6 + 2) * 8 = 80


(6 + 2) * 8 = 64 ? Where others cycles were lost ?


> With Dmitry's 6-cycle per stage vertical adder (actually the first stage
> only needs to be two cycles) you need:
>   6 * (stages -1 ) + 2 instructions
> And for 16 bits that comes out to 92 instructions. For 14 bits, the
> horizontal and vertical counters are equivalent and for fewer than 14 (but
> more than 8) the vertical is faster!


Actually I've understood another interesting thing that makes a vertical
counters faster and memory requireless in competition with horizontal
ones.
Phase accumulators usually used to generate sine & cosine square waves.
Let us recall the following trick :

X= 00, 01, 10, 11

X.1 changed as sine func ( 0_0_1_1_.. )
and X.1 xored with X.0 changed as cosine ( 0_1_1_0_..)

Generating sine and cosine will require only one additional stage of
vertical conter. In case of 7 bits counters we will see the following:

movlw   const_1         ;phase adding
addwf   count_1s,f
addwf   count_1c,f
rlf     count_1s,w
rlf     phase_l,f
rlf     count_1c,w
rlf     phase_l,f

7 clocks per sin&cos generating operation 7 * 8 = 56 and
will require 2*8 + (1 or 2) (temp_phase) cells of memory

In case of vertical implementing: ( 8 + 1 stages at all )
6 * 8 + 2 + 1(additional xorwf to obtain cos) = 51 and will
require only 9 cells of memory

Probably there are way to achieve much better performance
after understanding what John'd proposed in 3 clocks routine.

WBR Dmitry.

PS. Playing RISC uC obtain pleasure ;)

1998\12\03@142736 by John Payson

flavicon
face
|If you're implementing phase accumulators, you need the msb of the sum
|and:
|  movf  r1,w
|  addwf r2,f
|  rlf   r2,w
|  rlf   phase,f

In some phase-accumulator apps, the msb of the sum is needed (a
50% square wave) while in others, what's needed is one pulse
per cycle.  Either way, the above may be improved...

       movf    f1,w
       addwf   p1,w
       rlf     carries,f
       movf    f2,w
       addwf   p2,w
       rlf     carries,f
       ...

at this point, 'carries' holds the eight carries from the adds.
If 50%-square-waves are desired, they may be accomplished via

       movf    carries,w
       xorwf   outputs,f

at the end (in which case, the counters are effectively 9-bit).

Note also that the time for eight mov/add/rlf's is less than the
time for eight full stages of a vertical adder, and that the value
in "carries" is perfect for feeding into a vertical adder.

Thus, if you need, e.g., eight 12-bit phase accumulators, you may
be best off using conventional methods for the bottom 8 bits and
a vertical adder to handle the top 4.

This is especially true if the values you're going to be adding are
smaller than the size of the phase accumulator (often true).  In
this case, the upper parts of the phase accumulator (actually a phase
subtractor) can be implemented as simply:

       xorwf   stage0,f
       andwf   stage0,w
       xorwf   stage1,f
       andwf   stage1,w

etc. (2 cycles/stage, or 8 cycles for all 4!).

Note also, btw, that if it's necessary for the phase accumulators to
have a modulus that's not a power of two, this may be dealt with quite
easily with the addition of only one or two instructions.  For example,
xor'ing the output carry with the second-to-last stage will cause the
last two stages to count mod 3.

Cute, eh?

1998\12\04@214634 by Scott Dattalo

face
flavicon
face
On Thu, 3 Dec 1998, Dmitry Kiryashov wrote:

> >   (6 + 2) * 8 = 80
>
>
> (6 + 2) * 8 = 64 ? Where others cycles were lost ?

Brain farts!


-----------------

After studying this problem quite a bit, I got tired of thinking and
decided to bring out the sledge hammer. I wrote a program to simulate a
small number of instructions (it has nothing to do with gpsim)  and I
tailored it specifically for a single stage of the vertical adder
algorithm. Using brute force, the program exhausts all possible
combinations of the instructions. This turns out to be quite a large
number - even for just 6-cycles. I simulated 32 unique operations; thus
the total combinations was 6^32 which is over a billion!

The results: 6-cycles was the minimum solution. I set up the program so
that the carry bit between stages is propogated through W and not through
a register like Dmitry's solution.

Here's what I got:

   movwf  t
   xorwf  k,w
   iorwf  t,f
   xorwf  r,f
   andwf  r,w
   xorwf  t,w


and (the nearly identical solution):

   movwf  t
   xorwf  k,w
   xorwf  r,f
   iorwf  t,f
   andwf  r,w
   xorwf  t,w

I forgot to make the program write the comments too :).  If you compare
this to Dmitry's solution you'll see that the two are nearly identical.

I also tried making K a constant, hoping that some magical optimization
would occur. No luck. I got the same solution as above except the xorwf k
became xorlw k. Oh well.

If anyone cares to tinker with the c code, here it is:

http://www.interstice.com/~sdattalo/va_optimizer.c

and a similar version for k being a constant:

http://www.interstice.com/~sdattalo/va_optimizer2.c


Scott

More... (looser matching)
- Last day of these posts
- In 1998 , 1999 only
- Today
- New search...