Truncated match.
PICList
Thread
'Vertical Adders'
1998\12\02@105243
by
Scott Dattalo
|
I've got bad news, good news, and some okay news.
bad news:
John's wonderful 3-instruction vertical adder does not work for the
'general' case.
good news:
It can be made to work as a phase accumulator.
okay news:
I found a 7-instruction version that works for the 'general' case. It uses
a temporary register (regardless of the number of stages), but at least it
saves a cycle over what I posted last week.
First, here's why the three cycle version doesn't work.
INPUT | XXA | TRUE
-------+-------+------
K R W | R+ W+ | R+ W+
-------+-------+------
0 0 0 | 0 0 | 0 0
0 0 1 | 1 1 | 1 0
0 1 0 | 1 0 | 1 0
0 1 1 | 0 0 | 0 1
1 0 0 | 1 1 | 1 0
1 0 1 | 0 0 | 0 1
1 1 0 | 0 0 | 0 1
1 1 1 | 1 0 | 1 1
The truth table lists all of the possible combinations of the input bits:
K - 'the constant', R - 'the register', and W - 'the carry'. The goal is
to compute:
R+ = (R + K + W) & 1
W+ = (R + K + W) >> 1
Or, the next state of bit R (R+) is the sum of the three bits R, K, and W.
The ' & 1 ' keeps only the least significant bit. The most signifcant bit
is copied to the output carry (W+) and propogated to the next stage.
The first column of R+ W+ outputs is from John's algorithm which I've
abbreviated XXA for XOR, XOR, & AND, the instructions from which the
algorithm is composed. The second column of R+ W+ outputs is for the so
called 'true' value or the result obtained when you perform the addition.
Just to be clear, look at the last row in the truth table. All three
inputs are '1'. So 1 + 1 + 1 produces 3; or the result is 1 and there's a
carry. Thus for the true value outputs, you see the 1 for R+ and 1 for W+.
Now, if you examine the XXA outputs you'll see that they exactly match the
true value outputs for R+. Wonderful! Unfortunately the carry only matches
at two positions. Bummer. However, if you imagine that the carry is
inverted then the XXA matches for 6 of the input combinations (that is,
invert the W+ column for the XXA outputs). The inverted carry can also be
re-interpreted as the result of subtraction. In other words (and I believe
Dmitry made this observation), John's XXA algorithm is closer to a
vertical subtractor than a vertical adder.
To illustrate this point, I wrote a simple C program that produces the
sequence of numbers by continuously adding a constant to a register (using
the XXA algorithm for addition, of course). I arbitrarily chose 8 stages
which would correspond to 8 vertical bits. However, instead of displaying
the results vertically, I chose to display them horizontally. Here are
the counting sequences for a selection of various k's (in each case r and
w are initially zero).
Sum sequence for k=0x1
0xff 0xff 0xfe 0xfd 0xfc 0xfb 0xfa 0xf9
0xf8 0xf7 0xf6 0xf5 0xf4 0xf3 0xf2 0xf1
0xf0 0xef 0xee 0xed 0xec 0xeb 0xea 0xe9
0xe8 0xe7 0xe6 0xe5 0xe4 0xe3 0xe2 0xe1
0xe0 0xdf 0xde 0xdd 0xdc 0xdb 0xda 0xd9
0xd8 0xd7 0xd6 0xd5 0xd4 0xd3 0xd2 0xd1
0xd0 0xcf 0xce 0xcd 0xcc 0xcb 0xca 0xc9
Sum sequence for k=0x2
0xfe 0xff 0xfd 0xfb 0xf9 0xf7 0xf5 0xf3
0xf1 0xef 0xed 0xeb 0xe9 0xe7 0xe5 0xe3
0xe1 0xdf 0xdd 0xdb 0xd9 0xd7 0xd5 0xd3
0xd1 0xcf 0xcd 0xcb 0xc9 0xc7 0xc5 0xc3
0xc1 0xbf 0xbd 0xbb 0xb9 0xb7 0xb5 0xb3
Sum sequence for k=0x3
0x01 0xfe 0xfc 0xfd 0xfa 0xfb 0xf8 0xf9
0xf6 0xf7 0xf4 0xf5 0xf2 0xf3 0xf0 0xf1
0xee 0xef 0xec 0xed 0xea 0xeb 0xe8 0xe9
0xe6 0xe7 0xe4 0xe5 0xe2 0xe3 0xe0 0xe1
0xde 0xdf 0xdc 0xdd 0xda 0xdb 0xd8 0xd9
Sum sequence for k=0x4
0xfc 0xff 0xfb 0xf7 0xf3 0xef 0xeb 0xe7
0xe3 0xdf 0xdb 0xd7 0xd3 0xcf 0xcb 0xc7
0xc3 0xbf 0xbb 0xb7 0xb3 0xaf 0xab 0xa7
0xa3 0x9f 0x9b 0x97 0x93 0x8f 0x8b 0x87
0x83 0x7f 0x7b 0x77 0x73 0x6f 0x6b 0x67
0x63 0x5f 0x5b 0x57 0x53 0x4f 0x4b 0x47
Sum sequence for k=0x5
0x 3 0xfe 0xfa 0xf5 0xf0 0xf3 0xee 0xe9
0xe4 0xe7 0xe2 0xdd 0xd8 0xdb 0xd6 0xd1
0xcc 0xcf 0xca 0xc5 0xc0 0xc3 0xbe 0xb9
Sum sequence for k=0x6
0x 2 0xfc 0xf9 0xfb 0xf5 0xf7 0xf1 0xf3
0xed 0xef 0xe9 0xeb 0xe5 0xe7 0xe1 0xe3
0xdd 0xdf 0xd9 0xdb 0xd5 0xd7 0xd1 0xd3
Sum sequence for k=0x7
0xfd 0xff 0xf8 0xf5 0xf6 0xf3 0xec 0xe9
0xea 0xe7 0xe0 0xdd 0xde 0xdb 0xd4 0xd1
0xd2 0xcf 0xc8 0xc5 0xc6 0xc3 0xbc 0xb9
Sum sequence for k=0x8
0xf8 0xff 0xf7 0xef 0xe7 0xdf 0xd7 0xcf
0xc7 0xbf 0xb7 0xaf 0xa7 0x9f 0x97 0x8f
0x87 0x7f 0x77 0x6f 0x67 0x5f 0x57 0x4f
0x47 0x3f 0x37 0x2f 0x27 0x1f 0x17 0x f
0x 7 0xff 0xf6 0xee 0xe6 0xde 0xd6 0xce
0xc6 0xbe 0xb6 0xae 0xa6 0x9e 0x96 0x8e
Sum sequence for k=0x9
0x 7 0xfe 0xf6 0xed 0xe4 0xdb 0xd2 0xc9
0xc0 0xc7 0xbe 0xb5 0xac 0xa3 0x9a 0x91
0x88 0x8f 0x86 0x7d 0x74 0x6b 0x62 0x59
0x50 0x57 0x4e 0x45 0x3c 0x33 0x2a 0x21
0x18 0x1f 0x16 0x d 0x 4 0xfb 0xf3 0xea
0xe1 0xd8 0xdf 0xd6 0xcd 0xc4 0xbb 0xb2
So as you can see, the general trend is that the sequence of numbers
decrease by 'k' each iteration. Many sequences have no errors. Some only
have errors when there is a rollover involed (e.g. 8 and 4 - though I
didn't show 4 completely). Other sequences like 5 and 6 have bizarre
results (like sometimes counting up for one iteration).
Thus I believe with careful experimentation that you could find
appropriate k's to approximate a phase accumulator for square wave
generation.
Finally, here's a 7-instruction vertical adder:
------------
;7-cycle vertical add
vert_add:
movwf t ;Save the input carries
xorwf k,w ;The first 'x' in XXA
xorwf r,f ;The second 'x'
andwf t,f ; t = (carry_in) & (carry_in ^ r_in ^ k)
andwf r,w ;The 'a' in XXA
xorwf t,w ;Invert those two incorrect carries
;(for krw = 001 and 011)
xorwf k,w ;Invert the output carry iff the input
;constant, k, is a 1. (lower half of
;the truth table).
Scott
1998\12\02@175442
by
Dmitry Kiryashov
|
Hello Scott.
The only one and only good news ;) I've done successfully
squeezing your 7 words example downto 6. Probably anyone
see the way to remove couples of words more ...
Scott I'm also interesting to look into C code you mentioned
about in your letter. (realization of John's idea)
Thank you in advance.
WBR Dmitry.
movfw Cy ;carry cell
xorlw K ;constant or second var
iorwf Cy,f ;invertion mask
xorwf R,f ;result
andwf R,w ;pre-carry
xorwf Cy,f lfinal true carry
-->
+--------+-------------+
| R K Cy | w Cy R w Cy |
+--------+-------------+
| 0 0 0 | 0 0 0 0 0 |
| 0 0 1 | 1 1 1 1 0 |
| 0 1 0 | 1 1 1 1 0 |
| 0 1 1 | 0 1 0 0 1 |
| 1 0 0 | 0 0 1 0 0 |
| 1 0 1 | 1 1 0 0 1 |
| 1 1 0 | 1 1 0 0 1 |
| 1 1 1 | 0 1 1 0 1 |
+--------+-------------+
{Quote hidden}> Finally, here's a 7-instruction vertical adder:
>
> ------------
> ;7-cycle vertical add
> vert_add:
> movwf t ;Save the input carries
> xorwf k,w ;The first 'x' in XXA
> xorwf r,f ;The second 'x'
> andwf t,f ; t = (carry_in) & (carry_in ^ r_in ^ k)
> andwf r,w ;The 'a' in XXA
> xorwf t,w ;Invert those two incorrect carries
> ;(for krw = 001 and 011)
> xorwf k,w ;Invert the output carry iff the input
> ;constant, k, is a 1. (lower half of
> ;the truth table).
1998\12\02@202620
by
Regulus Berdin
|
Hi Dmitry,
Your code is nice. But usually constant K must be a register which can
add an extra cycle to put the XORed K to W.
I have also another solution (different approach also 7 cycles). Also
uses a temporary storage. Uses the property that W+ is the invert of R+
except for the instances that W, R and K are all the same (0 or 1).
(RxW) (invert mask)
K R W T=KxW R+=TxR KxR+ T or (KxR+) W+
0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 0
0 1 0 0 1 1 1 0
0 1 1 1 0 0 1 1
1 0 0 1 1 0 1 0
1 0 1 0 0 1 1 1
1 1 0 1 0 1 1 1
1 1 1 0 1 0 0 1
Here it is:
xorwf K,w ;W = W xor K
xorwf R ;R+ = (T xor K) xor R
movwf T ;T = W xor K
movf R,w ;here 'W xor R' is extracted from R+
xorwf K,w ;W = W xor R
iorwf T,w ;W = (W xor R) or (W xor K) result of 1 if
;either W, R or K differs
xorwf R,w ;invert R with mask from W
Maybe somebody also can shave a few cycles from it.
regards,
Reggie
Dmitry Kiryashov wrote:
{Quote hidden}>
> movfw Cy ;carry cell
> xorlw K ;constant or second var
> iorwf Cy,f ;invertion mask
> xorwf R,f ;result
> andwf R,w ;pre-carry
> xorwf Cy,f lfinal true carry
>
> > Finally, here's a 7-instruction vertical adder:
> >
> > ------------
> > ;7-cycle vertical add
> > vert_add:
> > movwf t ;Save the input carries
> > xorwf k,w ;The first 'x' in XXA
> > xorwf r,f ;The second 'x'
> > andwf t,f ; t = (carry_in) & (carry_in ^ r_in ^ k)
> > andwf r,w ;The 'a' in XXA
> > xorwf t,w ;Invert those two incorrect carries
> > ;(for krw = 001 and 011)
> > xorwf k,w ;Invert the output carry iff the input
> > ;constant, k, is a 1. (lower half of
> > ;the truth table).
1998\12\02@213746
by
Dmitry Kiryashov
Regulus Berdin wrote:
>
> Hi Dmitry,
>
> Your code is nice. But usually constant K must be a register which can
> add an extra cycle to put the XORed K to W.
I see no problem there. jUST write xorwf K,w instead of xorlw K :)
WBR Dmitry.
> Dmitry Kiryashov wrote:
> >
> > movfw Cy ;carry cell
> > xorlw K ;constant or second var
> > iorwf Cy,f ;invertion mask
> > xorwf R,f ;result
> > andwf R,w ;pre-carry
> > xorwf Cy,f lfinal true carry
1998\12\02@215623
by
Regulus Berdin
Dmitry Kiryashov wrote:
>
> I see no problem there. jUST write xorwf K,w instead of xorlw K :)
I guess you're right. I goofed. Sorry!
regards,
Reggie
1998\12\03@001718
by
Regulus Berdin
Hi again,
I had checked your code, seems that there are two errors on the result
and may not work.
regards,
Reggie
Dmitry Kiryashov wrote:
{Quote hidden}>
> movfw Cy ;carry cell
> xorlw K ;constant or second var
> iorwf Cy,f ;invertion mask
> xorwf R,f ;result
> andwf R,w ;pre-carry
> xorwf Cy,f lfinal true carry
>
> -->
> +--------+-------------+
> | R K Cy | w Cy R w Cy |
> +--------+-------------+
> | 0 0 0 | 0 0 0 0 0 |
> | 0 0 1 | 1 1 1 1 0 |
> | 0 1 0 | 1 1 1 1 0 |final Cy should be 1 because W=1 and Cy=0
> | 0 1 1 | 0 1 0 0 1 |
> | 1 0 0 | 0 0 1 0 0 |
> | 1 0 1 | 1 1 0 0 1 |
> | 1 1 0 | 1 1 0 0 1 |also Cy should be 0 because W=0 and Cy=0
> | 1 1 1 | 0 1 1 0 1 |
> +--------+-------------+
1998\12\03@073540
by
Dmitry Kiryashov
|
Regulus Berdin wrote:
> I had checked your code, seems that there are
> two errors on the result and may not work.
Again I see no problem there. ;)
1. R=0 K=1 Cy=0
movfw Cy ;W=0
xorlw K ;W=1
iorwf Cy,f ;Cy=1
xorwf R,f ;R=1
andwf R,w ;W=1
xorwf Cy,f ;Cy=0
0 + 1 + 0 = 01b. Cy=0 R=1. Isn't it ? ;)
2. R=1 K=1 Cy=0
movfw Cy ;W=0
xorlw K ;W=1
iorwf Cy,f ;Cy=1
xorwf R,f ;R=0
andwf R,w ;W=0
xorwf Cy,f ;Cy=1
1 + 1 + 0 = 10b. Cy=1 R=0. The questions ? ;)
Reggie there are no errors in this code. Please apply
efforts to reduce it not to find an errors inside it.
WBR Dmitry.
{Quote hidden}> > movfw Cy ;carry cell
> > xorlw K ;constant or second var
> > iorwf Cy,f ;invertion mask
> > xorwf R,f ;result
> > andwf R,w ;pre-carry
> > xorwf Cy,f lfinal true carry
> >
> > -->
> > +--------+-------------+
> > | R K Cy | w Cy R w Cy |
> > +--------+-------------+
> > | 0 0 0 | 0 0 0 0 0 |
> > | 0 0 1 | 1 1 1 1 0 |
> > | 0 1 0 | 1 1 1 1 0 |final Cy should be 1 because W=1 and Cy=0
> > | 0 1 1 | 0 1 0 0 1 |
> > | 1 0 0 | 0 0 1 0 0 |
> > | 1 0 1 | 1 1 0 0 1 |
> > | 1 1 0 | 1 1 0 0 1 |also Cy should be 0 because W=0 and Cy=0
> > | 1 1 1 | 0 1 1 0 1 |
> > +--------+-------------+
1998\12\03@110229
by
Scott Dattalo
|
On Thu, 3 Dec 1998, Dmitry Kiryashov wrote:
> Regulus Berdin wrote:
>
> > I had checked your code, seems that there are
> > two errors on the result and may not work.
>
> Again I see no problem there. ;)
I think Reggie may've been confused because your routine propogates the
carry in the 'Cy' register and not in 'W'. The routine DOES work.
> WBR Dmitry.
>
> > > movfw Cy ;carry cell
> > > xorlw K ;constant or second var
> > > iorwf Cy,f ;invertion mask
> > > xorwf R,f ;result
> > > andwf R,w ;pre-carry
> > > xorwf Cy,f lfinal true carry
Amazing! Dmitry has my vote for the Pic Nerd of the Month!
I can't see a way to simplify this anymore. But I think there are few
comments to be made about this routine when it is compared to those
old-fashioned horizontal counters. If you add two horizontal registers
together, only two instructions are needed:
movf r1,w
addwf r2,f
If you're implementing phase accumulators, you need the msb of the sum
and:
movf r1,w
addwf r2,f
rlf r2,w
rlf phase,f
If you're implementing multibyte adders (e.g. 16 bits) you need 6
instructions for the addition. If you're using this as phase accumulator
then two more instructions are needed to extract the msb. The number of
instructions required for say 8 counters is:
(6 + 2) * 8 = 80
With Dmitry's 6-cycle per stage vertical adder (actually the first stage
only needs to be two cycles) you need:
6 * (stages -1 ) + 2 instructions
And for 16 bits that comes out to 92 instructions. For 14 bits, the
horizontal and vertical counters are equivalent and for fewer than 14 (but
more than 8) the vertical is faster!
-----------------------
I did make one observation that slightly improves John's 3-cycle
sequence:
xorlw k
xorwf r,f
iorlw ~k
andwf r
I haven't fully tested this, but my (maybe faulty) analysis shows that
only one term is incorrect in the truth table (instead of two). Of course,
K now MUST be a constant (unless you want to have an inverted copy in an
extra register).
-----------------------
C-program to John Payson's 3-cycle vertical adder
http://www.interstice.com/~sdattalo/vertical_adder.c
Scott
1998\12\03@125351
by
Stig Brautaset
> I think Reggie may've been confused because your routine propogates the
> carry in the 'Cy' register and not in 'W'. The routine DOES work.
>
> > WBR Dmitry.
> >
> > > > movfw Cy ;carry cell
> > > > xorlw K ;constant or second var
> > > > iorwf Cy,f ;invertion mask
> > > > xorwf R,f ;result
> > > > andwf R,w ;pre-carry
> > > > xorwf Cy,f lfinal true carry
>
> Amazing! Dmitry has my vote for the Pic Nerd of the Month!
What is a vertical adder used for? What is its purpose? And what is a phase
accumulator?
curious regards, Stig
> If you're implementing phase accumulators, you need the msb of the sum
> and:
> movf r1,w
> addwf r2,f
> rlf r2,w
> rlf phase,f
1998\12\03@132957
by
Dmitry Kiryashov
|
Hello Scott.
> If you add two horizontal registers together,
> only two instructions are needed:
>
> movf r1,w
> addwf r2,f
>
> If you're implementing phase accumulators, you
> need the msb of the sum and:
> movf r1,w
> addwf r2,f
> rlf r2,w
> rlf phase,f
>
> If you're implementing multibyte adders (e.g. 16 bits) you need 6
> instructions for the addition. If you're using this as phase accumulator
> then two more instructions are needed to extract the msb. The number of
> instructions required for say 8 counters is:
> (6 + 2) * 8 = 80
(6 + 2) * 8 = 64 ? Where others cycles were lost ?
> With Dmitry's 6-cycle per stage vertical adder (actually the first stage
> only needs to be two cycles) you need:
> 6 * (stages -1 ) + 2 instructions
> And for 16 bits that comes out to 92 instructions. For 14 bits, the
> horizontal and vertical counters are equivalent and for fewer than 14 (but
> more than 8) the vertical is faster!
Actually I've understood another interesting thing that makes a vertical
counters faster and memory requireless in competition with horizontal
ones.
Phase accumulators usually used to generate sine & cosine square waves.
Let us recall the following trick :
X= 00, 01, 10, 11
X.1 changed as sine func ( 0_0_1_1_.. )
and X.1 xored with X.0 changed as cosine ( 0_1_1_0_..)
Generating sine and cosine will require only one additional stage of
vertical conter. In case of 7 bits counters we will see the following:
movlw const_1 ;phase adding
addwf count_1s,f
addwf count_1c,f
rlf count_1s,w
rlf phase_l,f
rlf count_1c,w
rlf phase_l,f
7 clocks per sin&cos generating operation 7 * 8 = 56 and
will require 2*8 + (1 or 2) (temp_phase) cells of memory
In case of vertical implementing: ( 8 + 1 stages at all )
6 * 8 + 2 + 1(additional xorwf to obtain cos) = 51 and will
require only 9 cells of memory
Probably there are way to achieve much better performance
after understanding what John'd proposed in 3 clocks routine.
WBR Dmitry.
PS. Playing RISC uC obtain pleasure ;)
1998\12\03@142736
by
John Payson
|
|If you're implementing phase accumulators, you need the msb of the sum
|and:
| movf r1,w
| addwf r2,f
| rlf r2,w
| rlf phase,f
In some phase-accumulator apps, the msb of the sum is needed (a
50% square wave) while in others, what's needed is one pulse
per cycle. Either way, the above may be improved...
movf f1,w
addwf p1,w
rlf carries,f
movf f2,w
addwf p2,w
rlf carries,f
...
at this point, 'carries' holds the eight carries from the adds.
If 50%-square-waves are desired, they may be accomplished via
movf carries,w
xorwf outputs,f
at the end (in which case, the counters are effectively 9-bit).
Note also that the time for eight mov/add/rlf's is less than the
time for eight full stages of a vertical adder, and that the value
in "carries" is perfect for feeding into a vertical adder.
Thus, if you need, e.g., eight 12-bit phase accumulators, you may
be best off using conventional methods for the bottom 8 bits and
a vertical adder to handle the top 4.
This is especially true if the values you're going to be adding are
smaller than the size of the phase accumulator (often true). In
this case, the upper parts of the phase accumulator (actually a phase
subtractor) can be implemented as simply:
xorwf stage0,f
andwf stage0,w
xorwf stage1,f
andwf stage1,w
etc. (2 cycles/stage, or 8 cycles for all 4!).
Note also, btw, that if it's necessary for the phase accumulators to
have a modulus that's not a power of two, this may be dealt with quite
easily with the addition of only one or two instructions. For example,
xor'ing the output carry with the second-to-last stage will cause the
last two stages to count mod 3.
Cute, eh?
1998\12\04@214634
by
Scott Dattalo
|
On Thu, 3 Dec 1998, Dmitry Kiryashov wrote:
> > (6 + 2) * 8 = 80
>
>
> (6 + 2) * 8 = 64 ? Where others cycles were lost ?
Brain farts!
-----------------
After studying this problem quite a bit, I got tired of thinking and
decided to bring out the sledge hammer. I wrote a program to simulate a
small number of instructions (it has nothing to do with gpsim) and I
tailored it specifically for a single stage of the vertical adder
algorithm. Using brute force, the program exhausts all possible
combinations of the instructions. This turns out to be quite a large
number - even for just 6-cycles. I simulated 32 unique operations; thus
the total combinations was 6^32 which is over a billion!
The results: 6-cycles was the minimum solution. I set up the program so
that the carry bit between stages is propogated through W and not through
a register like Dmitry's solution.
Here's what I got:
movwf t
xorwf k,w
iorwf t,f
xorwf r,f
andwf r,w
xorwf t,w
and (the nearly identical solution):
movwf t
xorwf k,w
xorwf r,f
iorwf t,f
andwf r,w
xorwf t,w
I forgot to make the program write the comments too :). If you compare
this to Dmitry's solution you'll see that the two are nearly identical.
I also tried making K a constant, hoping that some magical optimization
would occur. No luck. I got the same solution as above except the xorwf k
became xorlw k. Oh well.
If anyone cares to tinker with the c code, here it is:
http://www.interstice.com/~sdattalo/va_optimizer.c
and a similar version for k being a constant:
http://www.interstice.com/~sdattalo/va_optimizer2.c
Scott
More... (looser matching)
- Last day of these posts
- In 1998
, 1999 only
- Today
- New search...