>
> > What's the fastest way to square a 10-bit number?
>
Try looking in the applications handbook, the maths routines include
8 and 16 bit multiplies. Use the 16 bit one but limit the first byte to
a 2 bit multiply.
> > > What's the fastest way to square a 10-bit number?
>
> Try looking in the applications handbook, the maths routines include
> 8 and 16 bit multiplies. Use the 16 bit one but limit the first byte
> to a 2 bit multiply.
Yeah, but those routines are really slow... For a specialized case
like this, a better choice (barring some clever squaring algorithm of
which I'm unaware) would be one of the following. Note that I've
only worked these out for 4-bit numbers; the 10-bit equivalents will
be a lot messier, but these examples'll give you the idea:
SOLUTION 1 -- Real fast for 4-bit numbers, probably slower than
solution #2 for 10-bit numbers:
; Enter with 4-bit number to be squared in register "X".
; Exits with X*X in the W register.
movlw 0
clrc
btfsc x,0
addwf x,w
rlf x
btfsc x,2
addwf x,w
rlf x
btfsc x,4
addwf x,w
rlf x
btfsc x,6
addwf x,w
SOLUTION #2: Slower than #1 for 4-bit numbers, but with
sufficiently-good ordering of instructions, could be pretty fast for
10-bit numbers:
BASIC version first, showing the results of squaring all
possible 4-bit numbers:
for a = 0 to 1
for b = 0 to 1
for c = 0 to 1
for d = 0 to 1
n = a*8 + b*4 + c*2 + d
w = 0
x = 0
y = 0
z = 0
if d then z = a*16 + b*8 + c*4 + 1
if c then y = a * 32 + b*16 + 4
if b then x = a*64 + 16
if a then w = 64
r = w+x+y+z
s = n*n
print n,s,r,: if s <> r then print "Error" else print "OK"
next:next:next:next
Now the PIC version:
; Enter with number to be squared in X.
; Exits with X*X in the W-register.
btfss x,0
goto doc
clrc
rlf x,w
addlw -1
doc:
btfss x,1
goto dob
btfsc x,3
addlw 32
btfsc x,2
addlw 16
dob:
btfss x,2
goto doa
btfsc x,3
addlw 64
addlw 16
doa:
btfsc x,3
addlw 64
Personally, I'd go with Solution #1... As I was working out the
assembly code for Solution #2, it was quickly becoming unmanageable
even for only 4 bits.
Stuart Taylor suggested...
>
> > > > What's the fastest way to square a 10-bit number?
> >
> > Try looking in the applications handbook, the maths routines include
> > 8 and 16 bit multiplies. Use the 16 bit one but limit the first byte to
> > a 2 bit multiply.
Thanks, Stuart.
The apps handbook is always a good place to start, but those routines
are way too slow.
I was thinking that a 10 bit by 10 bit multiply is faster than a
16 by 16, and there might be a more efficient algorithm for squaring
than multiplying... I'm a little short of general reference material
maybe I'll ask my wife for some of Knuth's book for Christmas...
>
>Is there scope to break it down into three simpler multiplications?
>
>(a+b)^2 = a^2 + b^2 + 2ab
>
>This gives one two bit multiply (trivial) one 2 bit x 8 bit multiply
>(straightforward) and one eight bit multiply (conventional but still a bit
> slow)
>If you are subsequently re-scaling the result to fit in 16-bits you can
> save a couple of cycles on the eight bit multiply too.
Thanks Keith.
I hadn't thought about this type of appproach, but I expect that taking
the calculation 'head on' will be quicker. Time permitting I'll compare
this
approach to below and let you know how it works out.
Andrew Warren worked out...
>
> SOLUTION 1 -- Real fast for 4-bit numbers, probably slower than
> solution #2 for 10-bit numbers:
>
...[snip some lines of code]...
>
> SOLUTION #2: Slower than #1 for 4-bit numbers, but with
> sufficiently-good ordering of instructions, could be pretty fast for
> 10-bit numbers:
>
...[snip a few more lines of code]...
>
> Personally, I'd go with Solution #1...
thanks, Andy, that's very helpful. I'll complete these for 10-bits and
if anyone is interested I'll post the code. As usual I've been sidetracked
today so it might be a day or 2.
For things like this, what reference material would you suggest?
- Andy.
*************************************************************
Andrew David Senior Project Engineer - Software
Keith Dowsett wrote:
>
> >Guys,
> >
> >
> >What's the fastest way to square a 10-bit number?
> >
> >
> Is there scope to break it down into three simpler multiplications?
>
> (a+b)^2 = a^2 + b^2 + 2ab
>
Precisely what I was thinking. However, let me expand upon a
couple different branches.
Branch 1.
Suppose you divide the number you wish to square in half, that
is into 5 most significant bits and 5 lsb's:
N = a*32 + b
a = 5 msb's
b = 5 lsb's
Now using Keith's observation:
N^2 = 32^2 * a^2 + 64*a*b + b^2
= (a^2)<<10 + (a*b)<<5 + b^2
And the 10-bit multiplication has been broken down into
three 5-bit multiplications. However, with a little work
the 5-bit squares may be evaluated via a table-look-up.
This would leave you with one general purpose 5-bit
multiplication.
Branch 2.
Suppose you divide the number into three nibbles. The most
significant nibble would only be 2 bits wide.
Squaring the nibbles can be done trivially with a look up
table. The other three 4X4 bit multiplications can be
quickly computed with Andy W's 13 instruction suggestion.
If you wish to take advantage of 'a' being only 2-bits
wide, then investigate
This reduces to: 3 table look-ups, one 4X4 bit multiplication
and one 2X8 bit multiplication. My guess is that this could
be coded to execute in less than 60 cycles.
Scott
P.S. Have you considered running the square root routine
backwards? :)
> > What's the fastest way to square a 10-bit number?
>
> Is there scope to break it down into three simpler multiplications?
>
> (a+b)^2 = a^2 + b^2 + 2ab
>
> This gives one two bit multiply (trivial) one 2 bit x 8 bit multiply
> (straightforward) and one eight bit multiply (conventional but still
> a bit slow)
Keith:
Have you coded this into PIC assembly language? I have a feeling
that, in practice, it'll be slower than a standard 10-bit x 10-bit
multiplication.
> The apps handbook is always a good place to start, but those
> routines are way too slow. I was thinking that a 10 bit by 10 bit
> multiply is faster than a 16 by 16, and there might be a more
> efficient algorithm for squaring than multiplying... I'm a little
> short of general reference material maybe I'll ask my wife for some
> of Knuth's book for Christmas...
> ....
> For things like this, what reference material would you suggest?
Andy:
A 10x10 multiply IS faster than 16x16, mostly because the result is
guaranteed to fit in three bytes rather than 4. If you don't have
the code space to unroll the whole multiplication into inline code
(as I showed in my "Solution #1"), looped code can still be made
pretty fast by writing three loops:
One for the first few bits of the multiplier, where the result
of each addition of the multiplicand to the intermediate result
is guaranteed to fit in the least-significant two bytes,
one for the next group of bits, where the addition may affect
all three bytes of the result, and
one for the final few bits, where the addition is guaranteed to
only affect the two most-significant bytes of the result.
As for reference books... I don't know; I looked through Knuth and a
couple of other books, but didn't find any really fast squaring
algorithms.
As usual, though, I'll probably discover a month from now, while
investigating some other problem, that Knuth's book DOES have a good
squaring algorithm.
-Andy
P.S. By the way, the "Solution #1" algorithm I posted earlier is
very similar to Keith Dowsett's suggestion; I just took it a little
farther and (for 4-bit numbers) factored the problem into:
(a + b + c + d)^2
= a^2 + b^2 + c^2 + d^2 + 2ab + 2ac + 2ad + 2bc + 2bd + 2cd.
This removes ALL multiplications from the algorithm; each term just
reduces to either "0" or a single bit.
> Keith Dowsett wrote:
> >
> > >
> > Is there scope to break it down into three simpler multiplications?
> >
> > (a+b)^2 = a^2 + b^2 + 2ab
> >
>
Scott Dattalo wrote:
> Precisely what I was thinking. However, let me expand upon a
> couple different branches.
>
hmmmm ...methinks I need an intuition implant...
my gut feel was that something like Andy W's "straight line"
solution would be a bit quicker than a few smaller calculations
whose results are combined. My gut feel isn't the most reliable
indication of anything, though :-) I'm working on improving it.
Thanks to Keith, Andy and Scott for excellent suggestions
and examples... looks like I've got a few routines to
write! If I'm brave enough, I'll post the fastest one I can
come up with this week.
- Andy.
*************************************************************
Andrew David Senior Project Engineer - Software
> > > Is there scope to break it down into three simpler multiplications?
> > >
> > > (a+b)^2 = a^2 + b^2 + 2ab
> hmmmm ...methinks I need an intuition implant...
> my gut feel was that something like Andy W's "straight line"
> solution would be a bit quicker than a few smaller calculations
> whose results are combined. My gut feel isn't the most reliable
> indication of anything, though :-) I'm working on improving it.
>
> Thanks to Keith, Andy and Scott for excellent suggestions
> and examples... looks like I've got a few routines to
> write! If I'm brave enough, I'll post the fastest one I can
> come up with this week.
Something like this? [warning: untested]
clrf DstH
clrf DstM
clrf DstL
movf SrcL,w
andlw $0F
btfss Src,4
addwf DstM
rrf DstM
rrf DstL
btfss Src,5
addwf DstM
rrf DstM
rrf DstL
btfss Src,6
addwf DstM
rrf DstM
rrf DstL
btfss Src,7
addwf DstM
call Sqr4
addwf DstL
swapf SrcL
andlw $0F
call Sqr4
addwf DstM
; At this point, 16-bit result is in DstM:DstH
; 25 words of code prior to this point (plus a
; 17-word table-lookup). Total execution time:
; 35 cycles up to this point.
btfss SrcH,0
goto NoBit8
movf SrcL,w
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
incf DstH
; Another 9 words for bit 8; 3 or 9 cycles to exec.
NoBit8:
btfss SrcH,1
goto NoBit9
movlw 4
btfss SrcH,0
movlw 8
addwf DstH
rlf SrcL,w
btfsc C
incf DstH
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
; Another 17 words for bit 9; 3 or 17 cycles to execute
; Total worst-case time: 35+26 = 61 cycles.
NoBit9:
retlw 0 ; All done!
Sqr4:
addwf PC
db 0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
>> > What's the fastest way to square a 10-bit number?
>
>Try looking in the applications handbook, the maths routines include
>8 and 16 bit multiplies. Use the 16 bit one but limit the first byte to
>a 2 bit multiply.
>
>Stuart Taylor
What is this "applications handbook?" Could you please elaborate a little
more about whait it is, what's in it, and where to get it?
Thanks,
-Matt
"DOS Computers manufactured by companies such as IBM, Compaq, Tandy, and
millions of others are by far the most popular, with about 70 million
machines in use wordwide. Macintosh fans, on the other hand, may note that
cockroaches are far more numerous than humans, and that numbers alone do
not denote a higher life form."