Searching \ for 'A challenge for square(r)s' in subject line. ()
Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=challenge+squarers
Search entire site for: 'A challenge for square(r)s'.

Truncated match.
'A challenge for square(r)s'
1999\03\22@142630 by

It's been a while since we've had a challenge. And now that Dmitry's
back, I guess we better see if he's lost any of his pic skills :).

Now this isn't really a brand new challenge because I've seen bits and
pieces before posted to the pic list. But at any rate:

1) Shortest and/or fastest code to square an 8 bit number. (The shortest
is probably Andy's 8 by 8 multiply. The fastest is about 35 cycles, as I
recall).

2) Shortest and/or fastest code to square a 4-bit number. (I can think
of one that takes only 10-cycles)

No points - just glory!

>
> It's been a while since we've had a challenge. And now that Dmitry's
> back, I guess we better see if he's lost any of his pic skills :).
>
> Now this isn't really a brand new challenge because I've seen bits and
> pieces before posted to the pic list. But at any rate:
>
> 1) Shortest and/or fastest code to square an 8 bit number. (The shortest
> is probably Andy's 8 by 8 multiply. The fastest is about 35 cycles, as I
> recall).
>
> 2) Shortest and/or fastest code to square a 4-bit number. (I can think
> of one that takes only 10-cycles)

Wouldn't the fastest simply be to have a jump table that returns the square
of the number? Only 16 entries and should take 6 cycles if the number to
square is already in W.

BAJ

If 8^2 = 64 = 2^3 x 2^3 = 2^(3+3) = 2^6
then square root of 64 is 2^(6/2) = 2^3
It works for simple binary numbers.

Take a look at:
http://www.interstice.com/~sdattalo/technical/software/pic/picsqrt.html

Byron A Jeff wrote:
{Quote hidden}

> 2) Shortest and/or fastest code to square a 4-bit number. (I can think
> of one that takes only 10-cycles)

A 4 bit number IMHO is most efficiently squared using a lookup table. That's
a bit less than 10 cycles :-)

Byron A Jeff wrote:
{Quote hidden}

Of course... What I was thinking about applies to multiplication. For
example if you wanted to multiply two 4-bit numbers stored in separate
registers how would you go about doing this? After I posted the
challenge, I realized that the solution I was thinking about requires 14
cycles. A lookup table can still be used, however 256 entries (at most I
guess) would be required.

So if I may, let's change the challenge to multiplication of 4-bit
numbers instead of squaration (new term) of them. I'll by BAJ a Bud.

Scott

|So if I may, let's change the challenge to multiplication of 4-bit
|numbers instead of squaration (new term) of them. I'll by BAJ a Bud.

Well, let's see...

movlw   0
bcf     C
btfss   n1,0
rlf     n2,f
btfss   n1,1
rlf     n2,f
btfss   n1,2
rlf     n2,f
btfss   n1,3
rlf     n2,f
swapf   n2,f

15 cycles, putting the result in W and leaving n1 and n2 untouched at
the end.  Striking the requirement that n2 be untouched would save the
last two cycles.  Note that if n1<=15 and n1*n2<=255 this routine will
produce a correct result even if n2>15; the value in n2 will be trashed
in such a case, though.

Aiming for a result in n2...

swapf   n1,w    ; Assume top 4 bits zero
btfss   n2,0
rrf     n2,f
btfss   n2,0
rrf     n2,f
btfss   n2,0
rrf     n2,f
btfss   n2,0
rrf     n2,f

13 cycles, leaving the result in n2.  This one won't work if either fac-
tor is oversized.

BTW, on the subject of squaring, a potentially useful observation in
some cases is that

(a+b)^2 = a^2 + b^2 + 2ab

Thus, any multiplication could be performed as three squaring operations,
a subtraction, and a shift.  Not generally worthwhile on a PIC, but if
one has an external OTP loaded with, e.g., a 16-bit table of squares it
could be interesting in some cases...

Another useful trick with the above squaring formula, btw, is that when
squaring a large number some of the partial products will be identical.
For example, a 16-bit number can be squared using two "general" 8x8 mul-
tiplies and an 8-bit square operation.  To square a 32-bit number (made
of 4-byte chunks) one would compute:

a   b   c   d
a   b   c   d
---------------------
ad  bd  cd  dd
ac  bc  cc  dc
ab  bb  cb  db
aa  ba  ca  da
------------------------------

aa | 2ab | bb+2ac | 2ad+2bc | 2bd+cc | 2cd | dd

Thus 10 multiplies (4 of them squares).

><snip>
>So if I may, let's change the challenge to multiplication of 4-bit
>numbers instead of squaration (new term) of them. I'll by BAJ a Bud.
>
>Scott

;  X*Y ; result left in W
;  X and Y are 4 bits
;
;
CLRW            ; Clear Result
CLRC            ; Clear Carry for RLF later                     ;
BTFSC   Y,0     ; ?*1
ADDWF   X,W     ; W = ?X
RLF     X,F     ; X = 2X
;
BTFSC   Y,1     ; ?*2
ADDWF   X,W     ; W = ?X + ?2X
RLF     X,F     ; W = 4X

BTFSC   Y,2     ; ?*4
ADDWF   X,W     ; W = ?X + ?2X + ?4X
RLF     X,F     ; W = 8X

BTFSC   Y,3     ; ?*8
ADDWF   X,W     ; ?X + ?2X + ?4X + ?8X
;;;;;;;;;;;;;;;;;;;;;;;;;

13 cycles result in W  13 bytes of code space

Looking back, I now realise that is what John Payson has already
posted. Albeit in a different form. Sigh(!)

Thanks for the challenge Scott, keep 'em coming.
Now, about that 10 cycle solution you mentioned....

Ray Gardiner rayhdc.com.au

Guys,

I've been off the list for a bit over a week, can anyone email me a summary
of this challenge? Not that I'm going to submit anything, I'm just
interested

Thanks,

- Andy.

----------------------------------------------------------
Andrew David, Software Manager, Ultronics Ltd, Cheltenham.
akdavidUltronics.co.uk          http://www.ultronics.com/
----------------------------------------------------------

John Payson wrote:
>
> |So if I may, let's change the challenge to multiplication of 4-bit
> |numbers instead of squaration (new term) of them. I'll by BAJ a Bud.
>
> Well, let's see...

<snip>

> 15 cycles, putting the result in W and leaving n1 and n2 untouched at
> the end.  Striking the requirement that n2 be untouched would save the

<snip>

>
> Aiming for a result in n2...
>

<snip>

> 13 cycles, leaving the result in n2.  This one won't work if either fac-
> tor is oversized.

and then Ray Gardiner likewise wrote:
>
<snip>

>         13 cycles result in W  13 bytes of code space
>
> Looking back, I now realise that is what John Payson has already
> posted. Albeit in a different form. Sigh(!)

These are the solutions that Andy Warren posted a couple of years ago (I
have no idea if he was the original author or not). I think this
shift-and-add trick is really clever. Good work, but what about the 8X8
squaring? Hint - it uses a variation of this same trick.

{Quote hidden}

Cute, Eh?

There's also a variation of this algorithm for multiplying two arbitrary
numbers:

a    b
c    d
---------
ac   bc
----------------
ac | ad + bc | bd

now the middle term can be rewritten

ad + bc = ac + bd - (a - b)(c - d)

Which at first glance seems like an inefficient substitution. However,
notice that the 'ac' and 'bd' terms have already been computed. So two
multiplications and an addition are replaced with one multiplication and
two additions and two 'half sized' additions. The real savings occur for
multiplying large numbers. The efficiency compared to long-hand
multiplication is comparable to efficiency of FFT's vs. DFT's.
Unfortunately, when I attempted to use this to multiply 16 bit numbers
on a pic (using 8 bit numbers for the a,b,c,d terms) I found that the
16-bit long-hand shift-and-add algorithm to be much faster (and
smaller).

Scott

John Payson wrote:

> Aiming for a result in n2...
>
>         swapf   n1,w    ; Assume top 4 bits zero
>         btfss   n2,0
Why btfss ?

>         rrf     n2,f
If we have Cy=1 before entering this code fragment and we skip
first addition we'll got error.

{Quote hidden}

At all code looks very impressive and compact.

WBR Dmitry.

John Payson wrote:

> Aiming for a result in n2...
>
>         swapf   n1,w    ; Assume top 4 bits zero
>         btfss   n2,0
Why btfss ?

>         rrf     n2,f
If we have Cy=1 before entering this code fragment and we skip
first addition we'll got error.

[snips]

Mea culpae.  Sorry 'bout that.

Hi guys. Sorry for such delayed reply.

Scott Dattalo wrote:
>
> John Payson wrote:
> >
> > 13 cycles, leaving the result in n2.  This one won't work if either fac-
> > tor is oversized.

;       4 X 4 multiplication

movfw   x       ;0000abcd
btfss   y,0     ;0000efgh
movlw   0
btfsc   y,1
rlf     x,F
btfsc   y,2
rlf     x,F
btfsc   y,3
addwf   x,W     ;result in W

;       12 clocks/words

8 bit version maybe coming soon ;-)

WBR Dmitry.

More... (looser matching)
- Last day of these posts
- In 1999 , 2000 only
- Today
- New search...