Calculating x^2.
Archive:
Summary:
; assumes W contains a 4 bit number, and squares it. ; by John Payson Sqr4: addwf PC db 0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
(a+b)^2 = a^2 + b^2 + 2ab
Thus, any multiplication could be performed as three squaring operations, a subtraction, and a shift."
a*b = t(a+b)  ( t(a) + t(b) ) ; calculate t(W), for 0 <= W <= 22 ; by David Cary get_t_w: PAGESEL $+2 addwf PCL,f db 0,1,3,6, H'0A', H'0F', H'15', H'1C', H'24', H'2D' db H'37', H'42', H'4E', H'5B', H'69', H'78' db H'88', H'99', H'AB', H'BE', H'D2', H'E7', H'FD'
Stuart Taylor suggested starting with a standard 16x16 bit multiply and trimming it down to a 10x10 bit multiply.
Keith Dowsett suggested "break it down into three simpler multiplications:
(a+b)^2 = a^2 + b^2 + 2ab
where b is the bottom 8 bits of the number, a is the top 2 bits, and use a conventional 8x8 bit multiply for b^2, and (for speed) use a customized 2x8 bit multiply for 2ab, and a 4 entry lookup table for a^2.
N = a*32 + b a = 5 msb's b = 5 lsb's Now using Keith's observation: N^2 = 32^2 * a^2 + 64*a*b + b^2 = (a^2)<<10 + (a*b)<<5 + b^2
which can be evaluated with one generalpurpose 5x5 bit multiply and a lookup table for squaring 5 bit numbers.
Then Scott wonders if breaking the number into 3 nybbles would be even faster:
N = a*256 + b*16 + c N^2 = (a*256)^2 + (b*16)^2 + c^2 + 2*(a*256*b*16 + a*256*c + b*16*c) = (a^2)<<16 + (b^2)<<8 + c^2 + ((a*b)<<12 + (a*c)<<8 + b*c<<4))<<1
(See 4x4 bit multiply )
Andrew Warren speculates that a standard 10bit x 10bit multiplication will turn out to be faster than any of the complicated schemes above.
A 10x10 multiply IS faster than 16x16, {for calculating a square} mostly because the result is guaranteed to fit in three bytes rather than 4. If you don't have the code space to unroll the whole multiplication into inline code (as I showed in my "Solution #1"), looped code can still be made pretty fast by writing three loops:One for the first few bits of the multiplier, where the result of each addition of the multiplicand to the intermediate result is guaranteed to fit in the leastsignificant two bytes,
one for the next group of bits, where the addition may affect all three bytes of the result, and
one for the final few bits, where the addition is guaranteed to only affect the two mostsignificant bytes of the result.
John Payson implemented Andy David's tips, in a completely unrolled loop (sacrificing code space for speed) See the debugged version
; [warning: untested] ; by John Payson 19961105 clrf DstH clrf DstM clrf DstL movf SrcL,w andlw $0F btfss Src,4 addwf DstM rrf DstM rrf DstL btfss Src,5 addwf DstM rrf DstM rrf DstL btfss Src,6 addwf DstM rrf DstM rrf DstL btfss Src,7 addwf DstM call Sqr4 addwf DstL swapf SrcL andlw $0F call Sqr4 addwf DstM ; At this point, 16bit result is in DstM:DstH ; 25 words of code prior to this point (plus a ; 17word tablelookup). Total execution time: ; 35 cycles up to this point. btfss SrcH,0 goto NoBit8 movf SrcL,w btfsc C incf DstH addwf DstM btfsc C incf DstH incf DstH ; Another 9 words for bit 8; 3 or 9 cycles to exec. NoBit8: btfss SrcH,1 goto NoBit9 movlw 4 btfss SrcH,0 movlw 8 addwf DstH rlf SrcL,w btfsc C incf DstH btfsc C incf DstH addwf DstM btfsc C incf DstH addwf DstM btfsc C incf DstH ; Another 17 words for bit 9; 3 or 17 cycles to execute ; Total worstcase time: 35+26 = 61 cycles. NoBit9: retlw 0 ; All done! Sqr4: addwf PC db 0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
Questions:
See also:
file: /Techref/microchip/math/sq/index.htm, 7KB, , updated: 2010/4/14 22:14, local time: 2019/1/20 20:41,

©2019 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions? <A HREF="http://www.piclist.com/techref/microchip/math/sq/index.htm"> PIC Microcontoller Basic Math Square Methods</A> 
Did you find what you needed? 
PICList 2019 contributors:
o List host: MIT, Site host massmind.org, Top posters @20190120 * Page Editors: James Newton, David Cary, and YOU! * Roman Black of Black Robotics donates from sales of Linistep stepper controller kits. * Ashley Roll of Digital Nemesis donates from sales of RCL1 RS232 to TTL converters. * Monthly Subscribers: Gregg Rew. ongoing support is MOST appreciated! * Contributors: Richard Seriani, Sr. 
Welcome to www.piclist.com! 
.