good in general, but bad nested loops
Henry Cejtin
henry@sourcelight.com
Mon, 9 Jul 2001 23:32:45 -0500
I have a new version of my spy program. This one takes the `-l' option to
mean don't look for a loop, just dump what the program is doing. The default
window is still 1000, so if you just run
spy -l pid
then it will show you the next 1000 instructions that process pid is running.
You can combine this with the -w option:
spy -l -w 10000 pid
will show you the next 10,000 instructions.
Doing this on the nested loop case is quite illustrative. Note, compiling
with `-detect-overflow false' makes very little difference to the time.
I see that we are comparing registers against 0 instead of doing testl of the
register with itself. The compare is 3 bytes vs. 2 bytes for the testl (for
the %ebp register), but it doesn't seem to make any speed differences in the
tests I performed. Still, it is something to put in.
I also saw that we were loading a register from memory, incrementing the
register, and then storing it back into the same memory location. (This is
with overflow detection off, otherwise there is a test for overflow between
these two.) I thought that perhaps just incrementing the memory location
would be faster, but it doesn't seem to be. Again, it is smaller, but with
overflow detection you would have to be clever enough to know that on an
exception it is dead so it is safe to do the increment. (Or else have patch-
up code in the overflow true case.)
Perhaps for these cases it really is just a matter of not having the relevant
variables in real registers. Or maybe it is the 2 adjacent stores into
memory. (I seem to recall that this was bad for the CPU to schedule.)
Any way, here is a sample of the hot code in nestedloop:
0x8049519: mov 0xdc(%edi),%esp
0x804951f: cmp $0x0,%esp
0x8049522: je 0x804a598
0x8049528: dec %esp
0x8049529: jo 0x804a5b8
0x804952f: mov 0xd8(%edi),%ebp
0x8049535: inc %ebp
0x8049536: jo 0x804a5c4
0x804953c: mov %esp,0xdc(%edi)
0x8049542: mov %ebp,0xd8(%edi)
0x8049548: jmp 0x8049519
And here is the latest spy program:
begin 600 spy.tgz
M'XL(`%*%2CL"`^U726_C-A3VU?X5',RDDAK;D3.9"6I,`J1`EP&*7KH"'A\8
MB;;92)1&I&([O[[?XV(KDW0[M(="[V"1;_G>PL?%NMX/_FU*9VEZ^>;-(`5=
MG*?V.[NXL%_0V[>O+P?IY>OS\XO+-[.+UZ1_F:8#E@[^`VJUX0UC@XU0S?Y/
M]42C!_\[>OGBK-7-V:U49T+=LWIO-I4:C6195XUA>J^'+QEOUO=CIDU>M29(
M*A+452W4F-7<;*:W7`O%2W$P-8U4:[(VE0Q,(\IZ)0L!=GE'D]$H%RM6<JGB
M9#X:DB=V16ZG-!P-@<+E>F/`_+Y28C3<2I576TS14^G!@#Z+V7P)^0;P;`:L
MH5PQ51DKH^FPU7PMX@1#L()1NCQJQL3G*B<)!.SJBD63*+'&MXW@=\[TL4,R
MMGBD7$16N1/U#'-1='6V3L<T>_L]9N0J-J5ZQ3XV"G8H=IFH#?N9%ZWXJFFJ
MQMD=TPD:[U4N=L]K/`U;%%H\+HN/*"OS6N9_&,^ST1PPD&@AE%5/V+MKMQ!'
M#]0@0`YM,'5-0*+M.K^-K7S,7`1CYBH#(62PJO34-EP<B9W(&#$GBDT^LLDM
M-]F&37;L1+-W9[FX/U-M4;#SZ\]F'U3$3I@%!A`6,2^DHAA@/@W3HR!.T)KZ
M3M;LMU9AO>&S51#<Q0'"5%9\9?.,9G.V.Y/LE0"+11#GW'`(%RCQJFJ89%*Q
M7<,5"N#3H8H\=6>CTIL*2FNV%5@I3$U5UR*'_@,@CS;'CGVP*YA+$4??H!QU
M4V5":R9VTHB<PK&6#PL7]'P9O+5**B,:H0VYJQNQDCN'&EK7]3'VH=OVTVT#
MR/@AZ78.Y3KE"%'E5N*#ZD+$JRT?([E2F@21K%""[ZJJCDM>QP42VHC=F!%.
M0LB?^J-T=4SB!7#8W`%1%Y(F91FGR6CTDOU"V@QVS&R$[8RL*DML93VU!XSK
MKK)V_66;*ZM:96@Q#'4BM1>UUD$GVE+YK,SG'G%C.-KL)'<]!91/-;0P[!55
M/.@X)X^UD&>42UT7?$^=4V=0)9X[N6!^?<52S\-)(FIYF`1X_$S8S+-1?3_Z
MV$J#P<%?5E2:V@4%NF$%RL[HKC,:1[*MDVI+T<B,%\4>XBVZ@?$\;ZB#Z!0$
ML,8V`/=6H)<%8,R&=W2P\KS!=XWC>\K8>\/R2F@5&:8$]2%O))"IBX-'+;)*
MY0"JLJS%KL@$L#..NY7I"D=#5N4"M\$>7/0[X-'&0/X1IH;"D]J%`"R`V)3`
MTB52$`V)E'4CE)'8/X6$2;4*`0L]MGEY!!B2`G`0D\9(DRYY-=)(L&A*:.2&
M#B.SL5.I(.8%;KV&*D,;B&('CFY7V$==,^]/4.,_-?$;S[7H86L<HJ7FM`MF
M-W_>9B(N):[;;CIAZ,YFJRWIG`_\J:0[(;8"*!1B9;KBA;,XM5<"#K0U<D3W
MC8:XTG$^*'^2=2[5<&TIL3/V$%R91S[8J;OR["UQ/)UNU&%1J"Q;KM$C]U@S
M+82R!]7!NQ^<,G(!P3&4,#IE+NXE1A3`8DZJDYF_C6/L?:C*VT+$G5JY5,<!
M)7'W>B-,"\@X2'U!7!`NKE5(=$%N4"DL]0\"C>#6>5LU18ZWP]-V2Z@YC\&X
M%@I)Q+Z'R0B[JZ2F"&V3^##`\AO'EMAS7;]TDK01/$T0^6V1A[^JK!(2JH^L
MH&A/[C!C+ZY<2KXW<.KZFI#ETE;-+NFW]&"F#*727(G(PSB76"SZX@;T*->.
MXV\_5W-ZT04KZW'NE)<4@P]G,7&\^?(Y4WITVI>*]7%J781VC:W0\=@[)!7<
M'_U9C3FCC\NNX_AQ>WA_P6$'>G30.6+.[=LQ9#`GSQ/B8X_:M?,W7VRW.(43
MZ\7\W'J/TEV4V&.CR_LU2HZ%_Y+GH<?F>/%$6/G%'-UOWS?:[5]Z>^"NPN-#
M+\[GRY!VE$94"DC(`SZ81%]$(5=KO)NEG]/P%*N7Q]DFP5U#(]@FX3&+R_`9
MH-5?`9V"R0,<[\+=/`/W]=^!NPEP-]&CMTE8%%@DH\<SNP3^30I=6].?:$K5
M9(M)L<3/UK\^)UH^B"5=]G2?TQ.T^V<G#O]4Z/SUR(2W*LV84;3VA>&?-:)I
M_!.`Y$"S<F01?;!'X.%-,P.2^TLT&O344T\]]=133SWUU%-//?744T\]]=13
33SWUU%-//?U3^AV\?O?$`"@```==
`
end