1.2
A)

Graph|                                      10
n    |                                 5.26
e s  |                             3.57
t p  |                         2.70
  e  |                     2.17
  e  |                 1.82
  d  |             1.56
  u  |         1.37
  p  |     1.22
     |  1.1
   1 |0
   0  ------------------------------------------
      0  10  20  30  40  50  60  70  80  90 100
          percentage of vectorization

B) 55.5555...%
C) 11.1111...%
D) 88.8888...%
E) 1.0/((1-.7)+(.7/10))=2.7027027027    (Current speedup from 70% vect at x10)
   1.0/((1-.7)+(.7/20))=2.98507462687   (Speedup from 70% vect at x20)
   1.0/((1-.74)+(.74/10))=2.99401197605 (Speedup from 74% vect at x10)

   Going to 74% vectorization beats the hardware improvement.  Try the
   compiler approach first.  (Then again, that's converting almost 1/5 of the
   currently non-vectorized code to vectorized, which isn't necessarily
   trivial.  But probably still cheaper than tweaking the hardware.)

1.3

run(enhanced)=1.0
run(unenhanced)=0.5+(10*0.5)=5.5
fraction(enhanced)=1-(0.5/5.5)=10/11
speedup=1.0/((1.0/11)+((10.0/11)/10))

A) Speedup=5.5
B) 10/11

1.6

Potentially different instruction sets.  CISC vs RISC, FPU vs emulation, etc.
The average work done per instruction, and hence instruction count to perform a
task, could vary wildly.

1.7

Um, don't I need a clock speed to do this?

Assuming 100 mhz for both processors:

A) 1.08*100M/10 = 10.8 million instructions on the RISC.
   13.6*100M/6 = 226.7 million instructions on the embedded.

B) 10 MIPS and 16.67 MIPS respectively.

C) (226.7-10.8)*1000000/195578=1104 instructions.


1.11

There's a reason I'm not a math major, but if you really want me to try to
gratuitously whip up a proof on an abstract property of an equation...

(a+b)/2 > sqrt(a*b)    (square both sides)
(a+b)(a+b)/4 > a*b     (times 4, and simplify)
a^2 + 2ab + b^2 > 4ab  (-2ab)
a^2 + b^2 > 2ab        (divide by ab)
a/b + b/a > 2

The ratio that's bigger than one has infinity to go to, but the ratio that's
smaller than one only has the distance between 1 and 0, but they traverse
their respective domains in a asymptotic variation of proportionately.  So the
bigger than one ratio is bigger than one by a larger amount than the smaller
than one ratio is smaller than one, so adding them together is bigger than 2.

The geometric and arithmetic means are equal when all the numbers being
averaged are the same.  2, 2, and 2, for example.

1.17

Using M for "Million"...

A) 120M=(I+(F*Y))/W
    80M=(I+F)/B
B) 120M=(I+(8M*50))/4
   480M-400M=I
   80M=I
C) 80M=(80M+8M)/B
   B=88M/80M
   B=1.1
D) 80M/120M=2/3     (time spent doing integer instructions)
   1.1-(2/3)=13/30  (time spent doing floating point instructions)
   8M/(13/30)       (Float instructions done in that time, normalize to 1 sec)
   18.46 MFLOPS.
E) Yes.  It does the work in 1.1 seconds instead of 4.

1.19

#include <stdio.h>

int main(int argc, char *argv[])
{
  int i;

  for(i=0;i<100000000;i++);
}


[landley@localhost arch]$ time ./a.out

real    0m1.246s
user    0m1.139s
sys     0m0.004s

100M/1.246=80.26M

[landley@grelber landley]$ time ./a.out

real    0m0.443s
user    0m0.440s
sys     0m0.000s

100M/0.443=225.73M

According to the SPEC Install guide for Unix, I need to get an install CD
from somewhere.  (It's not downloadable, that I can find...)

But I'd guess that any real-world test wouldn't stay in L1 cache so nicely,
and wouldn't give the branch predictor such an easy time either.

1.20

#include <stdio.h>

int main(int argc, char *argv[])
{
  int i;
  float f=0.0;

  for(i=0;i<100000000;i++) f+=1.0;
}

[landley@localhost arch]$ time ./a.out

real    0m1.376s
user    0m1.259s
sys     0m0.006s

Should I normalize out the integer padding?  Page 72 doesn't say...

Either:
100MFLOP/1.376 seconds = 72.67 MFLOPS
or:
100MFLOP/(1.376-1.246) = 769.23 MFLOPS

[landley@grelber landley]$ time ./a.out

real    0m1.002s
user    0m0.980s
sys     0m0.000s

100MFLOP/1.002 seconds = 99.80 MFLOPS
or
100MFLOP/(1.002-0.443) = 178.89 MFLOPS

And again, my laptop doesn't have this benchmark, and the install CD is not
downloadable.  I need to go get a login for a univeristy unix account...

Problem 9 has a similar problem: it compiles fine on my machine up until
it tries to run "condor_compile", which is not part of Red Hat 9...