Andrew Forber
I don't think I was among the first to experience the 'divide overflow'
problem. I'd heard rumours about it and was concerned, but didn't have
personal experience about it until 11:30 one night. (I think it was October
12th 1997.) AH, the perils of working from home. One of my company's service
people was at a customer site in Montreal, replacing a crashed web server
with a new machine. He called to tell me that he couldn't run our
Foxpro-based CGI application on it.
To understand some of what follows you need to know how the problem occurs.
In the run-time library Foxpro 2.x uses there's a small function written in
assembler that tries to estimate the CPU speed of the machine, probably for
the sake of "timing loops". It uses the BIOS's 18hz interrupt clock (yup,
that's right, 18.2 interrupts per second, 55MS per tick, handled by the ROM
BIOS on the original IBM PC) to figure out how many loops it can execute per
millisecond. It does this using the following algorithm:
- Load a big number (0=0xFFFFFFFF+1) into a 32-bit register;
- Wait for the BIOS clock to change, so it knows a tick has started;
- Loop like mad, decrementing the counter, until the timer changes again;
- Complement the counter to see how many loops it executed;
- Divide by 55 to get a number of loops per millisecond.
The problem was that with a new fast CPU, the divide by 55 resulted in a
number bigger than
64K loops per millisecond, which meant the divide operation overflowed.
Poof, no program!
After hanging up with the technician, I started researching it on the web.
Thanks to the active developer community there were lots of reports of the
problem already, and a pointer to a fix called SlowStart. (Sorry, I don't
recall who did SlowStart, I'm sure you can look it up.) SlowStart worked by
stealing CPU cycles so that the application (sometimes) didn't count enough
loops to cause the problem. Unfortunately our web server was a dual
processor machine, so all SlowStart would have done was tie up one CPU while
the other one went madly on, counting down to destruction.
Someone - and I'm sorry I can't remember who - posted the 'more detail'
information provided by Windows NT on a discussion forum, giving me the
bytes in program memory where the problem occurred as well as the program
counter.
Now, I'm an old assembler guy. I programmed my Sol-20, a 1976-vintage kit
computer, in machine code because it wasn't possible to run an assembler in
the standard 2K of RAM. I've also done more than my share of 8086 assembler,
so the information that anonymous poster provided was very useful. I ran a
little utility that's still part of Windows that no one seems to use any
more, debug.exe. I was able to open the .ESL file, find the same bytes the
poster had listed, and by trial, error, and with the aid of my
half-forgotten experience, disassembled the entire timer function. Sure
enough, the error occurred at a divide instruction. What next?
More fiddling. I figured out that if I modified the loop to count up instead
of down, and so didn't have to complement the result, it saved a few bytes.
It happened that those few bytes were enough to add the instructions to trap
the overflow condition before the divide, and paste a 'clamped' value into
the result register. It was 3 A.M. and I still had no way to test this
(other than that my own programs still ran correctly on my slow machine), so
I hand-patched the .ESL file and emailed it to the tech in Montreal. Then I
went to sleep.
I called him about 10 A.M., and he told me the problem was solved. I wrote a
Foxpro program to perform the patch, with documentation, and posted it
anonymously that afternoon, inviting folks to give it a try. Of course, it
required that a slow machine be used to patch a copy of the ESL or
Foxprow.exe before transferring the patched file to another machine. After a
few folks reported success, I sent an email to Microsoft, but didn't really
know who to reach, so it went unanswered.
I was very aware that what I'd done was a violation of the EULA: people
applying the patch today should be aware of the same thing. On the other
hand, a multi-million dollar business relied on getting a solution that
would give us time to port all of our applications to VFP. I plead the
defense of necessity.
I started seeing reports from other folks with the same problem, some quite
desperate, and helping folks run the FP-based patch started taking lots of
time from my work, so I wrote a program in C++ to perform the patch on any
Foxpro version and packaged it and posted it, with a readme file, this time
not anonymously. A few months later Microsoft released their patch using the
same technique I developed. They asked me to remove my version from
circulation, which I did, so that everyone could be legal again. It was a
relief. Some time after Microsoft released their patch I bought a new
computer, and was actually able to reproduce the problem for myself.
In the process of this I've been impressed and surprised by the response of
the Foxpro community. Many people posted the patch on their web sites as a
public service, before Microsoft published their replacement. I've received
hundreds of emails of thanks from all over the world; from the all branches
of the U.S. armed forces, the U.S. House of Representatives, from
programmers and managers and military personnel on every continent. (I also
get occasional emails from people asking for free technical support for
commercial products and even military systems that I've never even heard
of.) I've dozens offers of drinks "if you're ever in my part of the world",
that I won't be able to collect now because I've lost most of the emails in
a disk and tape crash. It's a measure of the strength of our little
community that the response to my few hours' work was so large and so
generous. Remember, folks, that initially, it was my own behind that I was
trying to save. Many, many people put in much more effort to help fellow
Foxers and get less thanks than I did. My hat is off to them.
See also: People That Helped FoxPro to Become a Legend: Andrew Forber