Tobyase's Blog

Recovering ancient FORTRAN FFT Code

Why?

I stumbled upon a Twitter Post by Rudiger Mohr stating that he uncovered a scan of some FORTRAN code used for FFT and that he needed help transcribing the listing attached to the document. He ended with

Warning: You might learn some Fortran

Don't tempt me with a good time I thought and got to work.

The document he found was a 1967 technical report titled "Three Fortran Programs that perform the Cooley-Tukey Fourier Transform" upon basically all of the FFT code we use today is based on. Specifically the FOUR1 subroutine. However there are three implementations in the document. And the comments documenting the FOUR1 program mention something along the lines of "btw this other implementation is faster".

PROGRAMS FOUR2 AND FOURT ARE AVAILABLE THAT RUN TWICE AS FAST

From the preamble of the FOUR1 subroutine

The FOUR2 subroutine from the same document which is nowhere else on the internet to be found. It being faster than the carbon copy of all FFT code definitely seems quite intriguing.

How hard can it be?

The task seems simple: Copy the code from the annex of the paper into a file, build it and it works. That's the theory.

But, the print of the listing is very bad in a few places and the scan doesn't make it better.

This part is easy:

Example of an easy transcript

IF(M-NP1)160,140,140
J=J+M

This is a bit harder but managable:

Example of some hard to read passages

CONTINUE
IPAR=3-IPAR

And this is basically impossible:

Unreadable part of the listing

! ?!

The Code

The programs are written in Fortran II and intended to be punched onto puchcards. Therefore some restrictions that seem very strage for the comtemporary programmer apply. The code is in fixed format, that means one line of code can b e at most 72 charachters in length and the first six columns have special meaning. If there is anything (here a C) in the first column the row is considered a comment. The columns 2-5 are reserved for labels. Those have to be numeric and unique per program but can be in any order. Being restricted to four characters means that there are at most 9999 labels in a program. But if you manage to program yourself into that corner, you have definetely different problems. Chances are that there is no computer (remember, you are in 1967) in the world with enough memory for your program anyway. Since some code lines may need more than the space available in one of the fixed rows you can continue on the next row. Anything in the 6th column means that this row is a continuation of the previous one.

There is no distinction between upper case and lower case, the program listings are in all-upper-case.

Space is precious in Fortran programs so there is no indentation or unneccessarhy whitespace whatsoever. In fact, whitespace has no meaning in a Fortran program and you can completely omit it. It would make for an absolutely unreadable code though.

The document is from 1967 so it employs some programming techniques that are considered ancient even in the Fortran 77 standard.

A simple if statement has a strage format and this way of writing if's can only be found in very old Fortran programs. The format is

IF (number1)number2,number3,number4

and is basically a three-way goto: If number1 is less than 0, goto the statement with the label number2. If it is equal to 0 goto number3 and if it is greater than 0 goto number4.

The full transcription for both the programs FOUR2 and FOURT can be found in the GitHub repository.

FOUR2.f

The FOUR2 subroutine is actually quite short (200 lines with comments, 135 code) with most of the code readable in the listing. Still, in the readable parts determining the difference between + and * requires some sort of divination.

The best thing to do in the really hard portions of the listing is guess based on context and patterns and try.

In the beginning of the PDF there is an "errata" clarifying some of the really hard to read parts. It helps a lot, but there are still some places left where our best strategy is to guess.

FOURT.f

The subroutine FOURT is a bit longer than FOUR2 (557 lines with comments, 394 code) and thus there are more places for transcription errors to occur.

I've located (through the art of googling) another Fortran 77 subprogram also called FOURT.f archived through some ESO FTP server as part of the STSDAS IRAF package. The package that the Fortran code comes from was assembled in 1993 or 1994. 25 years after the release of the 1967 document! The code in the ESO archive is definetely based on a copy of the code in the 1967 document, but is has some differences. Differences that also are interesting to study, but not in the scope of this post. It would be interesting to know where I. Brusco (he created the package where this file is used) got his copy of the FFT routine from.

The two versions of FOURT are similar enough to really aid in the transcription of the listing.

Chasing errors

The Compiler

Since reading of the listing is not as straight forward as anticipated, there inevitably are some errors in the initial transcription. Some tiimes there are guesses on what might be written based on the surrounding code or on patterns found elsewhere. And then there are typos made during the transcription. The way the listing is printed also lends itself to skipping lines when reading, which also doesn't help.

Finished the first transcription and the first barrier to pass is the compiler. It has some checks and it will yell at you if you for example reference a label that doesn't exist.

Crashing

Second hurdle: Running the code. The code given in the paper is a SUBROUTINE, so it has to be called. So, I write up a quick PROGRAM that calls the FFT code. But what to expect, how do we know that this is indeed doing a fast fourier transform and not some nonsensical mathematical manipulations? We could try to understand the FFT by heart and generate some test values. But that would require effort. Instead we do a forward transform followed by an inverse transform. One would expect (if the transform is injective, which the FFT should be) that after a forword followed by an inverse transform the original input data is reproduced. F^F^1=1^. Thankfully the preamble of the function mentions this scenario:

! NOTE THAT IF A FORWARD
! TRANSFORM IS FOLLOWED BY AN INVERSE TRANSFORM, THE ORIGINAL DATA
! WILL REAPPEAR MULTIPLIED BY NN(1)*NN(2)*...

Okay, all set for a test run, let's go!

./test-four2
Program received signal SIGSEGV: Segmentation fault

Well, not so fast then. Let's go hunting for memory invalid accesses. Probably some wrong transcriptions, calculating the index into the data array wrong or getting some loop parameters wrong. That's a good excuse to get some old gdb knowledge back. To get nice backtraces and information about where the heck we crashed exactly, we need to add the -g flag to the gfortran call.

Running correctly

After a lot of back and forth with wrong indices, pluses, minues and asterisks the code actually runs. It runs without crashing. Success! But is our goal to just run without crashing? No, we want to do the fast fourier transform! Do we actually?

Not at first, but there definetely were some correct numbers in the output of the first iteration of the code that actually did run.

I used a sample data set from the RosettaCode project and added some pretty printing to the console for checking to the test program.

There were a lot of correct numbers in the output, but they seemed to be shuffled a bit. The imaginary part appears one data point too early and has it's sign flipped. Strange. Nothing a good ol' hard read of the listing and some diff'ing with the code from the ESO archive can't fix.

And it got fixed!

The sample data from Rosetta now gets correctly transformed and after transforming the data back, everything is multiplied by 8 as expected. The 57-year-old code lives once again!

The output of the FOUR2 program is a lot worse with all the imaginary parts missing completely and the real parts also not correct.

What's still to do?

The FOUR2 program needs to be running correctly, as it now does something but certainly not the right thing.

More rigorous testing of the FOURT program would be nice. Multiple dimensions, large datasets, that kind of stuff. It would also be nice to cover the four different cases that the input data gets sorted into (see line 164).

Checking the claim that the code in FOUR2 and FOURT is twice as fast as the FOUR1 variant would be nice, as well as comparisons to current implementations.

Doing the very unfair comparison with Python transforming the 8-element complex vector from above ten million times takes about 8 times as long using numpy.

Code Time
FOURT.f 1.665 s
numpy 13.969 s

Created using hyperfine with two warmup runs

Conclusion

This was a very fun expedition into some very old code that is relevant up to this date. I learnt some Fortran for which I was looking for an excuse for quite some time now. All I need now is an 1401 to let this run on era-appropriate hardware.

If you find any errors and want to help, feel free to open a PR on Rudiger's GitHub or shoot me an E-Mail.