by Mike Pall, published with permission.

[This is a follow-up to Thomas Tempelmann’s Story of FCopy for the C-64.]

Ok, I have to make a confession … more than 25 years late:

I’ve reverse-engineered Thomas Tempelmann’s code, added various improvements and spread them around. I guess I’m at least partially responsible for the slew of fast-loaders, fast-copys etc. that circulated in the German C64 scene and beyond. Uh, oh …

I’ve only published AFLG (auto-fast-loader-generator) under my real name in the German “RUN” magazine. It owes quite a bit to TT’s original ideas. I guess I have to apologize to Thomas for not giving proper credit. But back then in the 80’s, intellectual property matters wasn’t exactly something a kid like me was overly concerned with.

Later on, everyone was soldering parallel-transfer cables to the VIA #1 of the 1541 and plugging them into the C64 userport. This provided extra bandwidth compared to the standard serial cable. It allowed much faster loading of programs with a tiny parallel loader (a file named “!”, that was prepended on all disks). Note that the commercial kits with cables, custom EPROMs and silly dongles followed only much later.

So I wrote “15 second copy”, which worked with a plain parallel cable. Yes, it copied a full 35 track disk in 15 seconds! There was only one down-side: this was only the time for reading/writing from and to disk — you had to swap the floppies seven times (!) and that usually took quite a bit more extra time! ;-)

It worked by transferring the “live” GCR-encoded data from the 1541’s disk head to the C64 and simultaneously doing a fast checksum. Part of the checksumming was done on the 1541, part was done on the C64. There simply weren’t enough cycles left on either side! Most of the transfer happened asynchronously by adjusting for the slightly different CPU frequencies and with only a minimum number of handshakes. This meant meticulous cycle counting and use of some odd tricks.

The raw GCR took up more space (684*324 bytes) in the C64 RAM, so that’s why it required 4 passes. Other copy programs fully decoded the GCR and required only 3 passes. But GCR decoding was rather time-consuming, so they had to skip some sectors and read every track multiple times. OTOH my program was able to read/write at the full 300rpm, i.e. 5 tracks per second plus stepper time, which boils down to 2x ~7.5 seconds for read and write. Yep, you had to swap the floppies every 2 seconds …

Ok, so I spread the program. For free. I even made a 40 track version, which took 17 seconds. Only to see these coming back in various mutations, with the original credits ripped out, decorated with multiple intros, different groups pretending they wrote it or cracked it (it was free, there was nothing to crack!). The only thing they left alone were the copy routines, probably because they were extremely fragile and hard to understand. So it was really easy to recognize my own code. Some of the commercial parallel-cable + ROM kits even bragged with “Backups in 15 seconds!”. These were blatant rip-offs: they basically changed the screen colors and added a check for their dongles. Duh.

Let’s just say this rather frustrating experience taught me a lot and that’s why I’m doing open source today.

So I shelved my plans to write an enhanced version which would try to compress the memory to reduce the number of passes. Ah, yes … I wrote quite a few packers, too … but I’ll save that story for another time.

I still have the disks with the source code somewhere in my basement. But I’m not so sure I’ll be able to read them anymore. They weren’t of high quality to begin with … and I’d have to find my homegrown toolchain, too. ;-)

But I took the time to reverse-engineer my own code from one of the copies that are floating around on the net. For better understanding on the C64/1541 handshake issues, refer to this article. If you’re wondering about the weird bvc * loops: the 6502 CPU of the 1541 has an SO pin, which is triggered by a full shift register for the data from the disk head. This directly sets the overflow flag in the CPU and allows reading the contents from the shift register with very low latency.

Yes, there’s a lot more weird code in there. For the sake of brevity, here are only the inner loops of the I/O routines for the read, write and verify pass for the C64 and the 1541 side. Enjoy!

;--- 1541: Read --- ldy #$20 f_read: bvc * ; Wait for disk shift register to fill clv lda $1c01 ; Load data from disk sta $1801 ; Send byte to C64 via parallel cable inc $1800 ; Toggle serial pin eor $80 ; Compute checksum for 1st GCR byte in $80 sta $80 bvc * clv lda $1c01 ; Load data from disk sta $1801 ; Send byte to C64 via parallel cable dec $1800 ; Toggle serial pin eor $81 ; Compute checksum for 2nd GCR byte in $81 sta $81 ; ... ; Copy and checksum to $82 $83 $84 ; And another time for $80 $81 $82 $83 $84 with inverted toggles ; ... dey beq f_read_end jmp f_read f_read_end: ; Copy the remaining 4 bytes and checksum to $80 $81 $82 ; Lots of bit-shifting and xoring to indirectly verify ; the sector checksum from the 5 byte xor of the raw GCR data ;--- C64: Read --- ; Setup ($5d) and ($5f) to point to GCR buffer ldy #$00 c_read: bit $dd00 ; Wait for serial pin to toggle bpl *-3 lda $dd01 ; Read incoming data (from 1541) sta ($5d),y ; Store to buffer iny bit $dd00 ; Wait for serial pin to toggle bmi *-3 lda $dd01 ; Read incoming data (from 1541) sta ($5d),y ; Store to buffer iny bne c_read c_read2: bit $dd00 ; Wait for serial pin to toggle bpl *-3 lda $dd01 ; Read incoming data (from 1541) sta ($5d),y ; Store to buffer iny bit $dd00 ; Wait for serial pin to toggle bmi *-3 lda $dd01 ; Read incoming data (from 1541) sta ($5d),y ; Store to buffer iny cpy #$44 bne c_read2 ;--- C64: Write --- ; Setup ($5d) and ($5f) to point to GCR buffer ldy #$00 tya c_write: eor ($5d),y ; Load from buffer and compute checksum bit $dd00 ; Wait for serial pin to toggle bpl *-3 sta $dd01 ; Store xor'ed outgoing data (to 1541) iny eor ($5d),y ; Load from buffer and compute checksum bit $dd00 ; Wait for serial pin to toggle bmi *-3 sta $dd01 ; Store xor'ed outgoing data (to 1541) iny bne c_write c_write2: eor ($5f),y ; Load from buffer and compute checksum bit $dd00 ; Wait for serial pin to toggle bpl *-3 sta $dd01 ; Store xor'ed outgoing data (to 1541) iny eor ($5f),y ; Load from buffer and compute checksum bit $dd00 ; Wait for serial pin to toggle bmi *-3 sta $dd01 ; Store xor'ed outgoing data (to 1541) iny cpy #$44 bne c_write2 ldx $5b sta $0200,x ; Store checksum for verify pass inx stx $5b ;--- 1541: Write --- ldy #$a2 lda #$00 f_write: bvc * ; Wait for disk shift register to clear clv eor $1801 ; Xor with incoming data (from C64) sta $1c01 ; Write data to disk shift register dec $1800 ; Toggle serial pin lda $1801 ; Reload data to undo xor for next byte bvc * ; Wait for disk shift register to clear clv eor $1801 ; Xor with incoming data (from C64) sta $1c01 ; Write data to disk shift register inc $1800 ; Toggle serial pin lda $1801 ; Reload data to undo xor for next byte dey bne f_write ;--- 1541: Verify --- ; Get checksum computed by c_write on the C64 side ldy #$a2 f_verify: bvc * ; Wait for disk shift register to fill clv eor $1c01 ; Xor with data from disk bvc * ; Wait for disk shift register to fill clv eor $1c01 ; Xor with data from disk dey bne f_verify ; Verify is ok if checksum is zero