Update!

I got a chance to look at the European release of ICO, and pretty much immediately noticed the new files SRCFILE.TXT and TRFILE.TXT. SRCFILE is the complete 'objdump -d' output of the game, with the debugging line numbers, and TRFILE is the complete linker log. Which includes these function names:

00136cc0:0003215616:0710:ffff:huft_build():fumi/ios/inflate.c:119 00137488:0003244212:0160:ffff:inflate_codes():fumi/ios/inflate.c:335 00137bd0:0003268411:00c0:ffff:inflate_stored():fumi/ios/inflate.c:439 00137ef0:0003278744:04d0:ffff:inflate_fixed():fumi/ios/inflate.c:485 00138150:0003288348:05e0:ffff:inflate_dynamic():fumi/ios/inflate.c:549 00138a68:0003319614:ffff:ffff:inflate_start():fumi/ios/inflate.c:706 00138ab8:0003321500:0030:ffff:close_inflate_handler():fumi/ios/inflate.c:750 00138b80:0003324593:00d0:ffff:inflate():fumi/ios/inflate.c:772 00139048:0003340442:0040:ffff:open_inflate_handler():fumi/ios/inflate.c:730 001390d8:0003343118:0060:ffff:fill_inbuf():fumi/ios/inflate.c:887 001391b8:0003346411:0020:ffff:huft_free():fumi/ios/inflate.c:309 00139568:0003361590:0040:ffff:new_mblock_node():fumi/ios/mblock.c:16 00139668:0003365214:ffff:ffff:reuse_mblock1():fumi/ios/mblock.c:95 00139690:0003365880:ffff:ffff:init_mblock():fumi/ios/mblock.c:12 001396a0:0003366175:0030:ffff:new_segment():fumi/ios/mblock.c:72 00139748:0003369314:0030:ffff:reuse_mblock():fumi/ios/mblock.c:105 001397a0:0003370659:0060:ffff:strdup_mblock():fumi/ios/mblock.c:123

This matches perfectly with all the stuff down here. I'm going to stop looking now, in case the Japanese release has Fumito Ueda's credit card numbers on it or something.

I haven't suceeded in contacting anyone about this; SCEI and ONICOS/Izumo don't read their email. Someone who speaks better Japanese than me should try writing them a letter.

Summary

ICO, a video game by Sony Computer Entertainment for the PlayStation 2, seems to be using parts of the GPL library libarc for compressed data handling. It doesn't credit the author or mention libarc or the GPL.

This isn't a big problem in terms of code — the two files from libarc used are under 1500 lines put together, and one is a heavily-edited copy of inflate.c from zlib, which is public domain. But, it's a GPL violation, and they need to fix it.

Evidence

To follow along with this, you'll need:

Reverse-engineering (interesting)

ios/inflate.c incomplete literal tree incomplete distance tree ios/mblock.c

ICO, helpfully, has all its debug logging still in the release binary. Here we can see the names of two files from libarc. Note the space before "incomplete" in both strings; this indicates a really old version of zlib. Even find-zlib, which claims to go back to zlib 0.1, doesn't have these. (It also doesn't find any data tables.)

From inflate.c:

/* Copyright (C) 2000 Masanao Izumo <mo@goice.co.jp> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ /* inflate.c -- Not copyrighted 1992 by Mark Adler version c10p1, 10 January 1993 */ /* You can do whatever you like with this source file, though I would prefer that if you modify it and redistribute it that you include comments to that effect with your name and the date. Thank you. [The history has been moved to the file ChangeLog.] */

/* build the decoding tables for literal/length and distance codes */ bl = lbits ; i = huft_build ( ll , nl , 257 , cplens , cplext , & tl , & bl , & decoder - > pool ) ; if ( bl = = 0 ) /* no literals or lengths */ i = 1 ; if ( i ) { if ( i = = 1 ) fprintf ( stderr , " incomplete literal tree

" ) ; reuse_mblock ( & decoder - > pool ) ; return - 1 ; /* incomplete code set */ } bd = dbits ; i = huft_build ( ll + nl , nd , 0 , cpdist , cpdext , & td , & bd , & decoder - > pool ) ; if ( bd = = 0 & & nl > 257 ) /* lengths but no distances */ { fprintf ( stderr , " incomplete distance tree

" ) ; reuse_mblock ( & decoder - > pool ) ; return - 1 ; } if ( i = = 1 ) { #ifdef PKZIP_BUG_WORKAROUND i = 0 ; #else fprintf ( stderr , " incomplete distance tree

" ) ; #endif } if ( i ) { reuse_mblock ( & decoder - > pool ) ; return - 1 ; }

Reverse-engineering (boring)

Now that we've seen that, it's time for MIPS assembly!

I'll be using ps2dis's output here.

The equivalent to fprintf() is located at 0x001A6E28 in the binary. It's been simplified - the first argument is missing, but I'll use the same name for clarity.

Searching for those error strings finds this:

jal $001a6e28 # 0013531c:0c069b8a v fprintf addiu a0, a0, $6b10 # 00135320:24846b10 a0=" incomplete literal tree

" beq zero, zero, $0013540c # 00135324:10000039 v __0013540c daddu a0, s0, zero # 00135328:0200202d __0013532c: # addiu v0, zero, $0006 # 0013532c:24020006 v0=$00000006 lui a3, $0028 # 00135330:3c070028 a3=$00280000 lui t0, $0028 # 00135334:3c080028 t0=$00280000 sll a0, v1, 2 # 00135338:00032080 lw a1, $0514(sp) # 0013533c:8fa50514 sw v0, $04fc(sp) # 00135340:afa204fc addu a0, sp, a0 # 00135344:03a42021 addiu a3, a3, $0b20 # 00135348:24e70b20 a3=$00280b20 addiu t0, t0, $0b60 # 0013534c:25080b60 t0=$00280b60 daddu a2, zero, zero # 00135350:0000302d addiu t1, sp, $04f8 # 00135354:27a904f8 addiu t2, sp, $04fc # 00135358:27aa04fc jal $001336c0 # 0013535c:0c04cdb0 ^ FNC_001336c0 daddu t3, s0, zero # 00135360:0200582d lw v1, $04fc(sp) # 00135364:8fa304fc bne v1, zero, $00135394 # 00135368:1460000a v __00135394 daddu s4, v0, zero # 0013536c:0040a02d s4=$00000006 lw v1, $0510(sp) # 00135370:8fa30510 sltiu v0, v1, $0102 # 00135374:2c620102 bne v0, zero, $00135398 # 00135378:14400007 v __00135398 addiu v0, zero, $0001 # 0013537c:24020001 v0=$00000001 lui a0, $0055 # 00135380:3c040055 a0=$00550000 jal $001a6e28 # 00135384:0c069b8a v fprintf addiu a0, a0, $6b30 # 00135388:24846b30 a0=" incomplete distance tree

" beq zero, zero, $0013540c # 0013538c:1000001f v __0013540c daddu a0, s0, zero # 00135390:0200202d __00135394: # addiu v0, zero, $0001 # 00135394:24020001 v0=$00000001 __00135398: # bne s4, v0, $001353a8 # 00135398:16820003 v __001353a8 lui a0, $0055 # 0013539c:3c040055 a0=$00550000 jal $001a6e28 # 001353a0:0c069b8a v fprintf addiu a0, a0, $6b30 # 001353a4:24846b30 a0=" incomplete distance tree

"

Now, let's do a Google Code Search for the errors. Almost all of these are the same — they're either commented out or there's only one call to each. The only different one is TiMidity++, which turns out to use libarc!

After the error message, all three paths jump to here:

__0013540c: # jal $00136140 # 0013540c:0c04d850 v FNC_00136140 nop # 00135410:00000000 beq zero, zero, $00135434 # 00135414:10000007 v __00135434 addiu v0, zero, $ffff # 00135418:2402ffff v0=$ffffffff

which goes to:

FNC_00136140: # addiu sp, sp, $ffd0 # 00136140:27bdffd0 sd s1, $0010(sp) # 00136144:ffb10010 sd ra, $0020(sp) # 00136148:ffbf0020 daddu s1, a0, zero # 0013614c:0080882d sd s0, $0000(sp) # 00136150:ffb00000 lw s0, $0000(s1) # 00136154:8e300000 beq s0, zero, $00136184 # 00136158:1200000a v __00136184 ld ra, $0020(sp) # 0013615c:dfbf0020 daddu a0, s0, zero # 00136160:0200202d nop # 00136164:00000000 __00136168: # jal $00136060 # 00136168:0c04d818 ^ FNC_00136060 lw s0, $000c(s0) # 0013616c:8e10000c bne s0, zero, $00136168 # 00136170:1600fffd ^ __00136168 daddu a0, s0, zero # 00136174:0200202d [...]

FNC_00136060: # lw v0, $0004(a0) # 00136060:8c820004 sltiu v0, v0, $2001 # 00136064:2c422001 bne v0, zero, $00136078 # 00136068:14400003 v __00136078 lw v0, $9758(gp) # 0013606c:8f829758 v0=$00632048 j $00139598 # 00136070:0804e566 v FNC_00139598 lw a0, $0000(a0) # 00136074:8c840000 __00136078: # sw a0, $9758(gp) # 00136078:af849758 [00632048] jr ra # 0013607c:03e00008 sw v0, $000c(a0) # 00136080:ac82000c nop # 00136084:00000000 __00136088: # sw zero, $0004(a0) # 00136088:ac800004 jr ra # 0013608c:03e00008 sw zero, $0000(a0) # 00136090:ac800000 nop # 00136094:00000000

FNC_00139598: # addiu sp, sp, $fb70 # 00139598:27bdfb70 sd s5, $0450(sp) # 0013959c:ffb50450 daddu s5, a0, zero # 001395a0:0080a82d sd ra, $0480(sp) # 001395a4:ffbf0480 lui a0, $0055 # 001395a8:3c040055 a0=$00550000 sd s7, $0470(sp) # 001395ac:ffb70470 sd s6, $0460(sp) # 001395b0:ffb60460 addiu a0, a0, $72d8 # 001395b4:248472d8 a0="mem:free " sd s4, $0440(sp) # 001395b8:ffb40440 sd s3, $0430(sp) # 001395bc:ffb30430 sd s2, $0420(sp) # 001395c0:ffb20420 sd s1, $0410(sp) # 001395c4:ffb10410 jal $001a6e28 # 001395c8:0c069b8a v fprintf sd s0, $0400(sp) # 001395cc:ffb00400 bne s5, zero, $00139618 # 001395d0:16a00011 v __00139618 addiu s1, s5, $fff0 # 001395d4:26b1fff0 lui a0, $0055 # 001395d8:3c040055 a0=$00550000 jal $001a6e28 # 001395dc:0c069b8a v fprintf addiu a0, a0, $72e8 # 001395e0:248472e8 a0="null memory pointer

" break (00000) # 001395e4:0000000d lui s0, $0055 # 001395e8:3c100055 s0=$00550000 lui a2, $0055 # 001395ec:3c060055 a2=$00550000 addiu s0, s0, $70e0 # 001395f0:261070e0 s0="ios/memory.c" addiu a2, a2, $7300 # 001395f4:24c67300 a2="IOSFREE(): NULL MEMORY POINTER

" daddu a0, s0, zero # 001395f8:0200202d a0="ios/memory.c" jal $001ad748 # 001395fc:0c06b5d2 v FNC_001ad748 addiu a1, zero, $0334 # 00139600:24050334 a1=$00000334 lui a2, $0063 # 00139604:3c060063 a2=$00630000 daddu a0, s0, zero # 00139608:0200202d a0="ios/memory.c" addiu a2, a2, $20b8 # 0013960c:24c620b8 a2=$006320b8 beq zero, zero, $001399b8 # 00139610:100000e9 v __001399b8 addiu a1, zero, $0334 # 00139614:24050334 a1=$00000334 [...]

That last function sure looks like free() to me.

From mblock.c:

static void reuse_mblock1 ( MBlockNode * p ) { if ( p - > block_size > MIN_MBLOCK_SIZE ) free ( p ) ; else /* p->block_size <= MIN_MBLOCK_SIZE */ { p - > next = free_mblock_list ; free_mblock_list = p ; } } void reuse_mblock ( MBlockList * mblock ) { MBlockNode * p ; if ( ( p = mblock - > first ) = = NULL ) return ; /* There is nothing to collect memory */ while ( p ) { MBlockNode * tmp ; tmp = p ; p = p - > next ; reuse_mblock1 ( tmp ) ; } init_mblock ( mblock ) ; }

I could go further, but pointing out more of the same control flow in a bunch of assembly text isn't really needed.

Instead, I wrote a tool to decompress ICO's data archive, using libarc. libarc's compressor (in deflate.c) uses the same DEFLATE algorithm as gzip, but doesn't store a gzip or zip header. Nevertheless, it decompresses all the files perfectly* without any messing with the compressed stream needed. Get it in the links below.

("advertise.pss" is an MPEG-2 video and will play in VLC, although it won't have sound.)

* It doesn't have a checksum, so it might not actually be perfect, but it doesn't error at least!

Links

MIPS64 Instruction Reference

The extractor. Run it with a copy of ICO's DFDATAS/DATA.DF as the first argument. Doesn't do sub-archives yet.

Etc.

Shadow of the Colossus, the "sequel" to ICO, doesn't seem to use any other code. I haven't disassembled it, but it's even more helpful: function names aren't stripped at all!

All of them look safe to me, aside from being as unorganized as any game code.

I tried contacting Masanao Izumo, the author of libarc, but one of his emails (mo@goice.co.jp) stopped working and I haven't received a response on the other (iz@onicos.co.jp). Maybe he can be reached through ONICOS?

Thanks !WAHa.06x36 for helping me with the format of DATA.DF.

(Why are the default colors for code2html so ugly? Why does tidy destroy text with CSS white-space: pre?)

http://astrange.ithinksw.net/