I should clarify upfront that I love PHP for its simplicity in developing web applications and this post is not meant to be a PHP bashing by any stretch of imagination. My only motivation is to plainly state certain facts that I came across while researching/experimenting about a design decision on how best to keep track of structured information within a PHP program. What I found was quite surprising, to say the least.

One of my function calls returned a collection of pairs of integers and I was wondering whether to store the pair as an array of two named values (as in array('value1' => $value1, 'value2' => $value2) ) or a PHP5 class (as in class ValuePair { var $value1; var $value2; } ). As the number of pairs could be quite large, I thought I'll optimize for memory. Based on experience with compiled languages such as C/C++ and Java, I expected the class based implementation to take less space. Based on a simple memory measurement program, as I'll explain later, this expectation turned out to be misplaced. Apparently PHP implements both arrays and objects as hash tables and in fact, objects require a little more memory than arrays with same members. In hindsight, this doesn't appear so surprising. Compiled languages can convert member accesses to fixed offsets but this is not possible for dynamic languages.

But what did surprise me was the amount of space being used for an array of two elements. Each array having two integers, when placed in another array representing the collection, was using around 300 bytes. The corresponding number for objects is around 350 bytes. I did some googling and found out that a single integer value stored within an PHP array uses 68 bytes: 16 bytes for value structure (zval), 36 bytes for hash bucket, and 2*8 = 16 bytes for memory allocation headers. No wonder an array with two named integer values takes up around 300 bytes.

I am not really complaining -- PHP is not designed for writing data intensive programs. After all, how much data are you going to display on a single web page. But it is still nice to know the actual memory usage of variables within your program. What if your PHP program is not generating an HTML page to be rendered in the browser but a PDF or Excel report to be saved on disk? Would you want your program to exceed memory limit on a slightly larger data set?

Coming back to the original problem -- how should I store a collection pair of values? array of arrays or array of objects? For memory optimization, the answer may be to have two arrays, one for each value.

For those who care for nitty-gritties, here is the program I used for measurements:

<?php class EmptyObject { }; class NonEmptyObject { var $int1; var $int2; function NonEmptyObject($a1, $a2){ $this->int1= $a1; $this->int2= $a2; } }; $num = 1000; $u1 = memory_get_usage(); $int_array = array(); for ($i = 0; $i < $num; $i++){ $int_array[$i] = $i; } $u2 = memory_get_usage(); $str_array = array(); for ($i = 0; $i < $num; $i++){ $str_array[$i] = "$i"; } $u3 = memory_get_usage(); $arr_array = array(); for ($i = 0; $i < $num; $i++){ $arr_array[$i] = array(); } $u4 = memory_get_usage(); $obj_array = array(); for ($i = 0; $i < $num; $i++){ $obj_array[$i] = new EmptyObject(); } $u5 = memory_get_usage(); $arr2_array = array(); for ($i = 0; $i < $num; $i++){ $arr2_array[$i] = array('int1' => $i, 'int2' => $i + $i); } $u6 = memory_get_usage(); $obj2_array = array(); for ($i = 0; $i < $num; $i++){ $obj2_array[$i] = new NonEmptyObject($i, $i + $i); } $u7 = memory_get_usage(); echo "Space Used by int_array: " . ($u2 - $u1) . "

"; echo "Space Used by str_array: " . ($u3 - $u2) . "

"; echo "Space Used by arr_array: " . ($u4 - $u3) . "

"; echo "Space Used by obj_array: " . ($u5 - $u4) . "

"; echo "Space Used by arr2_array: " . ($u6 - $u5) . "

"; echo "Space Used by obj2_array: " . ($u7 - $u6) . "

"; ?>

[pankaj@fc7-dev ~]$ php -v PHP 5.2.4 (cli) (built: Sep 18 2007 08:50:58) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies [pankaj@fc7-dev ~]$ php -C memtest.php Space Used by int_array: 72492 Space Used by str_array: 88264 Space Used by arr_array: 160292 Space Used by obj_array: 180316 Space Used by arr2_array: 304344 Space Used by obj2_array: 349144 [pankaj@fc7-dev ~]$

And here is a sample run: