I am trying to stream parse an about 4GB XML file and write parts of it to a new XML file in PHP.

The structure of the ~4GB XML document is like this and I am trying to keep the <doc> elements and its <title></title> <url></url> and <abstract></abstract> children.

But when I run this script, all I get is a file with <doc /> one on each line. So basically it is copying the <doc> elements and making them self-closing, but not copying over its children.

<?php $interestingNodes = array('title','url','abstract'); $xmlObject = new XMLReader(); $xmlObject->open('file.xml'); $xmlOutput = new XMLWriter(); $xmlOutput->openURI('destfile.xml'); $xmlOutput->setIndent(true); $xmlOutput->setIndentString(" "); $xmlOutput->startDocument('1.0', 'UTF-8'); while($xmlObject->read()){ if($xmlObject->name == 'doc'){ $xmlOutput->startElement('doc'); $xmlObject->readInnerXML(); if(array_search($xmlObject->name, $interestingNodes)){ $xmlOutput->startElement($xmlObject->name); $xmlOutput->text($xmlObject->value); $xmlOutput->endElement(); //close the current node } $xmlOutput->endElement(); //close the doc node } } $xmlObject->close(); $xmlOutput->endDocument(); $xmlOutput->flush(); ?>

Here is what file.xml looks like:

<feed> <doc> <title>Title of first doc is here</title> <url>URL is here</url> <abstract>Abstract is here...</abstract> <links> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> </link> </doc> <doc> <title>Title of second doc is here</title> <url>URL is here</url> <abstract>Abstract is here...</abstract> <links> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> <sublink>Link is here</sublink> </link> </doc> </feed>

And this is what I want destfile.xml to look like:

<doc> <title>Title of first doc is here</title> <url>URL is here</url> <abstract>Abstract is here...</abstract> </doc> <doc> <title>Title of second doc is here</title> <url>URL is here</url> <abstract>Abstract is here...</abstract> </doc>

But when I run that script, above, all I get is: