In my previous article we've discovered the basics of PHP Streams and how powerful they were. In this tutorial we are going to use this power in the real world. First I'll show you how to build your custom filters and attach them to a stream, then we'll package our filters inside a document parser application.

You are encouraged to read the previous article in case you haven't yet, as understanding the introduction will be essential for following along with this part.

The full source code to this article is available on Github.

Using Filters

As previously stated, filters are pieces of code that can be attached to a stream to perform operations on the data while reading or writing. PHP has a nice set of built-in filters such as string.toupper , string.tolower or string.strip_tags . Some PHP extensions also provide their own filters. For example, the mcrypt extension installs the mcrypt.* and mdecrypt.* filters. We can use the function stream_get_filters() to fetch the list of filters available on your machine.

Once that we know which filters we can count on, we can append any number of filters to a stream resource using stream_filter_append() :

$h = fopen('lorem.txt', 'r'); stream_filter_append($h, 'convert.base64-encode'); fpassthru($h); fclose($h);

or open a stream using the php://filter meta wrapper:

$filter = 'convert.base64-encode'; $file = 'lorem.txt'; $h = fopen('php://filter/read=' . $filter . '/resource=' . $file,'r'); fpassthru($h); fclose($h);

In the above samples the function fpassthru() will output the same encoded version of the sample file. Simple, isn't it? Let's see what we can do with the php_user_filter class.

Filtering data on read-time: the Markdown filter

Our first custom filter will be appended to a reading stream in order to convert the markdown-formatted data from the source into HTML markup. PHP provides the base class php_user_filter that we're extending. This base class has two properties: filtername and params . filtername contains the label used to register our filter with stream_filter_register() , while params can be used by stream_filter_append() to pass data to filters.

The main worker method that we must override is filter() . This method is called by the parent stream and receives four parameters:

$in : a pointer to a group of buckets objects containing the data to be filtered.

: a pointer to a group of buckets objects containing the data to be filtered. $out : a pointer to another group of buckets for storing the converted data.

: a pointer to another group of buckets for storing the converted data. $consumed : a counter passed by reference that must be incremented by the length of converted data.

: a counter passed by reference that must be incremented by the length of converted data. $closing : a boolean flag that is set to TRUE if we are in the last cycle and the stream is about to close

The other two optional methods, onCreate() and onClose() , are called respectively when our class is created and destroyed. They are useful if our filter needs to instantiate resources such as other streams or data buffers that must be released at the end of the conversion.

Our filter uses these methods to deal with a temporary data stream managed by a private property called $bufferHandle . The onCreate() method will fail returning false if the buffer stream is not available, while the onClose() method closes the resource. Our MarkdownFilter uses Michel Fortin's parser.

<?php namespace MarkdownFilter; use \Michelf\MarkdownExtra as MarkdownExtra; class MarkdownFilter extends \php_user_filter { private $bufferHandle = ''; public function filter($in, $out, &$consumed, $closing) { $data = ''; while ($bucket = stream_bucket_make_writeable($in)) { $data .= $bucket->data; $consumed += $bucket->datalen; } $buck = stream_bucket_new($this->bufferHandle, ''); if (false === $buck) { return PSFS_ERR_FATAL; } $parser = new MarkdownExtra; $html = $parser->transform($data); $buck->data = $html; stream_bucket_append($out, $buck); return PSFS_PASS_ON; } public function onCreate() { $this->bufferHandle = @fopen('php://temp', 'w+'); if (false !== $this->bufferHandle) { return true; } return false; } public function onClose() { @fclose($this->bufferHandle); } }

In the main filter() method I'm collecting all the content into a $data variable that will be converted later. The first loop cycles through the input stream, using stream_bucket_make_writeable() , to retrieve the current bucket of data. The content of each bucket ( $bucket->data ) is appended to our container and the $consumed parameter is incremented by the length of the retrieved data ( $bucket->datalen ).

When all of the data is collected, we need to create a new empty bucket that will be used to pass the converted content to the output stream. We use stream_bucket_new() to do this and if the operation fails we return the constant PSFS_ERR_FATAL that will trigger a filter error. Since we need a resource pointer to create a bucket, we use the $bufferHandle property, which has been initialized earlier using the php://temp built-in stream wrapper.

Now that we have the data and the output bucket, we can instantiate a Markdown parser, convert all the data and store it in the bucket's data property. Finally the result is appended to the $out resource pointer with stream_bucket_append() and the function returns the constant PSFS_PASS_ON to communicate that the data was processed successfully.

We can now use the filter in this way:

// Require the MarkdownFilter or autoload // Register the filter stream_filter_register("markdown", "\MarkdownFilter\MarkdownFilter") or die("Failed to register filter Markdown"); // Apply the filter $content = file_get_contents( 'php://filter/read=markdown/resource=file:///path/to/somefile.md' ); // Check for success... if (false === $content) { echo "Unable to read from source

"; exit(1); } // ...and enjoy the results echo $content, "

";

Please note that the use directive has no effect and the fully qualified class name must be provided when registering a custom filter.

Filtering data on write-time: the Template filter

Once we have our content converted from Markdown to HTML, we need to pack it inside a page template. This can be anything from a basic HTML structure to a complex page layout with CSS styles. So, in the same way as we did a read-and-convert action with the input filter, we're going to write a convert-and-save action, embedding the template engine of our choice into the output stream. I chose the RainTPL parser for this tutorial, but you are free to adapt the code to the one you prefer.

The structure of the template filter is similar to our input filter. First we'll register the filter in this way:

stream_filter_register("template.*", "\TemplateFilter\TemplateFilter") or die("Failed to register filter Template");

We use the format filtername.* as filter label, so that we can use that * to pass some data to our class. This is necessary because, as far as I know, there is no way to pass parameters to a filter applied using a php://filter wrapper. If you know of a way, please post it in the comments below.

The filter is then applied in this way:

$result = file_put_contents( 'php://filter/write=template.' . base64_encode('Some Document Title') . '/resource=file:///path/to/destination.html', $content );

A title for the document is passed using the second part of the filter name and will be processed by the onCreate() method. Going further, we can use this trick to pass an array of serialized data with custom configuration settings for the template engine.

The TemplateFilter class:

<?php namespace TemplateFilter; use \Rain\Tpl as View; class TemplateFilter extends \php_user_filter { private $bufferHandle = ''; private $docTitle = 'Untitled'; public function filter($in, $out, &$consumed, $closing) { $data = ''; while ($bucket = stream_bucket_make_writeable($in)) { $data .= $bucket->data; $consumed += $bucket->datalen; } $buck = stream_bucket_new($this->bufferHandle, ''); if (false === $buck) { return PSFS_ERR_FATAL; } $config = array( "tpl_dir" => dirname(__FILE__) . "/templates/", "cache_dir" => sys_get_temp_dir() . "/", "auto_escape" => false ); View::configure($config); $view = new View(); if (!$closing) { $matches = array(); if (preg_match('/<h1>(.*)<\/h1>/i', $data, $matches)) { if (!empty($matches[1])) { $this->docTitle = $matches[1]; } } $view->assign('title', $this->docTitle); $view->assign('body', $data); $content = $view->draw('default', true); $buck->data = $content; } stream_bucket_append($out, $buck); return PSFS_PASS_ON; } public function onCreate() { $this->bufferHandle = @fopen('php://temp', 'w+'); if (false !== $this->bufferHandle) { $info = explode('.', $this->filtername); if (is_array($info) && !empty($info[1])) { $this->docTitle = base64_decode($info[1]); } return true; } return false; } public function onClose() { @fclose($this->bufferHandle); } }

We still have the $bufferHandle parameter pointing to the temporary stream, and we also have a parameter called $docTitle that will contain (by priority):

the content of the first H1 tag (if exists) of the parsed document, or the decoded content of the second part of the filter name, or the default fallback value 'Untitled'.

Inside the onCreate() method, after the buffer stream is initialized, we're dealing with option number two:

$info = explode('.', $this->filtername); if (is_array($info) && !empty($info[1])) { $this->docTitle = base64_decode($info[1]); }

The main filter() method can be divided in five steps here. The first two steps are identical to the Markdown filter: all the data is fetched from the input buckets and stored inside the variable $data , then an empty output bucket is created to store the processed content.

In the third step the template parser class is loaded and configured. I'm asking the system for a temporary directory to use for caching, disabling HTML tags escape feature and setting the templates directory.

The default template used here is very simple, with the variables defined as {$VarName} :

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>{$title}</title> </head> <body> {$body} </body> </html>

The fourth step is where the actual parsing takes place. First I'm searching for a document title inside a H1 tag. Then I set the body and title variables defined in the template and finally process the document. The first parameter of the draw() method is the template name, the second tells to return the string instead of printing it.

The last step is placing the parsed content into the output bucket and appending it to the output resource, returning PSFS_PASS_ON .

Putting it all together: a document parser

Now that we have the basic blocks in place it's time to build our document parser utility. The utility app lives in its own directory mddoc . Our custom filters live under the lib directory using a PSR-0 directory and namespace structure. I've used Composer to keep track of dependencies

"require": { "php": ">=5.3.0", "michelf/php-markdown": "*", "rain/raintpl": "3.*" },

and autoloading:

"autoload": { "psr-0": { "MarkdownFilter": "lib", "TemplateFilter": "lib" } }

The main application file is mddoc that can be executed like this:

$ /path/to/mddoc -i /path/to/sourcedir -o /path/to/destdir

The app file looks like:

#!/usr/bin/env php <?php /** * Markdown Tree Converter * * Recursive converts all markdown files from a source directory to HTML * and places them in the destination directory recreating the structure * of the source and applying a template parser. */ // Composer autoloader require_once dirname(__FILE__) . '/vendor/autoload.php'; // Deals with command-line input arguments function usage() { printf( "Usage: %s -i %s -o %s

", basename(__FILE__), '/path/to/sourcedir', '/path/to/destdir' ); } if (5 > $argc) { usage(); exit; } $in = array_search('-i', $argv); $src = realpath($argv[$in+1]); if (!is_dir($src) || !is_readable($src)) { echo "[ERROR] Invalild source directory.

"; usage(); exit(1); } $out = array_search('-o', $argv); $dest = realpath($argv[$out+1]); if (!is_dir($dest) || !is_writeable($dest)) { echo "[ERROR] Invalild destination directory.

"; usage(); exit(1); } // Register custom read-time MarkdownFilter stream_filter_register("markdown", "\MarkdownFilter\MarkdownFilter") or die("Failed to register filter Markdown"); // Register custom write-time TemplateFilter stream_filter_register("template.*", "\TemplateFilter\TemplateFilter") or die("Failed to register filter Template"); // Load directory iterator for source $it = new RecursiveIteratorIterator( new RecursiveDirectoryIterator($src), RecursiveIteratorIterator::SELF_FIRST ); // For every valid item while ($it->valid()) { // Exclude dot items (., ..) if (!$it->isDot()) { // If current item is a directory, the same empty directory // is created on destination if ($it->isDir()) { $path = $dest . '/' . $it->getFileName(); if ((!@is_dir($path)) && !@mkdir($path, 0777, true)) { echo "Unable to create folder {$path}

"; exit(1); } } // If current item is a markdown (*.md) file it's processed and // saved at the coresponding destination path if ($it->isFile() && 'md' == $it->getExtension()) { $path = $it->key(); if (!is_readable($path)) { echo "Unable to read file {$path}

"; exit(2); } $content = file_get_contents( 'php://filter/read=markdown/resource=file://' . $path ); if (false === $content) { echo "Unable to read from source '" . $path . "'

"; exit(3); } $pathinfo = pathinfo($dest . '/' . $it->getSubPathName()); $target = $pathinfo['dirname'] . '/' . $pathinfo['filename'] . '.html'; $result = file_put_contents( 'php://filter/write=template.' . base64_encode(basename($path)) . '/resource=file://' . $target, $content ); if (false === $result) { echo "Unable to write file '" . $target . "'

"; exit(4); } } } $it->next(); } exit(0);

First we include our autoloader, then we go ahead with argument validation:

the command line must match the above example,

the source directory must exist end be readable,

the destination directory must exist and be writeable.

Then we register the custom filters (with full class path, remember) and instantiate a RecursiveIteratorIterator object to walk the source directory recursively. The main loop cycles through all valid elements fetched by the iterator. All the elements, excluding the dotfiles, are processed as follows:

if the current element is a directory, try to re-create the relative path with the same name starting from the destination path.

if the current element is a markdown file (.md) the content of the file is read into a variable using the markdown read filter, then a new file with the .html extension is written at the same relative path starting from the destination directory with the `template.` filter applied.

The result is your documentation directory structure fully converted into HTML with one command. Not bad.

Summary

We covered a lot of useful ground here and we also have a fully functional utility to… ehm, "append" to our tool chain. I'll leave it up to you to take it further and create other tools and component like this to enhance our projects. Happy coding!