In this article, I'm going to explain you how you can extend Swift, the OpenStack Object Storage project, so it performs extra action on files at upload or at download time.

We're going to build an anti-virus filter inside Swift. The goal is to refuse uploaded data if they contain a virus. To help us with virus analyses, we'll use ClamAV.

WSGI, paste and middleware

To do our content analysis, the best place to hook in the Swift architecture is at the beginning of every request, on swift-proxy, before the file is actually stored on the cluster. Swift proxy uses, like many other OpenStack projects, paste to build his HTTP architecture.

Paste uses WSGI and provides an architecture based on a pipeline. The pipeline is composed of a succession of middleware, ending with one application. Each middleware has the chance to look at the request or at the response, can modify it, and then pass it to the following middleware. The latest component of the pipeline is the real application, and in this case, the Swift proxy server.

If you've already deployed Swift, you encountered a default pipeline in the swift-proxy.conf configuration file:

[pipeline:main] pipeline = catch_errors healthcheck cache ratelimit tempauth proxy-logging proxy-server

This is a really basic pipeline with a few middleware. The first one catches error, the second one is in charge to return 200 OK response if you send a GET /healthcheck request on your proxy server. The third one is in charge of caching, the fourth one is used for rate limiting, the fifth for authentication, the sixth one for logging, and the final one is the actual proxy server, in charge of proxying the request to the account, container, or object servers (the others components of Swift). Of course, we could remove or add any of the middleware here at our convenience.

Be aware that the order matters: for example, if you put healthcheck after tempauth, you won't be able to access the /healthcheck URL without being authenticated!

ClamAV

If you don't know ClamAV, it's an antivirus engine designed for detecting trojans, viruses, malware and other malicious threats. Wwe're going to use it to scan every incoming file. To build the middleware, we'll use the Python binding pyclamd. The API is quite simple, see:

>>> import pyclamd >>> pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl') >>> print pyclamd.scan_stream(pyclamd.EICAR) {'stream': 'Eicar-Test-Signature(44d88612fea8a8f36de82e1278abb02f:68)'} >>> print pyclamd.scan_stream("safe!") None

Anatomy of a WSGI middleware

Your WSGI middleware should consist of a callable object. Usually this is done with a class implementing the __call__ method. Here's a basic boilerplate:

class SwiftClamavMiddleware(object): """Middleware doing virus scan for Swift.""" def __init__(self, app, conf): # app is the final application self.app = app def __call__(self, env, start_response): return self.app(env, start_response) def filter_factory(global_conf, **local_conf): conf = global_conf.copy() conf.update(local_conf) def clamav_filter(app): return SwiftClamavMiddleware(app, conf) return clamav_filter

I'm not going to expand more on why this is built this way, but if you want to have more info on this kind of filter middleware, you can read their documentation on Paste.

This middleware will just do nothing as it is. It's going to simply pass all requests it receives to the final application, and returns the result.

Testing our basic middleware

Now is a really good time to add unit tests. I hope you didn't think we were going to write code without some tests, right? It's really easy to test a middleware, as we're going to use WebOb for that.

import unittest from webob import Request class FakeApp(object): def __call__(self, env, start_response): return Response(body="FAKE APP")(env, start_response) class TestSwiftClamavMiddleware(unittest.TestCase): def setUp(self): self.app = SwiftClamavMiddleware(FakeApp(), {}) def test_simple_request(self): resp = Request.blank('/', environ={ 'REQUEST_METHOD': 'GET', }).get_response(self.app) self.assertEqual(resp.body, "FAKE APP")

We create a FakeApp class, that represents a fake WSGI application. You could also use a real application, or write a fake application looking like the one you want to test. It'll require more time, but your tests will be closer to the reality.

Here we write the simplest test we can for our middleware. We're just sending a GET / request to it, so it passes the request to the final application and returns the result. It is transparent, it does nothing.

Now, with that solid base we'll able to add more features and test these features incrementally.

Plugging ClamAV in

With our base ready, we can start thinking about how to plug ClamAV in. What we want to check here, is the content of the file when it's uploaded. If we refer to the OpenStack object storage API, a file upload is done via a PUT request, so we're going to limit the check to that kind of requests. Obviously, more checks could be added, but we'll keep things simple here for the sake of comprehensibility.

With WSGI, the content of the request is available in env['wsgi.input'] as an object implementing a file interface. We'll scan that stream with ClamAV to check for viruses.

import pyclamd from webob import Response class SwiftClamavMiddleware(object): """Middleware doing virus scan for Swift.""" def __init__(self, app, conf): pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl') # app is the final application self.app = app def __call__(self, env, start_response): if env['REQUEST_METHOD'] == "PUT": # We have to read the whole content in memory because pyclamd # forces us to, but this is a bad idea if the file is huge. scan = pyclamd.scan_stream(env['wsgi.input'].read()) if scan: return Response(status=403, body="Virus %s detected" % scan['stream'], content_type="text/plain")(env, start_response) return self.app(env, start_response) def filter_factory(global_conf, **local_conf): conf = global_conf.copy() conf.update(local_conf) def clamav_filter(app): return SwiftClamavMiddleware(app, conf) return clamav_filter

That's it. We only check for PUT requests and if there's a virus in the file, we return a 403 Forbidden error with the name of the detected virus, bypassing entirely the rest of the middleware chain and the application handling.

Then, we can simply test it.

import unittest from cStringIO import StringIO from webob import Request, Response class FakeApp(object): def __call__(self, env, start_response): return Response(body="FAKE APP")(env, start_response) class TestSwiftClamavMiddleware(unittest.TestCase): def setUp(self): self.app = SwiftClamavMiddleware(FakeApp(), {}) def test_put_empty(self): resp = Request.blank('/v1/account/container/object', environ={ 'REQUEST_METHOD': 'PUT', }).get_response(self.app) self.assertEqual(resp.body, "FAKE APP") def test_put_no_virus(self): resp = Request.blank('/v1/account/container/object', environ={ 'REQUEST_METHOD': 'PUT', 'wsgi.input': StringIO('foobar') }).get_response(self.app) self.assertEqual(resp.body, "FAKE APP") def test_put_virus(self): resp = Request.blank('/v1/account/container/object', environ={ 'REQUEST_METHOD': 'PUT', 'wsgi.input': StringIO(pyclamd.EICAR) }).get_response(self.app) self.assertEqual(resp.status_code, 403)

The first test test_put_empty simulates an empty PUT request. The second one, test_put_no_virus simulates a regular PUT request but with a simple file containing no virus.

Finally, the third and last test simulates the upload of a virus using the EICAR test file. This is a special test file that is recognized as a virus, even if it's not real one. Very handy for testing virus detection software!

Configuring Swift proxy

Our middleware is ready! We can configure Swift's proxy server to use it. We need to add the following lines to our swift-proxy.conf to teach it how to load the filter:

[filter:clamav] paste.filter_factory = swiftclamav:filter_factory

We'll assume that our Python modules is named swiftclamava here. Now that we've defined our filter and how to load it, we can use it in our pipeline:

[pipeline:main] pipeline = catch_errors healthcheck cache ratelimit tempauth clamav proxy-logging proxy-server

Just before reaching the proxy-server, and after the user being authenticated, the content will be scanned for viruses. It's important here to put this after authentication for example, because otherwise we may scan content that will get rejected by the authtemp module, thus scanning for nothing!

Beyond scanning

And voilà, we now have a simple middleware testing uploaded content and refusing infected files. We could enhance it with various other things, like configuration handling, but I'll let that as an exercise for the interested readers.

We didn't exploited it here, but note that you can also manipulate request headers and modify them if needed. For example, we could have added a header X-Object-Meta-Scanned-By: ClamAV to indicates that the file has been scanned by ClamAV.

You should now be able to build your own middleware doing whatever you want with uploaded data. Happy hacking!