I have an application that uses selenium webdriver to interface with PhantomJS. To scale things up, I want to run multiple instances of PhantomJS and load balance them with haproxy. This is for a local application, so I'm not concerned with deployment to a production environment or anything like that.

Here's my docker-compose.yml file:

version: '2' services: app: build: . volumes: - .:/code links: - mongo - haproxy mongo: image: mongo phantomjs1: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs2: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs3: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs4: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false haproxy: image: haproxy volumes: - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro ports: - 8910:8910 links: - phantomjs1 - phantomjs2 - phantomjs3 - phantomjs4

As you can see, I've got four instances of phantomjs, one haproxy instance, and one app (written in python).

Here's my haproxy.cfg :

global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 timeout connect 5000 timeout client 50000 timeout server 50000 frontend phantomjs_front bind *:8910 stats uri /haproxy?stats default_backend phantomjs_back backend phantomjs_back balance roundrobin server phantomjs1 phantomjs1:8910 check server phantomjs2 phantomjs2:8910 check server phantomjs3 phantomjs3:8910 check server phantomjs4 phantomjs4:8910 check

I know I need to use sticky sessions or something in haproxy to get this to work, but I don't know how to do that.

Here's a relevant snippet of my python app code that connects to this service:

def get_page(url): driver = webdriver.Remote( command_executor='http://haproxy:8910', desired_capabilities=DesiredCapabilities.PHANTOMJS ) driver.get(url) source = driver.page_source driver.close() return source

The error I get when I try to run this code is this:

phantomjs2_1 | [ERROR - 2016-07-12T23:35:25.454Z] RouterReqHand - _handle.error - {"name":"Variable Resource Not Found","message":"{\"headers\":{\"Accept\":\"application/json\",\"Accept-Encoding\":\"identity\",\"Connection\":\"close\",\"Content-Length\":\"96\",\"Content-Type\":\"application/json;charset=UTF-8\",\"Host\":\"172.19.0.7:8910\",\"User-Agent\":\"Python-urllib/3.5\"},\"httpVersion\":\"1.1\",\"method\":\"POST\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.REDACTED.com\\\\\\\"\\\", \\\"sessionId\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlParsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userInfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"queryKey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceURL":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"} phantomjs2_1 | phantomjs2_1 | phantomjs://platform/console++.js:263 in error app_1 | Traceback (most recent call last): app_1 | File "selenium_process.py", line 69, in <module> app_1 | main() app_1 | File "selenium_process.py", line 61, in main app_1 | source = get_page(args.url) app_1 | File "selenium_process.py", line 52, in get_page app_1 | driver.get(url) app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get app_1 | self.execute(Command.GET, {'url': url}) app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute app_1 | self.error_handler.check_response(response) app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response app_1 | raise exception_class(value) app_1 | selenium.common.exceptions.WebDriverException: Message: Variable Resource Not Found - {"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"172.19.0.7:8910","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"POST","post":"{\"url\": \"\\\"http://www.REDACTED.com\\\"\", \"sessionId\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlParsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","queryKey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}} app_1 |

So, how do I get load balancing working? What am I missing?

UPDATE

I figured out that I need some kind of session management in haproxy. The selenium webdriver and phantomjs communicate via sessions. The client sends a POST /session and receives a reply with the session id in the body. That reply looks something like this:

{"sessionId":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"linux-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}}