PyCon Wireless Network

By Sean Reifschneider Date March 15, 2007

Also see my related articles on networking I set up for PyCon 2008 2010 , and 2012

How do you make 600 Python geeks happy? Well, wireless network access is a good start...

Last year at PyCon 2006, the hotel ran the wireless network. Despite our repeatedly telling them that we were going to be heavily using the wireless network ("No, really, we're going to beusing the wireless network."), they really weren't prepared for our level of use. We also had problems with their technical support, things like the DHCP server giving out leases with a gateway in a different network from the lease, and their support people rebooting APs to "fix" it (which, not surprisingly, it didn't).

Last Year

It was so bad last year, that we decided to run our own wireless network this year. The wired network last year worked reasonably well, though there were some issues with DHCP there as well. So, I volunteered to run the network for 2007.

The wireless networking for PyCon 2006 was amazingly bad. The survey results break it down as a satisfaction of 44% "very low" and 38% "low", 15% "high" and 3% "very high". Those people in the latter categories were probably using wired, I'd imagine.

I took charge last year of helping people with the network, and the largest problems we had were that people couldn't associate. Some of these people were probably running into issues on their client, but I suspect some of them were not able to get associated because the APs weren't able to handle any more associations.

This Year

The largest issue was that people who were associated were seeing huge amounts of loss pinging the gateway. I don't think I ever saw less than 10% loss pinging the gateway, and usually more like 50% This was probably due to having only 4 APs, all sharing the same 802.11b channel, so there was a huge contention in the RF spectrum.

This year things were almost entirely reversed from last year. The survey the satisfaction with the network was 22% "very high" and 48% "high", with 20% "low" and 1% "very low". Despite having 41% more attendees than last year (and probably an even larger number of people using wireless, just because more people have 802.1a/b/g devices), well over half the people were happy with the networking.

Were they just happy because in comparison with last year it was so much better? Maybe, but a full third of attendees weren't at last years conference. Maybe that's the third that wasn't happy with it. :-)

Another side effect of running our own networking was that we were able to collect all sorts of stats. Last year the hotel only gave us MRTG graphs which included not only our network utilization but the hotel in total, including guest rooms. They had the huevos to claim that, based on the usage on these graphs, that there was no problem with our networking. We didn't know last year that these stats included guest rooms.

Total attendees 593 Total unique DHCP clients 623 Total DHCP requests 24,400 Max DHCP requests from a single client 4,537 Peak number of clients connected 340 (Saturday at 4:44pm) 95th percentile number of clients connected 263 Peak number of 802.11a clients connected 92 (Friday at noon) Peak number of 802.11b clients connected 47 (Friday at 9:10am) Peak number of 802.11g clients connected 198 (Saturday at 4:44pm) Max number of clients on a single AP 85 (Rear of ballroom, Saturday at 9:55am)

Here are some statistics:

Access Point Map

So, our peak number of attendees connected at any given time was 57% of attendees. The 95th percentile number is the number of associations we have to handle to work 95% of the time, which was 44% of attendees.

802.11a?

The wireless network access point locations and the peak number of clients.

One thing that surprised many people is that we had fairly high numbers of 802.11a users. Part of this is that we had better coverage for 802.11a, because of the number of available channels, than we did for b/g. Because 802.11a has more channels, we could run those APs on higher power, and so a user who saw both 802.11a and 802.11b+g APs would probably see the 802.11a AP as having a better signal.

Mostly, I did this so that the 802.11a users would just get out of the way of the 802.11b users, where the spectrum is very scarce.

WEP

had an 802.11a user that asked me for help. Every person who asked me for help was running 802.11b or g. I'm not sure if that's because the 802.11a users more more sophisticated, or that the 802.11a service was that much better.

I also set up WEP with a trivial hex key. At the Python Need for Speed sprint, we basically spent the week with really bad network connection. We only had around 256kbps of bandwidth there, and other people in the hotel were using the network. We couldn't track down who among us was hammering it, so it must have been someone around or on another floor. We even had people come right into the room we were having the sprint in and sit down and start computing away...

We used a hex key, because there are two different algorithms for converting a text key into a hex key. So, we just used a couple of hex digits repeated 5 times. Mostly just to keep the random other people off the network. Probably better on the sprint days, when there were other events going on and we didn't have the WEP key posted anywhere.

The Problems...

Though the shaping probably would have prevented problems with the random users abuse our network, it was nice to make sure our scarce bandwidth wasn't spread any more thinly.

It wasn't completely rosy though. We did have some problems.

The first problem we deliberately created. For the tutorials we put out too few APs, to see how they would work under the weight of many connections. It didn't work super well was the answer. At lunch I doubled the number of APs, and that solved those issues.

The bigger problem became apparent on the tutorial day though... We had based our predictions on the amount of network usage on two false assumptions: that the number of attendees was going to be roughly the same as the previous year, and that the bandwidth usage would be similar.

Tuesday morning I realized that basing our bandwidth speculation on the previous years usage was just wrong. The previous year, something around 82% of the attendees had severe problems connecting, and therefore they weren't sucking up the bandwidth. Last year we had 3mbps of bandwidth (again, shared with the hotel), this year 4.5mbps. Next year I'm recommending we get 10 or preferably 20mbps.

Another problem we had was that someone was wandering around with their laptop set up on our ESSID, with our WEP key, running in AdHoc mode. Meaning that users close to that person would associate with that user instead of our network. This was reported to me on Friday, but it wasn't until late Saturday afternoon that he found me and asked for help getting connected.

The problem here was a faulty network configuration program. The system was running Windows, with this busticated NetGear program for configuring the wireless. You'd tell it you wanted to connect to an existing network, and it would set itself up in Ad-Hoc mode. It wouldn'tyou that it was doing this, you had to dig through the advanced information. You also couldn't change this, you had to use the "expert" configuration to tell it not to do that.

The number one problem we had with users connecting to the network? It's called a "hardware radio switch". That's right, our most serious problem was with users who had their laptops firmware configured to disable the wireless radio. This is like "airplane mode" on cell phones. From the software, it looks like the WiFi card is working, but it can never associate.

A Mac User, Stephan Deibel, reported that setting the "Use interference robustness" option helped. It was impossible, even searching on the Internet, to find details on what exactly this option configures. I speculated that it might reduce the "RTS" setting, but it was impossible to tell for sure.

Another problem was APs getting unplugged. The worst problem was with them getting unplugged from the Ethernet, because then users would still try to associate with them, but it wouldn't work. That only happened once. The APs had the ability to watch the Ethernet link and disable the wireless if they got disconnected. I didn't set that up because I didn't have a chance to test it before the show. I had initially expected to run a number of the APs not connected to the Ethernet, so I couldn't have used that anyway.

3 or 4 times an AP was unplugged from power, probably people banging into them or the like. We didn't, in most cases, have safe places to mount the APs, I just set up a chair for them to sit on.

The last problem was with our shaping. I had set up fancy shaping using the HTB shaping rules. This should have allowed users to burst up to the full line speed if capacity was available, but push heavy users down to 128kbps as others used bandwidth. In my testing at home, it worked exactly as I had hoped. However, at the conference it eventually became clear that it was just restricting all users to 128kbps.

I've been over it several times, and as far as I can tell, this was a bug in the Linux traffic shaping.

Considering the overall scarcity of the bandwidth, it was probably for the best that users were limited, providing fair sharing even when users were hitting the network. Like the one person I saw in Guido's keynote, who was streaming a Google video of another talk Guido gave, and ignoring the streaming video.

I was mostly concerned about users with a virus, worm, or doing file sharing swamping the bandwidth. This is based on my experiences at coffee shops with users doing this and just killing the network because they were swamping the fairly limited outbound bandwidth. It's easy to bring a coffee shop network to it's knees by just sending 50 to 90KB/sec outbound.

DHCP

We never had an instance during the conference where the network was bad because someone or a few people were hammering it. So, in general I'd call it a success.

We had our own router, running NAT to the hotel network. On this server we ran our own DHCP. Our private network was a /22 network of 1024 addresses. I set up DHCP to give out 760-ish of these in a pool of dynamic addresses, with an 8 hour lease time. I set aside another 250-ish of these addresses to be outside the pool.

I used the "glabel" program to print out a bunch of slips of paper, one of each of these 250 IPs, including DNS, netmask, and gateway information. People who reported problems getting an address were given one of these slips of paper, effectively giving them a lease on an address.

One person suggested that 8 hours was way too long a lease and that we'd probably run into problems because of this long lease time. I explained that we had way more IPs available in the DHCP pool than we had attendees, so the long lease time shouldn't be a problem. As far as I know, it never was. Based on a review of the logs, we never allocated all the IPs available in the pool.

DNS

I'msurprised that we had more DHCP leases than we had attendees. I figured there would be some attendees with more than one wireless device, with cell phones and PDAs having it now.

Transparent HTTP Proxy

I set up dnscache on the NAT router and published this machine as the DNS server. I've had incredibly good luck with running dnscache in the past, and in particular have found it to work well with little memory usage. This was running on a small machine with only 256MB of RAM, and was easy to set up, so I threw it into the mix.

When it became apparent that we didn't have nearly enough network bandwidth, I tried setting up a Squid transparent proxy. The iptables REDIRECT target was just never matching, despite my double-checking the rules 4 or 5 times. I finally gave up on it. I believe this may have been related to the bug causing the shaping not to work properly, because I've successfully set up transparent proxy before, and several other references I checked showed that I was doing it right.

The AP Setup

I would consider setting up a proxy if I did this again in the future, but I'd probably use Apache to do it. I've found Squid to be very complicated to configure, Apache is much easier to deal with.

Last year the hotel provided around 4 APs. This year we had 24 APs with 12 more in reserve. These were actually 12 "dual channel" APs, with both an 802.11a and 802.11b+g AP built into it. The remaining 12 were because I ordered the wrong model initially, and since I had 30 days to return them I decided to just ship them to the conference in case we ended up needing them.

I didn't know initially if we'd get access to the hotel wiring infrastructure, so one of the options was that we could set up the dual-channel APs to run meshing on 802.11a as our backbone to distribute the other APs around for 802.11b+g as primary access.

We did end up getting access to the wiring infrastructure of the hotel, so all the APs we ran in dual AP mode, acting as both an 802.11a and 802.11b+g AP.

I set up 802.11b+g to run in the second to lowest power mode, and using the 3 available non-overlapping channels. I also mounted the APs around 2.5 feet above ground, so that peoples bodies would absorb the signal and help reduce interference between adjacent APs on the same channel. I tried to organize the APs such that APs on the same channel were not close together.

802.11a has like 9 non-overlapping channels, so I set up APs on different channels as much as possible, and ran 802.11a in it's highest power setting.

After the main conference, I set up two of the APs to use the 802.11a radios in WDS mode, and put the WDS client AP out by the lobby (where we didn't have a wired port), to provide repeater service for users in the lobby, bar, and restaurant. "Mission-critical bar coverage has been set up" I joked.

RTS?

So, in the end we had a dozen 802.11a APs, and another dozen 802.11b+g APs, to cover the conference.

I had asked Jamie Gansead of ThinAirNet, who just sold off his 802.11b-based terrestrial wireless ISP, to review my plan for the wireless network. His biggest suggestion was to set the "RTS" parameter low. However, this is a client-side setting, not something I can push from the server side.

802.11a/b/g wireless works by listening on the channel to see if anyone else is sending, and if the packet is below the RTS, and the radio doesn't hear anyone else sending, it will send the packet. However, if there are two clients that can both see the AP but can't see each other, this mechanism doesn't work.

If the packet is larger than the RTS, the client will ask the AP to reserve a time for it to send the packet, the AP will announce to everyone within hearing that the radio is reserved, and the client will send. This extra overhead hurts in small networks where all the clients can see each other most of the time, but really helps when you have users who can't see other stations.

The default RTS setting is the maximum value. I noticed some high latency on my laptop, and then set RTS down to the minimum and the latency dropped way off.

Again, this wasn't something I could centrally dictate though. I set up a wireless networking page which in the Wiki which included this hint, but I don't know how many users actually used this setting.

We used the D-Link DWL-7200 AP, which is a less than $200 "enterprise" access point. It had some problems, like enabling the "Load Balancing" feature seemed to cause it to break, even if the number of associations was below the specified limit. Also, every page you change in the configuration required that you reboot the AP, meaning every page of changes required 30 seconds to save.

It included the ability to save off and load a config file, so I just made a base config that I saved, then uploaded that config to a new AP and changed a couple of values including IP address and channel, so it went much faster. The config file was text format, meaning in theory I could have changed it in a text editor, but there was a checksum value at the end that I figured would have caused that to break. Dang.

I had originally brought in a couple of Proxim "enterprise AP"s, but the sheer price of these wouldn't have allowed me to get my target number of APs (15) within my budget. The Proxim APs seem to have a few more features, including a nice "mesh" auto-configuration that would have been nice if we ended up doing meshing for the backhaul. However, they also took a good 20 to 30 seconds per page of changes to make. The web interface was amazingly slow.

Costs

The Proxims cost $450 each new (compared to $180-ish for the DLink). I had originally looked at ebay and found they could be had for under $200, but there just weren't enough of these auctions available in the weeks leading up to the conference for me to have gotten them at this price. At full price, the Proxims alone would have been over twice my budget for the number of APs I was hoping to get.

I spent around $800 getting hardware in to evaluate. This included 4 Proxim APs (of which one was just broken, one was somehow a dual 802.11a AP and didn't have the b+g radio) and would have covered getting 2 of the DLinks. In the end I just ordered a dozen of the DLinks because time was getting short. I had a 1.5 week long business trip a week before PyCon which blew my schedule.

Next Year

Total cost spent on APs was around $2200 for the APs we used for the conference. Time required to architect the network, evaluate and select the hardware, set up test and deploy the network, was around 70 hours.

The hotel in Chicago where we are holding PyCon next year has already said, flat out, that they will not allow us to hook our network gear to their network. We are trying to make sure they understand the magnitude of the job they are taking on. They are also offering an SLA.

As a contingency, we are keeping the wireless gear from this year, so if the poo hits the fan next year we will have the hardware on hand to fix it, even if the hotel can't. We are also getting competitive quotes from other providers for what it'll cost to bring in our own network line for the event. Hotels charge tens of thousands of dollars for providing a network, and a full DS-3 (45mbps) can cost $2500/month for a year term, so we may be in the ballpark. Especially if we are going to use the connection two years in a row (paying off the build-out for the first year over two events).

I suspect the hotel isn't up to dealing with the networking requirements of PyCon because they have told us they are using T1 lines. 10 years ago, a T1 line was a mighty Internet connection. Today, it's pretty sad. I have over 6x that bandwidth into my house. My recommendation for bandwidth next year would require 7 to 14 T1s, which is just silly.

In Conclusion

I also hope that the hotel networking folks at least read this to get an idea of what they're committing to. I'm also going to ask them to run their network design past me, just so we can assure their plans are mighty enough to handle the reality of PyCon.

I went into this saying "It'd be hard to make it as bad as last year." I was hoping for a perfect network experience for all, but admit I didn't quite meet that. The biggest issues being the upstream network and shaping that didn't work as I'd hoped.

However the "better than last year" target was undeniably achieved.

If I were to do it again, I'd add even more APs, particularly in the center of the ballroom (I had them around the outside). My initial estimate of 15 APs was probably spot on, I reduce it to 12 to try to stay in budget, despite AMK saying that I could go over budget. Last minute FedExing cost nearly 20% of my budget and was responsible for nearly my entire overrun.

the big applause when AMK thanked me for the networking made all the time and effort worth it. But, to be honest, the reduction in problems over last year, which meant that I could spend more time enjoying the conference, was the big payoff. I'll admit I was kind of floored by AMK saying in the closing address "If Sean can do this for the conference, imagine what tummy.com can do for your business." That waskind of him. People stopping me in the hall and saying "Yay" was also quite uplifting.

Literally, I spent most of my time on the network helping people who were having problems with their machine rather than the problem being related to the network.

Shameless Plug

So, overall I'd call it a total success, though with a few things I'd do differently.

Please enable JavaScript to view the comments powered by Disqus.

Disqus

tummy.com has smart people who can bring a diverse set of knowledge to augment your Linux system administration and managed hosting needs. See the menu on the upper left of this page for more information about our services.