[nycphp-talk] session variables: seven deadly sins

Allen Shaw wrote: > Really? That's a surprising assertion, though I'll agree my surprise > probably comes more from my own lack of insight than a flaw in your > argument. Of course a quick google shows a few people hold that > session vars are "evil," but I can't find much to back up the idea. > > Can you elaborate, or give us a few links on the topic? I'll try to reply to this and some other people who replied to my previous message. I'll start with my background. I've often been the person who the buck stops with -- somebody else develops an application that almost works (perhaps even puts it in production) and then I have to clean up the mess. The app might be written in PHP, Java, Cold Fusion, Perl, you name it. I've learned to see session variables as a "bad smell". When I develop my own applications, I use cookies for personalization and caching. I use the authentication system described in http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz this mechanism can carry a "session id", which in turn can be used a key against application state stored in a relational database. I think through the boundary cases, and find that my greenfield apps behave predictably -- my only woe is that you'll discover that browsers have a lot of undocumented behavior connected with cookies, form handling, and caching. All problems that you still need to fight with if you use sessions, see the comments for http://www.php.net/manual/en/function.session-cache-limiter.php ---- The context of this is that the average web application is poor in the areas of usability and security: recent studies show that 80% of web applications have serious security problems http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf Jacob Nielsen's website has been chronicling the sorry state of web application usability: http://www.useit.com/ Perhaps the top 20% of programmers can write applications with $_SESSION that don't have serious security and usability problems, but what about the other 80%? ---- (1) Session variables are treacherous. Odd things can happen in boundary cases, such as when sessions expire, or when you are targeted by session fixation attacks. http://shiflett.org/articles/security-corner-feb2004 I've looked at many apps that use sessions that seem to be working... Until you walk away for two hours, come back, and discover that you're logged in as somebody else. I suppose I could have spent hours or days tracking down an intermittent problem, which involved some confluence of browser oddness (IE was fine, Firefox was screwy), the behavior of the session system, and crooked logic in the application. Or I could use cryptographically signed cookies to implement an authentication system which won't give me surprises in the future. Anybody can write applications that work 95% of the time with $_SESSION. Getting the other 5% right requires a deep understanding of state and statelessness on the web... Which is what (many) people are trying to avoid when they use $_SESSION variables. There are more than twenty configuration variables that affect the way sessions work under PHP. Incorrect configuration of any of these can cause applications to fail, often in intermittent ways. The use of a custom session handler can have unpredictable effects on security, reliability and performance. Other languages are a lot worse than PHP -- the use of the "scope" concept in languages such as Cold Fusion and Tango makes it easy to use a session variable without realizing it... Resulting in an application that "works" sometimes, but fails in mysterious ways. (2) Session variables are bound to a particular language. In the real world, I work with legacy systems that might be written in other languages. I might have some old pages in Cold Fusion that work just fine, and I won't rework them in PHP until I've got a good reason. If users can set a customization parameter, such as the background of a page, it's easy to write a cookie that all languages can read. Applications stuck in the session variable roach motel aren't as maintainable and portable. (3) PHPSESSID. Do I need to say more? I consider the client that wants user tracking and can't accept cookies, so all the pages on their site look like http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob** Three months later they come back and wonder why their site isn't being indexed in Google. Yes, there's a saner way to use this feature, but this "cure" to privacy violation is worse than the cookie "disease", since session ids will leak out through referrers, bookmarks, links that people cut-and-pate... (4) The back button. When somebody asks a question about sessions on a forum, they'll usually ask another question a few days or weeks later: "How do I disable the back button?" The underlying problem is a deep aspect of the structure of the web. There is certain state information that's particular to a request (GET and POST variables) and certain state information that has a more persistent scope (cookies, session information, a relational database.) The back button makes it possible for these two things to get out of sync. Ultimately, we need a systematic strategy to deal with this. One pattern is to put the complete state of the application in form variables. Applications that use this pattern always work perfectly with the back button. This pattern doesn't work always (hitting the back button shouldn't cancel your order on an e-commerce site), but it works often... For instance, you can use hidden variables to hold onto form variables for complicated forms that spread over several pages, (5) Multiple windows. I think it's a human right to be able to have more than one window open on a web site. If I'm shopping, for instance, I'd like to be able to look at two products simultaneously. An application that keeps state in form variables doesn't care how many you have open. If you're looking for jobs at an organization that uses taleo.net's software, you'll find that it uses trickery to prevent you from having more than one window open... So you can't look at two jobs at once, or look at the job description while you're filling out the application. I suspect that they did this because they don't want to spend forever debugging "race conditions" that could be caused by a user acting in two windows simultaneously. Session variables introduce problems of locking. PHP gets an exclusive lock on the session for each page displayed. This hurts the performance of pages that use dynamically generated images and Javascript, and can mysteriously deadlock AJAX applications. (6) Scalability, Reliability, and all that. This is a tricky one, because it depends on particulars. Sessions can be lightning-fast in systems that keep them in RAM, such as Java and Cold Fusion. The default session handler in PHP uses files, and is probably faster than a relational database in a direct comparison: however, the session handler will load all of the data into RAM, whereas a relational implementation may only need to load information when it's needed. Keeping information in POST variables or cookies also involves a tradeoff -- this is as scalable as it gets so far as server resources, but requires that the state be passed back and forth between the browser and server. This is no big deal if the state is 500 bytes. It's unacceptable if the state is 500 megabytes. In most cases, it starts looking expensive when we're passing an extra 10k-100k around. I've recently been working on a legacy app that contains a query (select a subset of items) and reporting (display user-selected fields of those items) function. The interface between those modules is simple: the query system passes a comma-separated list of item identifiers to the reporting system. I like this, because it meant that one system could be changed without affecting the other. I had to update the app so it would work with a changed database schema, so both sides needed some work. I discovered that the app was passing the item list as a session variable. This worked: unless I was using the application in two windows at a time. In that case, a query in one window would change the report delivered in another window. I thought about it, and realized that in this case, result sets would always be under about 10k, and usually be around 1k. Therefore, it made sense to pass this as a hidden variable in the form and ditch the session variable. This shows the kind of problems that regularly turn up in the applications that developers "throw over the wall" to testers and clients. Choose a session variable, and your application behaves mysteriously for a user who didn't respect the "one window at a time" assumption you made. Passing hidden variables in forms, on the other hand, might work OK when you're testing with a small data set over a LAN, but could rapidly become a performance nightmare for dialup users using a production database. Performance can be improved in a number of ways: for instance, by delta-sigma compressing the item list, or creating a "form scope" variable that's keyed against a unique identifier in the form. Either way, quality web applications take quality thought. (7) Lack of engineered application state: Engineered Application State is the gem of database-backed web applications. If you keep the state of your application in a relational database, you need to ~design~ the state of your application. You need to ~think~ every time you add or change a table in your relational database. You can add a new variable to your application as easily as typing '$'. Desktop apps keep the application state in a tangle of pointers. C and C++ applications tend to contain 5 or more defects per thousand lines of code. Errors show up in data structures over time, just as mutations occur in your cells. Memory leaks, application hangs, and crashes are cancers caused by these mutations. PHP apps die at the end of each request, and are reborn for the next request. They don't accumulate errors over time. Web application environments such as Java and Cold Fusion that involve a long-running process regularly hang or crash and require restarts. When is the last time you've had to restart PHP? A database protects you from errors in multiple ways. Transactions, for instance, protect against data corruption caused by crashing scripts. It's easy to write $_SESSION["logged_in"]=true; in one place and $_SESSION["logged-in"]=false; in another, introducing unpredictable behavior and security holes. A relational database will give you an error if you try something like that. ------------- Can users of $_SESSION avoid the seven deadly sins? Yes. In practice they don't.