That's pretty much the size of it. We aim for 'non-disruptive enhancement' around here. That means: No Friday installs of anything that has a Customer facing component. Of late this has become problematic, since we have a sizable User base distributed around the Globe. Someone is always inconvenienced, no matter when we install; so we have settled on Developer Convenience as the determining factor. This means that the last five or six Product releases have occurred on Wednesday Evening between 1800 and 2000.



The Service DOES NOT GO DOWN. The New Service (Client enhancement, bug fix and code maintenance, new database features, etc) will run in parallel with the Old for a period of weeks. (Or a period of years, in the case of Client code. We have Client-software in the Wild that is over eight years old. The feature set is still supported, We don't accept bugs on it, however.)



While there is a committee involved in the planning and scheduling of a Change, there is a single Change Captain. The Change Captain has final authority to say 'Oops. Back it out'.



We try to give our User Community a reasonable estimate of when a new feature set will be available. We plan to have it in play four to six hours before the announced go-live. This gives us a little 'final-checkout' time. Our Users know this and so we sometime get 'early adopters'. We don't discourage this.



The Usual Time Line: Two weeks before the date -- feature freeze One week before the date -- code freeze and QA begins regression testing. On the day: 1300 -- Final Change Review meeting -- are we really ready? 1600 -- New code/hardware active and checked out 1800 -- Go-time -- everybody involved gathers in a conference room and watches the logs and monitors. 1810 -- Pizza delivered (on the Project Managers nickel) 1930 -- More Pizza, this time with beer (ditto; there is a line item in the project budget for this) 2000 -- Go-Live for the Users; Ice cream arrives Afternoon of the following day -- Post Mortem

This seems to be a working method, it has served for the past fourteen months. We have only had one release aborted by the Change Captain -- when it was announced in the 1300 'final readiness' meeting that the primary power system to one of the co-location facilities had failed at 0300, and we were on standby generators. The power vendor 'expected to have it back online by close-of-business today.' The CC said "that's nice. We ain't going until the generator has been up for at least 12 hours." We slipped the install a day. ----

I Go Back to Sleep, Now. OGB

Some of these issues have to do with your specific job, and the nature of the systems. For instance, when I worked for a university, if a system change would require downtime, we never started before 6pm on a Friday for planned change, so we had the whole weekend to clean up if things went wrong.* Things were put into place on Friday night, stakeholders got to review the system on Saturday, and sysadmins and programmers had to get it working by 6am on Monday. If it was something that could be done with a quick cutover, I'd prep everything in advance, get signoff on it in testing, and then cut it over at 7am on a Monday morning. (specifically because people came in late on Mondays, so the trouble calls came in slower**). These days, my work is international, and as I'm a contractor, they don't like me working overtime or odd hours. So, I'm a firm believer in the Tues-Thurs window. We had a specific rule of NO system changes after noon on Fridays. For some types of changes, we have to wait for specific gaps which occur. For other changes, we do 'em late morning, as most of Europe's left for the day, and many of our West Coast users aren't early risers. We then have people around for a full day of debugging, should something go wrong. So, I'd have to say my normal window these days is Tuesday-Thursday, 10am-1pm. (if it's a local change, and not going to affect the European folks, 8am-10am) ... * the 6pm rule, unfortunately, is what got us into problems when our management wouldn't let us take down a mail server when we noticed it was having disk problems, and resulted in us spending 16 days around the clock trying to get things working (and by 'working' I mean e-mail was lost for all students) ... because management didn't want to react when we noticed the problem (a little past noon), and the system failed in a cascading manner at about 4:30pm. ** Of course, that also resulted in a problem one day when I got all of the signoffs, but something wasn't transfered cleanly ... and as no one noticed 'till about 10am, I had to manually merge changes, which took me about 2 days ... normally, not a big deal, but the cutover was just to buy me 2 weeks to finish a project ... which then spiralled out of control, and took years to complete (of course, I had been fired for 'use of sarcasm', as as the lead on the project, that might've explained why they were delayed by 3+ years (well, that, and bringing in a 'third party' to review the system, who didn't understand our business needs, or the software we were using which added the first year of delay, and may have resulted in my sarcastic attitude))

simon.proctor++ ... that's pretty close to how we handle it. Sometimes we have pushes that would impact customers (database needs to go down because you cannot pre-live) so we also evaluate the impact to the end-user. If it's too much of an impact, we co-ordinate to do the push off hours (weekend or after biz hours). I'd say more than 90% of our pushes follow your rules and about 2-3 times a year I can expect to be working on a Saturday. -derby

I'll second that... much the same where I am.

One added thing - whenever possible (and it's not always possible), I try to have the "live" system up and running for a day before announcing it. One last chance to check for... unforseen consequences :-) -- WARNING: You are logged into reality as root.

I did support for a Fortune 500 company and typically all major changes were done on the weekend. There were two reasons for this, It supposedly gave the developers, Admins extra time to test the environment without everyone banging on it during the work day. They would bring in employees on Saturday/Sunday to help with testing the implementation.



The real reason was that the Production environment was so convoluted that there was not a good 'test' environment. The only way to make sure everything was going to work was to push it into Production and try patching and fixing things as they broke. What ending up happening was all the basic stuff got checked in Production and they went with that. Come Monday, as the users got onto the system a ton of bugs were consistently found, which usually caused corporate wide problems. I have not worked with a company yet that has been consistent about how they role out code into production, but usually it was in the evening hours because it interupted fewer people but those systems typically had little testing so always resulted in major problems the next day.

If it were realistic, I would push the end time back to 15:00. If you are writing from a (purely) development side, then that potentially leaves you with other teams in the mix (say a database management and operations teams). You may not have any access at all to any live machines, and you are reliant on these other departments to deploy your changes. That means you are reliant on them to help fix errors and roll back. Also, as you mention DNS, it sounds like web development, and if you are UK based, then one of your peak traffic periods may be 16:00. And this all ignores the times when you are implementing major changes (that might include hardware changes as well as software) where downtime is unavoidable (or highly likely). Then you must consider the user/customer base first and pick off-peak times in order not to cause long lasting damage to the business. -=( Graq )=-

> that potentially leaves you with other teams in the mix (say a database management and operations teams). Almost true in my case. Unfortunately, there are no real database teams per se. The hardware team looks after hardware, makes sure the os is up and the capacity is ok. Beyond that, it is considered 'application' and out of their remit. Because of that, I tend to get a little more say in the stuff that I do. I'm also lucky that most of what I do is internal to the business. However, as an international business we have 24 hour access requirements. This is where our SLAs come into play. Downtime is inevitable but is generally mitigated as much as possible. I don't really mention it as downtime is a lot rarer for my work. > As you mention DNS it sounds like web development That is part of my work but i used DNS as an easy to understand example of not choosing a fixed time for launches. Thanks for your comments :)

I support one system that has a team working 9am-6pm in Toronto on one server, and another team working about 10am-10pm in Mumbai, India, on a satellite server. That means the system's in use from about 11pm through to about 6pm local time, so my 'maintenance window' is technically 6-8pm. So when it's time for a roll-out, it's a little tricky because the two teams share a server for some operations, but not for everything .. so we usually make the changes at both ends sometime during the day and get our end tested. If it all checks out, we test the Mumbai satellite from Toronto, and then finally get them to try it out at about midnight when they start their day. We usually have an opportunity to roll back to the previous version, unless it's a database schema change, in which case any fixes depend on changes to live Production code. That hasn't been a problem in the four years I've been doing the job. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

I don't have anything to add but wanted to drop a note to say that I really enjoyed this discussion and reading the various approaches. I've worked as a sys admin in a highly heterogeneous environment where 24/7 was critical (associated with patient care) and I've worked as a programmer in research where lucky users were supported between 8am and 5pm. Clearly the needs vary depending on users, systems, etc. Clearly the demands on the admin and/or programmer also vary greatly. Thanks again for sparking such a discussion.