Here is the quick notes from the session Scaling MySQL – Up or Out ? moderated by Kaj Arno as part of the todays keynote.

Here is the list of panelists are ordered by Alexa ranking.

Monty Taylor (MySQL) Matt Ingerenthron (Sun) John Allspaw (Flickr) Farhan Mashraqi (Fotolog) Domas Mituzas (Wkipedia) Jeff Rotheschild (Facebook) Paul Tuckfield (YouTube)

Here is the list of questions and answers from panelists:

Â How many servers Number of DBAs How many web servers Number of caching servers Version of MySQL Language, platform Operating System MySQL 1 M, 3 S 1/10 2 2 5.1.23 Perl,php and bash Linux fedora Sun 2 clustered, 2 individual 1.5 160+ 8 5.0.21 Lots of stuff (java mostly) Open Solaris Flickr 166 At present 0 244 14 5.0.51 Php and some Java Linux Fotolog 140 databases on 37 instances 10 instances a DBA 70 40 ( 2 on each, 80 total) 4.11 and 4.4 Php, 90% Java Solaris 10 Wikipedia 20 None, but everybody is kind of aÂ DBA 70+200 40 ( 2 on each, 80 total) Â Php, c++, python Fedora / Ubuntu Facebook 30000 databases, 1800 db servers 2 1200 805 5.0.44 with relay log corruption patch Php, python, c++ and erlang Fedora / RHEL Youtube I can not say 3 I can not say I can not say 5.0.24 Python SuSE 9

Few more misc questions …

Number of times re-architected ?

My: 2 times – 1 time slave, 1 time memcached

SN: site depend (many times over the year)

FR: 2.5 (various clusters federated)

FL: many cached replacements (about to do one change now)

WK: Never (Spaghetti)

FB: Every Tuesday, continual

YT: Pretty continual, 2-3 times (replication, sharding, federation)

What happens if server fails ? what actions you will generally take ..

FR: All of our servers are federated, pairs of servers, we can loose any one side of shard, we can loose boxes.. traffic goes to either side of shard, now it goes to one, and we will get another one (very transparent to user)

WK: Users shout at them on IRC then they moderate … fixed in seconds

FB: one of 1800-1900 will always fail, just operate well, minor impact, with data going away for a while…we restore from binlog and start the server quickly, promote slave to master and number of ways

FL: we simply mount the snapshots to different servers and get

YT: SAN etc, very important data.. recover the server, mirrored disk …mirrored hard drive is crucial

Any recommendation of scaling technology that you wanted to bring

FL: UltraSPARC-T1 (excellent master, multi threaded) and UltraSPARC-T2 for slave (single threaded)

WK: good network switch

FB: cheap switch causes problems and learned lessons, we do not use SAN, neatly partitioned, they scale independently and fail independently

MY: cluster very sad

Server virtualization ?

nobody uses at this time

FB: ETL cluster, we may run more than one in the future

Anything to worry at present ?

FB: app design is the key to use resources, data center power supply and consumption

FL: Google has to approve for our lab power (cut app servers by 1/2 by moving from php to java)

YT: not at all

Any reco, lessons to DBA

better you know what the systems are, then you can

performance, scaling taking it serious

nothing more permanent than temp solutions (if you don’t know when you will fail, then you will )

architect properly in start, schema, cost of serving data