MySQL数据库Sharding存在的一些解决方案
MySQL数据库Sharding存在的一些解决方案-MySQL Database Sharding Existing solutions
At the moment Netlog is the 67th most visited website in the world, according to Alexa's ranking. This means that there's at least 66 other websites out there probably facing similar problems as we do. 16 of the 20 most popular websites are powered by MySQL so, we are definitely not alone, are we?
Let's have a look at some of the existing technologies that implement or are somehow related to sharding and scaling database sysems, and let's see which ones could be interesting for Netlog.
MySQL Cluster is one of the technologies you could think would solve similar problems. The truth is that a database cluster is helpful when it comes to high availability and performance, but it's not designed for the distribution of writes.
MySQL Partitioning is another relatively new feature in MySQL that allows for horizontal splitting of large tables into smaller and more performant pieces. The physical storage of these partitions are limited to a single database server though, making it not relevant for when a single table grows out of the capacities of a single database server.
HSCALE and Spock Proxy, that both build on MySQL Proxy, are two other projects that help in sharding your data. MySQL Proxy introduces LUA, as an extra programming language to instruct the proxy (for eg. finding the right shard for this query). At the time we needed a solution for sharding neither of these projects seemed to support directory based sharding the way we'd wanted it to.
HiveDB is a sharding framework for MySQL in Java, that requires the Java Virtual Machine, with a php interface currently being in an infancy state. Being a Java solution makes it less interesting for us, since we prefer the technologies we are experts in and our application is written in: php.
Other technologies that aren't MySQL or php related include HyperTable (HQL), HBase, BigTable, Hibernate Shards (*shivers*), SQLAlchemy (for Python), Oracle RAC, etc ... The memcached SQL-functions or storage engine for MySQL is also a related project that we could mention here.
None of these projects really seemed to come in line with our requirements. But what exactly are they?
- Flexible for the hardware department.
We project growth and want the sharding system to be flexible. Knowing that our traffic will increase, we need to be able to add more shards quickly. With a growing amount of data, a proportional growth in hardware is requested. For this reason we opt for a directory based partitioning scheme. - No massive rewrite.
We can't introduce a whole new database layer or incompatible abstraction layer. We want to keep on using our database class as we are doing now and only implement sharding for those features that really require that amount of scaling. That's why we've opted for a solution that builds on what we have and allows for incremental implementation. We also wanted to use the sharding API, without having the data to be physically sharded, so the development and IT departments can independently decide when to do their part of the job. - Support for multiple sharding keys.
Most of our data will probably be sharded on $userID, but we want the system to be flexible so we can implement other keys and/or sharding schemes too. - Easy to understand.
We can't expect each and every of our developers to know everything about scalability and performance. Even if this was the case, the API to access and store data in a sharded environment should make it transparent to them so they shouldn't care about performance and can focus on what's really fun to do: developing and improving on features.
So, it's best if the API is a php API which makes it easy for them to use in the rest of our application.