MongoDB vs Cassandra
2011-10-19 01:44
357 查看
http://blog.boxedice.com/2011/07/21/mongodb-vs-cassandra/
Real software development
HomeAbout
MongoDB
MongoDB Monitoring
Startups
Owl
MongoDB vs Cassandra
July 21, 2011tags:
MongoDB, nosql,
cassandra
by David Mytton
Over the 2 years we’ve been using MongoDB in production with our server monitoring tool,
Server Density, we’ve built up significant
experience and
knowledge about how it works. Back in 2009 when I was looking at a replacement forMySQL
I looked at Cassandra but dismissed it because MongoDB had several advantages, and Cassandra was still extremely early stage (even more so than MongoDB at the time). Having been invited to give a comparison at the
Cassandra London Meetup, I thought I’d revisit it to see how it compares today.
Disclaimer: It’s important to note that much of what I know about MongoDB has been learnt through using it in production. We don’t use Cassandra so any comparisons are going to be fairly superficial but they will still be relevant because
that’s the stage most people will be in when they are considering which database to pick. As a result of this I will try to avoid making technical comparisons about specific features because this will be biased towards my extensive understanding on MongoDB
vs a limited understanding of Cassandra.
As such, this comparison is split into 2 types of difference – usage and operations.
Usage: The actual usage as a developer implementing the application with the database.
Operations: Points which are not directly about the core database but it’s suitability forproduction and management on an operational level.
That said, I will start with several technical comparisons because these are important to understand.
Usage – Structure
MongoDB acts much like a relational database. Its data model consists of a database at the top level, then collections which are like tables in MySQL (forexample) and then documents which are contained within the collection, like rows in MySQL. Each document
has a field and a value where this is similar to columns and values in MySQL. Fields can be simple key / value e.g.
{ 'name': 'David Mytton' }but they can also contain other documents e.g.
{ 'name': { 'first' : David, 'last' : 'Mytton' } }.
In Cassandra documents are known as “columns” which are really just a single key and value. e.g.
{ 'key': 'name', 'value': 'David Mytton' }. There’s also a timestamp field which is forinternal replication and consistency. The value can be a single value but can also contain another “column”. These columns then exist within column families
which order data based on a specific value in the columns, referenced by a key. At the top level there is a keyspace, which is similar to the MongoDB database.
A good set of data model diagrams forCassandra can be
found here.
Usage – Indexes
MongoDB indexes work very similar to relational databases. You create single orcompound indexes on the collection level and every document inserted into that collection has those fields indexed.
Querying by index is extremely fast so long as you have all your indexes in memory.
Priorto Cassandra 0.7 it was essentially a key/value store so if you want to query by the contents of a key (i.e the value) then you need to create a separate column which references the other columns i.e. you create your own indexes.
This changed in Cassandra 0.7 which allowed secondary indexes on column values, but only through the column families mechanism.
Cassandra requires a lot more meta data forindexes and requires secondary indexes if you want to do range queries. E.g. if we define a new column family with 1 index:
1 | $ bin/cassandra-cli --host localhost |
2 | Connected to: "Test Cluster" on localhost/9160 |
3 | Welcome to cassandra CLI. |
4 | Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. |
5 | [default@unknown] create keyspace demo; |
6 | [default@unknown] use demo; |
7 | [default@demo] create column family users with comparator=UTF8Type |
8 | ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, |
9 | ... {column_name: birth_date, validation_class: LongType, index_type: KEYS}]; |
1 | [default@demo] get users where state = 'UT' and birth_date > 1970; |
2 | No indexed columns present in index clause with operatorEQ |
1 | update column family users with comparator=UTF8Type |
2 | ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, |
3 | ... {column_name: birth_date, validation_class: LongType, index_type: KEYS}, |
4 | ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; |
stateas the primary and filter based on the
birth_date:
1 | get users where state = 'UT' and birth_date > 1970; |
this blog post).
Usage – Deployment
MongoDB is written in C++ and provided in binary form forLinux, OS X, Windows and several other platforms. It’s extremely easy to “install” – download, extract and
run mongod.
Cassandra is written in Java and has the overhead that brings, but also the easy ability to integrate into existing Java projects. It
takes a little longer to getstarted but there is a demonstration of
setting up a 4 node cluster in less than 2 minutes, which you’d struggle to beat with MongoDB.
I know plenty of people running MongoDB on Windows but would be interested to hear if that’s the same with Cassandra (I suspect it’s more Linux).
Operations/Usage – Consistency/Replication
In MongoDB replication is achieved through
replica sets. This is an enhanced master/slave model where you have a set of nodes where one is the master. Data is replicated to all nodes so that if the master fails, another member will take over. There are configuration options to determine which nodes
have priority and you can set options like sync delay to have nodes lag behind (fordisaster recovery, forexample).
Writes in MongoDB are “unsafe” by default; data isn’t written right away by default so it’s possible that a write operation could return success but be lost if the server fails before the data is flushed to disk. This is how Mongo attains high performance.
If you need increased durability then you can specify a safe write which will guarantee the data is written to disk before returning. Further, you can require that the data also be successfully written to n replication slaves.
MongoDB drivers also support the ability to read from slaves. This can be done on a connection, database, collection oreven query level and the drivers handle sending the right queries to the right slaves, but there is no guarantee of consistency (unless
you are using the option to write to all slaves before returning). In contrast Cassandra queries go to every node and the most up to date column is returned (based on the timestamp value).
Cassandra has much more advanced support forreplication by being
aware of the network topology. The server can be set to use a specific consistency level to ensure that queries are replicated locally, or
to remote data centres. This means you can let Cassandra handle redundancy across nodes where it is aware of which rack and data centre those nodes are on. Cassandra can also monitornodes
and route queries away from “slow” responding nodes.
The only disadvantage with Cassandra is that these settings are done on a node level with configuration files whereas MongoDB allows very granular ad-hoc control down the query level through driver options which can be called in code at run time.
Operations – Who’s behind it?
Both Cassandra (Apache 2.0 license) and MongoDB (AGPL) are open source. You can freely download the code, write patches and submit them upstream. However, Cassandra is purely an open source project whereas MongoDB is “owned” by a commercial company,
10gen. The original authors of MongoDB are core contributors to the code and work for10gen (indeed, 10gen was founded specifically to support MongoDB and the
CEO and CTO are the original creators).
In contrast, Cassandra was created by 2 engineers from Facebook and is incubated by the Apache Foundation. This is not a disadvantage (indeed, the Apache Web server used by the majority of websites has similar roots and is part of the Apache Foundation)
but is important to understand when it comes to support, ongoing development and the community (below).
Operations – Support
Although there are independent consultants forMongoDB, the best place to getsupport is from
10gen themselves because they wrote the database so they know it best. They’re able to provide
support contracts with phone and e-mail SLAs.
In contrast, Cassandra has
several companies offering commercial support and whilst they do have committers to the core Cassandra code, I’d argue it’s not the same as having access to the entire engineering team and original authors from a single contact point, as is the case with
MongoDB.
Operations – Ongoing development
Interacting directly with the company that controls the main project, especially forsupport purposes, means you can have bug fixes and changes implemented to the code base. We’ve had numerous fixes committed as a result of problems discovered in our production
usage of MongoDB. We pay 10gen forsupport now but even before we did they were very responsive to bugs. We also getvotes forfeatures and improvements.
In theory this is the same in Cassandra – you’d want bugs to be fixed and features implemented but that doesn’t have to happen because of the nature of open source projects run by volunteers (becomes more complex when companies are paying developers to work
on the project e.g. Eric Evans from Rackspace working on Cassandra full time).
Of course there is a risk that the company behind the project disappears and all the engineers move on somewhere else but the project is still open source and this is the same with any piece of software you might use.
You could also argue there is more direction and focus from a commercial company working solely on the product (and more engineers dedicated to it) but I don’t want to go any further with this point as this post isn’t about open source vs commercial. This
is just one point to be aware of.
Operations – Documentation
The official Cassandra documentation is poor. Researching forthis I had to visit several websites and watch videos even to getexplanations forkey concepts like indexes. There is
better documentation from Datastax but that is still lacking in explaining concepts in any depth.
The MongoDB documentation was good when I first looked at it but is even better nowadays. It’s actually kept up to date and covers all the features, with examples. Nobody likes writing documentation and it shows with many open source projects; another advantage
of having a company behind the project, forcing developers to write the docs! Incidentally, one of the biggest advantages of the PHP language is the extensive documentation, examples and user submitted notes.
When you’re using a completely new data store then documentation is important, and is one of the reasons why I chose MongoDB back in 2009.
Operations – Community
MongoDB has to be
a case study in how to build a community around a product. There have been almost
40 MongoDB conferences in the last year,
a very active mailing list, and
user groups around the world. You know you’re well known when a phrase like “web scale” is associated with your product (as a parody).
Again, this is because there is a company behind the product actively promoting it and encouraging and managing these events.
Cassandra has had 1 conference in that time, and whilst there are user groups (I presented this talk at the London one) it’s certainly not on the same scale as MongoDB.
Does that matter? None of that existed when we chose MongoDB so we learnt everything ourselves. But fornew users today, there’s a huge forum of people who are using MongoDB and are sharing their knowledge freely and easily accessible.
Operations/Usage – Drivers
The other main reason I chose MongoDB was the driver support. All the key drivers forMongoDB were available and most importantly, maintained by 10gen themselves. MongoDB has
official drivers forC, C#, C++, Erlang, Javascript, Java, Perl, PHP, Python, Ruby and Scala. All fully supported.
The Python and PHP drivers were most important to us but we also use the C# driver in our Windows monitoring agent and to have these well maintained just like the core server makes a massive difference.
Cassandra only has official Java and Python drivers with
a few others written by 3rd parties. I’ve found that Python is usually well catered forwhen it comes to libraries that work well. PHP is another story and we’ve had issues with RabbitMQ and ZeroMQ
in the past (specifically not working well under heavy load; they all work fine forplaying around). Good PHP libraries are hard to come by.
Conclusion
There is no conclusion. This post isn’t about which is best, it’s about comparing the two. Both have advantages and disadvantages and to truly compare you need to run them both in production under significant load fora long period of time. MongoDB has worked
well forus and has proven itself at scale and to have flexibility to do things like
building a queueing system as well as be the main data store forour
server monitoring service.
Forme, the operational considerations play a majorpart in making a decision because these types of databases are so new. I would suspect they’re also important to companies looking to adopt this technology. We don’t need a support contract forApache,forexample, because it’s so well proven. Our support contract with 10gen has been well worth the money!
相关文章推荐
- 时间序列数据处理的角逐:MongoDB vs. Cassandra
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: KKovacs
- NoSQL数据库 Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison
- 性能测试:SequoiaDB vs. MongoDB vs. Cassandra vs. HBase
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs Ela
- 性能测试:SequoiaDB vs. MongoDB vs. Cassandra vs. HBase
- 时间序列数据处理的角逐:MongoDB vs. Cassandra
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs OrientDB vs Aerospike vs N
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs OrientDB vs Aerospike vs N
- NoSQL数据库对比:MongoDB vs.Cassandra
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs Ela
- NoSQL比较:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j
- MongoDB vs Cassandra
- [zz]NoSQL对比:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs OrientDB vs Aerospike vs N
- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vsHBase vs Couchbase vs Neo4j vs Hypertable vsElast
- NOSQL数据库大PK:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase 数据库
- NOSQL数据库大比拼:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase