Translation


by Transposh

Posts Tagged ‘Database’

NoSQL – The revelation that is gaining momentum

Tuesday, October 20th, 2009

The world of data storage is up for a massive shift. A whole new breed of scalable data stores is gaining popularity & that too the pace is too fast for traditional databases to recoil & grapple with. I am afraid to say, but they are starting to look like a thing of past. The whole data tier is being shaken up as Memcached appears right next to MySQL. While some might see it as a move away from MySQL and PostgreSQL, the traditional open source relational data stores, it’s actually a higher-level change. Much of this  change is the result of a few revelations.

A relational database isn’t always the model or system for every piece of data. They are tricky to scale (especially if you start with a single monolithic configuration–they aren’t distributed by design), when it comes to performance normalization hurts.

The new data stores vary quite a bit in their specific features, but in general they draw from a similar set of high-level characteristics. Not all of them meet all of these, of course, but just looking at the list gives you a sense of what they’re trying to accomplish.

  1. de-normalized, often schema-free, document storage
  2. key/value based, supporting lookups by key
  3. horizontal scaling
  4. built in replication
  5. HTTP/REST or easy to program APIs
  6. support for MapReduce style programming
  7. Eventually Consistent

And I could probably list another half a dozen qualities that many of them share too. But to me, the first two are the biggest departure form the traditional RDBMS. Of course, you can stick with MySQL and go non-relational.

The movement to these distributed schema-free data stores has begun to use the name NoSQL. You can find the overview of  some of the  implementations  by GeekTantra here.

The Future of Scalable Databases

Thursday, October 8th, 2009

For most of us database is synonymous to tables, tuples, SQL, RDBMS, or normalization, but is that what databases actually mean or is it beyond just the relational data-model? Relational data-model although the most popular and the most accepted data-model is not apt for all problems. And how far can we go by mapping all our problems on to the relational data-model. After certain table size eventually the database starts slowing down so we move towards replications via multiple configurations which obviously increases the operating expenses. Now when this is not enough we employ some expensive sys-admins to configure sharding for our database for which we require still more resources or pay a fortune worth of money to the “Big Guys” like Oracle and Microsoft to tweak our databases for performance. But is this the future of databases? I guess not. Let us have a look at what other database options which are not based on the RDM and are free from SQL:

  • MongoDB: It is a very high performance open source, schema-free document-oriented database.It provides a JSON like data-store mechanism which can free the software architects from the limitations of the RDBMS. It also supports full indexing including inner objects and arrays, dynamic queries, query profiling, efficient storage of binary data including blob data, replication and fail-over, auto-sharding for extreme loads and we thought MySQL was ultimate for databases?
  • CouchDB: It is a free and open source document-oriented database written in the Erlang programming language which is a functional language. It is well suited for local replications and vertical scaling. It again has a JSON data-store as documents which need not share a schema, but retain query abilities via views. Views are a combination of aggregate functions and filters and are computed in parallel, much like MapReduce. With bindings for many languages this is sure to become one of the most popular databases in the future.
  • Mnesia: It is a distributed database system written in Erlang. The data-store of Mnesia can be considered Relational but isn’t what someone familiar with SQL might expect. A database contains tables. Relationships between them are modeled as other tables. A key feature of Mnesia’s is tables can be reconfigured within a schema and relocated between nodes, not only while the database is still running but even while write operations are still going on which make both the read and write operations extremely fast and fault tolerant.
  • Cassandra: It is an open-source distributed database management system with a five dimensional Key value hash. It was developed by facebook and open sourced in July 2008. It provides a structured key-value store with eventual consistency. The major components of a Cassandra data-model are Columns, SuperColumns, ColumnFamily and KeySpace. It is considered as a Hybrid of the BigTable and Amazon Dynamo Key Value store. It is currently used by facebook, twitter and Digg.
  • HyperTable: It is an open-source database based on Google’s BigTable. It used HDFS (Hadoop Distributed File System) as a storage file system and is distributed.
  • Amazon Dynamo Key-Value Store: It is a proprietary high availability Key-Value data-store which has properties of both Databases and distributed hash tables. It powers parts of Amazon Web services.

So seeing these I am sure Relational Databases are soon to loose their share of importance when concerns like high scalability of databases arise.