DATABASES predate the web by decades, yet the efficiency of the Internet's most popular services is directly related to their performance. NoSQL is looking to take this essential information technology discipline into the Internet age.
Given our dependence on databases, it is surprising to see such a fundamental technology remaining relatively unchanged in an industry that has several technology cycles every decade. As Erik Ljungstrom, technical team leader at Dedipower Managed Hosting says, most database administrators have to "hack around problems that were not around 30 years ago".
Ljungstrom made his comment as he promoted NoSQL, a relatively new database system optimised for workloads typically found in serving Internet workloads. The idea of NoSQL was first mooted back in 1998 as a lightweight, open source relational database. It obscures the SQL interface from the user and enforces no database schema, while relationships between data are stored in a separate graph.
Although NoSQL fell out of favour in the early 2000s, interest in it was rekindled by hosting providers such as Dedipower, who promoted it as an alternative to traditional database systems. As the Internet moves from transmitting crumbs of stored data to streaming large chunks of data such as video and pictures, firms are starting to question whether database optimisation is the most efficient way forward.
Ljungstrom, keen to emphasise that NoSQL isn't a complete replacement for traditional relational database management systems (RDMBS), says that the technology provides website architects with a viable alternative in certain use cases. That's not a statement without any real world examples, since firms such as Amazon, Google, Ebay and Facebook already use NoSQL implementations to power customer facing parts of their businesses.
Apache's implementation of NoSQL comes under the guise of Cassandra and has been adopted by many firms including Datasift, the outfit behind Tweetmeme. Datasift has the enviable position of having invented the popular 'Re-tweet' button that's found on thousands of websites, and it uses Cassandra to store all its live data. Nick Telford, a system developer at Datasift, said Cassandra offers unseen benefits to small companies aside from the usual free, open source benefits of low cost, good security and agile flexibility.
According to Telford, Datasift sets up Cassandra so that it makes use of low level hard drive characteristics, allowing it to get away with purchasing inexpensive hardware. This is one reason why a firm that receives around a third of all Twitter messages sent needs only 15 low-end servers to handle the load. The servers all have just standard consumer grade equipment, including Intel Core 2 Duo processors and two 1TB SATA hard drives.
"We have optimised for sequential writes," said Telford, referring to the fact that as hard drive platter density increases that brings higher sequential performance, something Datasift exploits through a process of internal data compaction. Telford said that Datasift optimises data placement to reduce head seeking, which is a characteristic better served by expensive high RPM enterprise hard disk drives and solid state drives.
Telford was proud to say that Datasift is not only able to handle this considerable load but do it "on the cheap". That might be a gung-ho message but it's also one that might inspire many small, up and coming firms to look into NoSQL.
As Ljungstrom said at the start of his talk, NoSQL isn't a magic bullet. The document store is not particularly good for searching and Telford claims that Datasift's testing has found unoptimised data reads can have a latency of around 100ms while write times are as low as 0.1ms. Telford proposes data caching to improve read performance.
Read performance issues aside, Telford admits that Datasift uses Sun's MySQL RDBMS internally in situations "where you need guaranteed data consistency, ad-hoc queries and relational data". It's a line that doesn't inspire total confidence in Cassandra, however when presented with Datasift's track record of uptime along with performance and cost benefits, it's hard to argue against NoSQL and Cassandra.
While Ljungstrom claims NoSQL "often has built in resilience", Telford detailed Datasift's replication strategy. Of its 15 servers, data is replicated on three and, talking exclusively to The INQUIRER, Telford said Datasift will have to reconsider its replication strategy as the number of servers grows, with a directly proportional growth in its replication ratio with the number of servers deployed.
Aside from resilience, consistency is likely to be NoSQL's Achilles heel. Some implementations of NoSQL offer ACID guarantees while others don't, and that uncertainly will need to be cleared up for big businesses to seriously consider the adoption of NoSQL.
NoSQL, like many database systems before it, isn't a magical solution for all data serving and management needs, however evidence presented by Dedipower and Datasift show NoSQL to be an effective alternative to traditional RDBMS in certain uses.
It's unlikely that IBM and Oracle will be concerned about NoSQL implementations taking away business from DB2 and Oracle DB just yet, because while newer, smaller firms are happy to embrace new technology, older firms that are heavily invested in an RDBMS are unlikely to ditch or even supplement their existing systems with alternatives all that easily.
As Chris Miller, CEO of Dedipower said, "NoSQL is not the end of RDBMS" and of course he's correct. What NoSQL shows, however, is that traditional software and database system architectures designed decades ago might not always be the best solutions for all present and future needs. That's not to say that traditional technologies such as complex RDBMS systems will lose their places in modern technology, but that new alternatives are worthy of consideration. µ
So let me ask this: what does the NSA use? (or the british equivalent).
That will convince me more than this ad.
The only thing new, again, is that we can ISAM (Indexed Sequential Access Method) and other database types (even SQL) is that we can fit much of, if not all of, the database in memory for now. Even fitting in much of the indices and a large subset of the database content (think super-cache) can markedly improve read-decision-write cycles.
All I knew back in the 70's and early 80's is knew again (pun intended). C'est la guerre.
Its community interprets 'NoSQL' as 'Not Only SQL'
SQL RDBMS databases are seen as a SubSet, not as an alternative.
It basically refers to all types of structure storage, INCLUDING the 'traditional' SQL-RDBMS systems.
Is it just me?