Tuesday, November 29, 2011

Search as Data Store

A while back I came across Solandra, and began wondering about using it as a general purpose store for a typical content-heavy webapp. The idea was based on a pretty vague notion that combining search with NoSQL would be powerful. What Solandra does is bolts the SOLR search engine onto Cassandra instances, resulting in a distributed search with all the data persisted in a NoSQL store.

In poking around, I've found that it is not unheard of to use SOLR as a web backend. Perhaps the most prominent example would be The Guardian's use of SOLR for their "Content API." There was also supposedly a twitter-like example that "uses the Lucandra store exclusively and does not use any sort of relational or other type of database", though the link did not work as I wrote this (Lucandra is the predecessor to Solandra).

Why not just stuff everything into SOLR? It is effectively a document store with powerful and fast queries. Sure it does not support transactions and the data is not normalized in a relational sense. But do you need those things for content? What I mean by content is dynamic stuff that might be edited by humans or updated by feeds: articles, comments, tags, product descriptions, locations; data that is viewed a lot but edited infrequently, not super structured or requiring crazy integrity. With the Cassandra backing, you should at least get durability and eventual consistency on the data that you shove into SOLR (Solandra).

I guess now I will have to give it a try, see where the trade-offs are in practice between search, NoSQL, and Relational stores on a real content project. Who's got one for me? :)