Comparing the Apache Solr and Sphinx search platforms
One of my clients was mentioning to me a huge newspaper archive that he was getting ready to put online. What he was doing with the search — trying to provide access to a huge amount of data with very good throughput speeds, all available on the web — sounded like something that the Apache Solr platform is very good at. (Long-time readers may remember I recently did a webcast and whitepaper on using Solr to do full-text search on data in an RDBMS, so it was on my mind.) When I mentioned it to him, he said he was using Sphinx.
Never having heard of Spinx before, I took a look, and it is also an open-source search platform designed for large indexes. In this case, however, it seems more aimed at databases than individual documents, and I’m not sure how building a faceted search application would go, but it seem reasonably capable.
Always wanting to provide the best advice to my clients — and on the theory that if the only tool in your toolkit is a hammer, the whole world looks like a nail — I took a look around for a comparison between Solr and Sphinx.
What I found seems to confirm what I suspected, based on what I’d heard from my client, and my own experiences. Both are good platforms, and will do what you need. Sphinx seems to have the advantage when it comes to generating the initial index, but once the system is set up, that advantage no longer matters. I’ve heard complaints that Lucene (on which Solr is based) can’t handle more than few thousand documents, but that hasn’t been my experience. I indexed 245,000+ documents on my laptop, and searching was still pretty fast. Finally, as a developer, I can’t tell you how much I appreciate how easy Solr makes it to build an application with some pretty advanced features.
So to me, Solr still has the advantage. Care to weigh in?