Add enterprise level search into your site.
News and Follow/Ups – 01:00
Square now being sold in Apple’s store
Check-Ins dying out?
Dropbox: 25 million users
Geek Tools – 14:13
Yikerz! – Super fun magnet game
Webapps – 16:12
Surfboard – Flipboard as a web app
InstaLyrics – Find lyrics quickly
Full Text Search – 22:11
Google Custom Search
Super fast to setup
Easy to implement
Ability to add adsense into search results
Unable to adjust content ranking and do custom integration
Mainly for just indexing HTML pages, not search queries and other text.
“Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.”
Open source with commercial support
Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
The search service daemon (searchd) is pretty low on memory usage – and you can set limits on how much memory the indexer process uses too.
Java, PHP, Python, Ruby, Perl, C, and other languages.
Written in C++
60+ MB/sec per server
Biggest known Sphinx cluster indexes 5 billion documents, resulting in over 6 TB of data. Busiest known one is, unsurpisingly, Craigslist, that serves 50+ million search queries/day.
Companies using Sphinx
Done by the Apache foundation
Written in Java
ranked searching — best results returned first
many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
fielded searching (e.g., title, author, contents)
sorting by any field
multiple-index searching with merged results
allows simultaneous update and searching
over 95GB/hour on modern hardware
small RAM requirements — only 1MB heap
index size roughly 20-30% the size of text indexed
Lucene is a library where Solr is a server that supports XML, REST
Benefits over Sphinx
Solr is easily embeddable in Java applications.
Solr can be integrated with Hadoop to build distributed applications
Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can’t.
Companies using Solr