It's time to stop relying on corporations which do not respect our privacy. Here are some search engines that, unlike Google, Bing and Yahoo, have a stronger focus on protecting your privacy.
Unlike meta search engines such as DuckDuckGo, Startpage, etc., which rely either partially or entirely upon third parties for their results (primarily Bing and Google), all search engines listed here maintain their own indexes meaning they actively crawl the web in search of new and updated content to add to their catalogs. A few are hybrids, meaning they rely partially upon a 3rd party engine.
Although meta search engines are often referred to as "alternative" search engines, they are not true alternatives since they are subject to the same censorship/de-ranking practices of the companies upon which they rely. Such search engine companies are really proxies in that they may provide a valuable service by insulating you from privacy intrusive third party services, however this is not always the case. To gain some insight as to the relationships between search engines, see the excellent info-graphic provided by The Search Engine Map website.
If you are going to use a meta search engine which relies upon a 3rd party, those which depend on Microsoft's Bing seem to return generally better results than those which rely upon Google, especially when searching for sensitive and censored information, though i don't expect this to last since Bing and DuckDuckGo are working together to censor Bing's results.
If you have any indexing search engines you would like to suggest, leave a comment (you need not be logged in). To install search engine plugins for Firefox, see Firefox Search Engine Cautions, Recommendations.
Legend:
Decentralized: (yes/no) whether or not the service depends upon centralized servers or is distributed among its users, such as YaCy
Type: (index/hybrid) indexing search engines crawl the web and index content without relying on a 3rd party, whereas hybrid search engines are a combination of both meta and index
Requires JS / Cookies: (yes/no) whether the website requires JavaScript and/or cookies (web storage) in order to function
Self Contained: (yes/no) whether or not the website uses 3rd party resources, such as Google fonts, etc.
Client Required: (yes/no) whether or not you have to install client software in order to use the service
License: (proprietary/<license type>) whether the source code is available and, if so, the license type
Brave Search is in the process of building its own index, however until that is complete it also pulls results from 3rd parties, primarily Bing and Google.
The search interface is attractive and intuitive. Unfortunately there are few options for tailoring the search results or the interface, however some of the more important options are in place, including regional and date search options.
Gigablast is a free and open source search engine that maintains its own index.
The search interface offers some useful options, such as selecting the format of the output, several interesting sorting options, time span options, file type options and plenty of advanced syntax options.
I couldn't find a privacy policy, but decided to include it anyway since it is open source.
You can install and run Gigablast on your own server.
Good Gopher was apparently developed by Mike Adams, editor of the NaturalNews.com website, and appears to be unmaintained.
As stated in the Good Gopher privacy policy, their search results are censored in that they filter out what they and their users consider to be "corporate propaganda and government disinfo", while simultaneously promoting the undisputed heavyweight king of propaganda and disinformation, Alex "Bullhorn" Jones.
The core of their privacy policy consists of a few vague paragraphs, the bulk of which has nothing to do with user privacy.
Revenue is generated by displaying ads in the search results, though they state they are very particular about who may advertise on the platform.
LookSeek appears to be owned by Applied Theory Networks LLC and apparently has been around a while. The software seems to be propitiatory, but they do have a decent, clear and brief privacy policy.
The search interface is rudimentary, to say the least, and there doesn't appear to be any configuration options.
LookSeek states they have "no political or social bias".
Marginalia Search is a very interesting, open source, niche search engine which describes itself as "an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed".
One very useful aspect of Marginalia Search is that it allows you to choose the search result ranking algorithm which compiles the search results in different ways, such as by focusing on blogs and personal websites, academic sites, popular sites, etc..
Another potentially unique feature of Marginalia Search is that the results include some information about the website, such as how well the site fits with your search terms, what markup language it is written in and whether it uses JavaScript and/or cookies. Additional information is also provided regarding the content and dependencies for a given site, including whether it employs tracking /analytics, whether it contains media such as video or audio, and whether it contains affiliate links, such as Amazon links.
Mojeek is a UK based company founded in 2004. The company operates it's own crawler and promises to return unbiased results. I think Mojeek is currently the most usable and one of the most promising of all the search engines listed here. Mojeek is very open about how they operate and development of the search engine and its algorithms are driven in part by soliciting input from users.
The search interface is clean and they offer quite a few options to customize how searching works and how the interface looks. Also available are advanced search options and another tool it calls 'Focus' which can direct search terms to specific domains. One can also configure how many search results per domain are returned and if more than that number are available, Mojeek adds a link under the result which will open a new page with those results when clicked. If you enter a domain as the search term, Mojeek offers the option to search within that domain. The engine also supports some search operators including site: and since:, the latter of which is similar to the date: operator used by Google.
Mojeek has a simple, clear and solid privacy policy.
Private.sh uses the Gigablast engine and is therefore very similar in terms of search results. I felt it was worth having its own entry because they offer additional layers of privacy which strips your IP address and encrypts searches on the client using JavaScript before they are sent to the server, thus even Private.sh apparently doesn't know what you're searching for. As with Gigablast however, there is no privacy policy.
The search interface is bare and there are no options other than the ability to perform an advanced search. There are only two scopes of searches, they being web and news.
Right Dao searches seem to be fairly comprehensive and so this search engine is a solid choice when looking for politically sensitive information that Google and others censor. While the engine accepts phrase searches, that functionality seems to be very broken.
Wiby is an interesting, open-source search engine which is building an index of personal, rather than corporate websites. The interface is very plain and there was only one option in the settings, however it was designed to work well with older hardware.
While YaCy doesn't produce a lot of search results since not enough people use it yet, i think it's one of the more interesting search engines listed here.
YaCy is a decentralized, distributed, censorship resistant search engine and index powered by free, open-source software. For those wanting to run their own instance of YaCy, see their home and GitHub pages. This article from Digital Ocean may also be helpful.
Footnotes
While JavaScript is not strictly required, functionality may be reduced if it is disabled.
Refusing to accept cookies may result in settings not being saved.
Alexandria is a very new, open-source search engine with its own index, though it's currently built using a 3rd party. The first version of the source code appeared on GitHub in late 2021. The index is very small at the moment and therefore the service isn't really useful yet.
The interface is sparse and there are currently no options for customizing anything, however there are plans to improve the service.
There was no formal privacy policy at the time of writing, however the little information there is indicates a strong regard for privacy. By default they store IP addresses along with search queries in order to improve the service, however they promise to never share this information and there is an option to disable this behavior.
Alexandria is worth keeping an eye on.
I contacted Alexandria in April of 2022 with some questions. Following is our exchange:
Q: what are your values regarding user privacy? A: We care a lot about user privacy and plan to let users decide how much they want to share. We run Alexandria.org as a non-profit so we have no incentive to store any info other than to make the search results better.
Q: i see that you have a dependency on rsms.me - depending on 3rd parties is always a privacy and security concern and i think it is often unnecessary - it looks like it's only css that's being imported at the moment, but do you plan on adding any other 3rd party dependencies? A: Yes we use the Inter font which is open source, we just think it is a nice looking font. We generally have a high threshold for using a 3rd party dependencies but I think it is impossible to build everything ourselves so if there are things other people are better at than us and it is not in our core mission to build it we will use third party solutions. For example we depend on Hetzner for servers, we depend on commoncrawl for high volume scraping. But it's quite likely that we remove that dependency when we redesign the website next time.
Q: what are the long-term goals for Alexandria? A: The long terms goal is to make knowledge as accessible as possible to as many people as possible. We want to give the users of alexandria.org info that are in their best interest without having to think about advertisers or other third parties.
Q: will you offer unbiased results? A: Our bias should be to show the results that are likely to be the most useful for users, so that is what we are aiming for.
Q: do you respect robots.txt? personally i'm fine with it if you do not since it seems Big Tech is making it difficult for the little guy to compete in this market A: Our index is primarily built with data from Common Crawl. But when we do crawling our self we respect robots.txt. Our main problem with scraping is not robots.txt, but that many big/valuable sources of information are behind cloudflare and similar services or otherwise closed to scarping.
Q: how do you plan to finance the project? A: In the long term we hope to be able to finance it with donations.
Q: what is the current size of your index roughly (pages) and at what rate is it growing? A: Right now we are just using a very small index while rebuilding big parts of the system. The current index is around 100 million urls. Pretty soon we plan to have 10 billion urls indexed.
Q: what search operators will you/do you support (site:, title:, date: etc.)? A: None right now. The first one we will implement is site: since it is quite simple.
Q: because the code is available, will anyone be able to run Alexandria on their own server and how will that work? will each instance be independent, or might the indexes be shared across all servers? A: Our index is not open source at the moment. So anyone who want's to create their own search engine will have to create their own index by crawling the web themselves or downloading data from common crawl or similar.
Presearch is (currently) yet another meta search engine which is ultimately powered by Big Tech in that it relies on multiple corporate giants for its search results.
Presearch appears to be largely centralized at the moment, though decentralization is a stated goal. In the future Presearch is to be powered largely or entirely by the community in that anyone can run a node and help build an index with content curated by users.
The interface is interesting in that you can select among many different search categories, however it unnecessarily requires JavaScript to be enabled before one can initiate a search and again to display any results.
Presearch uses code from several 3rd parties including bootstrapcdn.com, coinmarketcap.com, cloudfront.net and hcaptcha.com. Such dependencies are often unnecessary, resulting in bloated and potentially insecure platforms which may not be privacy friendly.
Presearch incorporates "PRE" tokens, yet another form of digital currency which is apparently used for a variety of purposes including to incentivize people to use Presearch, financing the growth of infrastructure and to insure the integrity of the platform. While people can apparently earn "PRE" when using the search engine, withdrawing their earnings appears to be a convoluted process which is not always successful (see here and here for example).
While Presearch may have potential, the realization of its goals of decentralization and the building of its own index need to be met before it becomes a viable service.
De-listed search engines
DuckDuckGo
DuckDuckGo has openly admitted to censoring and de-ranking search results as well as working with Microsoft's Bing in order to influence their results (DuckDuckGo relies heavily on Bing). In one instance they blacklisted voat.co, a former free speech social platform, and on March 10, 2022, DuckDuckGo's CEO, Gabriel Weinberg, tweeted the following:
Like so many others I am sickened by Russia’s invasion of Ukraine and the gigantic humanitarian crisis it continues to create. #StandWithUkraine️ At DuckDuckGo, we've been rolling out search updates that down-rank sites associated with Russian disinformation.
Weinberg apparently had no problem when the U.S. invaded Iraq, Syria, Libya, etc., nor any problem with Black Lives Matter and Antifa terrorists burning and looting cities throughout the U.S., but he suddenly developed a selective crises of conscious when Russia invades Ukraine, which happens to be full of U.S. supported terrorists.
DuckDuckGo also admitted to influencing Microsoft's Bing search results according to a New York Times article:
DuckDuckGo said it "regularly" flagged problematic search terms with Bing so they could be addressed.
DuckDuckGo continues its race to the bottom. From an April 15, 2022, TorrentFreak article:
Privacy-centered search engine DuckDuckGo has completely removed the search results for many popular pirates sites including The Pirate Bay, 1337x, and Fmovies. Several YouTube ripping services have disappeared, too and even the homepage of the open-source software youtube-mp3 is unfindable.
On or around 25 May, 2022, it was discovered that DuckDuckGo was allowing tracking by Microsoft:
DuckDuckGo's founder Gabriel Weinberg has admitted to the company's agreement with Microsoft for allowing them to track the user's activity. He further stated that they are taking to Microsoft to change their agreement clause for users' confidentiality.
DDG's founder (Gabriel Weinberg) has a history of privacy abuse, starting with his founding of Names DB, a surveillance capitalist service designed to coerce naive users to submit sensitive information about their friends. (2006)
Qwant's privacy policy has apparently deteriorated. They collect quite a lot of data, some of which they share with 3rd parties. Most disturbingly is, like DuckDuckGo, they censor results. Someone from Qwant tweeted the following on March 1, 2022:
#UkraineRussiaWar In accordance with the EU sanctions, we have removed the Russian state media RT and Sputnik from our results today. The neutral web should not be used for war propaganda.
As of somewhere around 2018 or 2019, Startpage was partially bought out by Privacy One Group/System1 which appears to be a data collection/advertising company. Source: Software Removal | Startpage.com
Other search engines
The Search Engine Party website by Andreas is well worth visiting. He has done an excellent job of compiling a large list of search engines and accompanying data. Also see the 'A look at search engines with their own indexes' page by Rohan Kumar who did an excellent job of compiling a list of engines that maintain their own index, however do note that privacy was not considered.
Reader suggested search engines that didn't make the cut
The Cliqz search engine, which is an index and not a proxy, is largely owned by Hubert Burda Medi. The company offers a "free" web browser built on Firefox.
It appears there are two primary privacy policies which apply to the search engine and both are a wall of text. As is often the case, they begin by telling readers how important your privacy is ("Protecting your privacy is part of our DNA") and then spend the next umpteen paragraphs iterating all the allegedly non-personally identifying data they collect and the 3rd party services they use to process it, which then have their own privacy policies.
In 2017 the morons at Mozilla corporate made the mistake of partnering with Cliqz and suffered significant backlash when it was discovered that everything users typed in their address bar was being sent to Cliqz. You can read more about this on HN, as well as a reply from Cliqz, also on HN.
I was anxious to try this engine after seeing it listed in NordVPN's article, TOP: Best Private Search Engines in 2019! and so i loaded the website and i liked what they had to say. Unfortunately, Gibiru not only depends on having JavaScript enabled, it depends on having it enabled for Google as well. Fail! It seems Gibiru is little more than a Google front-end and a poor one at that.
I added Search Encrypt to the list and later removed it. The website uses cookies and JavaScript by default, their ToS is a wall of corporate gibberish and their privacy policy is weak.
Lastly, Search Encrypt doesn't seem to provide any information about how they obtain their search results, though both the results and interface reek of Google and reading between the lines clearly indicates it is a meta search engine.
Like Search Encrypt, Yippy, bought by DuckDuckGo, was another ethically challenged company with a poor privacy policy looking to attract investors. Yippy used cookies by default and wouldn't function without JavaScript. Yippy was also recommended by NordVPN.
Evaluating search engines
There are several tests that you can perform in order to determine the viability of a search engine. To get a sense of whether the results are biased, i often search for highly controversial subjects such as "holocaust revisionism". If you preform such a search using Google, Bing or DuckDuckGo, with or without quoting it, most or all of the first results link only to mainstream sources which attempt to debunk the subject rather than provide information regarding it. If you perform the same query using Mojeek however, the difference quite dramatic. Rohan Kumar also offers several great tips for evaluating search engines in his article, A look at search engines with their own indexes:
"vim", "emacs", "neovim", and "nvimrc": Search engines with relevant results for "nvimrc" typically have a big index. Finding relevant results for the text editors "vim" and "emacs" instead of other topics that share the name is a challenging task.
"vim cleaner": should return results related to a line of cleaning products rather than the Correct Text Editor.
"Seirdy": My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
"Project London": a small movie made with volunteers and FLOSS without much advertising. If links related to the movie show up, the engine’s really good.
"oppenheimer": a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: "magna carta" (intermediate), "the prince" (very hard).
Lessons learned from the Findx shutdown
The founder of the Findx search engine, Brian Rasmusson, shut down operations and detailed the reasons for doing so in a post titled, Goodbye – Findx is shutting down. I think the post is of significant interest not only to the end user seeking alternatives to the ethically corrupt mega-giants like Google, Bing, Yahoo, etc., but also to developers who have an interest in creating a privacy-centric, censorship resistant search engine index from scratch. Following are some highlights from the post:
Many large websites like LinkedIn, Yelp, Quora, Github, Facebook and others only allow certain specific crawlers like Google and Bing to include their webpages in a search engine index (maybe something for European Commissioner for Competition Margrethe Vestager to look into?) Other sites put their content behind a paywall. [...]Most advertisers won’t work with you unless you either give them data about your users, so they can effectively target them, or unless you have a lot of users already. Being a new and independent search engine that was doing the time-consuming work of growing its index from scratch, and being unwilling to compromise on our user’s privacy, Findx was unable to attract such partners. [...]We could not retain users because our results were not good enough, and search feed providers that could improve our results refused to work with us before we had a large userbase … the chicken and the egg problem. [...]From forbidding crawlers to index popular and useful websites and refusing to enter into advertising partnerships without large user numbers, to stacking the requirements for search extension behaviour in browsers, the big players actively squash small and independent search providers out of their market.
I think the reasons for the Findx shutdown highlight the need for decentralized, peer-to-peer solutions like YaCy. If we consider the problems Findx faced with the data harvesting, social engineering giants like Google, Facebook and the various CDN networks like Cloudflare, i think they are the sort of problems that can be easily circumvented with crowdsourced solutions. Any website can block whatever search crawler they want and there can be good reasons for doing so, but as Brian points out, there are also stupid and unethical reasons for doing so. With a decentralized P2P solution anyone could run a crawler and this could mitigate a lot of problems, plus force the walled garden giants such as Facebook to have their content scraped.