There are a variety of ways to find what you’re looking for when you come across a broken link on the interwebs. Here’s a few methods i like to use.
The first thing you should know is how to use a search engine. Various search engines will attach a special meaning to certain characters and these ‘search operators’ as they’re called can be really helpful. Here’s some handy examples that work for Google as well as some other search engines (and no, you shouldn’t be using Google directly):
OR: ‘OR’, or the pipe ( | ) character, tells the search engine you want to search for this OR that. For example cat|dog will return results containing ‘cat’ or ‘dog’, as will cat OR dog.
( ) : Putting words in a group separated by OR or | produces the same result as just described, however you can then add words outside of the group that you always want to see in the results. For example, (red|pink|orange) car will return results that have red, pink or orange cars.
" ": If you wrap a “word” in double quotes, you are telling the search engine that the word is really important. If you wrap multiple words in double quotes, you are telling the search engine to look for pages containing “that exact phrase.”
site:: If you want to search only a particular domain, such as 12bytes.org, append site:12bytes.org to your query, or don’t include any search terms if you want it to return a list of pages for the domain. You can do the same when preforming an image search if you want to see all the images on a domain. You can also search a TLD (Top-Level Domain) using this operator. For example, to search the entire .gov TLD, just append site:.gov to your query.
-: If you prefix a word with a -hyphen, you are telling the search engine to omit results containing this word. You can do the same -“with a phrase” also.
cache:: Prefixing a domain with cache:, such as cache:12bytes.org, will return the most recent cached version of a page.
intitle: : If you prefix a word or phase with intitle:, you are telling the search engine that the word or phrase must be contained in the titles of the results.
allintitle: : Words prefixed with allintitle: tells the search engine that all words following this operator must be contained in the titles of the search results.
One of the simplest methods of finding the original target of a busted link is to copy the link location (right click the link and select ‘Copy Link Location’) and plug that into one of the web archive services. The two most popular, general archives that i’m aware of are the Internet Archive and Archive.is. The Internet Archive provides options to filter your search results for particular types of content, such as web pages, videos, etc.. In either case, just paste the copied link in the input field they provide and press your Enter key. If the link is ‘dirty’, cleaning it up may provide better results. For example, let’s say the link is something like:
The archive may not return any results for the URL, but it might if you clean it up by removing everything after ‘hunt’.
There are also web browser extensions you can install to make accessing the archive services easier. For Firefox i like the View Page Archive & Cache add-on by ‘Armin Sebastian’. When you find a dead link, just right-click it and from the ‘View Page Archive’ context menu you can select to search all of the enabled archives or just a specific one. Even if the page isn’t dead you can right-click in the page and retrieve a cached copy or archive the page yourself. Another cool feature of this add-on is that it will place an icon in the address bar if you land on a dead page and you can just search for an archived version from the icon context menu.
Of these two services, the Internet Archive has a far more extensive library, but there’s a very annoying caveat with it that defeats the purpose of an archive which is why i much prefer Archive.is. The Internet Archive follows robot.txt directives. I won’t go into why i think this is stupid, suffice to say that content that is stored on the Internet Archive can be removed even if it does not break any of their rules.
Dead links and no clues
If all you have is a dead link with no title or description and you can’t find a cached copy in one of the archives, you may still be able to find copy of the document somewhere. For example let’s say the link is https://example.com/pages/my-monkey-stole-my-car.html. The likely title of the document you’re looking for is right in the URL — my-monkey-stole-my-car — and you can plug that into a search engine just as it is, or remove the hyphens and wrap the title in double quotes to perform a phrase search. Also see some of the other examples here.
Dead links with some clues
If you come across a dead link that has a title or description, but isn’t cached in an archive, you can use that to perform a search. Just select the title, or a short but unique phrase from the description (which preferably doesn’t contain any punctuation), then wrap it in double quotes and perform a phrase search.
Dead internal website links
If you encounter a website that contains a broken link to another page on the same site and you have some information about the document, like a title or excerpt, you can do a domain search to see if a search engine may link to a working copy. For example, let’s assume the title of the page we’re looking for is ‘Why does my kitten hate me?’ on the domain ‘example.com’. Copy the title, wrap it in double quotes and plug it into a search engine that supports phrase searches, add a space, then append site:example.com. This will tell the search engine to look for results only on example.com. Also see some of the other examples here.
YouTube videos you know exist but can’t find
Because there is a remarkable amount of censorship taking place at YouTube, they will sometimes hide sensitive videos from their search results when you use the search engine provided by YouTube. To get around this, use another search engine to perform a domain search as described in the ‘Dead internal website links‘ section.
In some cases, such as with a link that points to a removed YouTube video, you may not have any information other than the URL itself, not even a page title. Using the YouTube link as an example, https://www.youtube.com/watch?v=abc123xyz, copy youtube.com/watch?v=abc123xyz, wrap it in double quotes and plug that into your preferred search engine. You will often find a forum or blog post somewhere that will provide helpful clues, such as the video title or description which you can use to search for a working copy of the video. And the first place to look for deleted YouTube videos is YouTube! You can also search the Internet Archive as well as other video platforms that are more censorship resistant than YouTube, including Dailymotion, BitChute, DTube, LEEKWire and many others.
Broken links on your own website
I don’t know about you, but i have nearly 4,000 links on 12bytes.org as of this writing and many of them point to resources which TPTB (The Powers That [shouldn’t] Be) would rather you knew nothing about. As such, many of the resources i link to are taken down and so i have to deal with broken links constantly, many of them deleted YouTube videos. If you run WordPress (self-hosted – i don’t know about a wordpress.com site) you will find Broken Link Checker by ‘ManageWP’ in the WordPress plugin repository and it’s job is to constantly scan your site to look for broken links. While it is not a bug-free plugin (the developer is not at all responsive and doesn’t seem to fix anything in a timely manner), it is by far the most comprehensive tool of its type that i’m aware of. There are also many external services you could use whether you run WordPress or not.
Ethan Huff and the Green Swastika : Ethan, who writes for Natural News, is an idiot regarding Germany’s role in WW2, the same as Mike “The Health Ranger” Adams is, the founder (or so i assume) of NN. He recently ‘penned’ an article titled The Green Swastika: Environmentalism was a pillar of the Third Reich, the content of which is far more absurd than its absurd title reveals. Ethan gets nearly everything wrong and provably so, and, as usual, i pull fewer than zero punches when addressing such mindless ignorance.
Resources for research : This is just a list of web resources which i’ve used for researching various stuff that i write about. I though i’d share it. Cause i’m like that.
VPS host review: Pride Tech Design : This website is currently hosted by Pride Tech and i wanted to write a review about their service in order to con everyone into signing up with them so i can start raking in piles of kickback cash from Simba at PTD. Either that, or i’ve written a reasonably neutral review of possibly one of the best VPS server hosts on this side of the Milky Way that i’m aware of. You decide.
It’s time to stop relying on corporations which do not respect our privacy. Here are some search engines that, unlike Google, Bing and Yahoo, have a stronger focus on protecting your privacy.
See the recent changes at the end.
Following are some search engines which are more privacy-centric than those offered by the privacy-hating mega-corporations like Google, Bing and Yahoo. Note that several of those listed here are partially or wholly ‘meta’ search engines, meaning they do not index the web themselves and instead rely either partially or entirely upon third parties for their search results, especially Google and Bing. Although these meta search engines are often referred to as “alternative search engines”, they are not true alternatives, however they do provide a valuable service in that they act as a proxy between you and third party services such as Google, thus they can insulate you from the privacy risks associated with using the big search companies directly.
If you have any search engines you would like to suggest, please leave a comment (you need not be logged in).
Decentralized: whether the service is controlled by a single entity, such as Google, or distributed among its users, such as YaCy
Type:meta: uses 3rd party search indexes, such as Google, to deliver search results index: crawls the web and indexes content without relying on 3rd party search engines hybrid: a combination of both the above
Self Contained: whether the website uses 3rd party resources, such as Google fonts, etc.
Client Required: whether you have to install client software in order to use the service
If you want to save search settings without storing cookies, you’ll find their URL parameters here. You might want to use a browser extension that will redirect your searches and load the parameters automatically.
UPDATE: It looks like all, or at least many Searx instances are not providing any meaningful results. This problem is apparently being worked on, so for now i’ll keep Searx in the list.
Notes: Searx is a free, open source meta search engine which i have found to be the best of its type because of its ability to pull results from a wide array of third party services and the comprehensive options it offers. The Searx interface is clean, highly customizable and intuitive. Anyone can run a Searx instance on their own server (see their GitHub page) if they wish, or use any of the existing Searx instances run by others.
Notes: While YaCy doesn’t produce a lot of search results since not enough people use it yet, i think it’s the most interesting search engine listed here. YaCy is a decentralized, distributed, censorship resistant search engine and index powered by free, open-source software. For those wanting to run their own instance of YaCy, see their home and GitHub pages. This article from Digital Ocean may also be helpful.
Upcoming search engines
Presearch: A decentralized search engine powered by the community.
Please leave a comment if you know of any others.
Search engines that didn’t make the cut
Cliqz – The Cliqz search engine, which is an index and not a proxy, is largely owned by Hubert Burda Media. The company offers a “free” web browser built on Firefox. It appears there are two primary privacy policies which apply to the search engine and both are a wall of text. As is often the case, they begin by telling readers how important your privacy is (“Protecting your privacy is part of our DNA”) and then spend the next umpteen paragraphs iterating all the allegedly non-personally identifying data they collect and the 3rd party services they use to process it, which have their own privacy policies. In 2017 Mozilla made the mistake of partnering with Cliqz and suffered significant backlash when it was discovered that everything users typed in their address bar, along with a lot of other data, was being sent to Cliqz, not that this behavior is entirely unique to Cliqz. You can read more about this on HN, as well as a reply from Cliqz, also on HN.
Lessons learned from the Findx shutdown
The founder of the Findx search engine, Brian Rasmusson, shut down operations and detailed the reasons for doing so in a post titled, Goodbye – Findx is shutting down. I think the post is of significant interest not only to the end user seeking alternatives to the ethically corrupt mega-giants like Google, Bing, Yahoo, etc., but also to developers who have an interest in creating a privacy-centric, censorship resistant search engine index from scratch. Following are some highlights from the post:
Many large websites like LinkedIn, Yelp, Quora, Github, Facebook and others only allow certain specific crawlers like Google and Bing to include their webpages in a search engine index (maybe something for European Commissioner for Competition Margrethe Vestager to look into?) Other sites put their content behind a paywall.
Most advertisers won’t work with you unless you either give them data about your users, so they can effectively target them, or unless you have a lot of users already.
Being a new and independent search engine that was doing the time-consuming work of growing its index from scratch, and being unwilling to compromise on our user’s privacy, Findx was unable to attract such partners.
We could not retain users because our results were not good enough, and search feed providers that could improve our results refused to work with us before we had a large userbase … the chicken and the egg problem.
From forbidding crawlers to index popular and useful websites and refusing to enter into advertising partnerships without large user numbers, to stacking the requirements for search extension behaviour in browsers, the big players actively squash small and independent search providers out of their market.
I think the reasons for the Findx shutdown highlight the need for decentralized, peer-to-peer solutions like YaCy. If we consider the problems Findx faced with the data harvesting, social engineering giants like Google, Facebook and the various CDN networks like Cloudflare, i think they are the sort of problems that can be easily circumvented with crowdsourced solutions. Any website can block whatever search crawler they want and there can be good reasons for doing so, but as Brian points out, there are also stupid and unethical reasons for doing so. With a decentralized P2P solution anyone could run a crawler and this could mitigate a lot of problems, plus force the walled garden giants such as Facebook to have their content scraped.
 Startpage uses 1×1 pixel transparent GIF images in the page that serves search results. I had assumed these were tracking pixels and originally stated so in the notes above, however a representative from Startpage contacted me and explained that i was incorrect. Following is a Q&A from a couple of emails i exchanged with them:
Me: regarding the 1×1 gif images, i don’t understand how an image can be used to prevent a 3rd party from setting a cookie – can you explain?
Me: why several 1×1 images are used – why not just 1?
Startpage: It is simpler to offer a different image for each different aggregate count we are keeping.
Me: why do the file names appear to contain a UIN that changes with every search apparently?
Startpage: There is no identifier. Rather, there is something called an “anticache” parameter that has a random number. This prevents the image from being “cached” by the browser – as browser caching would prevent the loading – hence would prevent the aggregate counts from being correct.
Me: why are these clear gif’s are not loaded when 0 results are returned?
Startpage: A different part of the code is used when there are no results, so it might not include the same aggregate counts.
 Personal preferences/settings are not saved if cookies are disabled.
A new tutorial has been published titled Firefox Search Engine Cautions and Recommendations which covers the risks to your privacy when using any of the major search engines in general, but specifically when using the default search engine plugins that are packaged with the Firefox web browser, though this problem is certainly not limited to Firefox. I also cover how to circumvent the risks to your privacy when using the default Firefox search engine plugins, as well as make suggestions for alternative search engines.
I have to say that i’m becoming more and more disillusioned with the multi-million dollar Mozilla corporation and its flagship product, Firefox. Firefox was never a great web browser in my opinion, but it is/was appealing to many because of how completely customizable it is. In it’s earlier days it was just a little slow and buggy, but more recently Mozilla is making highly unethical choices with regard to the privacy-hating corporations they willingly partner with and how these partnerships have manifested and have been monetized in Firefox is a result of utter stupidity and greed in my opinion. I stuck with Firefox all these years because it has always been one of the most hackable browsers out there, but these days i stick with it primarily because i’m not (yet) able to reproduce the functionality i have added to it via add-ons with any other browser, and Chrome is out of the question, much less Google’s spyware version of it.
It’s sad and frustrating that a company who produced a decent, super-highly customizable browser for a niche market has lost its way and turned its back on the very market it once served by deciding to become a Google Chrome clone in order to appeal to the masses.
Screw you Mozilla.
But let’s end on a lighter note, shall we? Here, have a look.