A friend of mine (who shall remain anonymous to protect her anonymous-ness) has asked me a question and I've decided to provide the answer publicly in case others are interested--and because my long-winded answer is hard to fit in a text message.
QUESTION: What's the difference between the surface web, deep web and dark web.
I've noticed a few TV shows lately talking about the dark web. CSI Cyber, for example. They make it sound very sinister, elicit and dangerous. "Oh God, you don't want to accidentally stumble into the dark web or the hackers will get you."
Even though I've been involved with computers and the web for decades I still didn't understand what they meant when talking about the dark web. Surely someone with my experience would know all about this dark web thingy they speak of. Why didn't I?
I educated myself and will now break it down for you.
Let's begin with some background. First, the Internet is different than the Web. The Internet was created in the 60s by DARPA. Al Gore did not invent it, but he did apparently promote it and support it. Back in college I sent email and went on message boards all before the World Wide Web existed. Although some of you will attest that I was bad about replying to email.
The World Wide Web was invented in 1991 at CERN, the particle physics research lab near Geneva, Switzerland. I've been involved with the web in one way or the other since it's beginning 25 years ago.
The web consists of HTML and the HTTP protocol used by browsers to talk to servers and it works on top of the Internet. Kind of like a phone line. Your voice is transmitted over a phone line, but you can also transmit data or other information over a phone. Remember the days of dial up modems? Conversations happen on top of a phone line. The Web happens on top of the Internet.
Anyway, the Internet is the backbone and the Web is a certain way of exchanging information over the Internet.
Because I was in grad school for particle physics at Duke in 1992, I was among the first in the U.S. to see the web at work because it was originally designed for particle physicists. Soon after that I found myself at CERN doing my thesis research where I created websites, both personal (blogging before it was called blogging) and work-related, building websites for my research and for calibrating the particle detectors. My work even got written up in CERN's international magazine which is seen all over the world. I ended up on a Canadian documentary for my website work. And was offered a job to work in England. That felt good for a lowly graduate student. Gave me a grain of confidence that I might be able to support myself one day.
Any computer can be a web server. And any computer can be a web client. The computer you use to surf the web is a web client.
In the early days it wasn't easy to find other computers on the internet. You had to know the IP address to connect to the computer. The domain name system (DNS), which predates the web, eventually allowed us to use more memorable names like (www.yahoo.com) instead of IP addresses like (192.168.22.73). DNS servers translate the domain name into an IP address to find the target computer, just like dialing a phone number connects you to a certain phone on the other end. Of course, the computer you connect to has to be running software that listens for connections just like the phone you are trying to reach has to be plugged in or have a cell signal.
As more people started creating websites, the need arose for a way to locate these websites. Yahoo.com was started by a couple of Stanford students as they began cataloging websites and their domain names. Yahoo became like a phone book for the web.
Now that we have a little background, let me begin to address the main question. What is meant by the surface web, deep web and dark web?
To me these are just words. Blah, blah, blah. It's all just a bunch of interconnected computers. Period.
But even though surface, deep and dark are widely and wildly misused I suppose they will have value once everyone agrees on what they mean. Of course, my description below is the correct one.
The surface web consists of web pages that search engines like Google can find and index. Basically, whatever you can find in Google or Bing or Yahoo is part of the surface web.
The deep web is content that YOU (or someone like you) could get to, but which a search bot could not.
I can setup a website at home and I can access it from my friend's house by using an IP address. But there are no links to it anywhere else on the surface web. Therefore, a search engine will never find it. I've done this many times. Now, it's still "technically" possible for a search engine to find it by randomly searching IP addresses and seeing if something responds, but they don't generally do that as far as I know.
There are other ways something could be part of the deep web as well. For example, it might be possible that the only way to find stuff on a website is by typing in a search box on the website. Googlebot won't even try because it doesn't know what to type, although it certainly could try. It wouldn't surprise me if some day google bot would start to type stuff in search boxes to see if it can find some content that it hadn't found in other ways.
So, you see, the line between the surface web and deep web can be a little blurry. There's a lot of content that is not indexed in search engines because it takes a human with some knowledge to get to it, but technically it could be indexed in a search engine. Nonetheless, if it's not in a search engine it is, by definition, part of the deep web.
There's also another way to reach the deep web. Logon to your bank's website. Or any other website that requires you to login. Because credentials are required, a search engine can't access that content. Congratulations, you're on the deep web.
Let's recap. The surface web is anything a search engine can find and index. The deep web as anything a search engine cannot find and index.
Finally, this brings us to the dark web which, according to TV is the dark and dirty back-alley of the internet for drugs, illegal weapons, terrorists, child porn and dark-hat hackers.
60 Minutes has described the dark web as "a vast, secret, cyber underworld" that accounts for "90% of the Internet."
SIDENOTE: I have to point out again that the Internet and the World Wide Web are not the same thing. The Internet is comprised of the network protocols, machines, switches, routers, fiber optic cables and junction boxes over which data is transferred. The Web is comprised of the HTTP protocol and the web page code that allows people to share content and files over the Internet. There are many other ways besides the World Wide Web to transfer data over the Internet. The Web is just the one that most of us are familiar with today.
So, what is the dark web? And is 60 Minutes correct?
The dark web does consist of servers which cannot be accessed by search engines. So in this sense the dark web is a subset of the deep web.
CAVEAT: Technically, Google could index the dark web the same way you could access the dark web by installing some software on your computer. What they couldn't do is give you a direct link to a certain dark web website. I know this blurs the lines which is why I find the media's coverage of surface, deep and dark so frustrating. So, to be more specific, the surface web is content a search engine could access via direct links in a standard browser.
I know this wasn't clear. Keep reading to get a better understanding of the dark web.
When 60 Minutes suggested that the dark web comprised 90% of the Internet were they correct? Well, actually, they probably meant World Wide Web, not Internet. Second, they probably meant deep web not dark web. Third, 90% is a total wild-ass guess they pulled out of their collective you-know-whats.
Obviously, the 90% number is supposed to refer to the deep web not the dark web. Remember the deep web is just what's not in search engines. The dark web itself, as a smaller subset of the deep web, is certainly much smaller than the surface web.
Some estimates for the relative size of the deep web are much less than 90% and some are much, much more! It doesn't help that deep and dark are routinely confused.
Clear as mud?! Great. Don't worry, you're doing better than the journalists.
The variation in size estimates partly comes down to how you define the deep web. For example, is the content on your own computer part of the deep web? Google can't access your computer, but you're still connected to the Internet. Is everything on your hard drive part of the deep web? Is all the data collected by the NSA and stored in the data silos in Utah part of the deep web? I don't know. It depends on who you ask. If you were confused you have good reason to be. The terms are not well defined and often used in confusing ways by the media.
Back to the dark web.
How exactly is the dark web different from the deep web? I implied that the data on your computer, the data on your bank website and other sites could be part of the deep web because search engines don't or can't index it. But what makes something dark rather than deep?
Well, the dark web is all about anonymity. When you use the surface web--say you use a browser to visit amazon.com--your browser and Amazon are exchanging IP addresses directly and if law enforcement wanted to, they could quickly pinpoint the location of Amazon and, with a little help from your broadband provider, you.
Not so on the dark web. Instead of interacting directly you agree to meet in a neutral location.
Consider an analogy. Say you want to buy something online. Most of us go to amazon.com and checkout using a valid credit card and shipping address. It's a lot like going to Wal-mart and writing a check. You know where the Wal-mart store is. And Wal-mart knows where you live, too, because your address is on the check.
However, the dark web is like two people calling each other from burner cells and agreeing to meet at a neutral location. You meet up on a random street corner wearing a wig and mustache. You exchange cash for a product and go your separate ways. You destroy your burner cells with hammers and ditch them in random trash cans. Neither of you knows where the other lives or what the other looks like. Because you were also disguised when you bought the burner and paid with cash, no one can trace you to that burner either. There is no way for anyone to track you down.
On the Web, this kind of anonymity is accomplished with special software. Specifically, people can use the TOR browser, Tails OS, I2P, Freenet, Subgraph OS, and others. These systems and browsers provide specialized encryption and prevent the transmission of IP addresses and other information that might reveal the user's location or identity.
But website providers can also be anonymous on the dark web. This is much more difficult because most websites have to register some kind of web address or at the very least reveal their IP address which could then result in law enforcement (or criminals) being able to determine a location. Not good.
So how can someone access your dark web content on your dark web website if they can't find you?
Here's how it works.
You may have heard of something called peer-to-peer networking. If not I'll explain. Most web interactions are what's called client-server. My web browser is a client and I directly contact a web server like amazon.com. But with special software you can connect to anyone else running that same software. File sharing systems like napster, gnutella and bit torrent are peer-to-peer networks. Files and data can be shared with many different peers across a peer-to-peer network so that even though you are able to access content you don't necessarily know where it originated.
The TOR network is such a peer-to-peer network but with some extra goodies to better ensure anonymity. When you setup a TOR site it picks several random peers to be interaction points which advertise the service across the network without revealing its original location. When you decide to connect to the service, the TOR network establishes random rendezvous points, or computers essentially, where the source files get handed off to you. Because the transmissions are encrypted and decrypted at every juncture, you can't actually trace the content back to it's origin. All your browser can know is the first computer's IP address that it connected to. It has to know this in order to talk, but there are many more steps after that which are hidden.
Consider an analogy. Suppose I know a guy named Raul. Incidentally, that's not his real name and he also wears a disguise and uses a fake accent. But the one thing I do know about Raul is that he hangs out on a certain street corner. I therefore have a way to locate Raul. I go to him and ask him what sorts of things he can do for me. He knows the names of certain "businesses" and what they provide, but that's all. So you tell him you want something. He says you can get it from "Rummage.onion" which, of course, means nothing to you. Next day you get a box back with the information you requested. You don't know what Raul did with the box and he doesn't know what his contact did with the box. But the box could have been transferred between fifty or more different people each one not knowing anything except who the previous and next person was in the chain. And this chain of people is never the same.
Back to the web, lots of computer protocols and services are used in the process to secure and randomize the communication without any one computer knowing more than it has to know.
This sounds pretty sophisticated, but there are still vulnerabilities in these networks that can leak information and smart hackers occasionally find ways to exploit weaknesses.
I know you're just curious, but if you do try to access the dark web with something like the free TOR browser pack, understand that most of what you'll see are scams. There will probably be a lot of offensive content. And some of the services are very likely honeypots. A honeypot is a fake website setup by law enforcement (or criminals) designed to attract (or capture) people doing illegal (or stupid) things and thereby catch them in the act (or rob them blind). While the network will, in theory, make you anonymous I wouldn't trust it myself. Law enforcement (or nefarious governments) may still have ways to track you down through the network.
However, I'm sure it would require serious sophistication, resources and connections to thwart the anonymity of the dark web. Perhaps the main vulnerability would be installing some questionable software you downloaded from the a dark web network. These could have all sorts of Trojan horses and once they run on your computer, your computer could be compromised. But you have this same risk on the surface web, of course. Viruses, keystroke trackers and back doors are embedded in games and other programs you might download on the web. This is why you should always be careful about which websites you trust and you should always use some sort of antivirus software on your computer.
If my description of the dark web has made any sense, then you'll see that you can't accidentally end up on the dark web in the same way you can't accidentally end up in the Burnside neighborhood of Chicago.
In the end, there's nothing all that special about surface web vs. deep web vs. dark web. It's all just a bunch of interconnected computers with different software programs running on them that do stuff. That stuff varies and some of it is designed to be secure, hidden or anonymous. Meanwhile, just as with going out shopping, some places are safer than others.