People should really read VGs special, which covers things like:

* How VG found the IP of the Tor Hidden Server, and discovered that it was being run by Australian police.

* Why Australian police got to run the site, despite having no direct connection to it, due to how their laws allow the police to act in ways that are criminal elsewhere (basically digital black sites).

* How the forum had a policy that all posts from the admin had to include rape images, as an attempt at preventing police from honeypoting the site.

* The ethics of the police running a CP forum, including police posting CP themselves.

https://www.vg.no/spesial/2017/undercover-darkweb/

>He found one weakness: By asking the server the right question, it would reveal his own IP address .

>The question was asked and the server replied. It was located in Sydney, owned by the server Digital Pacific.

What the hell does that even mean? It almost sounds like they were doing packet sniffing and looking at the IP addresses of where the packets came from. Any non-tor exit node IPs could be the server's IP. Similar strategy was used against Silk Road.

They have a more technical description of the process further down in the article, in the grey box:

"IP addresses and physical server locations are inherently difficult to find on the Tor network. So how did VG’s computer expert get the forum to disclose this information?

1. Profile picture upload

The forum allowed users to upload a profile picture. This picture could also be fetched from a user-supplied URL.

2. The leak

This is where the information leak occurs. Configured for optimal security, the forum’s software and/or server would fetch the remote profile picture via Tor. Childs Play did not – all traffic to external sites originated from the server’s real IP.

3. The IP address is exposed

By telling the forum to fetch a picture from a server Stangvik controlled, he could see in his server logs that the originating IP was with a hosting provider in Sydney – Digital Pacific. Stangvik went on to confirm that outgoing DNS requests originated from the same provider, and that the forum’s software also loaded images included in forum post previews from the same IP.

4. A proxy, VPN or Tor Exit?

The next question was whether the IP belonged to a Tor Exit Node, a VPN or a proxy server. An IP can hide just about anything. How could he confirm that this was the forum’s location, rather than just a node in a chain of redirects? Stangvik applied three improvised techniques:

5. Timing between the servers

He rented a virtual server with Digital Pacific – the same place as where the suspected IP was located. He then updated the profile picture URL to point to this server. Upon receiving an incoming profile picture request, Stangvik’s server would respond with a redirect to another URL on the same virtual server. Repeating this redirection process several time, Stangvik was able to isolate and measure the roundtrip-time between the two servers. The measurements yielded very low times, consistent with a forum server in close vicinity of his rented server.

6. Measuring intermediate nodes

Stangvik also paid attention to so-called «Time To Live» values on the incoming data packets. These provide some insight into how many intermediate parties are involved from the sender to the recipient. In this case, the values indicated that there were at most one intermediate – a typical result if the servers were located in the same room.

7. Measuring packet size

The final test started to get advanced: Measuring MTU (Maximum Transmission Unit) and packet fragmentation. Each packet in a computer network has a maximum transmission size, based on which intermediates it passes through. Each encapsulating technology, such as VPNs, can result in the total packet size increasing beyond the maximum size, and local networks usually have larger maximum sizes than the “tubes” found on the internet. If the maximum size is surpassed, the packet will be broken into multiple fragments.

By crafting long profile picture URLs, and setting specific packet flags, in the redirects returned by his custom web server software, he could see that the MTU was consistent with that of high-speed local area network traffic, and also ruled out VPN configurations."

This is why any server you want hidden should be behind something like whonix where no process running on the server should know the IP.

Damn. If only those child pornographers knew this tip.

If I were going to do something so very illegal, I'd research the hell out of it. If I were going to run something like this, I'd pretty much want to be an expert on the architecture and software stack.

I'd be asking hundreds of questions, taking courses, and using as many layers as would be reasonable to hide even my efforts at learning. I'd probably be obsessed with security, more so than I am now. Much more so, in fact.

And then you'd likely be in a select group of people who could be investigated individually. OPSEC is hard in most circumstances, but it's very hard if you're trying to be an expert on one topic in a short period of time.

Yeah, I'd even have to make a point to hide my learning. Only a specific subset of people would be asking how best to allow uploads while ensuring the IP address was masked via Tor. Added with other questions, it'd put me into a pretty narrow group, so even gathering information would need to be masked.

Fortunately, I don't actually want to do anything illegal. That will make it easier. I do kind of want to learn how to set up a hidden service, but just to satisfy my curiosity.

The original silk road fell because of an opsec failure in a post on stackoverflow...

A few weeks ago, and prompted by an HN post, I considered writing about the possibilities if ML as applied to large aggregate datum and with criminal investigation as the motivation.

With all the public posts, writing style analytics, and use of a common moniker across services, it seems that it may be possible to do just that on a large scale. It seems that it could be made trivial to narrow down lists of suspects by crunching large data sets that contain stuff like SO questions, AC posts on Slashdot, or responses on HN.

After all, how many people are actively seeking to secure a message board as a hidden service and doing so at that time? I sort of envision it as having some commonality with the timing attacks already in use to deanonymize Tor users.

Subject A asked about securing IP addresses for uploads and Sevice A got this feature two weeks later. Subject C asked about this security aspect and Service A has that concern. Subject Q asked about using this forum software and requested this modification. Service A uses that software, etc...

So, maybe Subjects A, C, and Q are all the same people.

While it doesn't prove much, it does potentially aid in narrowing down the list of suspects. Coupled with other bits of information, it may narrow the list down significantly.

That and there are huge sets of data out there. Processing that intelligently, and rapidly, could really change the way investigations are done.

That's what XKEYSCORE does, pretty much. The NSA has spent a lot of time working on these sorts of techniques.

No it doesn't - or at least there is no claim I've seen that this is part of XKEYSCORE. It's not mentioned on the Wikipedia page either[1].

However this is an active areas of research in both classified and (presumably) non-classified areas. See for example this search: https://scholar.google.com.au/scholar?q=related:KbJLbpaKfCkJ...

[1] https://en.wikipedia.org/wiki/XKeyscore

Right, the NSA has a bunch of anti-Tor tools that usually are called QUANTUM-whatever. However, correlation of people across different networks is something that XKEYSCORE does. There's also the writing deanonymisation tools that you mention (but there's Anonymouth[1] which could help).

My original point was that OPSEC is hard if you're trying to be a topic expert in a short period of time. You don't need NSA tools to attack someone in that situation.

[1]: https://github.com/psal/anonymouth