Using Python for Darknet Market Data Extraction

Using Python for Darknet Market Data Extraction

Crawlers designed in Python are working wonders in extracting data from darknet marketplaces by giving data scientists a better reading of these spaces.

Gone are those days when the black market was restricted to low-lit back alleys or shabby, dilapidated buildings. Presently, sophisticated, international traders of illicit items are taking to the darknet markets to carry on their trade.

Also referred to as cryptomarkets, these online black markets are used for trading illegal items ranging from drugs to weaponry to child pornography. At times, they are also used to provide hitmen and other similar services.

In order to ensure optimum anonymity, these sites resort to the onion-style IP masking with the help of the TOR Network and the TOR Browser. Moreover, all financial transactions are undertaken through decentralised cryptocurrencies which rule out the need for traceable intermediaries.

Owing to these security measures employed by darknet traders and steady evolution of the same, it’s very difficult to study or trace these forums and their members. Consequently, researchers have felt the need to develop various effective data extraction techniques in languages like Python, so as to analyse the functioning of the darknet markets.

The primary motive of this article is to look into the ways in which data extraction from the darknet markets is possible using crawlers designed in Python. Yet, before moving on to that, I would like to briefly outline the course of development of the Darknet Markets.

Darknet Markets – A Brief History

An understanding of the development course of Darknet Market is, I believe, utterly essential in holistically grasping the significance of data extraction in this regard.

The rise in the popularity of the darknet markets may well be seen as a backfire effect of the shutting down of “Silkroad” by the FBI in 2013. As expected, the operation by the FBI and the consequent court proceedings involving the owner Ross Ulbricht drew immense media attention.

Silkroad was a pioneer in this field and was the only darknet market site since its incorporation in 2011. In fact, prior to this event, the very possibility of illegal trading on the darknet was unknown to many.

Eventually, the news spread like wildfire and once the darknet markets starting popping up, there was no stopping them.

At times, some of these market places have been taken down by law enforcement agencies. At other times, the operators have performed “Exit Scams” and the sites have been taken down, stealing the existing cryptocurrencies in the virtual merchant wallets.

Yet, despite the rise and fall of specific sites, the popularity curve of the darknet markets has been on a steady rise.

In this context, it is imperative that data is extracted from these forums. Analysing this data is expected to help researchers and law enforcers to check illegal darknet trading. It’ll also allow us to develop ways to mitigate the various national and international threats posed by the actors of the darknet markets.

‘Python’ in Data Extraction

Owing to the use of effective anonymity and security enhancing platforms like Kali Linux, Tor and the likes, these darknet sites cannot be accessed through Google or any other search engine. At times, these sites are hosted only on a masked IP and aren’t assigned any domain name.

In order to overcome these difficulties, researchers had to develop crawlers which could mine data from these forums. One of the most effective crawlers was designed in Python by Toms Reksna of the Leiden University, The Netherlands.

Alongside the shift of high-end criminal circles to darknet markets, there has also been a shift in their overall structure. The pyramid structure of earlier criminal organizations has disintegrated into a more fragmented structure. Such decentralization has led to the emergence of numerous nodal points which often perform independently and interact with each other according to the needs of the time.

Consequently, the data extraction techniques have been designed to focus on the user-user interactions on the darknet platforms. The data returned may then be graphically or pictorially analysed. This allows a study of the patterns and also enables the identification of core communities and the key members of the darknet markets.

The Method of Data Extraction

Designed in Python, the crawler undertakes a page-by-page analysis of darknet market platforms. Primarily, the program identifies users, collects their activity data and scans the page for their inter-communications.

The data extracted by the crawler program is then graphically represented, wherein, the nodes represent the users. Every time any communication occurs between any two users, an edge is drawn between their nodes.

According to the data extracted by Reksna’s crawler in 2017, it was found that the total number of nodes (active users) in Silkroad’s was as high as 21485. It’s worthwhile to note that Silkroad existed at a time when the darknet markets weren’t popular at all.


Darknet markets play a significant role in the overall functioning of the dark web. The steady rise in the popularity and their association with activities like trade in weaponry and drugs has led to the advent of serious national and international threats. Alongside, the cryptocurrencies are also becoming all the more popular with every passing day, enhancing the growth of the darknet markets.

That said, it’s imperative to find ways to penetrate the otherwise secret world of the darknet markets so as to develop ways of mitigating the issues raised by them. For these purposes, researchers have developed data extraction programs, designed in Python, which help in data mining from the darknet platforms. It is expected that the data thus retrieved and the analysis of the same will allow us to find effective ways of dealing with the menace of the darknet markets.

Nidhi Arora

Contributing Editor at Wimoxez. My passion and hobby lie within this very beautiful art. Yes! It’s surely an art. An art of moulding your words into magic.

wimoxez: Data, Insights and Intelligence

Data, Insights and Intelligence media platform and bring the best resources to explore valuable technologies which will shape tomorrow.