Transaction network analysis *

Static Badge Static Badge

Wash trading refers to the deceptive practice of artificially inflating trading activity by conducting trades between parties that are controlled or colluded by the same entity. In this scheme, the buyer and seller appear to be separate entities engaging in legitimate transactions, but in reality, they are orchestrated by the same individual or group. The advent of decentralized markets and platforms has added complexity to the landscape of wash trading. Decentralized finance (DeFi) platforms operate without centralized intermediaries, and the openness of blockchains permits users to generate an unlimited number of accounts, identifiable only by cryptographic addresses, without any reliable means to verify if multiple accounts are under single ownership.

There is evidence that the wash trading practice is deeply rooted in the NFT market, affecting individual NFT listing prices as well as the market value of entire NFT collections https://www.coindesk.com/learn/what-is-nft-wash-trading/. A promising approach for detecting NFT wash trades stands on linkability networks. The linkability network is a structure for reducing a larger network and keeping the transaction history information of parties of interest. The Ethereum Transaction Network (ETN) consists of the history of normal transactions of all externally owned accounts spanning the entirety of Ethereum's existence and can be represented as a directed graph without self-loops and without multiple edges. A linkability network is extracted from the (ETN) for the set of Ethereum addresses \(A\) that traded an NFT collection. For each address \(a \in A\), a graph traversal algorithm is run, operating on the ETN and constructing the linkability network \(L=(V,E)\) such that an edge \((a,b) \in E\) is added if there is a path in the ETN from \(a \in A\) to \(b \in A\). The edge weight \((a,b) \in E\) is determined by the length of the path from \(a \in A\) to \(b \in A\) in the ETN. The traversal algorithm should operate up to a depth defined by the parameter \(d\); therefore, edge weights can be of value \(d\) at most. The linkability network can be represented as an directed weighted graph with multiple edges and without self-loops.

To summarize, an edge \((a,b)\) with \(weight=2\) in a linkability network shows that in the ETN there is a path of normal transactions of length 3 between node \(a\) and \(b\), e.g. (\(b \rightarrow v \rightarrow u \rightarrow b\)). A normal transaction or direct p2p transactions of native blockchain currency usually imply a relationship of trust or a shared agreement between the two accounts involved. In the absence of contractual stipulations or automated logic, as with smart contracts, both parties are adhering to some shared agreement or understanding outside the purview of automated contract enforcement. This adherence to their mutual arrangement suggests a deeper trust relationship or shared control. As a result, there is a high probability that the two accounts participating in a normal transaction are either owned or controlled by the same entity, or entities with a strong mutual trust relationship.

Implementation guidelines

  1. Running the program:

    • The program can be ran in different modes (sequential, parallel and distributed) by specifying a parameter.

    • User can specify the parameter \(d\), maximal depth of the ETN traversal.

    • The program measures run-time needed to compute the linkability network.

  2. Implementation details

    • Graph libraries are off-limits

    • Use an adjacency list to represent the ETN and the linkability network.

    • The student will receive a chunk of the ETN, a list of centralized exchanges that must be removed from the ETN, and a dataset containing NFT transfers of ownership.

    • The program should output the linkability network as edge list in a csv file, and should print on the standard output the count of links in the linkability network for each weight value.

    • The implementation must adapt automatically to the hardware it is being ran on (Physical CPU's, Cores, Memory, etc..);

    • Wherever randomness is used a seed must be provided in order to compare solutions.

Supporting material

The suppoting data can be downloaded from HERE

The archive includes the following files: