Tutela - Anonymity Detection Tool - Final Update

As a reminder, Tutela is an Ethereum and Tornado Cash anonymity detection tool, built as a response to a Tornado Cash Community Grant. It allows Ethereum and Tornado Cash users to see when they potentially revealed themselves on-chain by:

  1. Linking their distinct Ethereum addresses
  2. Linking their Tornado Cash deposits and withdrawals

Since our last update, we’ve added three new features:

  1. Diff2Vec, a Machine Learning approach to clustering Ethereum addresses to find more potentially connected Ethereum addresses.
  2. A history of potentially revealing transactions for each address, so users can see when they may have compromised their privacy.
  3. A live data feed updating with new Ethereum and Tornado Cash transactions.

For the nitty gritty details you can check out the Tutela white paper.

Diff2Vec, a Machine Learning Algorithm

Initial users gave us feedback that often no Ethereum addresses were associated with their input address. This makes sense because we were only using the deposit address reuse (DAR) heuristic to link Ethereum addresses. DAR searches for very specific behavior (e.g. interacting with centralized exchanges), meaning that most Ethereum addresses show no results - it found c.2.5m Ethereum clusters from >180m Ethereum addresses.

To supplement DAR, we implemented Diff2Vec. It projects each Ethereum address to a point in a low-dimensional vector space based on who it transacts with. In this vector space, addresses belonging to the same entity should be close together in Euclidean distance. This allows Tutela to show k-results for every input address. The potential downside is that it is highly unlikely all k-results are connected to the input address. There is a tradeoff between quality and quantity.

Transaction Reveal Data

We’ve recently added a new piece of functionality - a history of potentially revealing transactions for each address. This shows when a user potentially made revealing transactions and the type of reveal. More on the different Ethereum reveals here and Tornado Cash reveals here.

After clicking the transactions tab on the landing page, a user can input an Ethereum address to see historical potentially revealing transactions over time.

Figure 1: Tutela transaction page when searching an Ethereum address. This address is also a Tornado Cash user.

Figure 1 shows the potential reveals by this address. On the left hand side, the user is given raw statistics about potential revealing behavior. On the right hand side, the user is shown when the annotated reveals occurred.

Figure 2: History of Potentially Revealing Transactions

By scrolling down (Figure 2), the user can see the transaction hashs associated with these reveals and look them up on Etherscan.

Analysis - How Accurate is Tutela?

Ethereum Clustering

We can measure the quality of address clusters using a “test set” of known clustered addresses. It is hard to find “ground truth” data in this space but we obtained a data set from (Beres’ et al., 2021) where 1,028 clusters of addresses (average size of 4.0 ± 3.6 Externally Owned Addresses (EOA) per cluster) are derived from ENS names. Admittedly, this is not an unbiased test set but does suffice as a good measure of Tutela’s generalization.

Figure 3: Plot of the recall of held-out address clusters through ENS reveals using deposit address reuse (DAR), diff2vec (NODE), and the combination of both (BOTH). A higher recall represents a better heuristic.

We are interested in “recall” or the ability of a heuristic or algorithm to recover the clusters in the test set. We do not consider precision on purpose as it does not make sense to penalize a model for finding clusters outside of those in the test set.

DAR alone has a recall of 39.4% with NODE (Diff2Vec) at 37.8%, two percent lower than DAR. However, as shown above, when DAR and NODE are combined, total recall increases to 44.8%. This suggests that the types of clusters found by DAR and NODE have diversity.

Tornado Cash Anonymity Set Auditor

In October 2021 there were 97.3k Tornado Cash equal user deposits, Tutela found 42.8k were potentially compromised: 18.6K from the address match reveal, 102 from the unique gas price reveal, 18.9K from the linked ETH address reveal, 16.2K from the multi-denomination reveal, and 358 from the TORN mining reveal (with overlap between reveals).

Figure 4: Plot of the percentage of compromised versus uncompromised (pink) deposits by pool.

By pool, we find the anonymity set is reduced by 37% (± 15%) on average. Figure 4 shows the uncompromised anonymity sets by pool. We find that some of the pools could be heavily compromised (such as the cDAI and WBTC pools), whereas other pools are less effected (e.g. USDC). In summary, while many of the Tornado Cash heuristics are simple, they are quite powerful. These findings could help Tornado Cash developers and users alike, measure and understand the degree user privacy offered.

Limitations

Heuristics are not perfect measures. In return for simplicity, there will always be false positives in practice, e.g., addresses in a cluster that should not be there, or faithful Tornado Cash transactions labeled as compromised.

For the Ethereum heuristics, picking proper hyperparameters is the challenge. In DAR, the algorithm is very sensitive to the choice of two thresholds. If the thresholds are too small, no clusters will be found; If the thresholds are too big, clusters will be low quality, containing many addresses they should not. Currently, the best practice is to tune these by hand.

Similarly in NODE (Diff2Vec), the choice of hyperparameters impacts performance. In particular, we must choose a “neighborhood” size to look when summarizing the behavior of a single node/wallet. Too small and we may lose sight of the bigger picture but too large and we may lose granularity on this wallet’s actions.

Running DAR or NODE on Ethereum is computationally expensive, requiring both a large RAM and storage. In particular, NODE is very costly and difficult to parallelize — we are unable to update it live due to its resource constraints. As of now, the NODE algorithm is trained on data up to October 2021 and will remain static.

On the other hand, the Tornado Cash heuristics are much simpler than the Ethereum ones, and more deterministic. However, given that only a small subset of Ethereum addresses are Tornado Cash users, they have limited applicability to the majority of potential Tutela users.

What Next?

This is it for Tutela! What a journey it has been! We’ve learned a lot and had the opportunity to work with amazing people. Shoutout to the LambdaClass team for their critical contributions to the development of the Tornado Cash heuristics and the Tornado Cash community for making this possible.

Mike, Kaili and I will continue to maintain Tutela as far as the Tornado Cash Community deems appropriate. Now we’re turning our attention onto building the next thing! If you want to connect with us - feel free to hit us up on Twitter! @willmctighe @mike_h_wu @kaili_jenner

References

Ferenc Beres, Istvan A Seres, Andras A Benczur, and Mikerah Quintyne-Collins. 2021. Blockchain is watching you: Profiling and deanonymizing ethereum users. In 2021 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS), pages 69–78. IEEE.

7 Likes

First of all, I’d like to thank all contributors involved for taking on this challenge and for exceeding the expectations of the initial proposition. Yet alone, accomplishing such a task in a short time frame. Over the past few days, I have been reviewing the codebase and theoretics behind Tutela, this post is meant to give you a rough outline of my thoughts.

Synopsis

Tutela builds upon the previous research towards stochastic deanonymisation of users through clustering Ethereum addresses. Profiling transactional patterns to centralised exchanges and Ethereum Name Service (ENS) domains, notated as Deposit address reuse (DAR). It was previously co-authored by one of the contributors, Andrea. This was applied by abstracting upon the clustering algorithm etherclust and the 2-dimensional vectorisation library diff2Vec, allowing the plotting of addresses in a directed graph for computation.

The formal whitepaper presented is of an academic standard, no stone is left unturned in terms of formulaic expressions and theoretical reasoning. It even went so far to simplify the logic behind diff2Vec in comparison to its source material. Let alone defining an anonymity metric and an accuracy multipler demoted as “confidence”.

The collection of clustered addresses is filtered through to ignore a “blacklist” of smart contracts, exchanges, service providers, and relayers. In an attempt to only analyse externally owned accounts (EOA), providing a more accurate data source to query an individual address’s association to benchmark anonymity. While Tutela is optimised via external clustering data from ENS and exchanges, it does not depend on it. As the code to index, categorise and cluster data could be facilitated to any data sources with tuning. Furthermore, having functions to index Google BigQuery Ethereum sets to target and cluster Tornado deposits and withdrawals through introduced heuristics.

Tutela by design is not a tool of validity but moreso a gauge of one’s privacy through the probability of being associated with another address. To avoid granularity and keep the span of data within realistic computational margins, heuristics are optimised by defining thresholds.

  • Multi-denomination reveal only concerns addresses that; have deposited or withdrawn from 5 or more times to multiple anonymity sets within 24 hours from the latest transaction of that address.
  • Linked addresses reveal only concerns addresses that have interacted outside Tornado 3 or more times but ignore pairs that have deposited to different anonymity sets.
  • DAR only benchmarks nodes with two or more connections to other clusters.

Furthermore to this, Tutela has live cronjobs to automate updating cluster sources relative to real-time Tornado transactions weekly. The Python implementations of this, DAR, the associated heuristics are clear, concise, and well documented through comments for function primers and specific steps in each process. The repository structure is formal, where each module has its directory. While some code does reappear in some instances, it is to foster accessibility rather than efficiency given the scope of the stack.

To complement all of this, a full-fledged web application tutela.xyz was designed and developed. Allowing anyone to query an Ethereum address to benchmark its anonymity relative to Tornado and centralised exchange usage. With a design catering to Tornado’s brand and an easy user experience, whether desktop or mobile. The application is built in React.js, and components are designed from bootstrap. It too like the Python code has clear comments, syntax, and structure.

Comments

  • DAR assumes singular deposit addresses for each user (as per old customs), when in fact many popular service providers today instead generate multiple depositing addresses (eg Coinbase) before forwarding to a hot wallet. Therefore it can be argued this heuristic is not as effective profiling clusters for the larger market share of exchange usage.
  • Is DAR up to date with current the registry of ENS domain ownership or is still based on data from the source implementation?
  • Why are relayer transactions ignored in the gas price heuristic? Depositors still set a preference for gas when withdrawing through a relayer.
4 Likes