Anonymity Research Tools
This is a solicitation to the Funded Bounty for Anonymity Research Tools. The final objective of the projected research and proposed tools is to help protect the privacy of Tornado Cash users.
The development of tools to protect anonymity must be focused around the different kinds of reveals that privacy perpetrators exploit. For this, our team proposes to research and identify many common reveals using data science and data engineering techniques in order to warn the Tornado Cash user.
In the traditional financial system, bank accounts are publicly linked to a natural person. This means that given a bank account, one can directly reach its owner. However, banks keep all transactions on that account privately.
In most blockchain systems, transaction history is completely public. This means that you can track all the transactions made up to the beginning, given a wallet address.
In principle, this is not a problem because no personal data is needed to create a wallet. Thus, having access to the transactions that an account has made does not compromise the user’s privacy.
Nevertheless, it can be easy to figure out the owner of a wallet’s address in some cases. If the individual is identified, all transactions made by that person could be easily traced, seriously compromising the privacy of the user’s finances.
Tornado Cash is a technology conceived to handle this problem. In Tornado’s own words:
“Tornado Cash improves transaction privacy by breaking the on-chain link between source and destination addresses. It uses a smart contract that accepts ETH & other tokens deposits from one address and enables their withdrawal from a different address.”
But what happens if a user deposits from an account and withdraws using the same account? What if they withdraw using an account that has a long transaction history with the account that made the deposit? These are some of the most common reveals and privacy threats.
Today, Tornado users have no good tools available to approximately assess their anonymity and privacy gains. This would be essential to allow them to make informed decisions about which pools they should deposit to or when and how they should withdraw their coins from the used pools.
First, we need to precisely define the anonymity and privacy guarantees we want to achieve or enhance. Afterwards, our goal is to measure (or at least approximate) these privacy guarantees and display them to our users.
In the following, we assume that the adversary has access solely to on-chain data. We do not assume and will not use any off-chain data that a relayer or an Internet Service Provider (ISP) certainly has. For instance, a relayer could assign IP addresses to Tornado withdraw transactions. Since we do not have such data sources, we cannot incorporate such privacy leakages into our analysis. Therefore, the results of our tool serve only as an optimistic upper bound for the achieved privacy guarantees.
Withdraw anonymity is the most straightforward anonymity guarantee that is provided by a Tornado Cash smart contract. Ideally, a withdrawal transaction might come from all the deposit transactions that occurred before the withdrawal transaction. We call this set of deposit transactions the anonymity set of a withdrawal transaction. However, several heuristics allow observers to exclude deposit transactions from the anonymity set since they are mapped heuristically to other withdraw transactions. See the considered heuristics below. After excluding all the revealed deposit transactions, we can characterize this anonymity guarantee by the size of the remaining anonymity set or the anonymity set’s entropy. Note, we might have probabilistic links between withdrawal and deposit transactions. Hence, we want to use the Shannon-entropy of the anonymity sets instead of just considering the size of the anonymity set.
Withdrawal address unlinkability
Ideally, users withdraw their coins into fresh withdraw addresses. Withdraw address unlinkability ensures that given two withdrawal addresses, one cannot determine whether the withdrawal addresses belong to the same entity or not. We do not consider withdrawal transaction to be unlinkable since many users withdraw multiple transactions to the same withdrawal address. In those cases, it is trivial to establish that the two withdraw transactions were issued by the same user. Therefore, we only focus on the unlinkability of withdrawal addresses. In a perfect case, the adversary cannot determine whether two withdrawal addresses belong to the same person with more accuracy than a random guess. We characterize this privacy guarantee by the adversary’s advantage in breaking the withdrawal address unlinkability (how much better the adversary can distinguish two withdraw addresses better than random guessing).
Deposit address unlinkability
Similar to the previous case, the deposit addresses of the same user should also be unlinkable. In the same fashion, we define the privacy goal, and we measure this privacy guarantee by the capability of an adversary in distinguishing a specific pair of deposit addresses.
Heuristics to reduce users’ anonymity and privacy
Our goal is to develop and implement a bunch of heuristics or reveals that help the users detect when the two addresses they have chosen increments the odds that their privacy was infringed.
Some of the possible heuristics to add to the list described in the Bounty summary could be:
Heuristic 1 - Same deposit and withdraw address
If a deposit address matches a withdraw address, then it is trivial to link the two addresses. Therefore, the deposit address needs to be removed from all the other withdraw addresses’ anonymity set.
Heuristic 2 - Unique gas prices
If there is a deposit and a withdraw transaction with unique gas prices (e.g., 3.1415926 Gwei), then we consider the deposit and the withdraw transactions linked. The corresponding deposit transaction can be removed from any other withdraw transaction’s anonymity set.
Heuristic 3 - Transactions between deposit and withdraw addresses
If there is a transaction issued from a deposit address to a withdraw address (or vice versa), then we consider them as linked. The deposit address is removed from the anonymity sets.
Heuristic 4 - Linking multiple deposit addresses to multiple withdraw addresses
If there are multiple (say 12) deposit transactions coming from a deposit address and later there are 12 withdraw transactions to the same withdraw address, then we can link all these deposit transactions to the withdraw transactions.
Heuristic 5 - Careless usage of anonymity mining
Anonymity mining is a clever way to incentivize users to participate in mixing. However, if users carelessly claim their Anonymity Points (AP) or Tornado tokens, then they can reduce their anonymity set. For instance, if a user withdraws their earned AP tokens to a deposit address, then we can approximate the maximum time a user has left their funds in the mixing pool. This is because users can only claim AP and TORN tokens after deposit transactions that were already withdrawn.
Heuristic 6 - Profiling Deposit and Withdraw addresses
All addresses that have interacted with any of the Tornado Cash pools will be collected and analyzed. We will profile them given their transaction history by inspecting the timestamps on all their transactions, we will also inspect all the services (e.g., Uniswap, Compound, MakerDAO, etc.) they ever interacted with. This will allow us to make likely matches between deposit and withdraw addresses.
Heuristic 7 - Wallet Fingerprints
Different wallets work in different ways. We have several ideas on how we can distinguish between them. It will allow us to further fragment the anonymity sets of withdraw transactions.
The full list of heuristics is yet to be determined by our team, but we expect to expand on this list. The more accurate and strong our heuristics are the better privacy and anonymity estimates we can give to our users.
We plan to rely on publicly available data sources to get all the necessary data for our analysis solely. We might want to use Google BigQuery, Infura, or other blockchain explorers. The exact details of how we will get the necessary data from the blockchain are yet to be determined.
The programming languages that will be used to tackle the problem will be Python, Julia and Rust.
Based on the above vulnerability analyses, an interface will be developed to provide feedback to users by informing them of their security level, raise warnings when unsafe actions are being committed and educate them on safe behaviors.
Stage 1: Research ~ 8 weeks
Research of the different tactics used to infringe the entropy of a user’s wallet, this will include the analysis of every Tornado cash transaction. For example, the possible heuristics used to detect two accounts linked to the same user.
Stage 2: MVP ~ 6 weeks
Development of a web app that warns Tornado’s users when they are making an unsafe or unwanted reveal in their transactions according to the information gathered in the first stage. A calculation of the user’s wallet entropy will be shown as well.
Stage 3: Full App ~ 8 weeks
Adding of features and fixing bugs detected by the community and the users.
Development of a UI to provide a more accessible platform with clear warnings.
Stage 4: Documentation ~ 4 weeks
Writing of a paper that collects the research and analysis of every reveal that the application solves alongside its explanation.
The project will be developed openly in GitHub with open source programming languages. We will invite users and the broader Tornado community to inspect and extend our tool. Every 4 weeks a report will be submited to the forum so that the community can inspect and comment about the work being done.
The team is composed by engineers, physicists, and mathematicians specialized in machine learning and data engineering who are constantly investigating and applying cutting-edge technologies to solve the most advanced technical challenges.
The members in charge of the team:
István A. Seres, applied mathematician that will be in charge of defining the heuristics and the research part of the project.
Federico Carrone, tech lead that will be in charge of a team of computer scientists, computer engineers and data scientists (mathematicians, physicists, industrial engineer) from his team at LambdaClass