Funded Bounty - Anonymity Research Tools

Summary

The scope of this bounty is to develop a suite of anonymity research tools to help better protect Tornado.cash users’ privacy.

Description

Every transaction you make on-chain reveals more about your personal data. For individuals who care to protect their privacy, they must be mindful of what transaction they make and how they make them. The more mindful a user is, the greater their level of entropy and, therefore, privacy.

The Tornado Cash Community Fund is seeking to fund an individual or team who is interested in developing a set of analytics to give users a better purview over their own revealed entropy as they use the Tornado.cash protocol.

Examples

There are various types of “reveals” you may accidentally commit with your wallet while using Tornado.cash, which will dramatically reduce the entropy between your “source” and “destination” addresses:

  • Synchronous Tx Reveal - If you consistently make txs with a “source” wallet and a “destination” wallet at roughly similar times, you may reveal the two wallets have the same owner
  • Rapid Mix Reveal - If you make a deposit and do no wait long enough to withdraw, you will reduce your anonymity set
  • Multi-Denomination Reveal - If your “source” wallet mixes a specific set of denominations and your “destination” wallet withdraws them all (example: if you mix 3x 10 ETH, 2x 1 ETH, 1x 0.1 ETH in order to get 32.1 ETH to begin staking on the beacon chain) , then you could completely nullify your anonymity set within the Tornado protocol if no other wallet has mixed this exact denomination set
  • TORN Mining Reveal - If you liquidity mine anonymity points (AP) and claim them all for TORN in a single tx with either the “source” or “destination” wallet, you will reveal exactly how long you mixed your tokens. This will perfectly connect the two addresses

Scope

The preliminary stage of this project is focused on researching how to best measure the anonymity of a user’s wallet. The secondary stage is focused on developing an application that provides users with feedback on how they can improve the anonymity and entropy of their own transactions while using the protocol, using the research conceived in the first stage.

Stage 1: Research

  • Create a standard measurement of “address entropy” - this can be used to estimate the probability that any two addresses are owned by the same person
  • Produce various classes of “reveals” - expand upon the list of examples above and classify them
  • Calculate how much entropy is reduced per reveal - develop unique formulas for each type of reveal in order to make the appropriate address entropy estimation

Additional Research Scope

  • Anonymity Set Auditor - define the strength of each Tornado Cash pool in how well it actually provides anonymity. Tornado can often be used ineffectively by users. Even if 10k mixes have taken place, it is possible 3k of those mixes exposed their owner’s identities. Thus, the actual anonymity set may only be spread across 7k mixes. Developing a tool to quickly audit the anonymity set of each pool is another useful instrument.

Stage 2: User Application

  • Input - Allow users to input “source” and “destination” addresses
  • Return - Return the “address entropy” between the two addresses
  • Analyze - Let users analyze how their entropy changed with every historical transaction
  • Preview - Enable users to preview how future transactions might further augment their address entropy

Additional Application Scope

  • Machine Learning - develop a machine learning model to cluster addresses with a high likelihood of being the same owner. Users can references this visualizer to see how their on-chain activity is associating their wallet with others.

Budget

Up to 2,000 TORN is currently allocated to this bounty.

Note: it is okay if the research and application portions are delegated to two different proposal respondents. The full bounty amount is allocated for above-and-beyond quality work.


Special thanks to @bt11ba @Rezan @Justin_Bram & @xgozzy for feedback, review, and brainstorm.

6 Likes

Please feel free to reach out to any of us directly with any questions/comments/concerns!

1 Like

Good idea!

I think asking users to de-anonymize themselves by putting both deposit and withdrawal addresses into a tool is not such a good idea, though. The tool could be (or become) a de-anonymize for honeypot.

I think it would be better to have them enter one address then show what addresses are strongly linked (if any).

And if there is a very small number strongly linked, tell them what they did wrong and how to fix it.

2 Likes

I’ve been messing around with the Multi-Denomination Reveal part of the problem, using Google’s BigQuery to extract Tornado deposit/withdraw data from the crypto_ethereum public dataset they maintain. Still definitely a work in progress, but I can share code with anybody else interested in collaborating.

I put links to a couple of data files in the Tornado discord: the top 10,000 deposit and withdraw addresses:
https://storage.googleapis.com/tornado-privacy-checkup/Deposits_10000.json
https://storage.googleapis.com/tornado-privacy-checkup/Withdraws_10000.json

I’m working on “given an address, query the data and see which other addresses look like they’re strongly linked” and if I have time I might throw up an ugly web page prototype.

I don’t want any bounty; I started this a couple of weeks ago for my own curiousity.

2 Likes

Valid concern

The tool should absolutely be open source so that it can be properly audited for security and vulnerabilities. It potentially could also be run client-side as a downloaded application to further de-risk usage. The example model to follow here might be MyCrypto’s desktop client

This would be extremely helpful and would definitely fall into the Machine Learning portion of the “additional scope of work” in Phase 2

1 Like

This is awesome! Thank you for sharing. Ya, whoever takes on this project would absolutely benefit from your existing work.

Please keep us posted on how this continues to progress. Also if you ever change your mind about a bounty :slight_smile:

Really nice, challenging bounty and initiative. Tornado Cash should definitely only accept open-source and client-side tools or at least verifiable tools.

The developed tool, however, should not be considered as a privacy panacea. First, and foremost, users can easily be deanonymised already on Layer-0, i.e., on the network level. For instance, the relayer or Metamask or Etherscan’s servers can easily map your IP addresses to your queried transactions or addresses. These leaks cannot be captured by the to-be-developed tool, because as described in the first post the tool should solely rely on blockchain data. Second, due to the nature of Tornado cash, there is no one on earth who really knows the ground truth, i.e., the true mapping between deposits and withdrawal transactions. Therefore, the developed tool can a) only give probabilistic matches b) the developed tool will be only as accurate as of its used heuristics. The tool should only be taken as an upper bound on the achieved entropy (degree of anonymity). Maybe with techniques (heuristics, wallet fingerprints) developed in the future one can further decrease the calculated entropies.

I would be interested to learn what you mean by this sentence:

“given an address, query the data and see which other addresses look like they’re strongly linked”

In your tool, how did you define address similarity? would be interested to learn how your tool works and what your heuristics are! Feel free to shoot me a DM or a mail to [email protected] I would be happy to contribute or help anyone (either group or individual) in making this tool a reality. I have some experience with privacy-enhancing tools and I already made a preliminary analysis on Tornado contract’s anonymity guarantees available here. If you want some academic rigour in your team then feel free to reach out! Let’s team up! Thanks in advance!

3 Likes

Hey, really happy to that Tornado is being proactive by offering these generous bounties.

I’ve been doing research on cryptocurrency deobfuscation for a while, including on Tornado.cash.

I am interested in pursuing portions of this bounty and have a very capable team. We’re publishing something exciting soon™ and this could be a good next project.

The ethics of publishing this type of work is nuanced so I’ve only published on it a handful of times (most recently because I had to make a PoC available).
Happy to discuss this further and will reach out directly in the next few days.

3 Likes

Impressive work on Monero!

The goal of this study isn’t directly to pinpoint/doxx specific hackers but more to quantify the overall anonymity that Tornado Cash provides. Like how many people are doing it wrong and what’s the true size of the anonymity set once you remove these bad users.

If you do find about specific hack cases, that’s awesome! But the best is to obviously report directly to involved parties or the police.

1 Like

Maybe I’m misunderstanding but it seems to me that “pinpointing/doxxing specific hackers” (and innocent users) is necessary for building a proper “Anonymity Set Auditor”.

Here’s how we’d probably build it:

First, develop methods for detecting each of the classes of reveals, and statistics to identify ineffective mixes.

Next, analyze all tornado.cash transactions and identify the ineffective mixes:

e.g.:

0x023a5ea899df16831d6f3101f4568fddb577c887 is [sub-optimally] farming TORN. They just deposit 100 ETH 20x, then withdraw 100 ETH 20x a few weeks later to collect TORN. Ultimately, this reduces the 100 ETH pool anonymity set by ~0.1%.

0xde3e412d4fe3c9d90ac74d0a9b064951b39eeae4 another example of 100 ETH pool farming

(These farmers obviously aren’t trying to hide their anonymity and the effect on the 100 ETH pool is negligible.)

…Anyways, go through the blockchain and identify all of the self-farming reveals. Then figure out how they impact the anonymity set over time.

We repeat this and identify statistically ineffective rapid mix reveals, MDRs, TORN mining reveals, etc and remove them from the anonymity set. This step will, by definition, be pinpointing and connecting accounts mixed ineffectively.

If you want the anonymity set auditor to be open source then I’m pretty sure this necessitates making all those connections public.

The topic becomes very nuanced. If we can build it then the other guys can too, and innocent users deserve a heads up that their privacy was compromised. On the other hand, by making that information public, you’re also tipping off scammers/hackers that they messed up and could make seeking compensation for victims more difficult.

Spoke to my team and they’re happy and excited to work on this and I see no ethical issues with most of it, but wanted to point this out.

2 Likes

I fully support this motion

Brilliant! This is outstanding! Would be excellent to have you as a contributor - especially in the research phase.

Wow! I was seriously not expecting such a qualified, organic response to this bounty so quickly. This is a very pleasant surprise. Very nice work!

Would love to receive a proposal from you and your team when you are ready. It sounds like you may have additional, talented community contributors available, as well, such as @Phread420 and @Istvan_Andras_Seres. Would be excellent to explore these possible collaborations further

And of course this is still open to any other individuals or teams who are interested. Thanks everyone so far for the overwhelmingly positive response

This is such an outstanding and underrated point

Perhaps there is room for a Phase 3 to exist to support layer-0 activity. I’ve previously published a post on building a Torando.cash desktop client. Once Phase 1 & 2 are complete, a desktop client could be developed to:

  • locally integrate a wallet so as not to rely on metamask (and leak your IP or chrome extension ID)
  • integrate tor routing (to cycle through IP addresses)
  • schedule tx signatures (to better obfuscate time zones)

The Phase 1 and 2 tools could then be available in the desktop client to provide “entropy estimations” locally before each tx

1 Like

Exactly, as we discuss before, a desktop client would be an amazing solution. Even if it is the “old” way, it is more secure than using a web-app with Metamask. I would support this and this would be one of the best tool we could give to the community imo.

3 Likes

My name is Federico Carrone (https://federicocarrone.com/). I am the founder of LambdaClass (https://lambdaclass.com/), a software company specialized in infosec, data engineering and data science. We have helped dozens of traditional finance, backbone internet telecommunication, energy, healthcare, marketing/real-time bidding and gaming development companies during the last decade.

Our team is composed by mathematicians, physicists, biologists and computer/electronic engineers. The programming languages we love the most are Rust, Julia and Erlang. We work solving difficult problems in data engineering and data science with millions of requests per seconds, petabytes of data and thousands of nodes.

In the traditional finance world, we have worked in projects where we had to flag related transactions. We have been working the last three years in crypto projects, so we are pretty new in this lovely place, but we are already working on two big crypto projects right now.

I have already sent an email to István. The paper he wrote is truly fascinating and I am sure we would do an impressive team.

2 Likes

I had to break the post since I am a new user and I can’t post more than 2 links at the same time.

You can check out the hands-on data science book we wrote with my team this year:

You can also check our conference:

And at last our blog notamonadtutorial [dot] com.

2 Likes

Sounds great, we are pretty much aligned on the approach!

My expectation would be:

  • Documenting and explaining each type of reveals
  • A set of opensource tools/scripts to extract each type of reveals (Maybe with a confidence score)
  • The actual results from running the above scripts
  • Discussion of the results and conclusions on how people are potentially misusing Tornado.
  • Bonus point, a simple UI or something to make it easy for anyone to query the “anonymity score” of withdrawal/deposit on an ongoing basis.
1 Like

Great, those expectation are a great summary of what we should do.

Tomorrow we will do a call with Itsván to submit a proposal and on Thursday I will do a meeting with my team to discuss how to move foward. I am eager to work with Itsván on this.

2 Likes

100% this! In the case of a web app, there would be no convincing way to ensure and prove to the users that we don’t spy on them, i.e., we don’t log IP addresses and other data of theirs. A user-friendly open-source local tool is the way to go with a well-written user manual and documentation. Hopefully, in a few weeks/months, we can put such a tool on the table!

3 Likes

Anonymity Research Tools

Pre-reading

This is a solicitation to the Funded Bounty for Anonymity Research Tools. The final objective of the projected research and proposed tools is to help protect the privacy of Tornado Cash users.

The development of tools to protect anonymity must be focused around the different kinds of reveals that privacy perpetrators exploit. For this, our team proposes to research and identify many common reveals using data science and data engineering techniques in order to warn the Tornado Cash user.

The Problem

In the traditional financial system, bank accounts are publicly linked to a natural person. This means that given a bank account, one can directly reach its owner. However, banks keep all transactions on that account privately.

In most blockchain systems, transaction history is completely public. This means that you can track all the transactions made up to the beginning, given a wallet address.

In principle, this is not a problem because no personal data is needed to create a wallet. Thus, having access to the transactions that an account has made does not compromise the user’s privacy.

Nevertheless, it can be easy to figure out the owner of a wallet’s address in some cases. If the individual is identified, all transactions made by that person could be easily traced, seriously compromising the privacy of the user’s finances.

Tornado Cash is a technology conceived to handle this problem. In Tornado’s own words:

“Tornado Cash improves transaction privacy by breaking the on-chain link between source and destination addresses. It uses a smart contract that accepts ETH & other tokens deposits from one address and enables their withdrawal from a different address.”

But what happens if a user deposits from an account and withdraws using the same account? What if they withdraw using an account that has a long transaction history with the account that made the deposit? These are some of the most common reveals and privacy threats.

Solutions

Today, Tornado users have no good tools available to approximately assess their anonymity and privacy gains. This would be essential to allow them to make informed decisions about which pools they should deposit to or when and how they should withdraw their coins from the used pools.

First, we need to precisely define the anonymity and privacy guarantees we want to achieve or enhance. Afterwards, our goal is to measure (or at least approximate) these privacy guarantees and display them to our users.

Privacy guarantees

In the following, we assume that the adversary has access solely to on-chain data. We do not assume and will not use any off-chain data that a relayer or an Internet Service Provider (ISP) certainly has. For instance, a relayer could assign IP addresses to Tornado withdraw transactions. Since we do not have such data sources, we cannot incorporate such privacy leakages into our analysis. Therefore, the results of our tool serve only as an optimistic upper bound for the achieved privacy guarantees.

Withdraw anonymity

Withdraw anonymity is the most straightforward anonymity guarantee that is provided by a Tornado Cash smart contract. Ideally, a withdrawal transaction might come from all the deposit transactions that occurred before the withdrawal transaction. We call this set of deposit transactions the anonymity set of a withdrawal transaction. However, several heuristics allow observers to exclude deposit transactions from the anonymity set since they are mapped heuristically to other withdraw transactions. See the considered heuristics below. After excluding all the revealed deposit transactions, we can characterize this anonymity guarantee by the size of the remaining anonymity set or the anonymity set’s entropy. Note, we might have probabilistic links between withdrawal and deposit transactions. Hence, we want to use the Shannon-entropy of the anonymity sets instead of just considering the size of the anonymity set.

Withdrawal address unlinkability

Ideally, users withdraw their coins into fresh withdraw addresses. Withdraw address unlinkability ensures that given two withdrawal addresses, one cannot determine whether the withdrawal addresses belong to the same entity or not. We do not consider withdrawal transaction to be unlinkable since many users withdraw multiple transactions to the same withdrawal address. In those cases, it is trivial to establish that the two withdraw transactions were issued by the same user. Therefore, we only focus on the unlinkability of withdrawal addresses. In a perfect case, the adversary cannot determine whether two withdrawal addresses belong to the same person with more accuracy than a random guess. We characterize this privacy guarantee by the adversary’s advantage in breaking the withdrawal address unlinkability (how much better the adversary can distinguish two withdraw addresses better than random guessing).

Deposit address unlinkability

Similar to the previous case, the deposit addresses of the same user should also be unlinkable. In the same fashion, we define the privacy goal, and we measure this privacy guarantee by the capability of an adversary in distinguishing a specific pair of deposit addresses.

Heuristics to reduce users’ anonymity and privacy

Our goal is to develop and implement a bunch of heuristics or reveals that help the users detect when the two addresses they have chosen increments the odds that their privacy was infringed.

Some of the possible heuristics to add to the list described in the Bounty summary could be:

Heuristic 1 - Same deposit and withdraw address

If a deposit address matches a withdraw address, then it is trivial to link the two addresses. Therefore, the deposit address needs to be removed from all the other withdraw addresses’ anonymity set.

Heuristic 2 - Unique gas prices

If there is a deposit and a withdraw transaction with unique gas prices (e.g., 3.1415926 Gwei), then we consider the deposit and the withdraw transactions linked. The corresponding deposit transaction can be removed from any other withdraw transaction’s anonymity set.

Heuristic 3 - Transactions between deposit and withdraw addresses

If there is a transaction issued from a deposit address to a withdraw address (or vice versa), then we consider them as linked. The deposit address is removed from the anonymity sets.

Heuristic 4 - Linking multiple deposit addresses to multiple withdraw addresses

If there are multiple (say 12) deposit transactions coming from a deposit address and later there are 12 withdraw transactions to the same withdraw address, then we can link all these deposit transactions to the withdraw transactions.

Heuristic 5 - Careless usage of anonymity mining

Anonymity mining is a clever way to incentivize users to participate in mixing. However, if users carelessly claim their Anonymity Points (AP) or Tornado tokens, then they can reduce their anonymity set. For instance, if a user withdraws their earned AP tokens to a deposit address, then we can approximate the maximum time a user has left their funds in the mixing pool. This is because users can only claim AP and TORN tokens after deposit transactions that were already withdrawn.

Heuristic 6 - Profiling Deposit and Withdraw addresses

All addresses that have interacted with any of the Tornado Cash pools will be collected and analyzed. We will profile them given their transaction history by inspecting the timestamps on all their transactions, we will also inspect all the services (e.g., Uniswap, Compound, MakerDAO, etc.) they ever interacted with. This will allow us to make likely matches between deposit and withdraw addresses.

Heuristic 7 - Wallet Fingerprints

Different wallets work in different ways. We have several ideas on how we can distinguish between them. It will allow us to further fragment the anonymity sets of withdraw transactions.

The full list of heuristics is yet to be determined by our team, but we expect to expand on this list. The more accurate and strong our heuristics are the better privacy and anonymity estimates we can give to our users.

Data pipelines

We plan to rely on publicly available data sources to get all the necessary data for our analysis solely. We might want to use Google BigQuery, Infura, or other blockchain explorers. The exact details of how we will get the necessary data from the blockchain are yet to be determined.

The programming languages that will be used to tackle the problem will be Python, Julia and Rust.

User protection

Based on the above vulnerability analyses, an interface will be developed to provide feedback to users by informing them of their security level, raise warnings when unsafe actions are being committed and educate them on safe behaviors.

Timeline

  • Stage 1: Research ~ 8 weeks
    Research of the different tactics used to infringe the entropy of a user’s wallet, this will include the analysis of every Tornado cash transaction. For example, the possible heuristics used to detect two accounts linked to the same user.

  • Stage 2: MVP ~ 6 weeks
    Development of a web app that warns Tornado’s users when they are making an unsafe or unwanted reveal in their transactions according to the information gathered in the first stage. A calculation of the user’s wallet entropy will be shown as well.

  • Stage 3: Full App ~ 8 weeks
    Adding of features and fixing bugs detected by the community and the users.
    Development of a UI to provide a more accessible platform with clear warnings.

  • Stage 4: Documentation ~ 4 weeks
    Writing of a paper that collects the research and analysis of every reveal that the application solves alongside its explanation.

Methodology

The project will be developed openly in GitHub with open source programming languages. We will invite users and the broader Tornado community to inspect and extend our tool. Every 4 weeks a report will be submited to the forum so that the community can inspect and comment about the work being done.

Team

The team is composed by engineers, physicists, and mathematicians specialized in machine learning and data engineering who are constantly investigating and applying cutting-edge technologies to solve the most advanced technical challenges.

The members in charge of the team:

  • István A. Seres, applied mathematician that will be in charge of defining the heuristics and the research part of the project.
  • Federico Carrone, tech lead that will be in charge of a team of computer scientists, computer engineers and data scientists (mathematicians, physicists, industrial engineer) from his team at LambdaClass

References

2 Likes