Funded Bounty - Anonymity Research Tools

Really nice, challenging bounty and initiative. Tornado Cash should definitely only accept open-source and client-side tools or at least verifiable tools.

The developed tool, however, should not be considered as a privacy panacea. First, and foremost, users can easily be deanonymised already on Layer-0, i.e., on the network level. For instance, the relayer or Metamask or Etherscan’s servers can easily map your IP addresses to your queried transactions or addresses. These leaks cannot be captured by the to-be-developed tool, because as described in the first post the tool should solely rely on blockchain data. Second, due to the nature of Tornado cash, there is no one on earth who really knows the ground truth, i.e., the true mapping between deposits and withdrawal transactions. Therefore, the developed tool can a) only give probabilistic matches b) the developed tool will be only as accurate as of its used heuristics. The tool should only be taken as an upper bound on the achieved entropy (degree of anonymity). Maybe with techniques (heuristics, wallet fingerprints) developed in the future one can further decrease the calculated entropies.

I would be interested to learn what you mean by this sentence:

“given an address, query the data and see which other addresses look like they’re strongly linked”

In your tool, how did you define address similarity? would be interested to learn how your tool works and what your heuristics are! Feel free to shoot me a DM or a mail to [email protected] I would be happy to contribute or help anyone (either group or individual) in making this tool a reality. I have some experience with privacy-enhancing tools and I already made a preliminary analysis on Tornado contract’s anonymity guarantees available here. If you want some academic rigour in your team then feel free to reach out! Let’s team up! Thanks in advance!

5 Likes

Hey, really happy to that Tornado is being proactive by offering these generous bounties.

I’ve been doing research on cryptocurrency deobfuscation for a while, including on Tornado.cash.

I am interested in pursuing portions of this bounty and have a very capable team. We’re publishing something exciting soon™ and this could be a good next project.

The ethics of publishing this type of work is nuanced so I’ve only published on it a handful of times (most recently because I had to make a PoC available).
Happy to discuss this further and will reach out directly in the next few days.

6 Likes

Impressive work on Monero!

The goal of this study isn’t directly to pinpoint/doxx specific hackers but more to quantify the overall anonymity that Tornado Cash provides. Like how many people are doing it wrong and what’s the true size of the anonymity set once you remove these bad users.

If you do find about specific hack cases, that’s awesome! But the best is to obviously report directly to involved parties or the police.

3 Likes

Maybe I’m misunderstanding but it seems to me that “pinpointing/doxxing specific hackers” (and innocent users) is necessary for building a proper “Anonymity Set Auditor”.

Here’s how we’d probably build it:

First, develop methods for detecting each of the classes of reveals, and statistics to identify ineffective mixes.

Next, analyze all tornado.cash transactions and identify the ineffective mixes:

e.g.:

0x023a5ea899df16831d6f3101f4568fddb577c887 is [sub-optimally] farming TORN. They just deposit 100 ETH 20x, then withdraw 100 ETH 20x a few weeks later to collect TORN. Ultimately, this reduces the 100 ETH pool anonymity set by ~0.1%.

0xde3e412d4fe3c9d90ac74d0a9b064951b39eeae4 another example of 100 ETH pool farming

(These farmers obviously aren’t trying to hide their anonymity and the effect on the 100 ETH pool is negligible.)

…Anyways, go through the blockchain and identify all of the self-farming reveals. Then figure out how they impact the anonymity set over time.

We repeat this and identify statistically ineffective rapid mix reveals, MDRs, TORN mining reveals, etc and remove them from the anonymity set. This step will, by definition, be pinpointing and connecting accounts mixed ineffectively.

If you want the anonymity set auditor to be open source then I’m pretty sure this necessitates making all those connections public.

The topic becomes very nuanced. If we can build it then the other guys can too, and innocent users deserve a heads up that their privacy was compromised. On the other hand, by making that information public, you’re also tipping off scammers/hackers that they messed up and could make seeking compensation for victims more difficult.

Spoke to my team and they’re happy and excited to work on this and I see no ethical issues with most of it, but wanted to point this out.

5 Likes

I fully support this motion

Brilliant! This is outstanding! Would be excellent to have you as a contributor - especially in the research phase.

Wow! I was seriously not expecting such a qualified, organic response to this bounty so quickly. This is a very pleasant surprise. Very nice work!

Would love to receive a proposal from you and your team when you are ready. It sounds like you may have additional, talented community contributors available, as well, such as @Phread420 and @Istvan_Andras_Seres. Would be excellent to explore these possible collaborations further

And of course this is still open to any other individuals or teams who are interested. Thanks everyone so far for the overwhelmingly positive response

1 Like

This is such an outstanding and underrated point

Perhaps there is room for a Phase 3 to exist to support layer-0 activity. I’ve previously published a post on building a Torando.cash desktop client. Once Phase 1 & 2 are complete, a desktop client could be developed to:

  • locally integrate a wallet so as not to rely on metamask (and leak your IP or chrome extension ID)
  • integrate tor routing (to cycle through IP addresses)
  • schedule tx signatures (to better obfuscate time zones)

The Phase 1 and 2 tools could then be available in the desktop client to provide “entropy estimations” locally before each tx

3 Likes

Exactly, as we discuss before, a desktop client would be an amazing solution. Even if it is the “old” way, it is more secure than using a web-app with Metamask. I would support this and this would be one of the best tool we could give to the community imo.

5 Likes

My name is Federico Carrone (https://federicocarrone.com/). I am the founder of LambdaClass (https://lambdaclass.com/), a software company specialized in infosec, data engineering and data science. We have helped dozens of traditional finance, backbone internet telecommunication, energy, healthcare, marketing/real-time bidding and gaming development companies during the last decade.

Our team is composed by mathematicians, physicists, biologists and computer/electronic engineers. The programming languages we love the most are Rust, Julia and Erlang. We work solving difficult problems in data engineering and data science with millions of requests per seconds, petabytes of data and thousands of nodes.

In the traditional finance world, we have worked in projects where we had to flag related transactions. We have been working the last three years in crypto projects, so we are pretty new in this lovely place, but we are already working on two big crypto projects right now.

I have already sent an email to István. The paper he wrote is truly fascinating and I am sure we would do an impressive team.

4 Likes

I had to break the post since I am a new user and I can’t post more than 2 links at the same time.

You can check out the hands-on data science book we wrote with my team this year:

You can also check our conference:

And at last our blog notamonadtutorial [dot] com.

3 Likes

Sounds great, we are pretty much aligned on the approach!

My expectation would be:

  • Documenting and explaining each type of reveals
  • A set of opensource tools/scripts to extract each type of reveals (Maybe with a confidence score)
  • The actual results from running the above scripts
  • Discussion of the results and conclusions on how people are potentially misusing Tornado.
  • Bonus point, a simple UI or something to make it easy for anyone to query the “anonymity score” of withdrawal/deposit on an ongoing basis.
3 Likes

Great, those expectation are a great summary of what we should do.

Tomorrow we will do a call with Itsván to submit a proposal and on Thursday I will do a meeting with my team to discuss how to move foward. I am eager to work with Itsván on this.

3 Likes

100% this! In the case of a web app, there would be no convincing way to ensure and prove to the users that we don’t spy on them, i.e., we don’t log IP addresses and other data of theirs. A user-friendly open-source local tool is the way to go with a well-written user manual and documentation. Hopefully, in a few weeks/months, we can put such a tool on the table!

5 Likes

Anonymity Research Tools

Pre-reading

This is a solicitation to the Funded Bounty for Anonymity Research Tools. The final objective of the projected research and proposed tools is to help protect the privacy of Tornado Cash users.

The development of tools to protect anonymity must be focused around the different kinds of reveals that privacy perpetrators exploit. For this, our team proposes to research and identify many common reveals using data science and data engineering techniques in order to warn the Tornado Cash user.

The Problem

In the traditional financial system, bank accounts are publicly linked to a natural person. This means that given a bank account, one can directly reach its owner. However, banks keep all transactions on that account privately.

In most blockchain systems, transaction history is completely public. This means that you can track all the transactions made up to the beginning, given a wallet address.

In principle, this is not a problem because no personal data is needed to create a wallet. Thus, having access to the transactions that an account has made does not compromise the user’s privacy.

Nevertheless, it can be easy to figure out the owner of a wallet’s address in some cases. If the individual is identified, all transactions made by that person could be easily traced, seriously compromising the privacy of the user’s finances.

Tornado Cash is a technology conceived to handle this problem. In Tornado’s own words:

“Tornado Cash improves transaction privacy by breaking the on-chain link between source and destination addresses. It uses a smart contract that accepts ETH & other tokens deposits from one address and enables their withdrawal from a different address.”

But what happens if a user deposits from an account and withdraws using the same account? What if they withdraw using an account that has a long transaction history with the account that made the deposit? These are some of the most common reveals and privacy threats.

Solutions

Today, Tornado users have no good tools available to approximately assess their anonymity and privacy gains. This would be essential to allow them to make informed decisions about which pools they should deposit to or when and how they should withdraw their coins from the used pools.

First, we need to precisely define the anonymity and privacy guarantees we want to achieve or enhance. Afterwards, our goal is to measure (or at least approximate) these privacy guarantees and display them to our users.

Privacy guarantees

In the following, we assume that the adversary has access solely to on-chain data. We do not assume and will not use any off-chain data that a relayer or an Internet Service Provider (ISP) certainly has. For instance, a relayer could assign IP addresses to Tornado withdraw transactions. Since we do not have such data sources, we cannot incorporate such privacy leakages into our analysis. Therefore, the results of our tool serve only as an optimistic upper bound for the achieved privacy guarantees.

Withdraw anonymity

Withdraw anonymity is the most straightforward anonymity guarantee that is provided by a Tornado Cash smart contract. Ideally, a withdrawal transaction might come from all the deposit transactions that occurred before the withdrawal transaction. We call this set of deposit transactions the anonymity set of a withdrawal transaction. However, several heuristics allow observers to exclude deposit transactions from the anonymity set since they are mapped heuristically to other withdraw transactions. See the considered heuristics below. After excluding all the revealed deposit transactions, we can characterize this anonymity guarantee by the size of the remaining anonymity set or the anonymity set’s entropy. Note, we might have probabilistic links between withdrawal and deposit transactions. Hence, we want to use the Shannon-entropy of the anonymity sets instead of just considering the size of the anonymity set.

Withdrawal address unlinkability

Ideally, users withdraw their coins into fresh withdraw addresses. Withdraw address unlinkability ensures that given two withdrawal addresses, one cannot determine whether the withdrawal addresses belong to the same entity or not. We do not consider withdrawal transaction to be unlinkable since many users withdraw multiple transactions to the same withdrawal address. In those cases, it is trivial to establish that the two withdraw transactions were issued by the same user. Therefore, we only focus on the unlinkability of withdrawal addresses. In a perfect case, the adversary cannot determine whether two withdrawal addresses belong to the same person with more accuracy than a random guess. We characterize this privacy guarantee by the adversary’s advantage in breaking the withdrawal address unlinkability (how much better the adversary can distinguish two withdraw addresses better than random guessing).

Deposit address unlinkability

Similar to the previous case, the deposit addresses of the same user should also be unlinkable. In the same fashion, we define the privacy goal, and we measure this privacy guarantee by the capability of an adversary in distinguishing a specific pair of deposit addresses.

Heuristics to reduce users’ anonymity and privacy

Our goal is to develop and implement a bunch of heuristics or reveals that help the users detect when the two addresses they have chosen increments the odds that their privacy was infringed.

Some of the possible heuristics to add to the list described in the Bounty summary could be:

Heuristic 1 - Same deposit and withdraw address

If a deposit address matches a withdraw address, then it is trivial to link the two addresses. Therefore, the deposit address needs to be removed from all the other withdraw addresses’ anonymity set.

Heuristic 2 - Unique gas prices

If there is a deposit and a withdraw transaction with unique gas prices (e.g., 3.1415926 Gwei), then we consider the deposit and the withdraw transactions linked. The corresponding deposit transaction can be removed from any other withdraw transaction’s anonymity set.

Heuristic 3 - Transactions between deposit and withdraw addresses

If there is a transaction issued from a deposit address to a withdraw address (or vice versa), then we consider them as linked. The deposit address is removed from the anonymity sets.

Heuristic 4 - Linking multiple deposit addresses to multiple withdraw addresses

If there are multiple (say 12) deposit transactions coming from a deposit address and later there are 12 withdraw transactions to the same withdraw address, then we can link all these deposit transactions to the withdraw transactions.

Heuristic 5 - Careless usage of anonymity mining

Anonymity mining is a clever way to incentivize users to participate in mixing. However, if users carelessly claim their Anonymity Points (AP) or Tornado tokens, then they can reduce their anonymity set. For instance, if a user withdraws their earned AP tokens to a deposit address, then we can approximate the maximum time a user has left their funds in the mixing pool. This is because users can only claim AP and TORN tokens after deposit transactions that were already withdrawn.

Heuristic 6 - Profiling Deposit and Withdraw addresses

All addresses that have interacted with any of the Tornado Cash pools will be collected and analyzed. We will profile them given their transaction history by inspecting the timestamps on all their transactions, we will also inspect all the services (e.g., Uniswap, Compound, MakerDAO, etc.) they ever interacted with. This will allow us to make likely matches between deposit and withdraw addresses.

Heuristic 7 - Wallet Fingerprints

Different wallets work in different ways. We have several ideas on how we can distinguish between them. It will allow us to further fragment the anonymity sets of withdraw transactions.

The full list of heuristics is yet to be determined by our team, but we expect to expand on this list. The more accurate and strong our heuristics are the better privacy and anonymity estimates we can give to our users.

Data pipelines

We plan to rely on publicly available data sources to get all the necessary data for our analysis solely. We might want to use Google BigQuery, Infura, or other blockchain explorers. The exact details of how we will get the necessary data from the blockchain are yet to be determined.

The programming languages that will be used to tackle the problem will be Python, Julia and Rust.

User protection

Based on the above vulnerability analyses, an interface will be developed to provide feedback to users by informing them of their security level, raise warnings when unsafe actions are being committed and educate them on safe behaviors.

Timeline

  • Stage 1: Research ~ 8 weeks
    Research of the different tactics used to infringe the entropy of a user’s wallet, this will include the analysis of every Tornado cash transaction. For example, the possible heuristics used to detect two accounts linked to the same user.

  • Stage 2: MVP ~ 6 weeks
    Development of a web app that warns Tornado’s users when they are making an unsafe or unwanted reveal in their transactions according to the information gathered in the first stage. A calculation of the user’s wallet entropy will be shown as well.

  • Stage 3: Full App ~ 8 weeks
    Adding of features and fixing bugs detected by the community and the users.
    Development of a UI to provide a more accessible platform with clear warnings.

  • Stage 4: Documentation ~ 4 weeks
    Writing of a paper that collects the research and analysis of every reveal that the application solves alongside its explanation.

Methodology

The project will be developed openly in GitHub with open source programming languages. We will invite users and the broader Tornado community to inspect and extend our tool. Every 4 weeks a report will be submited to the forum so that the community can inspect and comment about the work being done.

Team

The team is composed by engineers, physicists, and mathematicians specialized in machine learning and data engineering who are constantly investigating and applying cutting-edge technologies to solve the most advanced technical challenges.

The members in charge of the team:

  • István A. Seres, applied mathematician that will be in charge of defining the heuristics and the research part of the project.
  • Federico Carrone, tech lead that will be in charge of a team of computer scientists, computer engineers and data scientists (mathematicians, physicists, industrial engineer) from his team at LambdaClass

References

4 Likes

@ethdev, @Justin_Bram or @Rezan could you check the proposal? :slight_smile:

This is a very exciting opportunity to build something for the community! @drghost, a team and I are in the process of writing a complementary proposal to that of @Istvan_Andras_Seres and unbalancedparen (sorry I couldn’t tag you!).

We will submit it tomorrow!

4 Likes

Sorry for the slow replies. It seems that we have 2 great teams interested in the project. I propose to create a shared Telegram group to discuss the terms of the Bounty. Feel free to reach out, my TG handle is @Rezan_vm

Overview

This is a solicitation for the Funded Bounty of Anonymity Tools. In the following proposal, we hope to complement more research-focused proposals with one more focused on the User Application Phase. Our objective is to build an open-source tool with a simple UI to empower any Tornado Cash user to protect their privacy more effectively.

The Tornado Cash User’s Dilemma

Tornado cash users have multiple addresses and use Tornado Cash to obfuscate this fact. The most important need for this user base is to know whether their addresses are already compromised. In response, our initial MVP will focus on informing users which of their Ethereum addresses are “affiliated” (a non-blockchain analogy would be haveibeenpwned . com). Critically, we believe this is an important next step in building a community where privacy is a first-class citizen, and one where Tornado cash users are aware of when and what reveals have been committed.

Our Approach

The most efficient approach to reach this outcome, is to develop a clustering algorithm to correlate affiliated wallets. It would allow Tornado cash users to see which addresses are closely associated with theirs, and further infer the different types of reveals they or others have committed. The clustering algorithm would leverage known heuristics. We believe that cluster or entity-level data is necessary to fully audit the anonymity set of Tornado . Cash pools and confidently identify reveals.

A hypothetical example that illustrates the necessity of clustering:

If a user deposits 1424.0 ETH from one account into Tornado.cash, and then a few minutes later withdraws 1424.0 ETH to a single account, then simple synchronous reveal and multi-denomination reveal (MDR) heuristics would likely identify it.

However, if the user had instead deposited 1424.0 ETH from one account, and then withdrawn it to several different accounts which they connected post-Tornado, then an adversary would need to apply the heuristics at the cluster/entity-level.

In short, clustering presents a more generalizable solution than a heuristic set. It is also more flexible and be extended to be probabilistic (the more Tornado reveals that wallets make, the closer they will be clustered together) and dynamic (if new reveal heuristics surface, they can be easily incorporated to refine clusters). As a reach goal, ML algorithms can be applied to automate clustering using blockchain and API data. With a powerful clustering algorithm, we can derive privacy scores based on the clustering “density” around any individual address.

Data Sources

We will focus on using on-chain data scraped from blockchain transactions. Tools we will leverage include but are not limited to Infura, Web3, Etherscan, and Google BigQuery [1]. For later phases of the proposal, we may consider off-chain data, such as application APIs like ENS, OpenSea or popular games built on blockchain.

Methodology

We will develop the backend in Python and the front end in React. Any ML components will use PyTorch. Code will be open-sourced on Github post-development. We intend to invite users and the broader Tornado community to inspect and extend our tool. We will provide an update report to the community at the end of each phase OR every 4 weeks until completion (whichever is more frequent).

Product Roadmap

We split the development into two phases, the first of which focuses on a minimally viable product, whereas the second adds additional features for a more complete product.

Phase 1

Front End

A React application with a primary page containing a search bar (think Etherscan).

  • Input - Allow users to input a wallet address.
  • Return - Return the addresses associated with that account along with (normalized) confidence scores. Optionally, the UI may also display summary scores describing the level of privacy or anonymity for an individual user.
  • Additional features (e.g. rate limiting, account creation, batch search, additional statistics) may be implemented upon request from the Tornado cash team.

Back End

Our initial objective is to develop clusters of addresses that we believe are owned by single entities using the Deposit Address Reuse heuristic [2]:

Most centralized exchanges have deposit addresses that are unique to each customer. To credit cryptoassets to the correct account, exchanges often create deposit addresses, which then forward received funds to a main address. As deposit addresses are created per customer, multiple addresses that send funds to the same deposit address are highly likely to be controlled by the same entity. This technique has shown promising results in an academic setting to cluster addresses effectively. Care must be taken on numerous edge cases. The primary challenge of this approach is identifying deposit addresses. Naively finding all addresses that forward tokens to exchanges would be fairly noisy and inaccurate. A significant burden of our project is to develop a good algorithm for identifying these deposits. The paper proposes several heuristics to filter candidate addresses (e.g. asserting maximum amount differences, or maximum time differences), which serves as a good starting point. Once deposit addresses are found, we can create a mapping between addresses since both (1) multiple addresses that send to a single deposit and (2) an address that sends to multiple deposits, are revealing. Given a computational graph between wallet and deposit addresses, we can cluster by computing weakly connected components using standard topology algorithms.

Q: Why start with a heuristic that is broader than Tornado Cash use cases?

A: For two reasons: 1.) it provides the largest mapping of clustered addresses, which can form a basis of incremental heuristics and 2.) Tornado Cash users are not just using Tornado Cash - they are likely interacting with much of the Ethereum ecosystem - and their anonymity cannot be determined just by how they interact with Tornado Cash. Although there are many heuristics that we can use to do clustering, we see heuristic development as a complementary but distinct project. We focus on building the infrastructure so that we can quickly and easily incorporate new heuristics into cluster algorithms, and translate that into privacy scores.

Phase 2

Front End

Beyond an anonymity score, we propose to expand on Phase I to add transaction-by-transaction granularity. Specifically, we could show users the impact of individual transactions (made in the past) on their anonymity scores, as well as annotate relevant reveals for each transaction. This would add transparency and explainability to an otherwise “black-box” score. We also propose to add a plot of a user’s anonymity score over time, flagging anomalies (i.e., transactions that resulted in large changes in anonymity score). Users can view this plot to track the impact of recent transactions on their blockchain privacy.

Back End

After developing an initial basis of clustering, there are several avenues towards a more sophisticated algorithm. We consider three different directions:

  1. Building in Tornado cash specific reveals

    Once the infrastructure is built (phase I), it should be easy to incorporate new heuristics. In particular, we would build a subset of the Tornado Cash specific reveals (from the bounty description), such as the synchronous transactions, rapid mix, multi-denomination, TORN mining, and wallet fingerprinting reveals. Additionally, if other proposals or the community at large finds new interesting reveals, these can be incorporated as well.

  2. Incorporate ethereum application data

    Phase I relied solely on blockchain transaction data to cluster addresses. We propose to add rich data from applications (e.g. ENS, OpenSea, blockchain games, NFTs) to further strengthen correlations between addresses, as well as from other blockchains where addresses are reused. We may also attempt wallet fingerprinting based on gas price or other idiosyncrasies. As proof of concept, we plan to limit to 2 or 3 integrations but we could build out the Ethereum equivalent of whatsmyname.app.

  3. Refining the anonymity score

    Ideally, the privacy score should be a proper probabilistic value e.g. a score of 0.87 would mean my address has a privacy rating higher than 87% of all addresses. Doing this requires “calibration” [3], a technique popular in ML research for re-scoring neural network predictions. We will investigate applying this and similar methods to our privacy scores.

  4. Advanced algorithms for clustering (Optional)

    Although composing heuristics together manually can build a strong clustering algorithm, the holy grail would be to automatically learn these heuristics from blockchain data, leveraging recent advances in deep learning. In particular, there is an opportunity to apply modern representation learning algorithms for graph structures. Here, a “representation” is a continuous (high-dimensional) vector. If time and energy allows, we will dedicate cycles to investigate this. Possible approaches include:

    1. Self-supervised (“contrastive”) objectives [4] where transactions are used as augmentations for nodes. Representations are invariant to these augmentations, essentially factoring who an address interacts with into its embedding.
    2. Leverage Node2Vec [5], which supposes dynamic network neighborhoods (which is certaintly true in the blockchain setting). We can reframe transactions between nodes as a sort of “random walk” exploration that Node2Vec requires.
    3. Learn “useful” representations by predicting future txs. We can learn an embedding by collecting a historical dataset. Then, we can train a neural network to predict the T-th tx from the txs 1 to T-1. Use an hidden activation as the embedding.

    Given a vector representation, we can apply standard clustering algorithms e.g. kNN to cluster. Since these representations are “semantic”, clustering could be useful.

Timeline (10 weeks total)

  • Phase 1: MVP - 3 weeks

    Develop a web app and initial clustering algorithm that allows Tornado cash users to input addresses and see what addresses they are affiliated with.

  • Phase 2: Full App - 5 weeks

    Fix bugs detected by the community and the users. Add transaction-by-transaction granularity , refine the anonymity scores, and test more advanced clustering algorithms. Expand UI to provide a more accessible platform with clear warnings.

  • Phase 3: Documentation - 2 weeks

    Write a paper detailing our research and analysis of reveals, leveraging Nick’s experience tracing multiple major hacks. Open-source code.

Team

The team is composed of a number of Stanford students and graduates with a complementary skill set for the proposed application, including:

  • Nick Bax, a Stanford PhD graduate who has traced funds related to several hacks and recently published on tracing the WannaCry 2.0 malware Monero transactions. Nick will lead the identification of heuristics.
  • Mike Wu, a Stanford PhD in Machine Learning, who will drive the clustering and ML analysis. Mike’s research is on unsupervised learning algorithms (e.g., clustering) and has been featured in the New York Times. He has published 25 papers at top AI conferences (e.g., NeurIPS, ICLR, AISTATS, etc.). Mike was previously a software engineer in Facebook’s applied machine learning group.
  • Will McTighe, a Stanford MBA, will PM the effort. Will was previously a growth-stage tech investor at Vitruvian Partners, and before that worked in tech investment banking at Goldman Sachs. Will graduated from University of Warwick with a BSc in Philosophy, Politics & Economics (Math Econ and Philosophy focus).
  • Kaili Wang, a 4th year computer science major at Stanford, will lead front-end development. Kaili has most recently done full-stack development at Robinhood, where she built tools for the risk & fraud team to segment out crypto-related fraud. She also has worked at Amazon and PlusAI (both primarily front-end work).
  • The team is supported by Convex Labs, a crypto R&D company comprised of former members of the Stanford Blockchain Club with experience in blockchain analysis, building crypto-native protocols, and NFT projects.

Acknowledgements

We would like to thank F. Victor for insightful discussion.

P.S.

Thank you for your consideration - we are very excited about this project! We were restricted on number of links so apologies for the referencing!

References

[1] Day, A., Medvedev, E. Ethereum in BigQuery: a Public Dataset for smart contract analytics. (2018)

[2] Friedhelm, V. Address clustering heuristics for Ethereum. (2020)

[3] Guo, C. et al. On Calibration of Modern Neural Networks. (2017)

[4] Wu, L., et al. Self-supervised on Graphs: Contrastive, Generative,or Predictive. (2021)

[5] Grover, G., Leskovec, J. node2vec: Scalable Feature Learning for Networks. (2016)

8 Likes

We had a call between the two teams and we are going to prepare a short joint proposal! We should have it ready in the next few days.

2 Likes

If anybody decides to use BigQuery for this work, I’ve written up some lessons learned and have some code that should save you some time:

6 Likes

Joint Proposal

The below proposal should be considered in combination with the ideas put forward in the two separate proposals posted by myself and @unbalancedparen. We welcome feedback from the community.

Our final deliverables will be twofold:

  1. An open-source web-based (.onion domain) tool with a simple UI to empower any Tornado Cash user to protect their privacy more effectively
  2. A report on our technical approach and the Ethereum & Tcash heuristics we assess, as well as a measure of anonymity set for Tcash pools

Bounty Split

  • 50% to Team Stanford and 50% Lambda subject to all parties (Team Stanford, Team Lambda, TCash Community) being happy with the contributions of each party to the final output

Division of Work

We have split the work up into discrete pieces with a champion team, denoted in brackets below. The other team will be responsible for code-checking and giving feedback on features during the project:

  1. Front-end Website (Stanford)
  2. Back-end
  • Generalizable clustering algorithm (Stanford)

  • Coding and testing existing heuristics (Both teams)

    1. Ethereum heuristics (e.g. deposit reuse) (Stanford)

    2. Tcash heuristics (e.g. Synchronous, rapid mix, multi-denominational and TORN mining reveals) (Istvan / Lambda)

    3. Coding and Testing New Tcash heuristics (e.g. wallet fingerprinting) (Istvan / Lambda)

  1. Final Report (both teams)
  • We will jointly create a paper on our technical approach and the heuristics

Timeline & Deliverables

See below for a short-form timeline with deliverables. In the appendix, we include more detail on each phase. We will seek input and feedback from the Tcash team and community after each phase.

  1. Phase 1 - 3 weeks - 12th Nov
  • Deliverables: Website v1
  1. Phase 2 - 5 weeks - 17th Dec
  • Deliverables: Website v2 & Initial Report
  1. Phase 3 - 3 weeks - 7th Jan
  • Deliverables: Full Website & Expanded Report
  1. Phase 4 - 3 weeks - 28th Jan
  • Deliverables: Final Report

Communication

  • Two short team catch ups each week (e.g. Tuesday & Friday) to provide updates
  • Potentially mix teams to further increase collaboration and knowledge dissemination

Coding Norms

  • Python, Elixir and Julia for back-end, Javascript for front-end
  • Development through pull requests with potentially cross-team code reviews

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Appendix

Phase 1 - Website v1.0

Stanford

  • Front-end: initial web interface; and connection to backend
    • Input wallet addresses and return affiliated wallets with a clustering (“privacy”) score
  • Back-end: build the initial clustering algorithm based on the deposit reuse heuristic

Lambda

  • Back-end: identify, codify and test existing Tcash heuristics based on Tcash transaction data; develop thinking around other Ethereum heuristics (e.g. unique gas prices & temporal wallet usage)
  • Find and test new Tcash specific heuristics

Phase 2 - Website v2.0 & Initial Report

Stanford

  • Front-end: transaction by transaction granularity; annotated relevant reveals for each transaction; and a plot of anonymity score over time
  • Back-end: integrate and refine Tcash specific reveals; explore more advanced algorithms for clustering and anonymity scoring

Lambda

  • Back-end: test existing Tcash reveals; find and test new reveals; begin to integrate Tcash reveals into front-end
  • Report: write up Tcash specific reveals with supporting data

Phase 3 - Final Website & Expanded Report

Both teams

  • Integrate work (finish adding Tcash reveals into front-end) and develop joint report with research and application findings

Phase 4 - Final Report

Both teams

  • Finalize joint report
5 Likes