Skip to content

Commit 182b29e

Browse files
Initial Elliptic++ Release
0 parents  commit 182b29e

32 files changed

+39079
-0
lines changed

Diff for: .gitattributes

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.csv filter=lfs diff=lfs merge=lfs -text

Diff for: Actors Dataset/AddrAddr_edgelist.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:ffba894458e262a691e5e4d006f5dc1d0e069fabfe828f443fd157bf7f8393f2
3+
size 200631481

Diff for: Actors Dataset/AddrTx_edgelist.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:f5f903f752387f66a1bccaeff54e293b2e8470fcddf5eb56b88aa06fd23a8f3b
3+
size 21248388

Diff for: Actors Dataset/Elliptic++_Actors_ActorInteraction_Graph_Viz.ipynb

+3,237
Large diffs are not rendered by default.

Diff for: Actors Dataset/Elliptic++_Actors_AddrTx_Graph_Viz.ipynb

+3,575
Large diffs are not rendered by default.

Diff for: Actors Dataset/Elliptic++_Actors_Classification.ipynb

+5,594
Large diffs are not rendered by default.

Diff for: Actors Dataset/Elliptic++_Actors_Dataset_Statistics.ipynb

+6,581
Large diffs are not rendered by default.

Diff for: Actors Dataset/Elliptic++_Actors_Feature_Analysis.ipynb

+4,685
Large diffs are not rendered by default.

Diff for: Actors Dataset/README.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Elliptic++ Actors (Wallet Addresses) Dataset: A Graph Network of Bitcoin Blockchain Wallet Addresses
2+
3+
The Elliptic++ dataset consists of 822k wallet addresses to enable the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.
4+
5+
If you have any questions or create something with this dataset, please let us know by email: [[email protected]](mailto:[email protected]).
6+
7+
## Dataset Summary
8+
9+
| | |
10+
|---|---|
11+
| # Wallet addresses | 822,942 |
12+
| # Nodes (temporal interactions) | 1,268,260 |
13+
| # Edges (addr-addr) | 2,868,964 |
14+
| # Edges (addr-tx-addr) | 1,314,241 |
15+
| # Time steps | 49 |
16+
| # Illicit (class-1) | 14,266 |
17+
| # Licit (class-2) | 251,088 |
18+
| # Unknown (class-3) | 557,588 |
19+
| # Features | 56 |
20+
21+
## Dataset Tutorials
22+
23+
We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks cover dataset statistics, graph visualization, model training and classification, and feature refinement.
24+
25+
[`Actors dataset statistics`](Elliptic++_Actors_Dataset_Statistics.ipynb) : overall actors data statistics.
26+
<p align="center">
27+
<img src="../images/addrstats.jpg" width="650" alt="addrstats" /><br>
28+
</p>
29+
30+
[`Actors graph visualization (Actor Interaction)`](Elliptic++_Actors_ActorInteraction_Graph_Viz.ipynb) : visualizations of the Actor Interaction graph (addr-addr graph).
31+
<p align="center">
32+
<img src="../images/actorvizaddr.jpg" width="800" alt="actorvizaddr" /><br>
33+
</p>
34+
35+
[`Actors graph visualization (Address-Transaction)`](Elliptic++_Actors_AddrTx_Graph_Viz.ipynb) : visualizations of the Address-Transaction graph (addr-tx-addr graph).
36+
<p align="center">
37+
<img src="../images/actorvizaddrtx.jpg" width="550" alt="actorvizaddrtx" /><br>
38+
</p>
39+
40+
[`Actors classification`](Elliptic++_Actors_Classification.ipynb) : model training and classification on the actors data.
41+
<p align="center">
42+
<img src="../images/classification.jpg" width="550" alt="actorclassification" /><br>
43+
</p>
44+
45+
[`Actors feature analysis`](Elliptic++_Actors_Feature_Analysis.ipynb) : feature importance analysis of the actors data.
46+
<p align="center">
47+
<img src="../images/actorsfeatureanalysis.jpg" width="680" alt="actorsfeatureanalysis" /><br>
48+
</p>
49+
50+
51+
## Top-Level Directory Organization
52+
53+
.
54+
├── wallets_features.csv # Feature data for all actors
55+
├── wallets_features.csv # Feature data for all actors
56+
├── wallets_classes.csv # Class data for all actors
57+
├── AddrAddr_edgelist.csv # Address-Address graph edgelist
58+
├── AddrTx_edgelist.csv # Address-Transaction graph edgelist
59+
├── TxAddr_edgelist.csv # Transaction-Address graph edgelist
60+
├── Elliptic++ Actors Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
61+
├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb # Tutorial notebook: address-address graph visualization
62+
├── Elliptic++ Actors AddrTx Graph Viz.ipynb # Tutorial notebook: address-transaction-address graph visualization
63+
├── Elliptic++ Actors Classification.ipynb # Tutorial notebook: model training and classification
64+
├── Elliptic++ Actors Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
65+
└── README.md
66+
67+
68+
# Citation
69+
70+
If you use our dataset in your work, please cite [our paper]().
71+
72+
> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.
73+
74+
# Acknowledgement
75+
76+
Released by: [Youssef Elmougy](https://www.yelmougy.com), [Ling Liu](https://www.cc.gatech.edu/home/lingliu/)
77+
78+
School of Computer Science, Georgia Institute of Technology
79+
80+
81+
If you have any questions or create something with this dataset, please let us know by email: [[email protected]](mailto:[email protected]).

Diff for: Actors Dataset/TxAddr_edgelist.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:9f5afbdde7bc3d91fb7a4655be55799d6504cd0063ae55a0753a5a41189932b8
3+
size 36702878

Diff for: Actors Dataset/wallets_classes.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:4e5132c99f941666bf1fefd4100a1428d339c9252ec6987909e1adf8eac902f9
3+
size 30421134

Diff for: Actors Dataset/wallets_features.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:317daca2810c355ddfdb8c0dab34cf11d1aa90567fe975090c7e5a901386eb77
3+
size 606463522

Diff for: Actors Dataset/wallets_features_classes_combined.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:99bf27f7b76d6578ad59e0a61ec225ecd656fef6ae6958f29435ab286be2cc7d
3+
size 609000048

Diff for: README.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses
2+
3+
The Elliptic++ dataset consists of 203k Bitcoin transactions and 822k wallet addresses to enable both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.
4+
5+
If you have any questions or create something with this dataset, please let us know by email: [[email protected]](mailto:[email protected]).
6+
7+
## Dataset Summary
8+
9+
The Elliptic++ dataset contains a transactions dataset and an actors (wallet addresses) dataset.
10+
11+
Elliptic++ Transactions Dataset:
12+
13+
| | |
14+
|---|---|
15+
| # Nodes (transactions) | 203,769 |
16+
| # Edges (money flow) | 234,355 |
17+
| # Time steps | 49 |
18+
| # Illicit (class-1) | 4,545 |
19+
| # Licit (class-2) | 42,019 |
20+
| # Unknown (class-3) | 157,205 |
21+
| # Features | 183 |
22+
23+
Elliptic++ Actors (Wallet Addresses) Dataset:
24+
25+
| | |
26+
|---|---|
27+
| # Wallet addresses | 822,942 |
28+
| # Nodes (temporal interactions) | 1,268,260 |
29+
| # Edges (addr-addr) | 2,868,964 |
30+
| # Edges (addr-tx-addr) | 1,314,241 |
31+
| # Time steps | 49 |
32+
| # Illicit (class-1) | 14,266 |
33+
| # Licit (class-2) | 251,088 |
34+
| # Unknown (class-3) | 557,588 |
35+
| # Features | 56 |
36+
37+
## Dataset Tutorials
38+
39+
We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks are available for both datasets and cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.
40+
41+
[`Transactions dataset statistics`](Transactions%20Dataset/Elliptic++_Transactions_Dataset_Statistics.ipynb) : overall transactions data statistics.
42+
<p align="center">
43+
<img src="images/txsstats.jpg" width="680" alt="txsstats" /><br>
44+
</p>
45+
46+
[`Actors dataset statistics`](Actors%20Dataset/Elliptic++_Actors_Dataset_Statistics.ipynb) : overall actors data statistics.
47+
<p align="center">
48+
<img src="images/addrstats.jpg" width="650" alt="addrstats" /><br>
49+
</p>
50+
51+
[`Transactions graph visualization`](Transactions%20Dataset/Elliptic++_Transactions_Graph_Visualization.ipynb) : visualizations of the Money Flow Transaction graph (tx-tx graph).
52+
<p align="center">
53+
<img src="images/txsviz.jpg" width="800" alt="txsviz" /><br>
54+
</p>
55+
56+
[`Actors graph visualization (Actor Interaction)`](Actors%20Dataset/Elliptic++_Actors_ActorInteraction_Graph_Viz.ipynb) : visualizations of the Actor Interaction graph (addr-addr graph).
57+
<p align="center">
58+
<img src="images/actorvizaddr.jpg" width="800" alt="actorvizaddr" /><br>
59+
</p>
60+
61+
[`Actors graph visualization (Address-Transaction)`](Actors%20Dataset/Elliptic++_Actors_AddrTx_Graph_Viz.ipynb) : visualizations of the Address-Transaction graph (addr-tx-addr graph).
62+
<p align="center">
63+
<img src="images/actorvizaddrtx.jpg" width="550" alt="actorvizaddrtx" /><br>
64+
</p>
65+
66+
[`Transactions classification`](Transactions%20Dataset/Elliptic++_Transactions_Classification.ipynb) : model training and classification on the transactions data.
67+
<p align="center">
68+
<img src="images/classification.jpg" width="550" alt="txsclassification" /><br>
69+
</p>
70+
71+
[`Actors classification`](Actors%20Dataset/Elliptic++_Actors_Classification.ipynb) : model training and classification on the actors data.
72+
<p align="center">
73+
<img src="images/classification.jpg" width="550" alt="actorclassification" /><br>
74+
</p>
75+
76+
77+
[`Transactions case analysis`](Transactions%20Dataset/Elliptic++_Transactions_Case_Analysis.ipynb) : unique case (EASY, HARD, AVERAGE) analysis using the transactions data.
78+
<p align="center">
79+
<img src="images/txscaseanalysis.jpg" width="680" alt="txscaseanalysis" /><br>
80+
</p>
81+
82+
83+
[`Transactions feature analysis`](Transactions%20Dataset/Elliptic++_Transactions_Feature_Analysis.ipynb) : feature importance analysis of the transactions data.
84+
<p align="center">
85+
<img src="images/txsfeatureanalysis.jpg" width="680" alt="txsfeatureanalysis" /><br>
86+
</p>
87+
88+
[`Actors feature analysis`](Actors%20Dataset/Elliptic++_Actors_Feature_Analysis.ipynb) : feature importance analysis of the actors data.
89+
<p align="center">
90+
<img src="images/actorsfeatureanalysis.jpg" width="680" alt="actorsfeatureanalysis" /><br>
91+
</p>
92+
93+
94+
## Top-Level Directory Organization
95+
96+
The folder structure of this dataset repository is as follows:
97+
98+
.
99+
├── Transactions Dataset # Contains csv files and tutorial notebooks for the Elliptic++ Transactions Dataset
100+
│ ├── txs_features.csv # Feature data for all transactions
101+
│ ├── txs_classes.csv # Class data for all transactions
102+
│ ├── txs_edgelist.csv # Transaction-Transaction graph edgelist
103+
│ ├── Elliptic++ Transactions Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
104+
│ ├── Elliptic++ Transactions Graph Visualization.ipynb # Tutorial notebook: transaction-transaction graph visualization
105+
│ ├── Elliptic++ Transactions Classification.ipynb # Tutorial notebook: model training and classification
106+
│ ├── Elliptic++ Transactions Case Analysis.ipynb # Tutorial notebook: Unique case (EASY, HARD, AVERAGE) analysis
107+
│ └── Elliptic++ Transactions Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
108+
├── Actors Dataset # Contains csv files and tutorial notebooks for the Elliptic++ Actors Dataset
109+
│ ├── wallets_features.csv # Feature data for all actors
110+
│ ├── wallets_classes.csv # Class data for all actors
111+
│ ├── AddrAddr_edgelist.csv # Address-Address graph edgelist
112+
│ ├── AddrTx_edgelist.csv # Address-Transaction graph edgelist
113+
│ ├── TxAddr_edgelist.csv # Transaction-Address graph edgelist
114+
│ ├── Elliptic++ Actors Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
115+
│ ├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb # Tutorial notebook: address-address graph visualization
116+
│ ├── Elliptic++ Actors AddrTx Graph Viz.ipynb # Tutorial notebook: address-transaction-address graph visualization
117+
│ ├── Elliptic++ Actors Classification.ipynb # Tutorial notebook: model training and classification
118+
│ └── Elliptic++ Actors Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
119+
└── README.md
120+
121+
122+
# Citation
123+
124+
If you use our dataset in your work, please cite [our paper]().
125+
126+
> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.
127+
128+
# Acknowledgement
129+
130+
Released by: [Youssef Elmougy](https://www.yelmougy.com), [Ling Liu](https://www.cc.gatech.edu/home/lingliu/)
131+
132+
School of Computer Science, Georgia Institute of Technology
133+
134+
135+
If you have any questions or create something with this dataset, please let us know by email: [[email protected]](mailto:[email protected]).

0 commit comments

Comments
 (0)