Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

GraphSense Transformation Output Tables

Rainer Stuetz edited this page Mar 19, 2019 · 11 revisions

The GraphSense transformation pipeline reads raw block data, which is ingested into Cassandra by the graphsense-blocksci component, and computes de-normalized views, which are again stored in Cassandra.

This is a documentation of all generated output tables and fields.

summary_statistics

Provides summary statistics for a computed currency-specific dataset.

  • no_blocks: number of processed blocks
  • no_address_relations: number of address relations
  • no_addresses: total number of distinct addresses
  • no_clusters: number of computed clusters (with > 1 addresses)
  • no_transactions: number of transactions
  • timestamp: timestamp of the most recent block considered in dataset

address

Provides summary statistics for a given cryptocurrency address.

  • address_prefix: first five characters of the address; used for internal dataset partitioning and lookup only (e.g., t1XBa)
  • address: cryptocurrency address (e.g., t1XBa17NfzHrCN8Kn6NtnVdXkGxpjoyZKPr)
  • first_tx: the first transaction the address has been involved in (e.g., "{height: 481827, tx_hash: 0xbae701eb8e4a552fff840ece1d84ffbcdfc6be650fcb351c921344c0041da4cd, timestamp: 1550216321}")
  • last_tx: the last transaction the address has been involved in (e.g., "{height: 481827, tx_hash: 0xbae701eb8e4a552fff840ece1d84ffbcdfc6be650fcb351c921344c0041da4cd, timestamp: 1550216321}")
  • no_incoming_txs: the number of transactions using this address as input (e.g., 1)
  • no_outgoing_txs: the number of transactions using this address as output (e.g., 0)
  • in_degree: the number of incoming address graph edges
  • out_degree: the number of outgoing address graph edges
  • total_received: total amount of currency units received (e.g., "{satoshi: 73450000, eur: 2341.41, usd: 2644.98}")
  • total_spent: total amount of currency units spent ("{satoshi: 0, eur: 0, usd: 0}")

Note: GraphSense stores cryptocurrency subunits for maintaining precision in computations. The field name "satoshi" is for legacy reasons.

address_cluster

The assignment of an address to a cluster computed via the multiple-input heuristics

  • address_prefix: first five characters of the address; used for internal dataset partitioning and lookup only (e.g., t1XBa)
  • address: cryptocurrency address (e.g., t1XBa17NfzHrCN8Kn6NtnVdXkGxpjoyZKPr)
  • cluster: GraphSense specific cluster id (e.g., 2993355)

address_incoming_relations

The set of weighted, directed edges between two addresses in the address graph.

  • dst_address_prefix: first five characters of the address; used for internal dataset partitioning and lookup only (e.g., t1QZX)
  • dst_address: the destination node (address) of an edge (e.g., t1QZX18FLxsSTqzEuUNeApmcfrqVo3sBVjn)
  • estimated_value: the estimated flow of currency units from the source to the destination address (e.g., "{satoshi: 4137938, eur: 300.97, usd: 358.97}")
  • src_address: the source node (address) of an edge (e.g., t1L872tHAgBEzn4a26i6trKf5Dr3RyvBdBV)
  • no_transactions: the number of transactions from src_address to dst_address
  • src_properties: a selection of statistical properties of the source address (e.g., "{total_received: 191116171031804, total_spent: 182349215096398}")

address_outgoing_relations

Same as address_incoming_relations but opposite direction (src_address and dst_address switched)

address_tags

  • address: tagged cryptocurrency address (e.g., t1ZmpK4QFcvyQZ3ghTgSboBW8b4HgiZHQF9)
  • tag: the human-readable tag name (e.g., Internet Archive)
  • source: tag source (e.g., Internet Archive Web Site)
  • source_uri: tag source URI (e.g., https://archive.org/donate/cryptocurrency/)
  • actor_category: a field for categorizing the real-world actor behind an address (e.g., organization, exchange, miner, etc.)
  • description: a human-readable description (e.g., "Internet Archive Zcash address")
  • tag_uri: tag URI (e.g., https://archive.org/donate/cryptocurrency/)
  • timestamp: UNIX timestamp indicating when a tag has been created (e.g., 1552912648)

address_transactions

The transactions an address was involved in either as input or output

  • address_prefix: first five characters of the address; used for internal dataset partitioning and lookup only (e.g., t1fB3)
  • address: the address (e.g., t1fB36H7W9f6aHqFwR2NdKYrzqo1dKK7GWf)
  • height: height of the block the transaction belongs to (e.g., 128680)
  • tx_hash: the transaction hash (e.g., 0x42229e12cdec6b13d704799533dc140784cf4d220780d32c93c2b302f24e7b1e)
  • timestamp: the transaction timestamp (e.g., 1496983505)
  • tx_index: GraphSense internal transaction index
  • value: value (in cryptocurrency sub-units) assigned to an address (negativ if address was used input; positive if address was used as output)

cluster

Provides summary statistics for a given cryptocurrency address cluster.

  • cluster: GraphSense internal cluster identifier
  • first_tx: the first transaction an address of this cluster has been involved in
  • last_tx: the most recent transaction an address of this cluster had been involved in
  • no_addresses: the number of addresses in this cluster
  • no_incoming_txs: the number of transactions using cluster addresses as input
  • no_outgoing_txs: the number of transactions using cluster addresses as input
  • in_degree: the number of incoming cluster graph edges
  • out_degree: the number of incoming cluster graph edges
  • total_received: total amount of currency units received by the cluster
  • total_spent: total amount of currency units spent by the cluster

cluster_addresses

Statistical summary of addresses contained in a cluster.

  • cluster: GraphSense internal cluster identifier
  • address: cryptocurrency address contained in a cluster
  • first_tx: the first transaction the address has been involved in
  • last_tx: the most recent transaction the address has been involved in
  • no_incoming_txs: the number of transactions using this address addresses as input
  • no_outgoing_txs: the number of transactions using this address addresses as output
  • in_degree: the number of incoming address graph edges
  • out_degree the number of outgoing address graph edges
  • total_received: total amount of currency units received by the cluster
  • total_spent: total amount of currency units spent by the cluster

cluster_incoming_relations

This table follows the same structure as address_incoming_relations with src and dst nodes being cluster nodes instead of addresses.

cluster_outgoing_relations

Same as cluster_incoming_relations but opposite direction (src_cluster and dst_cluster switched)

cluster_tags

Same structure as address_tags, with additional cluster identifiers for addresses