Bitcoin Address Behavior Dataset (BABD) for Pattern Analysis: Key Insights and Methodology

·

Introduction

Bitcoin's growing popularity comes with inherent risks of misuse. While graph-based analysis offers potential for tracking cryptocurrency transactions, the lack of comprehensive datasets has been a significant barrier. This paper introduces the Bitcoin Address Behavior Dataset (BABD), a heterogeneous framework for constructing Bitcoin transaction graphs to extract analytical features. The dataset comprises:

👉 Discover how Bitcoin transaction patterns reveal hidden insights

Key Challenges in Existing Research

Current Bitcoin transaction analysis methods face three critical limitations:

  1. Incomplete Address Typing: Most studies analyze ≤7 address types, insufficient for deep behavioral understanding.
  2. Unsystematic Metrics: Existing indicators lack categorization and omit crucial graph-derived features.
  3. Limited Reproducibility: Few studies disclose how transaction graphs are constructed.

Dataset Construction Methodology

1. Heterogeneous Graph Structure

Unlike simplified Bitcoin graphs that lose information, BABD uses a directed heterogeneous multigraph that preserves:

This structure minimizes network information loss during pattern analysis.

2. Address Behavior Classification

The dataset categorizes 13 distinct Bitcoin wallet behaviors:

Behavior TypeExamples
Criminal ActivitiesRansomware, Darknet markets
Financial ServicesExchanges, P2P lending
AnonymizationMixers, Laundering
InfrastructureMining pools, Personal wallets

3. Data Collection Pipeline

  1. Network Crawling: API-based scraping of Bitcoin ledger data (100,001 blocks)
  2. Label Verification: Manual validation of address tags
  3. Data Categorization: Separation into Strong Addresses (SA) and Weak Addresses (WA)

Feature Extraction Framework

Statistical Indicators (SI)

CategoryFeaturesDescription
PAIToken countPure amount metrics
PDIPurity ratiosAddress correlation
PTITimestampsTemporal patterns
CICombined featuresHybrid indicators

Local Structure Indicators (LSI)

The 4-hop subgraph algorithm captures network topology by:

  1. Converting transaction graphs to undirected networks
  2. Extracting structural features:

    • Degree correlation
    • Betweenness centrality
    • PageRank values
    • Network density

👉 Learn how subgraph analysis improves Bitcoin tracking

Experimental Results

Performance Metrics

ModelAccuracyPrecisionRecallF1-Score
XGBoost96.71%96.46%96.71%96.57%
Random Forest95.62%95.21%95.62%95.38%
SVM93.24%92.80%93.24%92.97%

The SI+LSI combined features achieved consistent performance across all 13 classification tasks, demonstrating the framework's robustness.

Key Takeaways

  1. Heterogeneous Graphs provide richer analytical context than simplified structures
  2. Subgraph Sampling enables scalable analysis of massive transaction networks
  3. Composite Features (SI+LSI) outperform single-indicator approaches

FAQ Section

Q1: How does BABD improve upon existing Bitcoin datasets?
A: BABD offers more complete address typing (13 vs. ≤7 types), systematic feature categorization, and reproducible graph construction methods.

Q2: Why use 4-hop subgraphs?
A: Testing showed this optimally balances feature richness with computational limits—smaller hops lose context, larger hops become intractable.

Q3: What practical applications does this research enable?
A: The framework aids in detecting illicit activities (ransomware, mixing services), analyzing exchange behaviors, and improving wallet security analytics.

Q4: How were the 148 features selected?
A: Through iterative testing—starting with basic transaction metrics, then adding combined and graph-derived features that improved model performance.

Q5: Can this methodology apply to other cryptocurrencies?
A: Yes, with adjustments for chain-specific characteristics (e.g., Ethereum's smart contracts would require additional feature types).