创建文章

We are looking for publications that demonstrate building dApps or smart contracts!
See the full list of Gitcoin bounties that are eligible for rewards.

Tutorial Thumbnail
中级 · 30分钟

用于区块链数据分析的索引器

本教程将介绍使用“索引器”计算 Algorand 区块链指标的一些示例,包括客户端实例化、数据提取、数据处理和指标计算。

需要的工具和环境

必需

  • Python >= 3.6
  • 装有 python IDE。对于此分步教程,推荐使用 Jupyter notebook/ipython
  • py-algorand-sdk
  • 索引器:
    索引器有两种使用方式:
    1. 基础设施服务(例如 purestake、algorand-explorer 等):这是一种简单的解决方案,提供使用 sdk 读取索引器的渠道。
    2. 安装索引器:本地安装索引器可以独立访问 Algorand 区块链数据,无需依赖外部供应商。由于需从创世区块赶上 Algorand 区块链当前进度,本地安装索引器可能需花费数天时间。

可选
- 虚拟机环境

背景

Algorand 区块链能以高吞吐量处理交易。通过解析区块来查找所需信息可能相当耗时。使用索引器可以方便快捷地访问 Algorand 区块链上记录的数据。

本教程中,我们将获取 2020 年 8 月第一周的所有交易数据,介绍数据提取过程,了解如何处理结果和应用某些指标。

步骤

1.环境设置

安装环境中的所有依赖项:

pip install pandas
pip install dateutil
pip install py-algorand-sdk

2.导入依赖与实例化

首先,我们导入本教程所需的全部依赖包。

import os
import json
from time import sleep
from datetime import timezone

from dateutil.rrule import HOURLY, rrule
from dateutil.parser import parse
import pandas as pd

from algosdk.v2client import indexer

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

现在,我们需要实例化索引器客户端。需提供 3 个字段:indexer_tokenindexer_addressheaders(可选)。

信息

如果使用基础设施服务,可在登录后看到此信息。例如,在 purestake.io 上:
EditorImages/2020/09/03 00:59/purestake.png

提示

将 API 密钥保存为环境变量可以方便访问

indexer_token = os.environ.get('TOKEN')
myindexer = indexer.IndexerClient(indexer_token=indexer_token, indexer_address='https://mainnet-algorand.api.purestake.io/idx2')

3.获取交易

下面的函数中,我们抽取 start_timeend_time 期间区块链上记录的所有交易。

def get_txn_response(start_time, end_time):
    """
    Returns all transactions added to the blockchain between 'start_time' and 'end_time'

    """
    # The indexer expacting time inputs to be in RFC 3339 format
    start_time = start_time.astimezone(timezone.utc).isoformat('T')
    end_time = end_time.astimezone(timezone.utc).isoformat('T')

    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 transactions at each request.
    while numtx > 0:
        response = myindexer.search_transactions(start_time=start_time, end_time=end_time,
                                                 next_page=nexttoken, limit=1000)
        transactions = response['transactions']
        responses += transactions
        numtx = len(transactions)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

注意

索引器供应商可能对请求的数量和持续时间设置一些限制。
本例中,我们将一周的数据划分成每 6 小时一组的数据块,避免超出请求处理时间限制。

尝试采用其他数据块大小

start = "20200801T000000"
end = "20200808T000000"

# get a list of dates between start date and end date with 6 hours intervals
dates = list(rrule(HOURLY,
                   dtstart=parse(start),
                   until=parse(end),
                   interval=6))

现在,我们可以迭代日期,组合所有数据了:

提示

如果内存有限,不妨单独保存每个数据块,使用并行计算库(如 spark、dask 等)进行计算。

responses = []
for start_time, end_time in zip(dates[:-1], dates[1:]):
    responses += get_txn_response(start_time, end_time)
df = pd.DataFrame(responses)
df.head() 

asset-transfer-transaction close-rewards closing-amount confirmed-round fee first-valid genesis-hash genesis-id id intra-round-offset last-valid note receiver-rewards round-time sender sender-rewards signature tx-type payment-transaction group lease keyreg-transaction asset-config-transaction created-asset-index
0 {‘amount’: 31000, ‘asset-id’: 2512768, ‘close-… 0 0 8195181 1000 8195123 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO… 0 8196123 gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW… 0 1596254404 SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH… 0 {‘sig’: ‘X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZ… axfer NaN NaN NaN NaN NaN NaN
1 NaN 0 0 8195181 1000 8195127 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN… 1 8196127 gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 {‘sig’: ‘P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQ… pay {‘amount’: 201000, ‘close-amount’: 0, ‘receive… NaN NaN NaN NaN NaN
2 {‘amount’: 100, ‘asset-id’: 2513338, ‘close-am… 0 0 8195181 1000 8195127 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB… 2 8196127 gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 {‘sig’: ‘jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1R… axfer NaN NaN NaN NaN NaN NaN
3 {‘amount’: 0, ‘asset-id’: 2512768, ‘close-amou… 0 0 8195181 1000 8195134 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS… 3 8196134 gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW… 0 1596254404 BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… 0 {‘sig’: ‘FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpe… axfer NaN NaN NaN NaN NaN NaN
4 {‘amount’: 100, ‘asset-id’: 2513338, ‘close-am… 0 0 8195181 1000 8195123 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW… 4 8196123 gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 {‘sig’: ‘3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4… axfer NaN NaN NaN NaN NaN NaN

4.预处理数据集

注意,上表中某些列具有类型为“dict”的字段。这导致对嵌套字段的计算复杂且困难。本步骤中,我们将此类字段解析成新列。

df = pd.concat([df.drop(['asset-transfer-transaction'], axis=1),
                df['asset-transfer-transaction'].apply(pd.Series).add_suffix('-asset-txn').drop(['0-asset-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['payment-transaction'], axis=1), 
                df['payment-transaction'].apply(pd.Series).add_suffix('-pay-txn').drop(['0-pay-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['asset-config-transaction'], axis=1),
                df['asset-config-transaction'].apply(pd.Series).add_suffix('-asst-cnfg-txn').drop(['0-asst-cnfg-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['signature'], axis=1), 
                df['signature'].apply(pd.Series).add_suffix('-sig').drop(['0-sig'], axis=1, errors='ignore')], axis=1)

# format the unix seconds to human readble date format
df['date'] = pd.to_datetime(df['round-time'], unit='s').dt.date

df.head()

close-rewards closing-amount confirmed-round fee first-valid genesis-hash genesis-id id intra-round-offset last-valid note receiver-rewards round-time sender sender-rewards tx-type group lease keyreg-transaction created-asset-index amount-asset-txn asset-id-asset-txn close-amount-asset-txn close-to-asset-txn receiver-asset-txn sender-asset-txn amount-pay-txn close-amount-pay-txn close-remainder-to-pay-txn receiver-pay-txn asset-id-asst-cnfg-txn params-asst-cnfg-txn sig-sig multisig-sig logicsig-sig date day
0 0 0 8195181 1000 8195123 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO… 0 8196123 gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW… 0 1596254404 SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH… 0 axfer NaN NaN NaN NaN 31000.0 2512768.0 0.0 NaN IWPA7EBIWKXLROZRFTNQKYKYCAMDSHJQL6QKEMQAO6DKXT… NaN NaN NaN NaN NaN NaN NaN X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZlqXc+Zuba… NaN NaN 2020-08-01 Saturday
1 0 0 8195181 1000 8195127 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN… 1 8196127 gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 pay NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 201000.0 0.0 NaN MGY5TCZEUVOINTRXXPEHFQ3OTBVPOQGMMPANECG2WHASVY… NaN NaN P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQtf8VVvtwl… NaN NaN 2020-08-01 Saturday
2 0 0 8195181 1000 8195127 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB… 2 8196127 gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 axfer NaN NaN NaN NaN 100.0 2513338.0 0.0 NaN EBDDABI4F7IWCPLHMICNYKUCAEPCBFP5HXKUITS4SSSQKG… NaN NaN NaN NaN NaN NaN NaN jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1RYvJbfOWMQ… NaN NaN 2020-08-01 Saturday
3 0 0 8195181 1000 8195134 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS… 3 8196134 gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW… 0 1596254404 BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… 0 axfer NaN NaN NaN NaN 0.0 2512768.0 0.0 NaN BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… NaN NaN NaN NaN NaN NaN NaN FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpeDassKN0pt… NaN NaN 2020-08-01 Saturday
4 0 0 8195181 1000 8195123 wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= mainnet-v1.0 2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW… 4 8196123 gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW… 0 1596254404 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… 0 axfer NaN NaN NaN NaN 100.0 2513338.0 0.0 NaN M2UEUR6EP3JSI2BGGPXEXUIOTTJMNL7A6VSPE357N2CBFA… NaN NaN NaN NaN NaN NaN NaN 3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4PpRrUFfnp… NaN NaN 2020-08-01 Saturday

5.计算指标

现在有了经过处理的表,我们可以计算一些指标,洞察 Algorand 区块链的相关信息。

交易总数

len(df)

4687303

交易类型

df['tx-type'].value_counts()

axfer 3599820 
pay 1087417 
acfg 57 
keyreg 9 
Name: tx-type, dtype: int64

其中 axfer 是资产转移,pay 是支付,acfg 是资产配置,keyreg 是密钥注册。
有关交易类型的更多信息,可于此处获取。

这周转移的 algo 币总量

df['amount-pay-txn'].sum()

251299842396859.0

一周内各天的交易分布

df['day'] = pd.to_datetime(df['date']).dt.day_name()
df['day'].value_counts().plot()

EditorImages/2020/09/03 01:42/days_plot.png

转移的去重资产

df['asset-id-asset-txn'].nunique()

64

df['asset-id-asset-txn'].value_counts()

2512768.0 1699636
2513338.0 1237993
2513746.0 661888
312769.0 87
163650.0 25
438840.0 22
1942776.0 14
604.0 14
438839.0 12
...
137489.0 1
Name: asset-id-asset-txn, dtype: int64

交易表提供资产 id,但不附带资产名称。在下一步里,我们抽取更多数据,获悉关于交易的更多信息。

6.抽取资产

类似交易抽取,我们创建一个函数来循环请求。

def get_txn_response():
    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 assets at each request.
    while numtx > 0:
        response = myindexer.search_assets(next_page=nexttoken)
        assets = response['assets']
        responses += assets
        numtx = len(assets)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

assets = pd.DataFrame(get_txn_response())
assets.head()

index params
0 5 {‘clawback’: ‘L226FSG3LTZR4V2MI5M4SDKJSF5HP2TQ…
1 6 {‘clawback’: ‘QYQYXRJ7FLQCRRGE3WH5CIAAL56LLMIN…
2 235 {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT…
3 236 {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT…
4 237 {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT…

正如上文中预处理所做的那样,我们解析带有 dict 字段的列:

assets = pd.concat([assets.drop(['params'], axis=1),
                    assets['params'].apply(pd.Series)], axis=1)
assets.head()

现在,我们将上面的资产值计数与资产名称组合在一起。

assets_counts = pd.DataFrame(df['asset-id-asset-txn'].value_counts()).join(assets[['index', 'name', 'unit-name']].set_index('index'))

assets_counts.head()

asset-id-asset-txn name unit-name
2512768.0 1699636 YouNow Pending Props xPROPS
2513338.0 1237993 Camfrog Pending Props xPROPS
2513746.0 661888 Paltalk Pending Props xPROPS
312769.0 87 Tether USDt USDt
163650.0 25 Asia Reserve Currency Coin ARCC

pd.DataFrame(assets_counts.groupby('unit-name')['asset-id-asset-txn'].sum()).nlargest(10, 'asset-id-asset-txn').plot()

EditorImages/2020/09/03 02:00/assets_plot.png

总结

区块链上记录着大量信息,可从中找到很多极具吸引力的信息,而索引器是访问这些数据的利器。

深入了解如何使用索引器,积极分享您的见解