用于区块链数据分析的索引器
本教程将介绍使用“索引器”计算 Algorand 区块链指标的一些示例,包括客户端实例化、数据提取、数据处理和指标计算。
需要的工具和环境
必需
- Python >= 3.6
- 装有 python IDE。对于此分步教程,推荐使用 Jupyter notebook/ipython
- py-algorand-sdk
- 索引器:
索引器有两种使用方式:
1. 基础设施服务(例如 purestake、algorand-explorer 等):这是一种简单的解决方案,提供使用 sdk 读取索引器的渠道。
2. 安装索引器:本地安装索引器可以独立访问 Algorand 区块链数据,无需依赖外部供应商。由于需从创世区块赶上 Algorand 区块链当前进度,本地安装索引器可能需花费数天时间。
可选
- 虚拟机环境
背景
Algorand 区块链能以高吞吐量处理交易。通过解析区块来查找所需信息可能相当耗时。使用索引器可以方便快捷地访问 Algorand 区块链上记录的数据。
本教程中,我们将获取 2020 年 8 月第一周的所有交易数据,介绍数据提取过程,了解如何处理结果和应用某些指标。
步骤
1.环境设置
安装环境中的所有依赖项:
pip install pandas
pip install dateutil
pip install py-algorand-sdk
2.导入依赖与实例化
首先,我们导入本教程所需的全部依赖包。
import os
import json
from time import sleep
from datetime import timezone
from dateutil.rrule import HOURLY, rrule
from dateutil.parser import parse
import pandas as pd
from algosdk.v2client import indexer
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
现在,我们需要实例化索引器客户端。需提供 3 个字段:indexer_token、indexer_address 和 headers(可选)。
信息
如果使用基础设施服务,可在登录后看到此信息。例如,在 purestake.io 上:
提示
将 API 密钥保存为环境变量可以方便访问
indexer_token = os.environ.get('TOKEN')
myindexer = indexer.IndexerClient(indexer_token=indexer_token, indexer_address='https://mainnet-algorand.api.purestake.io/idx2')
3.获取交易
下面的函数中,我们抽取 start_time 到 end_time 期间区块链上记录的所有交易。
def get_txn_response(start_time, end_time):
"""
Returns all transactions added to the blockchain between 'start_time' and 'end_time'
"""
# The indexer expacting time inputs to be in RFC 3339 format
start_time = start_time.astimezone(timezone.utc).isoformat('T')
end_time = end_time.astimezone(timezone.utc).isoformat('T')
nexttoken = ""
numtx = 1
responses = []
# Retrieve up-to 1000 transactions at each request.
while numtx > 0:
response = myindexer.search_transactions(start_time=start_time, end_time=end_time,
next_page=nexttoken, limit=1000)
transactions = response['transactions']
responses += transactions
numtx = len(transactions)
if numtx > 0:
# pointer to the next chunk of requests
nexttoken = response['next-token']
return responses
注意
索引器供应商可能对请求的数量和持续时间设置一些限制。
本例中,我们将一周的数据划分成每 6 小时一组的数据块,避免超出请求处理时间限制。
尝试采用其他数据块大小
start = "20200801T000000"
end = "20200808T000000"
# get a list of dates between start date and end date with 6 hours intervals
dates = list(rrule(HOURLY,
dtstart=parse(start),
until=parse(end),
interval=6))
现在,我们可以迭代日期,组合所有数据了:
提示
如果内存有限,不妨单独保存每个数据块,使用并行计算库(如 spark、dask 等)进行计算。
responses = []
for start_time, end_time in zip(dates[:-1], dates[1:]):
responses += get_txn_response(start_time, end_time)
df = pd.DataFrame(responses)
df.head()
asset-transfer-transaction | close-rewards | closing-amount | confirmed-round | fee | first-valid | genesis-hash | genesis-id | id | intra-round-offset | last-valid | note | receiver-rewards | round-time | sender | sender-rewards | signature | tx-type | payment-transaction | group | lease | keyreg-transaction | asset-config-transaction | created-asset-index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {‘amount’: 31000, ‘asset-id’: 2512768, ‘close-… | 0 | 0 | 8195181 | 1000 | 8195123 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO… | 0 | 8196123 | gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW… | 0 | 1596254404 | SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH… | 0 | {‘sig’: ‘X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZ… | axfer | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | 0 | 0 | 8195181 | 1000 | 8195127 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN… | 1 | 8196127 | gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | {‘sig’: ‘P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQ… | pay | {‘amount’: 201000, ‘close-amount’: 0, ‘receive… | NaN | NaN | NaN | NaN | NaN |
2 | {‘amount’: 100, ‘asset-id’: 2513338, ‘close-am… | 0 | 0 | 8195181 | 1000 | 8195127 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB… | 2 | 8196127 | gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | {‘sig’: ‘jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1R… | axfer | NaN | NaN | NaN | NaN | NaN | NaN |
3 | {‘amount’: 0, ‘asset-id’: 2512768, ‘close-amou… | 0 | 0 | 8195181 | 1000 | 8195134 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS… | 3 | 8196134 | gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW… | 0 | 1596254404 | BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… | 0 | {‘sig’: ‘FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpe… | axfer | NaN | NaN | NaN | NaN | NaN | NaN |
4 | {‘amount’: 100, ‘asset-id’: 2513338, ‘close-am… | 0 | 0 | 8195181 | 1000 | 8195123 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | 2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW… | 4 | 8196123 | gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | {‘sig’: ‘3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4… | axfer | NaN | NaN | NaN | NaN | NaN | NaN |
4.预处理数据集
注意,上表中某些列具有类型为“dict”的字段。这导致对嵌套字段的计算复杂且困难。本步骤中,我们将此类字段解析成新列。
df = pd.concat([df.drop(['asset-transfer-transaction'], axis=1),
df['asset-transfer-transaction'].apply(pd.Series).add_suffix('-asset-txn').drop(['0-asset-txn'], axis=1, errors='ignore')], axis=1)
df = pd.concat([df.drop(['payment-transaction'], axis=1),
df['payment-transaction'].apply(pd.Series).add_suffix('-pay-txn').drop(['0-pay-txn'], axis=1, errors='ignore')], axis=1)
df = pd.concat([df.drop(['asset-config-transaction'], axis=1),
df['asset-config-transaction'].apply(pd.Series).add_suffix('-asst-cnfg-txn').drop(['0-asst-cnfg-txn'], axis=1, errors='ignore')], axis=1)
df = pd.concat([df.drop(['signature'], axis=1),
df['signature'].apply(pd.Series).add_suffix('-sig').drop(['0-sig'], axis=1, errors='ignore')], axis=1)
# format the unix seconds to human readble date format
df['date'] = pd.to_datetime(df['round-time'], unit='s').dt.date
df.head()
close-rewards | closing-amount | confirmed-round | fee | first-valid | genesis-hash | genesis-id | id | intra-round-offset | last-valid | note | receiver-rewards | round-time | sender | sender-rewards | tx-type | group | lease | keyreg-transaction | created-asset-index | amount-asset-txn | asset-id-asset-txn | close-amount-asset-txn | close-to-asset-txn | receiver-asset-txn | sender-asset-txn | amount-pay-txn | close-amount-pay-txn | close-remainder-to-pay-txn | receiver-pay-txn | asset-id-asst-cnfg-txn | params-asst-cnfg-txn | sig-sig | multisig-sig | logicsig-sig | date | day | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 8195181 | 1000 | 8195123 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO… | 0 | 8196123 | gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW… | 0 | 1596254404 | SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH… | 0 | axfer | NaN | NaN | NaN | NaN | 31000.0 | 2512768.0 | 0.0 | NaN | IWPA7EBIWKXLROZRFTNQKYKYCAMDSHJQL6QKEMQAO6DKXT… | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZlqXc+Zuba… | NaN | NaN | 2020-08-01 | Saturday |
1 | 0 | 0 | 8195181 | 1000 | 8195127 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN… | 1 | 8196127 | gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | pay | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 201000.0 | 0.0 | NaN | MGY5TCZEUVOINTRXXPEHFQ3OTBVPOQGMMPANECG2WHASVY… | NaN | NaN | P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQtf8VVvtwl… | NaN | NaN | 2020-08-01 | Saturday |
2 | 0 | 0 | 8195181 | 1000 | 8195127 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB… | 2 | 8196127 | gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | axfer | NaN | NaN | NaN | NaN | 100.0 | 2513338.0 | 0.0 | NaN | EBDDABI4F7IWCPLHMICNYKUCAEPCBFP5HXKUITS4SSSQKG… | NaN | NaN | NaN | NaN | NaN | NaN | NaN | jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1RYvJbfOWMQ… | NaN | NaN | 2020-08-01 | Saturday |
3 | 0 | 0 | 8195181 | 1000 | 8195134 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS… | 3 | 8196134 | gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW… | 0 | 1596254404 | BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… | 0 | axfer | NaN | NaN | NaN | NaN | 0.0 | 2512768.0 | 0.0 | NaN | BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ… | NaN | NaN | NaN | NaN | NaN | NaN | NaN | FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpeDassKN0pt… | NaN | NaN | 2020-08-01 | Saturday |
4 | 0 | 0 | 8195181 | 1000 | 8195123 | wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8= | mainnet-v1.0 | 2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW… | 4 | 8196123 | gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW… | 0 | 1596254404 | 62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV… | 0 | axfer | NaN | NaN | NaN | NaN | 100.0 | 2513338.0 | 0.0 | NaN | M2UEUR6EP3JSI2BGGPXEXUIOTTJMNL7A6VSPE357N2CBFA… | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4PpRrUFfnp… | NaN | NaN | 2020-08-01 | Saturday |
5.计算指标
现在有了经过处理的表,我们可以计算一些指标,洞察 Algorand 区块链的相关信息。
交易总数
len(df)
4687303
交易类型
df['tx-type'].value_counts()
axfer 3599820 pay 1087417 acfg 57 keyreg 9 Name: tx-type, dtype: int64
其中 axfer 是资产转移,pay 是支付,acfg 是资产配置,keyreg 是密钥注册。
有关交易类型的更多信息,可于此处获取。
这周转移的 algo 币总量
df['amount-pay-txn'].sum()
251299842396859.0
一周内各天的交易分布
df['day'] = pd.to_datetime(df['date']).dt.day_name()
df['day'].value_counts().plot()
转移的去重资产
df['asset-id-asset-txn'].nunique()
64
df['asset-id-asset-txn'].value_counts()
2512768.0 1699636 2513338.0 1237993 2513746.0 661888 312769.0 87 163650.0 25 438840.0 22 1942776.0 14 604.0 14 438839.0 12 ... 137489.0 1 Name: asset-id-asset-txn, dtype: int64
交易表提供资产 id,但不附带资产名称。在下一步里,我们抽取更多数据,获悉关于交易的更多信息。
6.抽取资产
类似交易抽取,我们创建一个函数来循环请求。
def get_txn_response():
nexttoken = ""
numtx = 1
responses = []
# Retrieve up-to 1000 assets at each request.
while numtx > 0:
response = myindexer.search_assets(next_page=nexttoken)
assets = response['assets']
responses += assets
numtx = len(assets)
if numtx > 0:
# pointer to the next chunk of requests
nexttoken = response['next-token']
return responses
assets = pd.DataFrame(get_txn_response())
assets.head()
index | params | |
---|---|---|
0 | 5 | {‘clawback’: ‘L226FSG3LTZR4V2MI5M4SDKJSF5HP2TQ… |
1 | 6 | {‘clawback’: ‘QYQYXRJ7FLQCRRGE3WH5CIAAL56LLMIN… |
2 | 235 | {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT… |
3 | 236 | {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT… |
4 | 237 | {‘clawback’: ‘QA75IQ76F6H2T55G65BY7BPLF5QNWSLT… |
正如上文中预处理所做的那样,我们解析带有 dict 字段的列:
assets = pd.concat([assets.drop(['params'], axis=1),
assets['params'].apply(pd.Series)], axis=1)
assets.head()
现在,我们将上面的资产值计数与资产名称组合在一起。
assets_counts = pd.DataFrame(df['asset-id-asset-txn'].value_counts()).join(assets[['index', 'name', 'unit-name']].set_index('index'))
assets_counts.head()
asset-id-asset-txn | name | unit-name | |
---|---|---|---|
2512768.0 | 1699636 | YouNow Pending Props | xPROPS |
2513338.0 | 1237993 | Camfrog Pending Props | xPROPS |
2513746.0 | 661888 | Paltalk Pending Props | xPROPS |
312769.0 | 87 | Tether USDt | USDt |
163650.0 | 25 | Asia Reserve Currency Coin | ARCC |
pd.DataFrame(assets_counts.groupby('unit-name')['asset-id-asset-txn'].sum()).nlargest(10, 'asset-id-asset-txn').plot()
总结
区块链上记录着大量信息,可从中找到很多极具吸引力的信息,而索引器是访问这些数据的利器。
深入了解如何使用索引器,积极分享您的见解