TL;DR: Substreams and ClickHouse revolutionize blockchain data handling, as demonstrated by Pinax co-op students Maryam and Syed in their insightful bulk-loading project.
Handling and analyzing vast amounts of blockchain data is a challenge in a rapidly evolving digital landscape. Substreams and ClickHouse are two powerful technologies that promise to revolutionize the way we manage and analyze blockchain data.
In this post, we delve into an insightful video featuring Maryam and Syed, co-op students at Pinax, who walk us through their project of bulk-loading blockchain data into ClickHouse.
What is Substreams?
Substreams is a specialized indexing technology designed for blockchain data. It allows you to categorize and segment blockchain information, making it easier to handle and analyze.
With Substreams, developers can extract specific data types from blockchains, apply customized transformations, and send data to various destinations, known as sinks. This powerful tool is essential for effective blockchain data management and analysis.
Understanding ClickHouse
ClickHouse is an open-source columnar database management system optimized for real-time analytics. It’s renowned for its high-speed query performance, scalability, and robust support for handling large amounts of data. Its architecture, designed for analytical queries, makes it an ideal choice for processing and analyzing blockchain data.
To learn more about ClickHouse and its capabilities, refer to our Quickstart Guide: ClickHouse 101.
Syncing Substreams with ClickHouse
The Substreams SQL sink lets you stream outputs from Substreams modules into ClickHouse. It reduces manual configuration and setup time by specifying the database connection details and setting up the necessary tables within ClickHouse, allowing for more focus on data analysis.
For an in-depth look at how ClickHouse can simplify real-time blockchain analytics, check out Simplify Real-Time Blockchain Analytics with ClickHouse.
The data pipeline
The data pipeline consists of several stages:
- Extract: Firehose extracts blockchain data into flat files.
- Transform: Substreams modules transform Firehose blocks and cache data on a server.
- Load: Substreams SQL sink loads data from Substreams backend into ClickHouse.
- Query: Execute SQL queries on the ClickHouse database for analysis.
Project highlights
Maryam and Syed aimed to quickly insert large amounts of blockchain data into ClickHouse. After exploring various methods, they found the most efficient approach was to buffer data locally and send it as large inserts using the Substreams SQL sink. They achieved impressive throughput rates, demonstrating the scalability and efficiency of their approach.
Looking ahead
The team plans to optimize their methods further, focusing on chains like EOS, WAX, and Telos. They intend to expand to more chains and explore other blockchain tech stacks and APIs, such as ERC-20 tokens and inscriptions.
Getting started
For those interested in trying out these tools, the GitHub repository for the Substreams sink SQL tool provides an example setup. You’ll need an API key, but a free tier allows you to follow along with the example and run the Substreams sink SQL on your own data.
Bottom line
The combination of Substreams and ClickHouse offers a powerful solution for managing and analyzing blockchain data. Maryam and Syed’s project is proof of the potential of these technologies, and their work provides a valuable guide for others looking to leverage these tools. For more detailed insights, you can watch the full video above.
Stay tuned for more updates and advancements in the exciting field of blockchain data management!
💡 This article answers questions like:
- What is Substreams?
- What is ClickHouse used for?
- How can a developer use Substreams with ClickHouse?
- What is a Substreams sink?
No Comments