Substreams 101: A Novice’s Introduction

Substreams By Feb 28, 2024 2 Comments

Last Updated on March 6, 2024 by Pinax Team

Learn the fundamentals of Substreams, a robust blockchain data indexing solution.

TL;DR: For those new to Substreams, learn what it is and how it's making blockchain data more accessible for developers.

Are you looking for the best blockchain data indexing solution? Wondering how to efficiently extract and manage blockchain data?

Get started with Substreams!

This is the first in a series of articles that will take you from Substreams novice to master.

The problem with accessing blockchain data

Developers often find building data-centric applications challenging, especially when confronted with blockchain data. Extracting blockchain data is difficult and complex, and extracting it in a fast and reliable way is even more challenging due to the linear and distributed nature of blockchain.

Substreams is the solution

There aren’t many solutions out there that try to solve this problem, but StreamingFast, an expert in building tools for handling blockchain data, is stepping up. They’re using their new tech called Substreams to make it easier to process and index blockchain data quickly and reliably.

Let’s take a look at what Substreams is and how it’s making blockchain data more accessible.

What’s Substreams?

Substreams is a powerful blockchain data indexing technology built and developed for The Graph Network by StreamingFast. It empowers developers to extract data from the blockchain, apply custom transformations to meet their application’s unique needs, and effortlessly channel the processed data to a variety of destinations, such as PostgresSQL, ClickHouse, MongoDB, and many more.

How does Substreams work?

Substreams involves two main components: a Substreams provider and a Substreams package. Let’s take a closer look at each:

  • Substreams provider: The Substreams provider stores and delivers blockchain data. These providers (like Pinax) use Firehose, a blockchain-agnostic, high-performance data extraction engine developed by StreamingFast, to extract blockchain data efficiently.
  • Substreams package: A Substreams package is a small Rust program compiled into WebAssembly that defines the transformations the developer wants to apply to the data. The developer sends the Substreams package to the Substreams provider with a gRPC request, which then executes it and streams back the transformed data. Additionally, the developer can send the data to other destinations as needed.
At present, Substreams can only be built using Rust, but the StreamingFast team has plans to enable developers to build Substreams in Golang and TypeScript in the near future.

Three ways to use Substreams

Developers have different options for working with Substreams: they can consume pre-built Substreams or build their own:

  1. Consuming Substreams: The easiest way to utilize Substreams is by using the pre-built Substreams packages available on the Substreams Registry, a one-stop destination for discovering and sharing Substreams packages. You can select the package that meets your needs and seamlessly stream data to your preferred destination.
  2. Building Substreams: If you can’t find a suitable Substreams package in the Substreams Registry, you can create your own. Once developed, you can publish these packages to the registry, making them available for others to use.
  3. Extending Substreams: You can also leverage existing Substreams modules from the registry and build new Substreams modules on top of them, producing entirely new datasets. This approach allows for the customization and expansion of Substreams functionality to suit specific requirements.

This collaborative approach fosters a vibrant ecosystem where developers can contribute their solutions and benefit from the collective knowledge and innovation within the community.

Benefits of using Substreams

Substreams offers developers many advantages when indexing and querying blockchain data. Here are a few of them:

  • Speed: Substreams prioritizes speed with a parallelized architecture and streaming-first design, ensuring efficient blockchain data indexing.
  • Composability: Substreams provides composability, enabling developers to easily use each other’s code or modules for creating complex indexing pipelines.
  • Reusability: Substreams emphasizes reusability, enabling you to use pre-built Substreams available on the Substreams Registry for their indexing tasks.
  • Custom sinks: Substreams supports custom sinks, allowing seamless integration with your preferred data storage or analytics solutions.
  • Shift blockchain data indexing to a provider: Substreams allows you to offload the heavy lifting of blockchain indexing to a service provider like Pinax. A provider can scale on request and sink the data into a variety of databases, relieving you of the need to run expensive indexing nodes themselves.
  • Strong community support: Despite being a new technology, Substreams has attracted significant attention from developers, with the numbers steadily increasing. At Pinax, we have a Discord Community, in addition to the StreamingFast Discord Community, to support you if you’re looking to work with Substreams.

Learn & explore more


If you’re keen on delving further into Substreams, we’ve curated some resources to help you:

💡 This article answers questions like:
- What is Substreams?
- How does Substreams make it easy to extract blockchain data?
- What are the benefits of using Substreams?
- Where can a developer learn more about Substreams?
Author

I am a seasoned technical writer with a passion for simplifying complex tech topics, especially tech around blockchain and web3. Beyond writing, I also love coding as a hobby, with Rust being a particular favorite. This dual passion enables me to bridge the gap between technical intricacies and accessible content.

2 Comments

  1. louis says:

    This tech is set to shake up the blockchain space, perfect timing with that little run we’re living now. Subgraphs were already a game-changer, revolutionizing data querying, but now, with Substreams-powered subgraphs, we’ve taken a giant leap forward. Great article by the way Ujjwal.

  2. Bina says:

    Thanks for sharing.

Leave a comment

Your email address will not be published. Required fields are marked *