Learn more about the new data services coming to the network.
TL;DR: Learn more about each of the new data services that The Graph plans to launch on the network, including advanced data streaming services like Firehose and Substreams, Large Language Models (LLMs), new query languages, and verifiable data.
One of the objectives of The Graph’s New Era roadmap is to introduce new data services on the network. But what are these new data services and what benefits will they bring for web3 developers and indexers?
Before we get into the details of each data service, let’s first understand what’s happening now. Right now, subgraphs, facilitated by indexers, are the only data service on the network. As more data services come on the network, they will bring new opportunities for indexers and data consumers. This expansion will bring many new ways to access and analyze blockchain data.
Join us for a detailed look at the World of Data Services as we explain each service and the data needs it serves.
(Image at right from The Graph blog)
A rich market of data services
What is Firehose?
Firehose is a tool developed by StreamingFast that extracts data from Firehose-enabled blockchain nodes and stores it in flat files. Flat files provide a streamlined and efficient way to handle data, thanks to their simple structure, which facilitates easy parsing, storage, and quick access for real-time data processing. Firehose delivers data with unprecedented speed, efficiency, and scale.
Why Firehose?
Firehose serves the need for real-time data streaming and processing, offering a straightforward, scalable solution for handling large volumes of blockchain data. This infrastructure is designed for the heavy demands of modern applications, facilitating a flow of data that’s both rich in content and robust in delivery.
What’s an example?
Firehose can be used for projects like large-scale DeFi platforms, where efficient and rapid processing of vast amounts of blockchain transaction data is crucial for real-time analytics and decision-making.
For more detailed information on Firehose, check out our playlist, Deep Dives with Pinax’s Matthew Darwin.
What is Substreams?
Substreams (also developed by StreamingFast) is an integral component of a comprehensive data pipeline, specifically designed to build upon and extend the capabilities of Firehose. Essentially, you can envision Substreams as a subsequent layer that comes into play only after Firehose has already collected the data.
Substreams serves as a sophisticated conduit that transforms or filters this incoming data from Firehose, refining the information flow for further use. Once processed, this refined data becomes primed for loading and querying within a subgraph, database, or data service, ensuring that users can interact with it more meaningfully and efficiently.
Why Substreams?
Substreams are parallelizable and highly composable.
- Parallelization ensures that large volumes of data can be processed quickly and efficiently, which helps significantly speed up handling of blockchain data processing.
- Composable means they can handle multiple data processing tasks simultaneously and be easily combined with other Substreams modules, which can help create versatile data workflows.
You can leverage Substreams without a heavy infrastructure investment by using a provider like Pinax. Then you can have all the advantages of the technology without any of the hassle, as we maintain the infrastructure required to process blockchain history with hundreds of CPU cores in parallel.
What’s an example?
You can use Substreams to index your subgraph much faster, making it a Substreams-powered subgraph. The StreamingFast team announced they were able to extract the entirety of the Uniswap V3 data and send it to a subgraph in just 20 hours, which previously took two months.
For more, check out The Graph’s blog post, How The Graph Powers Dapps with Subgraphs, Firehose, & Substreams.
For practical examples of how Substreams can be leveraged, check out our Substreams playlist.
💡 If you’re following The Graph, you’ve probably already heard about Firehose and Substreams being key technologies. We’ve had Substreams-powered subgraphs on the network for some time. But what does it mean to bring Firehose and Substreams to the network as data services? Rather than just being something indexers use to make their subgraphs more efficient, data consumers will be able to interact with these directly as services.
What are new query languages?
A query language is a type of computer language used to make queries or requests to retrieve information from databases and information systems. It allows you to manipulate, store, and modify data stored in a database. The most common example of query language is SQL (Structured Query Language).
How are they used?
The Graph uses another query language, GraphQL, that enables you to retrieve specific data from the blockchain. You can write precise queries to access indexed data from various blockchains, making it easier to run and build decentralized applications (dApps).
Many existing tools are also compatible with SQL, so data analysts can access and query the data they need without deviating from their familiar workflows, allowing them to stick with SQL for their tasks.
What’s an example?
Semiotic Labs and StreamingFast are collaborating to create a SQL manifest for The Graph protocol. This manifest will guide indexers on synchronization and SQL service specifics, including hardware, software, and data transformation requirements. This initiative, including a standard API for queries, will enable developers to efficiently build applications using blockchain data directly on The Graph.
For more information on this SQL data service, refer to the Semiotic Labs update on The Graph Forum.
What are LLMs?
Large Language Models (LLMs) like ChatGPT are revolutionizing the way we access and analyze data. These deep learning models process and comprehend vast amounts of data, including text from books and articles to learn and effectively use human language for different tasks.
What do they do?
Large Language Models (LLMs) are not just about interpreting and generating human-like text; they are also powerful tools for data interaction. For instance, in the context of The Graph, LLMs could transform natural language queries into structured database queries. This means you could ask, “What were the top 5 tokens Vitalik sold in 2024?” and the LLM would translate this into a suitable SQL command.
Instead of directly interfacing with The Graph, the LLM would serve as an intermediary that deciphers user intents and formulates precise database queries to retrieve the exact information needed, complete with visual representations like bar charts.
What’s an example?
Sam Green and his team at Semiotic Labs have been working on a project called Agentc, which brings SQL/analytics queries to The Graph. As shown in some of the demos, Agentc can perform a DEX swap just by typing in the chat. Enter the command, “Average price of ETH last week,” and the tech performs the query behind the scenes and displays charts. This is similar to ChatGPT, but for blockchain data. You can sign up for the waiting list for Agentc.
What are files?
The Graph’s ecosystem, including the innovative File Hosting Service (FHS), demonstrates its commitment to a comprehensive data management approach. FHS, a part of The Graph Network’s World of Data Services, revolutionizes decentralized, peer-to-peer file sharing with IPFS integration and HTTP2 protocol for secure, efficient transfers.
Why FHS?
FHS is designed for everyone from blockchain indexers to general users interested in decentralized file sharing. For more details on its features, scalability, and user-friendly interface, check out the GraphOps GitHub repo. This holistic approach ensures robust, scalable, and secure file services within our dynamic environment.
Check out our recap of Indexer Office Hours #140 for updates on FHS.
What’s an example?
Blockchain and Firehose operators often need to exchange terabytes of blockchain history files. Currently, this process depends on the goodwill of peers to share these files. Integrating FHS into The Graph introduces an incentive for peer-to-peer (P2P) file sharing, making files more accessible and streamlining the exchange process.
What is Verifiable Firehose?
Transparency and trust are the cornerstones of decentralized services. The Verifiable Firehose builds upon the standard Firehose by adding an extra layer of data verification by using Product Tree Queries and Optimistically Verifiable Commitment Protocol.
What does it do?
The Verifiable Firehose strengthens Firehose by allowing users to independently verify the authenticity and integrity of the data they receive. This assurance layer is crucial in a trust-based system, as it enables independent data verification, diminishing the need for third-party validation of blockchain data accuracy and integrity.
What’s an example?
While working on The Graph protocol, the Semiotic Labs team unexpectedly found that their technology could also solve a big challenge posed by Ethereum’s EIP-4444, a discovery they had not set out to make. Read the full research post from Semiotic Labs on Ethereum’s forum. You can also check out Pinax’s blog post, Enhancing Ethereum: The Graph’s Answer to EIP-4444.
What are non-deterministic data sources?
Deterministic systems produce consistent, predictable outcomes from the same inputs, while non-deterministic systems can yield varied results even with identical starting conditions. In the current state, The Graph is using mostly deterministic data sources to get predictable and repeatable query results for users.
How will they be used?
The Graph will be expanding its capabilities to encompass non-deterministic data sources, venturing beyond the realm of strict causality and deterministic systems. This initiative opens up new avenues for analysis and insights by leveraging data from a variety of sources that are not bound by the constraints of blockchain technology.
What are the possible applications?
One example is Farcaster, a social network that operates independently of on-chain mechanisms.
The Graph’s reach will extend to data available through conventional web protocols, like HTTP. This means that any information accessible via standard web pages, regardless of its hosting network, can be integrated and analyzed. Unlike data on IPFS, which is identifiable by specific hashes, The Graph could interact with data stored in various formats across different networks, even if they don’t adhere to the hash-based identification system typical of blockchain technologies.
What are other data services?
The Graph’s expansion includes a World of Data Services feature, enabling developers to seamlessly add and integrate a variety of new data services into the network. This initiative paves the way for a more dynamic and versatile ecosystem, catering to the evolving needs of data consumption in web3.
In a recent Forum post, Pablo Vélez introduced more in-depth details of Graph Horizon. In this post, he outlines a visionary future for The Graph, detailing how it will enable the permissionless addition of new types of data services.
What are chain integrations?
The Chain Integration Process (CIP) is a community-driven method that simplifies and accelerates the addition of new blockchains to the network. The process allows anyone in the community to propose new chains to integrate.
What do they achieve?
The goal is to bring more chains to The Graph ecosystem. Integrating multiple blockchains is no small feat, but we believe the future is multi-chain. Chain integrations are crucial in creating a unified and interoperable ecosystem where data from disparate sources can be accessed harmoniously.
How do I add a chain?
You can add your chain by following the integration process. Pinax can help new chains integrate, as outlined in our article, Navigating the Chain Integration Process.
Learn more about the multi-chain expansion process on The Graph’s blog and visit the New Chain Integrations channel on The Graph Forum to view projects as they start the process.
What is multi-chain Firehose integration?
The culmination of The Graph’s efforts is the multi-chain Firehose integration. By unifying data services across chains, they’re setting the stage for a future where developers are not limited by the boundaries of any single blockchain, but can instead draw from a diverse palette of data sources.
How do I add a chain?
Currently, Pinax offers Firehose endpoints for all chains listed on our website. To find out how to add your chain, contact us.
Stay tuned
The Graph is equipping developers and indexers with vital tools for next-gen decentralized apps. Each step forward shapes the industry’s future. Watch for updates on new data services on the New Era roadmap.
What data service excites you the most? Leave us a comment to let us know.
Together, we’ll forge a path into a future where data is not just available but is also meaningful, verifiable, and, most importantly, within your control.
💡 This article answers questions like:
- What new data services are coming to The Graph Network?
- What is Firehose?
- What is Substreams?
- What are Large Language Models (LLMs)?
- What is AI-assisted querying?
It’s like The Graph has exploded and been reborn! I’m excited for the upcoming months and look forward to the implementation.