The Graph Indexer Office Hours #190

Events By Pinax Team Jan 17, 2025 No Comments

TL;DR: Key points introduced during the open discussion about GIP-0081: Indexing Payments:
- The motivation for this proposal is to design a developer-friendly way to get subgraphs indexed.
- Indexing agreements will stipulate the subgraph to be indexed and the price per unit of “work done” indexing.
- The proposal includes two clauses to protect the gateway: a maximum amount for initial syncing (higher cost) and another for ongoing indexing per epoch (lower cost).
- The system will be automated: indexers set minimum payment requirements for stored entities and blocked indexes per chain, and indexing agreements are automatically accepted/rejected based on these parameters.
- The Edge & Node team is actively seeking indexer input on operational costs to help determine fair pricing.

Opening remarks

Hello everyone, and welcome to episode 190 of Indexer Office Hours!

GRTiQ 203

Catch the GRTiQ Podcast with Justin Banon, founder of Boson Protocol. Boson Protocol is building decentralized commerce infrastructure that aims to replace traditional e-commerce intermediaries with minimally extractive protocols for web3.

⬇️ Skip straight to the open discussion ⬇️

Repo watch

The latest updates to important repositories

Execution Layer Clients

sfeth/fireeth: New release v2.8.4 :
- Date: 2025-01-10 15:47:36 UTC
- This release addresses critical issues affecting data handling and performance, including fixes for gzip compression detection and tier2 request generation that could impact job scheduling. A thread leak fix is also included, which improves connection reliability. Support for zstd encoding is added for improved data compression.
- Urgency indicator: Yellow
- Urgency reason: Important fixes improve system performance.
Arbitrum-nitro New release v3.3.2 :
- Date: 2025-01-08 04:29:35 UTC
- This release updates the Docker image to fix an issue with flatcalltracer. Users must adjust their entrypoint flags accordingly and use a specialized image if running a validator.
- Urgency indicator: Yellow
- Urgency reason: Fix for important functionality issue.

Protocol watch

The latest updates on important changes to the protocol

Forum Governance

Forum Research

Core dev updates:

Contracts Repository

docs: remove e2e badge from README.md #1086 (merged)
ci: fix e2e-contracts.yml #1067 (merged)
fix: cleanup IPaymentCollector and TAPCollector docs #1083 (merged)

Open discussion [6:18]

Indexing Payments GIP-0081

Matias, Protocol Engineer at Edge & Node, is here to introduce GIP-0081: Indexing Payments.

Some of the following content is taken from the presentation slides, with additional details added from Matias. Some of the comments have been lightly edited and condensed.

Matias | Edge & Node: GIP-0081: Indexing Payments is up in The Graph Forum. Please check it out if you’re interested.

What is GIP-0081?

GIP-0081 is a proposed mechanism by which a gateway can pay an indexer to serve a subgraph.

A gateway pays the indexer in GRT.
- This has been raised in earlier versions of this presentation as a concern so I want to be upfront with this as it is still being proposed.
Builds on top of Graph Horizon primitives.
- It requires Graph Horizon to be deployed. So, if the proposal were to be accepted, the delivery timeline for this proposal would be after Graph Horizon.

I mentioned serving a subgraph, and there’s a mechanism to coordinate that interaction. So, what is the current way of doing this from a subgraph developer’s perspective?

The Curation Way

Today, a subgraph developer needs to:

Research how much signal is required to get the subgraph indexed.
Source the GRT (from a grant, own capital, etc.).
Curate the subgraph with that GRT.
Monitor performance and adjust signal accordingly.
- There are dynamics that can mean that the same amount of signal will perform differently at different times, so it’s up to the subgraph developer to monitor this constantly and in perpetuity. This is a lot of complexity for them that we are trying to remove with this proposal.

Although the curation way that I’ve just described has some drawbacks, it is undeniably the underpinning of the whole indexing market as we know it today. So this is not going away; it is not being modified in any way, shape, or form by this proposal. GIP-0081 complements this in the following ways that I’ll explain now.

Motivation for GIP-0081

To come up with a developer-friendly way to get subgraphs indexed.
Characteristics of developer-friendly:
- Regular and predictable billing (OpEx instead CapEx). For example, developers would have a monthly payment they can understand instead of having capital parked in curation.
- Tunable performance (consumer choice): a choice in how performant their subgraph needs to be. Performance can be measured in latency, error rate, availability zones, and so on. We would like the mechanism to take those into account.
- Developers pay a gateway in fiat. Mainly this is a convenience thing because developers are already paying the Edge & Node gateway in fiat, so it’s simple for them to understand.
- Fully managed: developers don’t have to set up their own monitoring and adjust their own curation or amount of money allocated to serving a subgraph. This is something the gateway can bring onto itself and hopefully simplify the process by absorbing that complexity.
Complementary to curation + rewards: we would like to think of GIP-0081 as supplemental income for indexers and, at the same time, an alternative for subgraph developers.

Indexing vs. Query Time Concerns

The work required to index a subgraph is unknown in advance (neither the subgraph developer nor the indexer knows this).
Indexers can report the amount of “work done” after they’ve done the indexing.
Query performance can only be assessed after the service has been rendered.
Gateways can assess indexer performance at query time, so they are uniquely positioned to select indexers based on that.

Indexing Agreement (Simplified)

Underpinning this whole mechanism so that gateways can pay indexers to serve a subgraph is what we’re calling an indexing agreement. Its most simplified version has these items:

Stipulates which subgraph has to be indexed.
Sets a price per unit of “work done” indexing (proxy, compute unit, subgraph gas).

Clauses to protect the gateway:

Maximum amount payable for the initial indexing (GRT).
Maximum amount payable for the ongoing indexing per epoch (GRT).
- Since neither party knows at the time of the agreement how much indexing a subgraph will cost, we want to set an upper bound to both the initial indexing and the ongoing indexing per epoch. So the indexer knows if they cross the bound, they need to stop indexing since they can no longer expect payment beyond that. The gateway can protect their escrow balance or the amount they’re willing to pay to get a subgraph indexed to a certain limit.
- If the indexer were to hit the maximum amount, the gateway would have to pay out that amount and they won’t be able to run queries against that subgraph. But they are protected from indexing costs going to infinity.

How It Works (😀 Path)

Gateway escrows funds.
Gateway offers indexing agreements to indexers based on a selection algorithm (off-chain).
Indexer accepts indexing agreements (on-chain).
Indexer posts Proof of Indexing (POI) + declared “work done” to collect payment from escrow.
Gateway monitors the performance of each agreement and adjusts accordingly.

How It Works (😭 Paths)

Agreements can be rejected by indexers.
POIs and “work done” can be disputed.
Agreements can be canceled by either indexer or gateway.
- For example, an agreement may be canceled if the performance is not what is expected by the gateway.

Indexing Indexer Selection Algorithm (IISA)

This is meant to be an algorithm implemented by the gateway to select which indexers receive agreements for which subgraphs. There’s a lot of power in this dynamic, the gateway has a lot of control over this, and the proposal has a canonical implementation of this as part of it, although as it’s an off-chain component, it’s not strictly part of the GIP.

The IISA needs to:

Balance decentralization with performance.
- Decentralization as a network concern and performance as a data consumer concern.
- We want to make sure that the highest performing indexers get the most agreements but we also want to make sure that new indexers get to participate in the network.
Take into account:
- Quality of Service:
  - Latency
  - Uptime
  - Error rate
  - Etc.
- Available stake
- Other

The gateway, sitting in the middle of the relationship between data consumer and indexer, and aggregating a lot of the query volume, has a unique view into past performance by indexers and current performance in aggregate, and gateways can use those data points to best select indexers moving forward.

So far, I’ve been describing the on-chain part of the indexing payment mechanism, and because that’s tied to Graph Horizon, which is not coming out in Q1, we’re working towards a minimum viable product that doesn’t require new contracts to be deployed or any kind of new on-chain interaction so we can test a lot of these ideas.

Minimum Viable Product (MVP)

Requires update to the indexer stack, plus opt-in configuration (so the stack can accept indexing agreements).
Requires upgrade to the Edge & Node gateway.
Payment collection is done via query RAVs.
“Work done” proxy based on:
- Entities stored
- Blocks indexed
  - When reporting how much work an indexer did to index a subgraph, they will need to report how many entities that subgraph has in storage for that particular epoch and how many blocks they had to index within that epoch.
  - We’re working towards setting a price for those two items.
  - This proxy is only meant to be temporary.
Trust implications: Being off-chain means that the indexer will have to trust that the gateway will pay for the indexing agreements, and the gateway will have to trust that the indexer reports the correct amount of work done.
We will keep the MVP to a small set of users to test things out, and then slowly, as we gain confidence and secure it more, we’ll roll it out to a wider set of indexers. At the same time, we’ll work fast towards the on-chain indexing payments so that everyone can participate.

Timeline

MVP is on track to be released in late Q1/early Q2.
On-chain indexing agreements will quickly follow after Horizon is released (later this year).

Price Discovery

We are researching the price for the initial batch of indexing agreements.
We are looking for data on the costs associated with indexing a subgraph or a set of subgraphs in aggregate.
Please ping me if you’d be happy to have a brief conversation to help us make indexing payments a success.
matias@edgeandnode.com

Q&A [33:29]

Josh Kauffman | StreamingFast.io: Will there be a place for two amounts per epoch: one for during the syncing phase and one for steady state? (since syncing and just querying will have different expected per-epoch-costs)

Josh elaborated: My question is about the per epoch maximum cost that can be set. In my mind, there would be a very high cost on the indexer during the syncing phase and a much lower cost during the steady state after it’s been synced. So obviously, the per epoch maximum would have to be set by the consumer and the gateway so that it would be able to handle the syncing phase, but just to avoid any abuse, it makes sense in my mind to have a lower maximum during the steady state phase.

Matias: The indexing agreement has two separate parameters, two maximum amount payables: one is for the initial sync (on the slide, I mentioned initial indexing), but this is all the work done to get from the initial history in the subgraph until chain head, and that is correct, that’s going to be a lot higher than the ongoing maximum per epoch. This accounts for the fact that, yes, there’s a lot more work required to do the initial sync than to keep up with every new epoch.

Maybe this is a good time to cover some of the challenges with this particular approach. For example, if an agreement gets canceled, that means that a new one needs to be issued to another indexer, and that means that the gateway, and by extension, the data consumer, will be liable to pay for another batch of initial indexing, and as you pointed out, that’s probably going to be a lot more expensive than ongoing indexing per epoch. So we’re looking at ways of evolving even this version of the proposal to allow for minimizing the reasons why agreements would be canceled, and if they’re canceled, there are even some ideas to work towards minimizing that initial cost of indexing. We’re aware of the shortcomings and are hoping we’ll be able to evolve it fast enough to address those down the line.

stake-machine.eth: Is it kind of indexer tips that they get on top of current indexing rewards? Or complete replacement?

Matias: It’s definitely not a complete replacement, not at all. You could consider it extra supplemental income on top of indexing rewards if you’re already indexing that subgraph or if there’s already curation on that subgraph. But it might be the case where you start seeing some subgraphs that are not being curated, but instead, indexing agreements are being offered, and in that case, you would have no indexing rewards for that particular subgraph, but you would get income from indexing agreements if you were offered one and accepted. It really depends on whether the subgraph has been curated or not for you to collect income from the two sources or just from one.

Matthew Darwin | Pinax: I guess there is some sort of timeout as well? What if an indexer agrees to index a subgraph, then goes on vacation for 6 months and never finishes syncing.

Matias: Absolutely, Matthew. I’ve tried to keep it simple, and the full spec is not even part of the GIP as of today, but the idea is that we do cover those cases, and we have to allow for this to work for both gateways and indexers, so the indexer needs to be allowed enough time for the initial sync to happen and not be at risk of losing any payment for the work they’ve done up to that point. The gateway needs to have some certainty that the agreements that are open are actually being served; otherwise, it needs to go and shop for other agreements. I’d be hoping that a 6-month vacation wouldn’t happen, but the gateway can cancel the agreement if it’s not seeing the performance that it expects.

PaulieB: Will these performant data points be added to Graph Explorer for full transparency? It’s on GraphSeer, but unsure if consumers use that.

Matias: As far as I know, we’ve looked at the possibility of open-sourcing as much of the data that the gateway uses to make the decisions and the Indexing Indexer Selection Algorithm (IISA). I’m not sure if it will be added to Graph Explorer in particular or some other way or mechanism. The short answer is, I would expect so, yes. If this is being driven by an algorithm, it should be transparent, at least for the Edge & Node gateway.

NSun | Graphtronauts: Would be good info for delegators to see as well. 💯 help them support quality indexers.

Matthew Darwin | Pinax: Presumably, “work performed” could be reported regularly during the initial indexing phase to prevent abuse.

Matias: I think you’re talking about the case where the initial indexing takes a long time, so there are some subgraphs that take weeks to get indexed. I don’t think the GIP makes any particular stipulation at this point for those cases, and the gateway will need to deal with them.

Pierre | Chain-Insights.eth: Then we have to do agreements per gateway per subgraph? That makes no sense to me. How many subgraphs do we have??? 10,400

We should have in the protocol to calculate the CU (Computer Units) and adjust to the market (demands vs. providers(supply)).

Matias: Do we have to have an agreement per gateway per subgraph? Yes, and so the hope is that this is automated. The indexers set a minimum amount of GRT they need to be paid per stored entity and per blocked index per chain, so when the gateway offers an indexing agreement, it’s automatically accepted or rejected. The hope is that, in aggregate, those agreements are profitable. We are aware that in some particular agreements in a subset of agreements that you’ve been offered, some might be losing money, some might be making a lot more money, and the idea is that, in the aggregate, those turn out to be economical. The indexer selection algorithm prioritizes indexers that have picked up a diverse set of subgraphs because if we don’t have compute units, as you mentioned, whatever proxy we come up with is going to be insufficiently accurate to make every agreement profitable. We hope that if enough indexing agreements are offered per indexer, this will become an economical aggregate.

In short, the acceptance of agreements should be automatic based on parameters set by the indexer, and then, in the aggregate, the bag of agreements that has been offered to an indexer should be profitable enough.

Pierre | Chain-Insights.eth: COST MODEL that was tried before I think no one is using as the protocol is adjusting the price dynamically.

Matias: What I’ve been describing is the proxy for “work done,” and you could view it as a cost model. A market is definitely better, and we are working towards that, but we’re not there yet, so the idea is this being a stepping stone to a market because, as you mentioned, if we centralize the setting of the proxy in the Edge & Node gateway and the proxy is not super accurate, then obviously the price discovery, which is a function of that, will not be optimal. So we’d rather move away from that as fast as we can, but right now, it’s sort of the best we can do for the MVP. There’s another stream of work to come up with a subgraph gas experiment to see if we can deterministically understand how much work is actually required to index a subgraph and then, on top of that, build a market. But that is down the line and not part of this GIP.

Vince | Nodeify: Do we average these costs of indexing? The upper bound & bell curve will be extreme, i.e. Hetzner vs. AWS or Google Cloud, without putting self-hosted bare metal in the equation. There is a syncing / indexing cost, but there is also a requirement cost per network. (Ethereum 4 TB, Arbitrum One, 20 + TB)

Matias: Now we’re getting into the meat. This is interesting. We’d love to have this conversation with you or anyone that has this concern. We would love to make the initial pricing decision based on as much data as we can. We’re aware that there are big indexers that have a very different structure than tiny indexers and we want to make sure that the indexing agreement can support a wide range of indexers. That probably means that the indexing agreement will be more profitable for indexers that have economies of scale, but this is not about leaving the small indexers out. It is very much about including them. So the more data we have, the better we’ll be able to make that decision. It’s a balancing act, but we definitely want to make sure that the smaller indexers and the new up-and-coming indexers are able to participate in the system.

Matthew Darwin | Pinax: I think indexers should share (off-chain) “amount of space” each network takes and then the gateway can use that for pricing, different per chain?

Matias: There are RPC costs, each subgraph also has a very different amount of stored entities to gigabyte ratio. There are a lot of moving pieces. It’d be great to get that data, for sure. It’d also be great to understand beyond the amount of space for each network, like how much you’re paying for that space because running on bare metal is different from Google Cloud or whatever. We would love to take all that into account.

Matthew Darwin | Pinax: But also “blocks indexed” can probably cover the size…. the gateway knows the StartBlock vs. CurrentBlock so can include that in the price.

Matias: I’m not sure if you’re referring to the size… which size in particular?

Matthew: I just meant from my previous comment about indexers sharing size or amount of space a network takes, really if a subgraph has a start block and the current block, when the gateway sends the price, the amount of blocks that need to be indexed is a kind of indicator of how much space the indexer needs to run the RPC node. Because you know, if you asked to index 300 million blocks in Arbitrum, that’s a very different number than 18 million in Ethereum. I would expect the gateway to pay a lot more for the Ethereum example.

Matias: To accommodate for that, there’s a different price per block per chain. The proxy has taken that into account. Again, it won’t be perfect, but we hope to make it work.

Vince | Nodeify: I hope someday we can provide archive vs. full node. The number of indexers available would greatly increase, and then we can charge full vs. archive needed, which will also drop the price for customers, as many don’t need archive support.

Matias: I guess you’re referring to subgraphs that require time traveling. I don’t have a comment on that in relation to this proposal, but I think there is room for improvement there because there’s really no way to tell if consumers will use or how many consumers are actually using the time-traveling feature.

Comments from the chat:

Matthew Darwin | Pinax: I think it is good enough to start with. Vince | Nodeify, traces vs. not traces support.

Slimchance: Archive nodes are not used for time traveling but for Ethereum calls. Subgraphs doing eth_calls require an archive node even if there are no time traveling queries being made.

Vince | Nodeify: It’s a very small amount, most use recent data.

Slimchance: Time traveling is related to pruning – and a similar question could be asked about this.

Matthew Darwin | Pinax: The metadata for the subgraph should tell you upfront if you need archive node vs. full node.

Pierre | Chain-Insights.eth: Sorry, I’m against this proposal at the moment. It’s not profitable to index subgraphs, adding time and resources for the indexer to manage these agreements. It should be automated. It makes no sense for a 2 billion market cap crypto project and it’s not aligned with the Knowledge Graph proposal.

Matias: Welcome the feedback. I’m not sure if there’s a concern there that I can address in particular but I’d be happy to have a chat.

Abel | GraphOps: Pierre, do you want to add some additional context either in the chat or by voice?

Pierre elaborated: I know you do your best to make a proposal, but I see many changes in software updates, and we don’t even have the Graph Node new version debugged. Now, there’s a proposal for The Graph knowledge that’s not only Arbitrum or Ethereum; it’s going to be a knowledge graph. There are a lot of proposals, but I have the sense that nobody’s talking to each other to align their proposals with each other. As an indexer, I’m a new indexer and not an experienced one, but I see that we already have to look into which [subgraph] we’re going to index, and it doesn’t take weeks—I have some subgraphs that I’m indexing for three months. It takes a long time and a lot of resources, and the rewards are not sustainable right now. Of course, if I have some delegators who will put in a lot of money, then it will become maybe somehow profitable. Even Yaniv said that the design of The Graph protocol has some lacks, but we are trying to make something over something that has some lacks. Fix the lacks, then you can make. […] I think the alignment is not as I expect it to be.

I see you want to solve some problems and have many gateways, but then it becomes a full-time job. Is it going to be rewarding enough to be sustainable? I don’t see it because how much more GRT are we going to receive for doing all these agreements and management manually? The first phase of this proposal should put more work into automation. […]

Matias: I understand now that your concern is around automation. The expectation here is that the issuing and collection of agreements are automated, so the indexer just needs to understand and set a price for how much they’re willing to be rewarded per entity stored and per blocked index in every chain. Then every agreement that comes across that matches those parameters will be automatically accepted by the indexer stack. So if we get that right, if the proxy is good enough, this should be opt-in and automatic from the indexer’s perspective. The hope is that if we get it slightly wrong, it’ll be, in aggregate, profitable for the indexer. This is meant to be automatic from day zero, and we can work towards making it more precise and fair in terms of profitability for everyone involved.

Pierre: Okay, thank you. The only thing I want you to be aware of is if you put a minimum, there’s always going to be an indexer who is going to charge less, and it’s going to win more because of many reasons, so it’s going to be a race to minimums and low price, and at the end, the big indexers will say I’m done with this, and I’m going somewhere else. […]

Matias: What I’ve described as part of the MVP and probably the first iteration of the on-chain, the price is being set by the gateway and is one price for every indexer. So it’s really up to the indexer to accept or reject, but it’s not like they can lower the price. How the gateway sets the price or exactly which price the gateway could set is something that we are working towards and that’s why I’ve asked for data and conversation. The idea is that the gateway sets the price to a point where even newcomers and small indexers are profitable so that we can encourage wide participation in this part of the protocol and make sure this can be as decentralized as it can be.

Rem | Edge & Node posted: We will be working towards automation, and it might be not all indexers participate in the first MVP version. Longer term, our plans are to decrease the amount of effort required in picking subgraphs by indexers, relative to what we have now.

Vince | Nodeify posted: If anything, it will be the big ones that put small ones out of business with predatory pricing because they can afford it until they are gone.

MoonBoi: ⁠📢︱indexers-announcements

If any indexers would like to get in contact with us to provide us with more insight on their costs for their business operations, then we are happy to receive your comments. We are working to make sure this proposal is sustainable and part of that work is ensuring that the payment that indexers can receive reflects the cost of the work they have provided.

Linked above is our Discord announcement from last week.

Matias: If you have other concerns, you can always reach out to me later or leave a comment in the forum so that everyone can benefit from it. I would be very grateful if indexers reach out to us directly (matias@edgeandnode.com) to help us with pricing research, which is crucial to get right to ensure broad participation.

blockchain indexing, indexers, subgraphs, The Graph

The Graph Indexer Office Hours #190

Opening remarks

GRTiQ 203

Repo watch

Execution Layer Clients

Protocol watch

Forum Governance

Forum Research

Contracts Repository

Open discussion [6:18]

Indexing Payments GIP-0081

What is GIP-0081?

The Curation Way

Motivation for GIP-0081

Indexing vs. Query Time Concerns

Indexing Agreement (Simplified)

How It Works (😀 Path)

How It Works (😭 Paths)

Indexing Indexer Selection Algorithm (IISA)

Minimum Viable Product (MVP)

Timeline

Price Discovery

Q&A [33:29]

Pinax Team

Blockchain Data Analysis Made Easy with Pinax Datasets and Snowflake

Mo Networks, Mo Solutions: The Power of The Graph Networks Registry

No Comments

Leave a comment Cancel reply

The Graph Indexer Office Hours #190

Opening remarks

GRTiQ 203

Repo watch

Execution Layer Clients

Protocol watch

Contracts Repository

Open discussion [6:18]

What is GIP-0081?

The Curation Way

Motivation for GIP-0081

Indexing vs. Query Time Concerns

Indexing Agreement (Simplified)

How It Works (😀 Path)

How It Works (😭 Paths)

Indexing Indexer Selection Algorithm (IISA)

Minimum Viable Product (MVP)

Timeline

Price Discovery

Q&A [33:29]

You may also like

No Comments

Leave a comment Cancel reply