The Graph Indexer Office Hours #184

Events By Nov 22, 2024 No Comments
TL;DR: The Graph's TAP (Timeline Aggregation Protocol) migration deadline is December 3, 2024, with about 34% of indexers already upgraded, representing 81.6% of query volume. The Q&A discussion focused on configuration settings for TAP, particularly around RAV (Receipt Aggregate Voucher) requests and managing unaggregated fees, with experts recommending starting with default values and adjusting based on query volume.

Opening remarks

Hello everyone, and welcome to episode 184 of Indexer Office Hours!

GRTiQ 195

Catch the GRTiQ Podcast with Jeroen Develter, Chief Operating Officer at Persistence Labs. Jeroen’s journey into web3 spans traditional finance, consulting, and an entrepreneurial leap into the world of crypto and blockchain.

⬇️ Skip straight to the open discussion ⬇️

Repo watch

The latest updates to important repositories

Execution Layer Clients

  • Geth New release v1.4.12 :
    • Date: 2024-11-19 13:53:28 UTC
    • This release removes the deprecated personal RPC namespace and –unlock command, indicating significant changes in key management. It also includes optimizations, database improvements, and various bug fixes, with breaking changes for tracer configurations.
    • Urgency indicator: Yellow
    • Urgency reason: Key management changes require adaptation by users.
  • Reth New release v1.1.2
    • Date: 2024-11-19 16:27:10 UTC
    • This release includes performance improvements and bug fixes, with a notable readiness for the Holocene hardfork. The changes specifically optimize payload job fetching and resolve pending transaction ordering, making it critical for OP-reth users to update promptly.
    • Urgency indicator: Red
    • Urgency reason: Hardfork readiness for OP-reth users.
  • sfeth/fireeth: New release v2.8.0 :
    • Date: 2024-11-14 14:55:01 UTC
    • A nil safety check has been integrated into the CombinedFilter, enhancing stability during transaction processing. Additionally, updates to substreams and dmetering include improvements to metering functionality.
    • Urgency indicator: Yellow
    • Urgency reason: Improves stability, not an immediate threat.
  • Avalanche: New release v1.11.13 :
    • Date: 2024-11-18 22:24:03 UTC
    • This release introduces new APIs for the platform, alongside significant fixes for RPCChainVM metrics initialization and compatibility improvements. Operators are encouraged to update to the latest plugin version for compatibility and stability enhancements.
    • Urgency indicator: Yellow
    • Urgency reason: Important fixes and new functions require timely attention.

Graph Stack

  • Indexer Service & Agent (TS): New release v0.21.8 :
    • Date: 2024-11-18 19:23:13 UTC
    • This release includes minor updates to improve the user interface by toning down subgraph height violation messages and a fix for the polling interval argument in the indexing agent. No critical changes affecting system performance or security have been noted.
    • Urgency indicator: Green
    • Urgency reason: Low-impact changes, no critical issues.
  • Indexer Service & Tap Agent (RS): New release indexer-tap-agent-v1.7.3:
    • Date: 2024-11-13 19:04:13 UTC
    • Version 1.7.3 addresses a bug by removing the escrow adapter check, which may affect transaction processing. All operators should review this change to ensure seamless functionality.
    • Urgency indicator: Yellow
    • Urgency reason: Important fix, not immediately critical, but recommended to update soon.

Protocol watch

The latest updates on important changes to the protocol

Forum Governance

Contracts Repository

Open discussion [9:02]

TAP Migration Q&A

Today’s discussion is a TAP migration Q&A with Ana from GraphOps and Gustavo from Semiotic Labs. TAP is the new micro-payments system for queries on The Graph.

🚨 Indexers need to migrate to TAP by December 3, 2024.

The session was guided by Ana’s IOH TAP Migration Q&A notes, which have been copied into this section along with some context from the discussion. Comments have been lightly edited and condensed.

Ana | GraphOps: Let’s start with what is TAP?

Introduction

Overview of TAP (Timeline Aggregation Protocol):

  • Replaces the older Scalar payment system.
  • Has key features like efficient micro-payments, reduced on-chain transactions, and indexer control over receipts.
  • Enables decentralized gateways by allowing indexers to accept queries from multiple gateways.

What do indexers need to do?

Pause for Questions [13:25]

Mickey | The Graph | E&N posted: What happens if an indexer misses the deadline? The gateway will not route any new queries to them until they upgrade or…?

  • Gustavo | Semiotic Labs: The gateway currently serves receipts for both Scalar and TAP. Once we reach the deadline, the gateway will only support TAP receipts. If you have a version less than 1.0, you won’t receive queries anymore.
  • Ana: Thank you, that makes a lot of sense. So, if you want to continue getting queries, you need to migrate.

Mickey | The Graph | E&N: Any way to tell what % of indexers have already moved to indexer-rs/tap?

  • Ana: One way to tell if an indexer has migrated to TAP is if they have a badge on their indexer profile in GraphSeer that says TAP READY.
  • Gustavo: At Semiotic, we are tracking this number closely, checking if indexers are running versions greater than 1.0 or not. Out of 88 indexers that we are tracking, 30 are running 1.0 or above, which is about 34% of indexers.
  • We’re also tracking the number of queries that each of them served in the last month, so we are currently at 81.6% of queries running through TAP.
  • Ana: I think what you’re saying is the indexers with the highest query volumes have already upgraded.
  • Gustavo: Yes. Of the top ten indexers, nine have upgraded. Actually, of the top 14 indexers, 13 have upgraded.

Mickey | The Graph | E&N: My impression is that a lot of indexers have had trouble upgrading – is that impression correct, and if so, is the Dec. 3 deadline realistic to get everyone upgraded by then? (no shade, just curious if this is stable enough atm)

  • Gustavo: I think that we are on track to have most of the queries [migrated]. Migrating 100% of indexers by the December 3 deadline is kind of difficult, but I think that we can easily get up to 95%. We are here to give you support so we can try to get to 100%.

Mickey | The Graph | E&N: Why 88 [indexers]? Are those just the ones that actually serve queries?

  • Gustavo: We are using the Quality of Service (QoS) subgraph to check for all active indexers, so basically, if they served one query in the last month, then they’re on that list of 88. We mostly care about queries, and if indexers who are serving queries are using the newest version, we are happy with that.

paka | E&N posted: I’m also confident on getting most query traffic to TAP.

Mickey | The Graph | E&N: Ok, got it! Good strategy.

Mickey | The Graph | E&N: Are you reaching out individually to the laggards who serve queries and have not upgraded?

  • Gustavo: We are reaching out to indexers we have contact with. We have a list, and we’re reaching out to each, starting with the biggest ones. Currently, the two biggest that haven’t upgraded yet are Upgrade Indexer [Edge & Node] and Pinax. We are working closely with Edge & Node so they can have support to upgrade to a version greater than 1.0. We’re going from the most served queries to the least served queries, one by one, as long as we have contact with that indexer.

Ana: Indexers are welcome to tag any indexer who has already migrated for help. If you have questions, don’t hesitate to reach out.

Gustavo | Semiotic Labs: If anyone has any questions, I'm answering them in The Graph Discord in ⁠📁︱indexers and ⁠📁︱indexer-software channels.

Common Issues and Troubleshooting [21:21]

  • Unaggregated Fees and RAV Requests:
    • Issue: Indexers might see their unaggregated fees plateau, especially if the rav_request_trigger_value is set too low.
    • Explanation: The rav_request_trigger_value determines when the indexer requests a Receipt Aggregate Voucher (RAV) from a sender (you can think of a sender as a gateway). If this value is too low, the trigger might not be met frequently enough, causing fees to accumulate without being aggregated.
    • Solution: Update your rav_request_trigger_value, which is calculated as max_amount_willing_to_lose_grt / trigger_value_divisor.

Ana: There is a dashboard; I have it up here. In the unaggregated panel, you can see the trigger value. In my case, the RAV trigger value is three. The way this works is that as new receipts are received, their values are added to the unaggregated fees and the system constantly checks if the total unaggregated fees exceed the trigger value.

When a trigger condition is met, a RAV request is created by checking the timestamp buffer to determine how far back in time to consider receipts, and it will limit the number of receipts based on a RAV request receipt limit.

If we quickly look at the configuration, there are a few things that are important here:

  • max_amount_willing_to_lose_GRT
  • trigger_value_divisor

These two determine the trigger value.

  • timestamp_buffer_secs, which is how far back in time to consider receipts
  • max_receipts_per_request, so a RAV can contain a maximum of 10,000 receipts

If any receipts are invalid, they are stored separately in the Scaler TAP receipts invalid table in the indexer metadata database. A receipt is invalid when the max_receipt_value_GRT is higher than this value.

Configuration: maximal-config-example-toml

Gustavo: I just want to talk about what’s going on here and why we have lots of configurations. For most indexers, these configurations won’t be a problem because they don’t have a lot of query volume.

If you have too many allocations or too many queries per second, it can get a bit difficult, and that’s why you need to update and configure those things. Usually I recommend you try the default values first and see. The current default values are:

  • max_amount_willing_to_lose_GRT = 20
  • trigger_value_divisor = 10
    • This means that every time the total amount of receipts that you haven’t yet aggregated (a.k.a. unaggregated fees) gets to 2 GRT, we try to aggregate. But if that value is constantly above 2, then you need to change the values.
    • Try to keep max_amount_willing_to_lose_GRT as low as possible. But if a RAV request fails and you receive a lot of receipts at that moment because you’re a big indexer, then you will block the sender really quickly.
  • timestamp_buffer_secs = 60
  • request_timeout_secs = 5
  • max_receipts_per_request = 10000

The top 15–20 indexers will need to update those values.

Pause for Questions [29:59]

Pierre | Chain-Insights.io: What would be my ideal values? Default is still good for me? I don’t seem to have RAV requests.

Gustavo: Let me check… trigger value total unaggregated fees. So, first of all, there have been a few updates. Which version are you using?

Pierre: The latest version.

Gustavo: Usually, when you get to the trigger RAV, it should try to create a RAV request. If that’s not happening, we need to investigate.

Pierre | Chain-Insights.io: No for me: tap:

max_amount_willing_to_lose_grt: 20

tap.rav_request:

trigger_value_divisor: 2

timestamp_buffer_secs: 60

request_timeout_secs: 5

max_receipts_per_request: 10000

Gustavo: So your trigger value is about 10, is that right?

Pierre posted: Yes, 20 / 2

Gustavo: So you would only trigger a RAV request once you reach 10 GRT or you reach 10000 receipts for a given allocation. Because of your query volume, I’d suggest something like 0.1, or:

max_amount_willing_to_lose_grt: 1

trigger_value_divisor: 10

max_receipts_per_request: 1000

Pierre posted: Ok, thanks.

Gustavo: If you don’t have that many queries per second, you need to have more RAV requests or have a greater trigger value.

This all depends on the number of allocations that you have open and the number of queries per second that you receive, plus the number of queries per second per allocation and any bursts that you have.

In TAP, after you aggregate at a certain point, no receipt that contains a timestamp before that point will work. They are considered invalid, so that’s why we have this buffer. We accept receipts until 60 seconds later. This is because it’s difficult to synchronize clocks between different computers. The gateway sends you the receipts and it defines the timestamp, and it could be 5 seconds before or after your clock.

We set the default values to be somewhere in the middle, where you have 1–2 queries per second, you get 1 RAV request every 15 minutes. But if you have less than that, you probably need to update to a lower max_amount_willing_to_lose_grt. If you have more than that, you probably need to update to a bit more.

Gemma | LunaNova: Are these values per allocation or across all allocations?

calinah | GraphOps: Per allocation

Pierre | Chain-Insights.io: After updating to: tap:

max_amount_willing_to_lose_grt: 1

tap.rav_request:

trigger_value_divisor: 10

timestamp_buffer_secs: 60

request_timeout_secs: 5

max_receipts_per_request: 1000

calinah | GraphOps: Nice, your receipt rate increased.

Gustavo: Yes, that’s nice. So what you want to do for your dashboard is the unaggregated fees should always be at that 0.1, so every time you have a RAV request, it’s going to go down a little bit, and then after 15 minutes, you have enough receipts that you trigger another RAV request.


Josh Kauffman | StreamingFast.io: Can someone share what they have set for those values?

Marc-André | Ellipfra: For reference, I’m running with these values. I do not recommend them specifically, as they were set more or less randomly. max_willing_to_lose is very high to avoid denying the gateway too often, and I intend to lower it at some point:

max_amount_willing_to_lose_grt = "1000.0"

[tap.rav_request]

timestamp_buffer_secs = 30

trigger_value_divisor = 50

request_timeout_secs = 20

max_receipts_per_request = 2000


Common Issues and Troubleshooting (continued) [41:35]

  • Sender Denial Issues:
    • Issue: Indexers might encounter situations where senders are denied, leading to disruptions in query processing and revenue.
    • Reasons for Denial: Three primary reasons for sender denial:
      • Sender not listed in tap.sender_aggregator_endpoints.
      • Sender’s escrow balance is insufficient to cover outstanding fees.
      • Unaggregated fees exceed the max_fee_per_sender limit.
  • Solutions: Specific solutions for each denial reason:
    • Verify the sender is correctly configured in the tap.sender_aggregator_endpoints section.
    • Investigate potential issues with escrow funding and ensure sufficient balance.
    • Review and potentially adjust the max_amount_willing_to_lose_grt setting if unaggregated fees frequently hit the limit.
  • Note: Indexer TAP agent will always start with:
2024-11-13T08:41:31.200776Z ERROR indexer_tap_agent::agent::sender_accounts_manager: There was an error while starting the sender 0xDD6a6f76eb36B873C1C184e8b9b9e762FE216490, denying it. Error: No sender_aggregator_endpoints found for sender 0xDD6a6f76eb36B873C1C184e8b9b9e762FE216490
at tap-agent/src/agent/sender_accounts_manager.rs:311
in ractor::actor::Actor with id: "0.0"

for any senders not configured in tap.sender_aggregator_endpoints or via env vars in the format INDEXER_TAP__SENDER_AGGREGATOR_ENDPOINTS__<address>

  • collect-receipt-endpoint error on indexer-agent :
{"level":40,"time":1731239677641,"pid":1,"hostname":"graph-network-indexer-agent-0","name":"IndexerAgent","msg":"The option '--collect-receipts-endpoint' is deprecated. Please use the option '--gateway-endpoint' to inform the Gateway base URL."}

Gustavo: The reason why you would want to deny a sender is either the sender isn’t paying you, so they don’t have enough escrow, or the amount that they queried you reached the amount that you’re willing to lose. Another thing that could happen is we use invalid receipts for this so if the amount of invalid receipts that you have is the maximum amount you’re willing to lose, you will block the sender because of that.

A quick update on the max_receipt_value_grt is that this is just for the service. For TAP to be minimal trust, these must be micro-payments, so once a gateway sends you a receipt, this is the maximum receipt value that they will accept, but it is not considered an invalid receipt.

An invalid receipt is one that you received, you served the query, and later on, you found out that the receipt is invalid.

Ana: For the invalid receipts, if any of the conditions change, would they become valid, or is it that once a receipt has been marked as invalid, there’s no retry?

Gustavo: There’s no retry. That table is just used for logging so you can go to the database and see what’s going on and why your receipts are failing.

Ana: That makes sense. Also, on the TAP dashboard, you can see if a sender has been blocked.

Common Issues and Troubleshooting (continued)

  • RAV Request Errors:
    • Issue: Errors during RAV requests can disrupt the aggregation process.
    • Error Example: It looks like there are no valid receipts for the RAV request… You can fix this by increasing the rav_request_trigger_value.
    • Explanation: This error suggests an insufficient rav_request_trigger_value is preventing receipts from being included in the RAV request.
    • Solution: A solution is provided in the error message – increase the rav_request_trigger_value.

Configuration and Setup

  • Software Versions: Use the required software versions for indexer-agent, indexer-service, and tap-agent.
  • Configuration File Walkthrough:
    • Shared Configuration: indexer-service and tap-agent share a common TOML configuration file.
  • Blockchain Addresses and Endpoints:
    • Important: Use the correct contract addresses, gateway endpoints, and chain IDs.

Monitoring and Debugging

  • Log Level Configuration: To set the RUST_LOG environment variable for appropriate logging, use RUST_LOG=indexer_tap_agent=debug,info.
  • Metrics and Grafana Dashboard:

Pause for Questions [50:40]

Pierre | Chain-Insights.io: Right now, until Dec. 3, only one sender is active?

# collect-receipts-endpoint: "<https://gateway-arbitrum.network.thegraph.com/collect-receipts>" DEPRECATED
gateway-endpoint: "<https://gateway-arbitrum.network.thegraph.com>"
  • Ana: That is correct as far as I know, so it’s only the Edge & Node sender that is active, that everyone is using. GraphOps is hoping for the end of December to early January to onboard some indexers to our gateway.

Pierre: Is this good or should I revert to activate collect-receipt-endpoint ?

  • Ana: Using gateway-endpoint is fine. Basically, the only thing that you need to know is that it’s your choice which CLI flag you use, but you’ll see the same error regardless and it’s a confusing error, so you don’t have to do anything here.

Ana: Any comments from some of the indexers who have already migrated about things they found that they want to share?

  • Marc-André | Ellipfra: It’s crucial to have the dashboard setup and monitoring. It’s getting more stable with every release, but you never know.

Final Thoughts

Gustavo: If you have any feature requests or ideas on how to better configure this, feel free to create issues in the indexer-rs repo.

We are happy to look at every issue and feature request so we can make the system better and the tech more stable and hopefully get to a better point. Thank you for the time to talk more about TAP.

Author

We're a web3 service provider specializing in blockchain indexing operations. Our mission is to enable creators to achieve their true potential with web3 technology. We want to help developers reliably access blockchain data in a consistent format so you can create amazing experiences for your applications.

No Comments

Leave a comment

Your email address will not be published. Required fields are marked *