The Graph Indexer Office Hours #188

Events By Dec 20, 2024 No Comments
TL;DR: Ricky shared his progress on indexer operations automation tools, including an Indexer Management Interface and a Curation Signal Allocation Optimizer. He showed several demo videos, including deleting subgraphs, quality of service metrics, and real-time ISA score. The session outlined a seven-step roadmap, with the ultimate goal being the integration of knowledge/note taking, AI agents, and a coding environment, all working together seamlessly and compatible with GRC-20.

Opening remarks

Hello everyone, and welcome to episode 188 of Indexer Office Hours!

GRTiQ 199

Catch the GRTiQ Podcast with Lark Davis, an influential crypto content creator and the driving force behind the widely followed YouTube channel and newsletter, Wealth Mastery.

⬇️ Skip straight to the open discussion ⬇️

Repo watch

The latest updates to important repositories

Execution Layer Clients

  • Reth New releases:
    • v1.1.4 :
      • Date: 2024-12-11 11:58:33 UTC
      • This release fixes the fromstr implementation for the miner_ variant, which is required for the op-batcher. Users of OP-Reth should prioritize this update.
      • Urgency indicator: Yellow
      • Urgency reason: Important fix for core OP-Reth functionality.
    • v1.1.3 :
      • Date: 2024-12-11 11:58:33 UTC
      • This release introduces performance enhancements, bug fixes, and breaking API changes (including new NodePrimitives abstractions), with medium update priority for OP-Reth users and low priority for all others.
      • Urgency indicator: Yellow
      • Urgency reason: Medium update priority for core OP-Reth functionality.
  • sfeth/fireeth: New release v2.8.2 :
    • Date: 2024-12-17 15:04:20 UTC
    • The release updates firehose-core to v1.6.8 and introduces improved Substreams performance through reduced memory allocation and the removal of unnecessary tracing. Additionally, new flags are added to enhance client connection management and block fetch duration control.
    • Urgency indicator: Yellow
    • Urgency reason: Performance improvements, not immediately critical.
  • Nethermind: New release v1.30.0 :
    • Date: 2024-12-12 10:15:46 UTC
    • The release addresses a startup crash for Nethermind users and fixes JSON RPC module handling after the .NET 9 upgrade. Both updates are important for maintaining system stability and compatibility.
    • Urgency indicator: Yellow
    • Urgency reason: Fixes issues that may disrupt service.

Consensus Layer Clients

Information on the different clients

  • Prysm: New release v5.2.0 :
    • Date: 2024-12-16 19:42:18 UTC
    • Release v5.2.0 includes mandatory updates for mev-boost users to prevent fallback to local blocks amid gas limit increases. It also introduces QUIC as the default protocol and various improvements and fixes critical to validator operation.
    • Urgency indicator: Red
    • Urgency reason: Mandatory update to prevent execution fallbacks.
  • Teku: New release 24.12.0 :
    • Date: 2024-12-12 11:19:51 UTC
    • The release 24.12.0 includes critical bug fixes and performance improvements for block publishing, particularly beneficial for operators using locally produced blocks. No immediate breaking changes are introduced, but operators should prepare for metric name changes in the next release.
    • Urgency indicator: Yellow
    • Urgency reason: Important updates, but not immediately critical.
  • Nimbus: New release v24.12.0 :
    • Date: 2024-12-13 08:13:54 UTC
    • Nimbus v24.12.0 introduces support for the bootstrap_nodes.yaml specification, improving node configuration. This is a low-urgency release, and operators may update at their convenience within two weeks.
    • Urgency indicator: Green
    • Urgency reason: Update at your convenience.
  • Lighthouse: New release v6.0.1 :
    • Date: 2024-12-16 06:25:41 UTC
    • Lighthouse v6.0.1 addresses minor issues and optimizes features from v6.0.0, including improvements in state management and bug fixes. Users should ideally upgrade to v6.0.1 for better performance, but it is not critical due to the nature of the fixes. The updates are backward-compatible with v5.x.y.
    • Urgency indicator: Green
    • Urgency reason: Low priority, non-critical updates.

Graph Stack

  • Indexer Service & Agent (TS): New releases:
    • v0.21.11 :
      • Date: 2024-12-12 19:02:30 UTCThis release includes an optimization for round trips in subgraph deployments, enhancing performance. Operators should consider updating to benefit from improved efficiency.
      • Urgency indicator: Yellow
      • Urgency reason: Performance improvement, not critical.
    • GraphOps is currently running this release on Arbitrum Sepolia with no issues.
    • v0.21.10 :
      • Date: 2024-12-12 19:02:30 UTC
      • This release reduces subgraph pagination verbosity, removes the DAI injection feature, and adds support for Boba, Boba BNB, as well as a fix for Blast.
      • Urgency indicator: Yellow
      • Urgency reason: Performance improvement, not critical.

Protocol watch

The latest updates on important changes to the protocol

Forum Governance

Contracts Repository

Open discussion [12:54]

Automating Indexer Operations

Ricky, an indexer with expertise in data science and automation, shares what he’s been working on since the last time he was on Indexer Office Hours (#183). For another previous episode, visit The Graph Indexer Office Hours #179.

Ricky: In IOH 183 I showed a user interface that I’ve been building out to manage all the moving parts of my indexer automatically. We seem to have lost that recording, unfortunately, but I have some videos that I will show here to walk through as a start.

Early version video

  • This is an early version of the Indexer Management Interface. You’ll see that it’s very similar to Grafana. It’s essentially like an alternative interface to Grafana that uses the same exact connections and everything. So, the first chart just shows query volume by subgraph.
  • Here, I have it set up where I have an interface for each of the logs, and then for each of those logs, I can look at different error summaries, and then I have an initialized LLM chat that can help me troubleshoot issues with the different components.
  • Then I can do a number of different things. I can adjust my PostgreSQL database settings, which were a complete disaster I realized at the start. This interface allows you to easily make those adjustments on the fly, run different commands, and then also simulate your allocation management, for example.
  • Basically, I’m building on the tutorial that I shared in the past, taking all that data and making it into a useful interface for an indexer. So you can close all your allocations and things like that, decide on which subgraphs to sync, which is what you’re seeing here [timestamp 14:46]. You would have things like the total queries, the entity count for the subgraph, the signal, the rewards proportion, and you can have it all in one place and then directly from the interface, you can sync subgraphs.
  • This is just a first step and an early version.

Deleting subgraphs video

  • You can delete specific subgraphs based on their size, how many queries they get, and then here [15:37], you can set up rules around that and then do a widespread pausing and deleting of subgraphs all at once.

Quality of Service metrics video

  • Here you can see by subgraph, by chain, things like latency, blocks behind.

Questions from the chat:

Matthew Darwin | Pinax: Logs are going into Grafana? (or where are the logs being stored?) Indexer management interface is written in what tech stack?

Ricky: The way I set this up for my own indexer, I basically had each of the logs going into a text file, and then I was reading from that (for simplicity at the start), but I do want to set it up where it uses the traditional logging stuff. So I’ll be looking into using Loki and other sources. At the start, I was outputting all the Docker container logs into text files and reading from those, but in general, my goal is to make it compatible with the existing stack. I’ve been using Payne’s Docker container stack so far, and the way it would be set up on a new indexer is that it essentially reads from the variables that you traditionally have set up, and that’s pretty much all that’s required. With Payne, I’m working on a new set up and we’re going to evaluate how to best set it up on that one, as well as for other indexers.

I haven’t been working on this in the last few weeks because I’ve been needing to upgrade my indexer, and Payne has been super helpful and handling everything for my new indexer. Hopefully, this will be a really great collaboration and will result in us bringing valuable tools to the indexer community.

Once the indexer is fully ready, one of the things we’ll be looking at will be long-running queries and creating automation around indexes, among other things. If other people want to use these tools, it would probably start to have a pretty meaningful effect on cost savings. I want to make this setup as simple as possible. It just requires the .env variables and figures things out from there. This part, in particular, will be a continued work in progress. Big shout out to Payne for his help and for enabling me to pursue some of these things.

stake-machine.eth: Deleting subgraphs directly in a DB or via graphman / API? Vector.dev + Loki should be good.

Ricky: Essentially, the way it works is I have a shell CLI command that works very similar to the traditional one where when you run it, you can interface with the CLI. So I wrote a slightly different version that just runs one command at a time and then kind of exits, so then, with my automation, I can just pipe commands through that directly. So yeah, some of this is not the best approach right off the bat, but it’s what’s been working for me.

stake-machine.eth: We were asking for a graphman API for ages already. 🙂

Matthew Darwin | Pinax: Poor man’s API.

Ricky: Down the line, my goal is to start showing different functional applications of using AI agents in actual reasonable ways that hopefully work well. Hopefully, in the future, I’ll be able to share a more live and working version of this. Once we get the indexer setup finalized, we’re going to keep playing around with this for my own indexer, Payne’s indexer, and then share more of these tools with whoever wants to use them.

Matthew Darwin | Pinax: Indexer Management Interface is written in what tech stack?

Ricky: This is in Python and using a tool called Shiny for Python Express, which comes from a corporation that specializes in R programming. The original tutorial I shared was made in R. They created some tooling within the Python ecosystem to create applications, and that’s what I’m using, but the tool itself is pretty interchangeable.

Matthew Darwin | Pinax: Would love to try it. Very cool stuff.

ISA Score video [23:45]

Ricky: This shows your ISA score in real time. So this is taking the public ISA repo and kind of like reverse engineering a pretty close to real-time version of that. You can see a breakdown of what your score is like across different subgraphs and different chains and be able to spot any issues. A lot of this is still a work in progress.

This is part of a broader seven-step project roadmap. For some of these, it’s good to get feedback from this group and see if there are any reservations about any of these things.

  1. Automate curation:
  2. Automate delegation
  3. Automate tooling to minimize inaccurate data being served: Query Consistency
  4. Automate indexer operations:
    1. Indexer Management Interface
    2. Log controls: Demo
    3. Deleting subgraphs
    4. QoS metrics
    5. ISA score
  5. Automate subgraph development: Subgraph development demo
  6. Build out the ultimate coding workflow with GRC-20 compatibility. This is the main project and what I’m building up to. It seems obvious to me that the next evolution is the natural integration of knowledge/note taking, AI agents, and coding environment, all working in much more harmony than today. Also, it seems obvious that doing so, [while] maintaining compatibility with both GRC-20 and Obsidian and offering all of that for free, starts to become a lot more sustainable and higher quality than paying people to add information to Geo: Knowledge flow demo.
  7. Fully automated public good software AI agent contributors leveraging the tool I’m developing.

Curation Signal Tool [25:49]

Ricky: I have a curation tool where you can find different opportunities. Here you can see how many queries each subgraph has. I already showed this to some degree, but it had some issues in the past, and now it’s fixed.

Working off of this, I built an example of an agent.

You have a chat interface by which you can do curation. Here, I’m asking it for some good opportunities and it’s pointing out two that are above 100% APR.

It will also visualize these so I can see over time and then once I have this information, I can ask it to execute on a certain kind of curation strategy.

I think this is a pretty good example, because some people think of an AI agent as something you can ask to do whatever, but in this case, I’m thinking about AI agents as really small modular components, where the goal is to know which tools to interface with at the correct times, so when you ask it to give you a list of curation opportunities, it would always give you the same list.

Query Consistency

You can see this in another tool that I’ve been experimenting with where for any given subgraph, you can get a state from the status endpoints that will tell you what query consistency is like on the given subgraph, and then when it detects inconsistencies, it’s able to find the divergence block between different indexers and then using that information, it can look up the entity changes for the indexers and compare that to the actual on-chain information, and get to this on-chain truth.

In this context , you can see that all the AI agent is doing is selecting the appropriate tools for the job versus an AI agent that actually gives you the response itself.

I’m doing a lot of experimentation right now with different AI agents and projects.

Everything I’m building is intended to improve the protocol in some way, and I think all of these things do. It’s helpful to be able to discuss these things with this group so we can identify any areas that we think are not beneficial to the protocol.

Moving away from curation, you could see a similar design around how you would manage delegations, where you have this AI tooling but you ultimately have a human in the loop that confirms the decisions and makes different kinds of decisions.

Automating Subgraph Development

I’m going to start developing some more specialized agents for managing different aspects of my indexer, and from there, one of the big items I’ve identified in my time working for Edge & Node is automating subgraph development more. This could be a really nice value add to the ecosystem. Having more data available and accessible via The Graph is a big one. We can see all of the on-chain activity is available, so I’d like to start chipping away at some of the higher value contracts and projects that don’t exist currently on The Graph, especially across certain chains that are supported by The Graph like Base and maybe Solana down the line. I think there’s a lot of activity we can see that does not currently exist on The Graph, and making it easier for that to happen is a big one I think.

Being able to query from The Graph shouldn’t necessarily come with having to become a subgraph developer because in my experience, it’s quite challenging. That’s one I started making some progress on.

[33:22]

You can see that you can provide a contract and then it would help you through the process of developing the subgraph. You can generate a schema and you’ll have a good starting point. It’s going to give you AI-powered suggestions of different things to add. Then you move on to the mappings, and kind of a similar thing.

Matthew Darwin | Pinax: AI agent in the graph-cli?

Knowledge Flow

Ultimately, one of the main projects I’d like to work towards is creating a really good interface for content management and writing your own notes. You could think of this as Obsidian or Roam Research style, if you’re familiar with those, and get all your content information in one place that’s really easy to interface with for different kinds of coding projects.

If you start to integrate those pieces together, this knowledge content management system together with coding, projects, and the things that you care about, and integrate those things better, I think you start to have pretty powerful software. If you make that available to people in a way that is compatible with something like Geo, then it starts to open up some pretty interesting possibilities.

Storing all my notes in a format that’s a lot similar to Obsidian, but then you can actually bring that information into more of a coding environment preloaded with an LLM and then you can interface something that’s already set up with all your content, all you files, and it’s kind of like its own dev environment.

I see something like this being a good solution compared to paying people to fill out a knowledge graph. My take on the Geo track of work would be to create something that people would be willing to pay for but make it available for free and make it compatible with the GRC-20 standard.

I did manage to make something that is compatible with GRC-20 and writes data into my own database. I’m still tweaking with a lot of different things.

These are all stepping stones in this direction.

Closing

[38:38]

Right now, everything I’m doing is geared towards making indexer operations streamlined, more efficient, and providing value to this community, so if there are any areas you struggle with as an indexer, I want to hear about them and see what I can do to help.

I’ll start sharing all my code, especially the Python stuff. Originally, my plan was to develop these tools, use them more for my own work, but I’m starting to lean more towards making everything public from the get-go, and anyone who wants to contribute, can.

On my website, for the query consistency tool, I made that available, so the AI agent tool you saw, it is open source.

Matthew | Pinax: Ricky, your stuff looks great. The things that I heard you say that I’m most interested in are:

  • Indexer tuning, so the find long queries, create indexes thing, that’s super critical.
  • Resources for how to build subgraphs faster, better, easier, plus how do we unify the tool sets we have (Graph CLI, Substreams dev container). How to build a more holistic set of tools that don’t confuse users.
  • POI investigations, if we’re going to start collecting logs, it would help to make them available to POI investigation tools.

Would love to collaborate with you on these tools.

Ricky: Yeah, definitely. Those are great callouts. Thanks, Matthew. On the last point, for the query consistency tool that I showed, one of the things I’d like to works towards is—I’m waiting for graphix to become more publicly available for me to use as a tool because then you have this somewhat trusted source of truth when it comes to the actual data, and then you have an interface that easy for everyone to interact with and collect your own data, but you at least have graphix as this grounded source of truth hopefully over time. I would love to keep collaborating with you guys on all these things.

Marc-André | Ellipfra: Yup, yup, Ricky already came up with a lot of useful tools for POIs. Connecting to logs is an interesting idea, Matthew. Obviously there are several challenges to get this integrated.

Matthew Darwin | Pinax: Let’s chat how we can make this happen….

Vince | Nodeify: Ricky, I made a pipeline for graphix images.

Ricky: I’ll work towards making the tools that I have today publicly accessible and available if anyone wants to take a look or start thinking about how some of them could work well for their stack. I imagine for a lot of indexers, what I’m using for my own stack won’t be a realistic approach, and I’m not too familiar with what other stacks may look like. I think also exploring how it would fit in well with something like Launchpad and Kubernetes stuff would be helpful. I’ll start sharing and then maybe we can iterate on some of these things as a group.

Matthew Darwin | Pinax: Yep, definitely interested in integrating with Launchpad.

Author

We're a web3 service provider specializing in blockchain indexing operations. Our mission is to enable creators to achieve their true potential with web3 technology. We want to help developers reliably access blockchain data in a consistent format so you can create amazing experiences for your applications.

No Comments

Leave a comment

Your email address will not be published. Required fields are marked *