TL;DR: In this recap of IOH #164, Yaniv Tal, co-founder of The Graph and founder of Geo, joined to discuss Geo, knowledge graphs, and demo Geo Genesis. Learn how the Geo data service will open more opportunities for indexers.
Opening remarks
Hello everyone, and welcome to episode 164 of Indexer Office Hours!
GRTiQ 175
Catch the GRTiQ Podcast with Charlie Hu, co-founder at Bitlayer, a pioneering Bitcoin Layer 2 solution based on the BitVM paradigm. Before launching Bitlayer, Charlie made significant contributions to other notable projects, such as Polkadot and Polygon.
Repo watch
The latest updates to important repositories
Execution Layer Clients
- Erigon v2.0 New release v2.60.2 :
- Revert breaking change to eth_estimateGas introduced in v2.60.1 (#10904) as we added a better fix for gas fee calculation in debug calls.
- Fix potential p2p shutdown hangup.
- Downloader: fix staticpeers flag.
- rpc: Fix incorrect txfeecap.
- Geth New release v1.14.6 :
- Fix issue in which the beacon root contract balance would not be saved in developer mode, causing an error on restart.
- Fix shutdown crash when geth runs in blsync mode.
- Fix data races in snapshot access.
- Fix out of bounds access in json unmarshalling.
- Add missing lock in peer discovery.
- Arbitrum-nitro New release v3.0.3 :
- This release fixes batch posting for Anytrust DAS chains, and fixes the optional Stylus program lazy recreation feature.
Consensus Layer Clients
Information on the different clients
- Lighthouse: New release v5.2.1 :
- This medium-priority release contains bug fixes for issues identified in v5.2.0. Users who have experienced sync issues with v5.2.0 are recommended to upgrade.
- Prevention of forwards sync stalls after restarting, through more robust handling of peer disconnects and errors.
- Prevention of backfill sync stalls.
- Removal of unnecessary warnings for stale blobs propagated on gossip by other clients. The log message WARN Could not verify blob sidecar for gossip. Ignoring the blob sidecar should no longer be seen.
- Correct sizing of the slasher database for the growing validator sets on mainnet and Holesky.
Graph Stack
- Indexer Service & Agent: New release v0.21.3 :
- Add Optimism to support matrix
- Release graph-node v0.34.0 to mainnet
- Release graph-node v0.35.0-rc.0 to testnet
- Docs: updated feature support matrix with Graph Node v0.34.1
- docs: added base, bsc, scroll and linea to the feature support matrix
- chore: update network subgraph deployment hashes
- fix: arbitrum-sepolia.md signer address
- Update feature support (ens, arweave, 0.35.x)
- Update hosted service subgraph endpoints in /docs & /docs/config-examples
- Add missing addresses and resolve table formatting issues in network docs
- indexer-agent: Avoid race conditions when queuing actions
- indexer-agent: broken comparison of indexing decisions
- indexer-agent,indexer-common: Improve control of subgraph deployment health safety check
- indexer-agent,indexer-common: feat: allow early allocation closure
- indexer-agent,indexer-common: lower the default parameters for query fee claiming
- indexer-agent,indexer-common,indexer-cli: Allow cost models in single network mode
- indexer-agent,indexer-common: fix tsconfig, error-handling export
- indexer-agent,indexer-common,indexer-service: feat optionally override address book
- indexer-agent,indexer-common,indexer-service: use sepolia instead of goerli for tests
- indexer-common: Add support for BSC, Base, Linea, Scroll
- indexer-common: Upgrade @graphprotocol/cost-model
- indexer-common: Avoid using createdAt field for pagination
- indexer-common: Add more tests into tap.tests
- indexer-integration-tests: start integration test package
- indexer-service: Use consistent authorization header formats
- indexer-service: Support setting rate limits and body size limits at startup
- indexer-service: Increase size of attestation signer cache
- indexer-service: Server integration tests
- indexer-service: fix paginate allocation queries
SSH Vulnerability
Vince from Nodeify highlighted an SSH vulnerability in Ubuntu. Patch your nodes and your indexer as soon as possible.
Vincent from Data Nexus shared a script to check for the SSH vulnerability: CVE-2024-6387_Check.
Graph Orchestration Tooling
Join us every other Wednesday at 5 PM UTC for Launchpad Office Hours and get the latest updates on running Launchpad.
The next one is on July 17. Bring all your questions!
No updates to:
- Protocol watch
- Forum governance
- Contracts repository
- Project watch
Open discussion with Yaniv Tal
Yaniv, co-founder of The Graph and founder of Geo, joined to discuss knowledge graphs and Geo and give a demo.
Yaniv opened by emphasizing the importance of his recent post for the community: Knowledge graphs are web3. This post has a lot of new ideas to digest and understand, especially about knowledge graphs.
What’s next for The Graph ecosystem?
The Graph was built because the co-founders wanted to enable web3 and strive toward the idea of decentralized coordination, changing how we cooperate and organize and allowing us to coordinate through open systems, not centralized platforms.
If we look at where we are as an industry, there are a lot of interesting projects happening, but we haven’t really seen an explosion of web3 taking off and getting mainstream adoption. This is because there’s something missing.
The way that technology works is when you get it right, there’s exponential growth. So, the fact that we’re not seeing this growth means that there’s something missing in what we’ve built so far.
The structure of web3 is missing.
This is where knowledge graphs come in.
With web3, we’re trying to build open, verifiable information systems. The information’s structure must be composable so that many different applications can interoperate. Knowledge graphs are key to this solution.
Geo
We want to enable countless applications to be built on web3, on this interoperable graph of data, a global decentralized verifiable graph—The Graph. Knowledge graphs are key to enabling this because what we have today is not super composable.
Subgraphs are a great tool for ingesting data from blockchains, but our subgraphs, as they are today, are still fairly siloed. That is, the same kind of teams are building on top of them, and we don’t see much composition across these subgraphs.
In this next stage, we need to start organizing more knowledge graph data—in a structure that enables open, composable data. This will make the data that we’re indexing much richer and the work of indexing even more interesting.
Demo
For a demo of Geo, go to 19:45 of the recording.
Three components to building this global verifiable knowledge graph:
- Data structures and knowledge: how we represent knowledge.
- Governance: how we get groups of people to agree on information.
- User interface: how we make it as easy as possible for people to add data to this knowledge graph and build applications on top of it.
Geo Genesis will be launching publicly soon.
Right now, people don’t know which information they can trust. We need a process that verifies facts in real time as stories are released, so we can enable a fact database that’s anchored on chain. We’re building the tools so that communities can participate in organizing that information, and we can combat misinformation.
If you’re looking for information about ZK, for example, we want to make sure everyone is referring to the same ZK, so we would have editors or curators go through and clean up and label the data.
The goal is to expand beyond the crypto space and push web3 mainstream.
Role of the Indexer
A global interconnected graph of all the world’s public knowledge and information with public governance behind it will make indexing more interesting. There will be so much more data to organize and indexers will have more choice and creativity in what they want to index.
There will be different ways to aggregate data. Maybe you want to index everything for a particular city, state, or country. Maybe you want to focus on a single mission or industry, or whatever you’re interested in.
Questions
Note: Answers have been lightly edited and condensed.
AbelsAbstracts.eth | GraphOps: How does web3 ensure the governance of information is decentralized and transparent?
[Timestamp 30:31] Starting at a high level, governance will need to be customizable because communities are not one size fits all. We’re going to launch with a default governance structure and then make it customizable in the future.
Governance starts with the spaces, examples like philosophy, cryptography, AI, etc. (as shown in the screenshot above). Each space has its own knowledge graph and its own governance. The governance outlines the rules for how people propose changes and how they get accepted. A community comes to a consensus on the knowledge and information for their community. They may not agree on every single thing, but they agree on the state of the debate.
By describing ideas as claims, we can get to a consensus more easily; we can agree that the person proposed this claim, and there are disagreements about whether that claim is true or not. Agreeing on how to structure the information is a big part of allowing people to reach a consensus in the first place.
For more on space creation and subspaces, listen at 32:45.
AbelsAbstracts.eth | GraphOps: What role do token incentives play in the governance and quality control of web3 data?
[Timestamp 35:36] Incentives are important, but they’re a really bad idea for governance. Token voting doesn’t work when you’re trying to discover the truth and to have knowledge that is correct. Knowledge is not evenly distributed nor is it proportional to the amount of tokens you have. If you make a governance process based on token voting, you’re guaranteeing that you’re not going to find the truth. We must separate governance from incentives.
Governance to me is an exercise in figuring out who’s the most qualified, who has the most expertise, who’s the best at what they do, and you want to give power to those people. I believe in meritocracy. We have the tools to make work more open and transparent so we can identify who those people are. For example, you can look on GitHub and see who’s pushing the code.
Incentives are important because we want to pay people to do work and incentives drive behavior. For any protocol, you must understand the goal, the objective function, and what work needs to be done to achieve that goal. Then you design incentives to encourage those behaviors.
The ultimate strategy should be oriented around things that don’t change over time. For The Graph, the strategy is oriented around the fact that people will always want more knowledge and more information, and they will want it better organized.
For more on incentives in Geo and the future of the curator role, listen at 37:49.
Vince | Nodeify: How will governance be dealt with on top-level governance, e.g., the past Brazil and Twitter issue with the people’s version vs. the government’s version of events?
[Timestamp 40:29] We’ll need to think about different governments and cultures and how to balance ideas like freedom of speech, the right to self-determination and self-governance. It’s really complicated. We’re not going to get away from laws; hopefully more truth, knowledge, and information gets out and we give people more power, and if they’re democratic, that gives them ways to be more free over time.
I think what we’ll end up with is different spaces. So for Brazil, maybe you have the Brazilian government-approved space and you have the free-Brazil space and they can both exist.
There will be user interfaces that get customized for different geographies so that people don’t see topics or items that aren’t part of their culture or preferences (e.g. nudity, swearing and aggressive language). We will be able to label posts and have customized algorithms or user interfaces so we have control over what we see in our feed.
NSun | Graphtronauts: For highly debated topics, what is the vision for how information will be presented? Is it whatever view that is arrived to through governance? Is it a showing of views from both, or all sides, and just letting the reader decide what makes the most sense to them?
[Timestamp 44:30] I don’t think the role of governance is going to be to try to determine what is true. That’s too difficult and not everyone is going to agree. I think the role of governance should be to try to create the conditions for healthy conversation and to organize information so it’s accurate. For example, including claims with supporting and opposing arguments, so you can see the best versions of both sides.
For examples, listen at 45:25.
AbelsAbstracts.eth | GraphOps: How will the role of an indexer change in this future with Geo? How can indexers prepare and get involved?
[Timestamp 49:25] We’re starting work on a new data service. Hope from GraphOps has started on a knowledge graph data service. The current version of Geo is built using Postgres, but once we start getting into highly connected data, graph databases become really interesting and exciting.
In fact, knowledge graphs and graph databases are the most powerful way of providing context and input into LLMs. We’re starting to experiment with graph databases, and since a lot of you have experience with Postgres, I think it’d be awesome to have indexers provide input and experiment. Some examples of graph databases include Neo4j, Memgraph, NebulaGraph, and FalkorDB.
Indexers will have more choice around what data they want to index. Indexers may also want to get involved in organizing some of this knowledge and information.
I think we should start revisiting the curator role and figure out the right incentives for people. We have a team that is building tools to ingest data, clean it up, and index it, which is a very relevant skillset that indexers already have, so that’s another place you might want to get involved.
NSun | Graphtronauts: How would you explain the most important differences of Geo to someone who might say this is just another version of Wikipedia, so why bother having it?
[Timestamp 53:36] Wikipedia is one application and it’s separate from other applications, separate from YouTube, X, Reddit, Quora—all these separate apps living in their siloes.
Some of the knowledge on Geo is static, but a lot of it is going to be dynamic. We’re going to ingest data from subgraphs, Substreams, and apps built on the blockchain, so there’s a lot of dynamic data. Wikipedia is static; Geo is meant to be dynamic.
Geo is not meant to be a competitor to Wikipedia. It’s meant to be like the Semantic Web, a whole new web, where you can have dynamic applications, but all of the information is verifiable and there’s public governance over public data.
chris.eth | GraphOps: How do we manage the Sybil problem, particularly when using something like the number of votes when forming views of the information?
[Timestamp 55:55] We have a unique approach to identity. The general Sybil defense strategy is to bootstrap identity through a web of trust built on the information contained in the knowledge graph. So, foundational to how we structure things and how we think about things is what I said about public governance over public data and the role of social consensus.
Blockchains are best used for assets and enforcing property rights. It’s important for assets to have property rights, but when it comes to information, I’m less convinced that that property rights are the way to govern information.
For more on Geo’s approach to identity, listen at 56:49.
People for the longest time were calling us the Google of blockchain. I don’t think up until now we’ve actually built the Google of blockchain, but this starts to look like [it], right? I can do a global search and I can search The Graph and I can find things.
Yaniv Tal
AbelsAbstracts.eth | GraphOps: How will the role of the indexer change as machine learning proliferates?
[Timestamp 1:01:15] Semiotic has announced that they’re working on an AI data service, but I think the quality of the AI inference is going to be related to the data that goes into it for RAG (retrieval augmented generation). So that’s where at query time, you can run a query against the knowledge graph, get the results, and then have the LLM incorporate that into its answers.
Indexers may want to collocate data and answer queries for areas that they’re indexing. For example, if you chose the agriculture space and index everything related to agriculture, then if someone has a question about agriculture, you can answer that really quickly because you’ve got everything right there. It will help with performance to have the data collocated.
Nick: Hey, Yaniv! Can you explain the costs of using Geo? For example, does it cost users to post or submit information on Geo? When I query Geo (like you just did for you or Tegan), do I need to pay GRT?
[Timestamp 1:03:19] Right now, we’re kind of subsidizing. I still think a paid service is the right model. The long term solution for web3 is the consumer should pay for the services they consume and they have full freedom. The challenge is you have to balance that against the friction of making users understand why and what they’re paying for, and how to pay.
If we succeed in building a platform for web3 that has a really great user experience, and we help lead and bring web3 to the mainstream, I think that could put us in a position to help solve that problem. We need to have a great user experience with seamless onboarding and start to educate people on what it means to pay for the services they consume.
Ultimately I do think consumers should pay, but I think it’s going to take us time to really define exactly what that looks like.
Quick follow-up: When I type into Geo, again, as you did with the example showing a search for you and Tegan, am I querying The Graph (and an Indexer is delivering that query)?
Right now, no, we’re running our indexer for this. We’re working on getting this data service on the network. It’s one of our goals for launch because we’d love to have indexers able to index a subgraph and to have those query fees flowing through the network.
Yaro from Pinax has been helping us with this; he’s been doing really great work.
AbelsAbstracts.eth | GraphOps: How can we unite the Semantic Web community and the web3 community We have the same goals but operate in different worlds – how can we unite and join forces?
There were three main reasons the Semantic Web didn’t take off. The first was that it was built by academics and has cumbersome developer ergonomics. It also lacks governance and standards, and it does not provide incentives. The Semantic Web folks are purists, and they’re right, but they’re not ready to give up the academics-led standards they built.
Listen to Abel expand on his question and Yaniv’s response at 1:06:43.
Vince | Nodeify: Would a donation platform similar to Wikipedia be more applicable here? Although it would be extremely inexpensive, signing tx’s everytime you do something, would I think push people back to freemium services like wiki with no knowledge needed and very easy on mobile. Or how do you bridge that UX gap?
People have lots of opinions and it’s hard to tell who is right when discussing in the abstract. It’s better to experiment and see what works and what leads to user friction.
Listen to Yaniv’s response at 1:10:12.
chris.eth | GraphOps: Can you talk about (a) Geo as the first application built on the Knowledge Graph, enabling governance and focused on end users vs. (b) the Knowledge Graph Data Service on The Graph, focused on attracting third-party developers to bring their data to the KG and build on it, like Geo has?
Both are important, but ultimately it’s not just about one application. The knowledge graph is the interoperability standard that third-party apps can build on top of. The team is beginning work on a framework where people can build custom apps using the knowledge graph.
Listen to Yaniv’s full response at 1:11:04.
Yaniv’s final thoughts
Thanks everyone, I really appreciate the time and you making it here and really excited to work together on building all of this out and getting web3 out into the world. It’s going to be a really exciting year ahead and can’t wait to work with everyone.
No Comments