TL;DR: Vince introduces Talos Linux, a lightweight and secure Linux distribution designed for Kubernetes, demonstrating its installation, configuration, and management capabilities. He shows the Talos Forge tool he created to convince another indexer to try Kubernetes and KubeSpan for creating hybrid clusters. The Q&A addresses topics like control plane setup, bare metal deployment, and comparisons with other tools like Launchpad.
Opening remarks
Hello everyone, and welcome to episode 182 of Indexer Office Hours!
GRTiQ 193
Catch the GRTiQ Podcast with Julian Zawistowski, Director at Golem Foundation. Julian is a co-founder and a longtime leader at Golem, a project that has evolved and transformed since the early days, and Julian will take us inside the transformation and what Golem is doing now.
⬇️ Skip straight to the open discussion ⬇️
Repo watch
The latest updates to important repositories
Execution Layer Clients
- Reth New release v1.1.1 :
- Date: 2024-11-05 22:50:45 UTC
- This release contains performance improvements and bug fixes, such as running pruner after saving blocks. There are no breaking changes; however, users may experience a longer startup time on the initial restart due to the running pruner fix.
- Urgency indicator: Yellow
- Urgency reason: Important updates for performance stability.
Graph Stack
- Indexer Service & Tap Agent (RS): New releases:
- indexer-tap-agent-v1.6.0:
- Date: 2024-11-04 21:58:37 UTC
- Version 1.6.0 adds a feature to calculate unaggregated fees even if the RAV fails. No critical updates are present in this release.
- Urgency indicator: Yellow
- Urgency reason: Important for performance, not critical.
- indexer-tap-agent-v1.5.0 :
- Date: 2024-11-01 21:58:37 UTC
- Version 1.5.0 adds a sender fee tracker metric and adds a fix to recalculate all aggregations. No critical updates are present in this release.
- Urgency indicator: Yellow
- Urgency reason: Important for performance, not critical.
- indexer-tap-agent-v1.4.1 :
- Date: 2024-10-30 21:58:37 UTC
- Version 1.4.1 adds a bug fix to check subgraphs before closing allocations. No critical updates are present in this release.
- Urgency indicator: Yellow
- Urgency reason: Important for performance, not critical.
- indexer-tap-agent-v1.4.0 :
- Date: 2024-10-30 21:58:37 UTC
- Version 1.4.0 adds bug fixes for initializing allocations monitor, subgraph, and refresh database before closing allocation. It also adds the ability to create allocations in parallel. No critical updates are present in this release.
- Urgency indicator: Yellow
- Urgency reason: Important for performance, not critical.
- indexer-service-v1.2.0 :
- Date: 2024-10-30 21:58:37 UTC
- Version 1.2.0 adds a feature for the system to check if the gateway is sending the value being requested, otherwise it rejects the query. No critical updates are present in this release.
- Urgency indicator: Yellow
- Urgency reason: Important for performance, not critical.
- Also, Indexer Service releases 1.2.1 and 1.2.2, which are very small releases.
- indexer-tap-agent-v1.6.0:
From the chat:
- Matthew Darwin | Pinax: Need a countdown clock to when indexer-service-ts is no longer supported….
- Ana | GraphOps: We did add a banner last week on our GraphSeer explorer that says the date, which is Wednesday, December 4, at 9:00 AM Pacific Time (PT), but yeah, a countdown would be good.
Update later posted to Discord
Gustavo | Semiotic Labs — Nov. 6
Hey everyone, we just released an important fix for indexer-service-rs with version 1.3.1.
This fixes a bug that wouldn’t sign correctly the attestation for queries that are not properly formatted (more spaces than needed).
I recommend everyone update to the latest set of versions for indexers:
- indexer-service-rs: 1.3.1
- indexer-tap-agent: 1.7.1
- indexer-agent: 0.21.6-2
Also, since version 1.2.2, indexer-service is enforcing cost models on gateway receipts. If you have cost models set to a higher value than the gateway budget, you might have some errors. If you do, I recommend lowering the value for cost models.
Protocol watch
Forum Governance
- Request for information about disputes #GDR-18
- Update: Given the evidence provided by Inspector POI and confirmation of the data by the arbitrators, we resolve to slash both disputes.
Forum Research
Core dev updates:
- Semiotic November 2024 Update
- Edge & Node’s October/November 2024 Update
- Pinax November 2024 Update
- GraphOps Update November 2024
- Messari November 2024 Update
- Geo November 2024 Update
- StreamingFast November 2024 Update
Contracts Repository
- ci: fix endpoints for token distribution repo and split ci per package #1066 (merged)
- docs: fix links in README.md #1065 (merged)
Project watch
Reminder: Indexer Service TypeScript to be deprecated
The countdown has begun. Migrate to Indexer Service Rust and TAP as soon as possible.
On Wednesday, December 4, at 9:00 AM Pacific Time (PT), the gateway will start enforcing Indexer Service TypeScript deprecation. Indexers still running Indexer Service TypeScript will not be eligible to serve queries.
Open discussion [10:14]
Vince from Nodeify (and Pinax) introduces Talos Linux, a lightweight, secure, and predictable Linux distribution designed specifically for Kubernetes.
Vince: I’ll be showing you Talos and trying to convince Payne to use Kubernetes (or at least play with Kubernetes).
What is Talos Linux?
A lot of Kubernetes distributions are installed on top of other distributions, like Debian, Ubuntu, etc. Talos is a lightweight distribution. It’s just the image, everything that Kubernetes needs, and nothing else, and this makes the installs predictable and very secure.
There’s no SSH. Everything is done through machine configs. We’ll talk more about that later on.
The Talos team makes everything in stable rollouts, so you don’t have to worry about Kubernetes getting out of whack with upgrades, and this doesn’t support that, etc.
Talos is minimal: you can PXE boot it, you can install it on a USB stick and boot it, you can run it on Docker, so there are lots of things you can do with it.
It’s ephemeral, which is fantastic. It doesn’t take up any disk space but leaves it all for Kubernetes itself.
Ephemeral means Talos runs in memory from a SquashFS, and persists nothing, leaving the primary disk entirely to Kubernetes.
Quickstart
There’s a quickstart if you want to run it on your local machine, just on Docker desktop or something like that. It’s very quick and simple.
Command Line Tool
If you’re at all familiar with kubectl or kube control, whatever you want to call it, this is how you talk to the Talos cluster. There are lots of commands.
Installation
You can install Talos on various platforms: DigitalOcean, Google Cloud, bare metal, Proxmox, etc. Where I think Talos gets really cool is they have something called an Image Factory.
- Talos Linux Image Factory
- The Image Factory provides a way to download Talos Linux artifacts. Artifacts can be generated with customizations defined by a schematic. A schematic can be applied to any of the versions of Talos Linux offered by the Image Factory to produce a model (from repo).
- Image Factory GitHub repo
With the Talos Factory, there are ISOs already built, but say you want to do more with that ISO. You want to build your own ISO to have its own kernel arguments, or you want to change its networking or something else different before it even boots. This makes things predictable when you’re launching in infrastructure.
An ISO is a bootable file that sets up the operating system.
Watch Vince demonstrate how to use the Image Factory at 15:57 of the recording.
After boot
Once you get your Talos machine booted (in the demo, Vince’s is on a server he has on DigitalOcean), it’s in maintenance mode. It’s not doing anything; it has just booted the Talos ISO.
Press F2 to get a dashboard, and if you’re on bare metal, you will also get a network tab.
Since it is in maintenance mode, I can talk to it and ask what disks are on this machine, and it will show me the disks on the machine, their names, how much space they have, the bus path, and a bunch of other stuff.
You do want to use this insecure flag because, in maintenance mode, it will be port 50000, and it will just be sitting there waiting to be told what to do.
Once you commission your cluster in Talos, it will sync up with certificates, and you can no longer use this insecure flag. The only way you can connect to it is if you have certificates, the same kind of thing with Kubernetes and talking to your Kubernetes cluster, it’s all in certificates, so it’s very secure.
The most someone can do with it is if they get access to it and you’re leaving stuff unprovisioned, they can provision it if they find it and tell it to do stuff, which obviously you don’t want, so I do recommend if you’re planning on leaving it and waiting, that you limit it to something internal and not fully exposed to the world.
Talos Forge tool [19:08]
Here’s a little tool I made to convince Payne. 😉
- Talos Forge
- This repository contains a makefile and associated scripts for managing Talos Kubernetes clusters. It provides a streamlined interface for common cluster operations including initialization, deployment, node management, and configuration.
It’s not crazy production, as I did this in just a few days, but it’s something you can get up and play with. By all means, fork it, make improvements to it, PR it, do whatever you want. I welcome any cool additions. I just wanted to be able to do the main stuff that you need to do.
The main stuff I can run with make help. It’s all run with make commands.
You can see all the commands by running make help or view the README on the repo.
make help
Usage:
make deps # Check system dependencies
make init # Interactive cluster initialization
make deploy <cluster-name> # Deploy Talos cluster
make kubeconfig <cluster-name> # Generate Kubeconfig
make reset-cluster <cluster-name> # Reset Talos cluster nodes
make add-node <cluster-name> <node-type> <node-ip> # Add a node to the cluster
make remove-node <cluster-name> <node-ip> # Remove a node from the cluster
make apply <cluster-name> <node-ip> <patch-file> # Apply patches to the specified cluster
Talos Patching
Talos has patching. Say I launched it with a specific machine config, but I want to change something about it. I want to change what disk it’s going on. You can apply patches to machines to adjust configuration options. So maybe you have your base config and then you have your patches that change various things.
You’ll need to have installed curl, Git, and yq. Plus, you’ll need talosctl and kubectl. Then you’re good to go.
Watch Vince demonstrate at 20:33 of the recording.
You’ll get two files: a worker YAML and a controller YAML.
Run the make init command and go through this yourself. There’s lots of stuff you can do.
Vince discusses:
- If you’re running bare metal, you’ll want to get the networking squared away.
- Instead of a load balancer, you can use VIP and use an unused IP address that’s in the same network.
- You can change your disk name or ask it to select a disk of a specific size.
Take a look at these and see all the stuff you can do. It’s a lot to go over, and there’s no possible way I can talk about all of it.
Talos KubeSpan
KubeSpan is a feature of Talos that automates the setup and maintenance of a full mesh WireGuard network for your cluster, giving you the ability to operate hybrid Kubernetes clusters that can span the edge, data center, and cloud. Management of keys and discovery of peers can be completely automated, making it simple and easy to create hybrid clusters (from KubeSpan page).
You can talk to things on multiple different platforms. Maybe you have bare metal, stuff in Google Cloud, and stuff in DigitalOcean, and they’re not even on the same network. This actually creates a custom WireGuard network with just an enabled flag, and everything will talk to each other, so you don’t have to deal with any crazy networking or anything like that.
Deploying a cluster [25:28]
Now that we have our cluster, we want to deploy it using the make deploy command.
It will start running, and you’ll see that the control plane and all the other nodes are going to start booting. Once all the services are up, some may be unhealthy. You just have to give it a minute, and all will turn healthy. Once it says ready, it’ll also tell you that your cluster is good to go.
Since we have a few minutes while we wait, does anyone have any questions?
Questions [26:58]
Payne: When you set up the IP for the control, the load balancer… what does the load balancer do?
Vince: So the load balancer is basically load balancing all the control planes, which is your Kubernetes API so you can talk to it.
Payne: Do I need a load balancer?
Vince: No, you can use VIP. You could even run HAProxy. You can run whatever you want at the front end. But if your main control plane goes down, you won’t be able to talk to your cluster. So you want something in front of it so you can always talk to it and manage it.
Payne: So I can access my cluster via the load balancer IP?
Vince: Correct. That way, it doesn’t matter if some of your control planes go down.
Payne: The control planes, are they separate servers?
Vince: Yes, or VMs. You can Proxmox them.
Payne: What if I have bare metal, and I want to throw containers at it?
Vince: If you have bare metal, you have a couple of options. I would recommend small bare metal servers, you can even use Raspberry Pi. Ideally, you want separate machines, but some people like to use a smaller server and Proxmox it. But if that machine breaks, then all your control planes go down. You definitely want different machines.
In the minimal ideal scenario, you would have six machines. But there’s also KubeSpan, where if you wanted to put the control planes somewhere else, like DigitalOcean, that’s fine; you can have just your control planes in the cloud. They’ll talk over WireGuard.
Control planes are very minimal, like 2 to 4 vCPU, very small servers.
Payne: What if I had all six servers act both as control planes and as workers?
Vince: Yeah, you can do that. By default, control planes don’t take on workloads, but you can turn that off.
Payne: So I can do what I’m doing now with Proxmox. I know it’s not recommended because I have the possibility of running it elsewhere but at the same time…
Vince: Yeah, you can definitely do that.
calinah | GraphOps posted: YOLO [you only live once]
Payne: Yeah, YOLO
My next question is, in the control plane YAML file, do you specify this per worker node… when you say you specify the disks and all that, is that in the control plane or in the worker YAML?
Vince: It’s in both.
[35:00] In control planes, there’s different stuff, and it’ll even tell you if you try to do something that’s not there. There’s mainly stuff under cluster in control plane that worker doesn’t have, so if you try to apply something from cluster on the worker node, it’ll say, “I don’t know what that is.”
Linked below is how to make your control planes also run workloads, so it’s just allowing scheduling on the control plane so you can run workloads on it.
Modifying a cluster [35:38]
Vince: Now that it says it’s ready, I can do: make kubeconfig and the cluster name (ioh). Now we have our kubeconfig.
Copy kubeconfig and paste into Kubie to test the cluster.
Then pull up K9s, go to namespace all, and you can see we have our cluster and we can talk to it, and we can run workloads on it if we want.
Add and remove a node [36:48]
Let’s add another controller.
We do: make add-node, cluster name, controlplane, node IP
- make add-node ioh controlplane [IP]
It’s booted and joined the cluster.
Now, let’s say that was a mistake: make remove-node, cluster name, node IP
- make remove-node ioh [IP]
It will reset the node and remove it from the cluster.
Then it boots back up into maintenance mode.
Patching [39:50]
Let’s do a patch on worker two. I want to change a label, something simple.
Go into the patches folder and create labels.yaml
I want it to say: value env: “worker”
Then do: make apply, cluster name, worker two IP, patch file
- make apply ioh [IP] labels.yaml
Then it applies the patch. You can also remove it.
You can patch anything in your machine config.
Talosctl [42:04]
Talosctl has a ton of commands and subcommands. Take a look at them.
Some examples:
- Validate your machine configs
- Back up etcd, which is kind of like the cluster
- List containers
- Add stuff
- Check the health
Resetting a cluster [43:49]
If we go to make help and type: make reset-cluster ioh
That will reset all the things that are in your cluster config, and then it will go back into maintenance mode and be ready for you to play with again.
In production, you’re going to want to really customize it to your machines because yours will be different.
Questions [45:35]
paka | E&N: Could I get a noob explanation for how this is different from something like Launchpad?
Vince: Launchpad is once you have a cluster, so this is getting a Kubernetes cluster and managing it at the hardware level.
Payne: So you’re managing this with Talos at the hardware level, and then you deploy stuff on the cluster with the Launchpad, right?
Vince: Yeah, correct. Or whatever you want to use: bare manifest, Argo, Flux, Launchpad.
Pierre | Chain-Insights.io posted: Talos is a Linux OS with Kubernetes batteries included.
Vince: Exactly, Pierre.
paka | E&N: Are there white-labeled Talos packages for domain/app-specific use cases?
Vince: Yes, if you go on to the releases page, they have ones specifically for different platforms. They have releases for virtualized platforms and hosted platforms like Google Cloud, DigitalOcean, and a bunch of other major players. They have ones for Raspberry Pi.
What’s cool with Raspberry Pi is with extensions, you can do kernel arguments, but they also added layers after 1.7, I think, so you can do stuff before the kernel even gets installed, which you want to do with Raspberry Pi because they have no bios.
Then the Talos Factory is super cool because you can create your own images, and if you don’t care about hosting your own PXE boot images, there are already PXE scripts for the exact thing you selected.
But yeah, Talos supports any cloud platform, virtualized Proxmox, bare metal. With bare metal, you’ll want to tune it more to whatever suits how you run stuff.
I’ll mention this, but it’s expensive: Getting Started with Omni
Talos has something called Omni. It’s kind of a hosted platform with support and other services. As a “hobby,” you can get up to 10 nodes, but if you plan on using more than that, you’re going to need to pay, and it’s like $1,500 for that many nodes, so very expensive, but a cool platform. Basically, what you do is create a cluster, and with this, you can download installation media and it’s kind of the same as Talos Factory, you can add kernel arguments and you download it.
What’s cool about this is you just download the installation media, then you get that image, and when you create a cluster, those machines will automatically pop up on your UI, and then you can do config patches and apply them.
From the repo:
Omni is available via a Business Source License which allows free installations in non-production environments. If you would like to deploy Omni for production use, please contact Sidero sales. If you would like to subscribe to the hosted version of Omni, please see the SaaS pricing.
If you would like to self-host Omni for non-production workloads, please follow the instructions in the documentation.
Pierre | Chain-Insights.io posted: Use Rancher with K3s and RKE2.
Mack: I did ask Vince about this a few weeks back. YEE HAW
John K. posted: You really don’t need the UI for Talos. Its philosophy is to be declarative. Check your machine config yamls into Git. You’ll really only need to change them to upgrade Talos or K8s.
Vince: Yes, this is true, and that’s the power of it: it’s declarative. Just some people like a good UI.
Talos is secure by default and declarative, so you’re deploying the way it’s meant to be deployed. You’re not going back in and wondering, “did I install that package or not?” What happened on the cluster is all in your GitHub.
Closing
Vince: Payne, are you going to use Kubernetes now?
Payne: As soon as I have a little more free time, I will start playing with it. 🎉
No Comments