IEEE IC2E 2025 Best Industrial Paper Award
Meta-Orchestration for Compute Continuums

Orchestrate Anywhere. Execute Everywhere.

ColonyOS is a meta-orchestrator for compute continuums — from edge devices to supercomputers. Orchestrate distributed workloads, run AI executors, and build resilient infrastructure with Kubernetes-style reconciliation across heterogeneous systems.

ColonyOS is used and further developed in the COP-Pilot EU project.

Unified Process Management

Three powerful execution patterns through a single abstraction — from batch workloads to real-time AI interactions.

Task Execution

Function Spec
Server
Executor

Submit declarative function specifications. Executors automatically pick up and execute matching processes.

Use cases: Batch jobs, ML inference, scientific simulations, data processing

Blueprint Reconciliation

Desired State
Actual State

Define desired state via blueprints. ColonyOS continuously reconciles across platforms — Kubernetes-style control-loop at continuum scale.

Use cases: Distributed IaaS, auto-scaling, self-healing infrastructure

Interactive Channels

Client
Executor

Bidirectional streaming channels for real-time interaction. Stream LLM tokens, tool calls, or any payload between clients and executors.

Use cases: LLM inference, AI agents, real-time sensor data, tool calling

All patterns support process graphs (DAG-based workflows) and full auditability with cryptographic signatures.

A Cognitive Compute Continuum

Like an ant colony where workers coordinate without central command, ColonyOS creates a digital ecosystem where executors collaborate across computing boundaries — from IoT sensors processing seismic data underground to GPU clusters running LLM inference in the cloud.

Inspired by Kubernetes, ColonyOS uses a reconciliation control-loop that works across platforms. Define your desired state, and the system continuously works to achieve it — whether scaling executors, recovering from failures, or migrating workloads between edge and cloud. Designed to integrate seamlessly with Kubernetes, ColonyOS extends orchestration beyond the cluster to the entire compute continuum.

Task Brokering

Brokers tasks to the right executor

Cross-Platform

Reconciliation across edge, cloud, HPC

Real-Time Channels

Bidirectional streaming for LLM tokens

Edge-to-Cloud

Automatic failover and migration

Core Capabilities

Everything you need to build resilient, distributed systems across any infrastructure.

Reconciliation

Kubernetes-style control-loop that works across platforms. Define desired state, and ColonyOS continuously reconciles across edge, cloud, and HPC.

Real-Time Channels

Bidirectional streaming channels for real-time interaction across the continuum. Stream LLM tokens, tool calls, or any payload between clients and executors.

Zero-Trust Security

Every operation cryptographically signed with ECDSA. No passwords, no tokens — just verifiable identity.

Full Auditability

Complete execution history with cryptographic proof. Every process, every state change, every result — fully traceable and verifiable.

Pull-Based Architecture

Executors pull work from the server. Deploy anywhere — behind firewalls, NAT, 5G networks — no inbound ports needed.

Execution Resilience

Automatic failover across the compute continuum. Executors can be stopped anytime — work continues seamlessly.

Real-World Applications

From industrial seismic processing to AI orchestration, ColonyOS powers diverse distributed scenarios.

Seismic Processing
In Production

RockSigma uses ColonyOS as the backbone for BEMIS™ — processing seismic data from underground sensors to cloud analysis in real-time.

  • Edge data collection
  • Cloud-based ML analysis
  • Cross-platform orchestration
Distributed AI

Run LLM inference on edge GPUs, training on HPC clusters, and serving through cloud — all orchestrated as one system.

  • Multi-agent LLM systems
  • Token streaming via channels
  • Tool calling integration
HPC Integration

Submit jobs to supercomputers with modern APIs. Combine HPC power with cloud preprocessing and edge data collection.

  • Scientific simulations
  • Large-scale training
  • Slurm orchestration
Digital Sovereignty

Process sensitive data locally while coordinating globally. Data never leaves where compliance requires.

  • Local processing
  • Federated workflows
  • Full audit trails

Developed and used by

Get Started

Two steps to distributed computing with ColonyOS.

1

Start a Colony

$ git clone https://github.com/colonyos/colonies
$ cd colonies/docker
$ docker-compose up -d
2

Submit a Process

$ colonies function submit \
    --func hello \
    --args "world"