Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,9 @@
- [Echanges: Coinbase](tools/exchanges-coinbase/README.md)
- [HTTP](tools/http/README.md)
- [Templating: Jinja](tools/templating-jinja/README.md)

## Looking for a home

* [nexus/adr/x3dh-removal-from-nexus.md](nexus/adr/x3dh-removal-from-nexus.md)
* [nexus/guides/tool-communication.md](nexus/guides/tool-communication.md)
* [nexus/packages/reference/nexus_workflow/network_auth.md](nexus/packages/reference/nexus_workflow/network_auth.md)
5 changes: 0 additions & 5 deletions nexus/TAP/default-tap.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,10 @@ The Default TAP is a useful helper component for Nexus agent developers. It serv
The Default TAP implements the [Nexus Interface V1][nexus-interface-v1] specification, which defines the required functionality for any Talus Agent Package to integrate with the Nexus workflow engine. Key interface requirements include:

1. **Version Management**

- Must declare and maintain interface version compatibility.
- Must support version checking for backward compatibility.

1. **Workflow Management**

- Must handle worksheet management and state tracking.
- Must support tool evaluation confirmation.

Expand Down Expand Up @@ -147,13 +145,11 @@ fun get_witness(self: &DefaultTAP): &DefaultTAPV1Witness {
The Default TAP works in conjunction with the Nexus workflow engine, which provides:

1. **DAG Implementation**

- Directed Acyclic Graph data structure for modeling complex workflows.
- Support for vertices, edges, and input/output ports.
- Entry group management for workflow initiation.

1. **Tool Registry**

- Registration and management of available tools.

1. **Tool Invocation**
Expand Down Expand Up @@ -297,7 +293,6 @@ public(package) fun new(ctx: &mut TxContext) {

/// Invokes the provided entry vertex on a DAG with the provided input data for
/// each input port.
#[allow(lint(share_owned))]
public fun begin_dag_execution(
self: &mut DefaultTAP,
dag: &DAG,
Expand Down
117 changes: 117 additions & 0 deletions nexus/adr/x3dh-removal-from-nexus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# ADR-1: Removal of X3DH+ Encryption from Nexus

**Status:** Proposed
**Date:** 2025-02-17
**Authors:** David ([@davidrotari19](https://github.com/davidrotari19))
**Deciders:** Pavel ([@kouks](https://github.com/kouks)), David ([@davidrotari19](https://github.com/davidrotari19)), Isaac, Augusto
**Consulted:** Stephen, Christos
**Informed:** Engineering Team

**Constraint Tags:** Technical | Architectural | Timeline

---

## Context

### What problem are we solving?

The current X3DH+ encryption implementation places decryption authority in the Leader. This design conflicts with our eventual goal of a permissionless leader network, where tools—not the Leader—should hold authority over secrets.

The encryption was intended to protect secrets (API keys) and sensitive data (prompts, chat completions) in Leader-Tool communication. However, the current design where Leader manages encryption context:

1. **Does not align with target architecture** - Authority should reside in tools, not Leader
1. **Requires complete redesign** - Stephen's intended design places encryption responsibility in tools, which is incompatible with current implementation
1. **Is not achievable within timeline** - Proper redesign cannot be completed before March mainnet

For the AvA Gaming use case specifically, the identified encryption needs are solvable without X3DH+:

- **API keys**: Solvable with "fat tools" that manage their own keys and charge per invocation
- **Prompts/completions**: Product decision—plaintext on-chain (via Walrus) is acceptable since all users have equal access to the same information, maintaining fair market dynamics

### Relevant Constraints

**Architectural:** Current encryption places authority in Leader; target architecture requires authority in tools. These are fundamentally incompatible approaches.

**Timeline:** March mainnet deadline does not permit the complete redesign required to implement encryption correctly.

**Product:** For AvA Gaming, on-chain visibility of game state data is acceptable—tech-savvy users having seconds of advantage is fair given equal access to information.

### Architectural Position

This decision affects the **Capability Layer** (Leader-Tool communication protocol). Rather than implement an ad-hoc solution that contradicts our architectural direction, we remove encryption entirely and solve the underlying needs through alternative mechanisms.

---

## Decision

### What are we doing?

We will remove X3DH+ encryption from Nexus entirely. The needs it was intended to address will be solved through alternative mechanisms:

- **API key protection**: Tools manage their own keys internally ("fat tools") and charge users per invocation
- **Sensitive data (prompts/completions)**: Stored in Walrus; plaintext is acceptable given equal access for all users

### Why this approach?

Implementing encryption properly requires authority to reside in tools, not Leader. The current implementation inverts this, and correcting it requires a complete redesign we cannot complete before March mainnet. Rather than ship an ad-hoc solution that contradicts our architectural direction, we remove encryption and solve the actual requirements (API key security, fair game dynamics) through simpler mechanisms that align with our target architecture.

---

## Alternatives Considered

### Alternative 1: Continue with current Leader-based encryption

**Description:** Complete David's in-progress changes to the existing encryption design.

**Why not:** The design fundamentally places authority in the wrong component (Leader instead of tools). Completing it would ship something incompatible with our eventual permissionless leader network goal.

### Alternative 2: Redesign encryption with tool-based authority

**Description:** Implement Stephen's design where tools manage their own encryption context.

**Why not:** Requires complete redesign incompatible with current implementation. Timeline to March mainnet does not permit this scope of work.

### Alternative 3: Do Nothing (keep partial implementation)

**Description:** Leave encryption code in place but incomplete.

**Why not:** Adds complexity without benefit. Encryption is a prerequisite for leader distribution, so incomplete implementation blocks other work while providing no security value.

---

## Consequences

### Positive Consequences

- Development effort redirected to agents and other critical path items
- Avoids shipping architecture that contradicts long-term direction
- Simplifies Leader implementation ahead of distribution work
- Existing encryption code may be reusable in Nexus SDK for tool developers implementing DTP

### Negative Consequences

- Prompts and chat completions visible in plaintext on Walrus (mitigated: equal access maintains fairness)
- API keys must be managed within tools rather than passed through workflows
- Time invested in X3DH+ implementation is sunk cost

### Neutral Consequences

- Tools become "fatter" with more responsibility for managing their own authority and secrets
- Tool pricing must account for resource usage (e.g., OpenAI token costs)

### Reversibility Assessment

- **Reversibility:** Moderate
- **Reversal cost:** Future encryption implementation should follow tool-based authority design, not resurrect Leader-based approach
- **Point of no return:** None—this decision explicitly defers proper encryption to post-mainnet

---

## Context Evolution Tracking

### Review Schedule

- **Next review:** Post-mainnet, when designing tool-based encryption for permissionless leader network
- **Review criteria:** New use cases requiring encrypted data that cannot be solved with fat tools; move toward permissionless leader network

---
20 changes: 12 additions & 8 deletions nexus/crates/leader.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

## Sequence diagram

High-level sequence diagram describing the duties of the Leader.
High level sequence diagram describing the duties of the Leader.

```mermaid
sequenceDiagram
participant RG as [On-chain] Registry
participant WF as [On-chain] Workflow
participant LD as [Off-chain] Leader
participant DB as [Off-chain] Indexer
participant RG as [On chain] Registry
participant WF as [On chain] Workflow
participant LD as [Off chain] Leader
participant DB as [Off chain] Indexer
participant TL as [Either] Tool

critical initialize service
Expand Down Expand Up @@ -77,6 +77,10 @@ There are multiple processes running in parallel in the leader node. These are a
- Verifies output data from Tools based on its output schemas.
- Finally, it halts until a channel is open to send a TX back to Workflow (this can happen at any point during the execution if it errors).

Tool invocation details (HTTPS/TLS, signed HTTP headers, key discovery/caching):

- [Tool communication (HTTPS + signed HTTP)](../guides/tool-communication.md)

1. **Workflow communication channel**

- N channels can run in parallel where N is the number of `Coin<SUI>` objects the Leader has available.
Expand All @@ -92,7 +96,7 @@ There are multiple processes running in parallel in the leader node. These are a

## Checkpoint clock (time sync)

A checkpoint-driven clock provides the leader with a conservative, monotonic view of on-chain time to gate time-sensitive work. It derives bounds from Sui checkpoints, caps drift by observed cadence/headroom, surfaces staleness, and refreshes via gRPC when stale.
A checkpoint-driven clock provides the leader with a conservative, monotonic view of on-chain time to gate time sensitive work. It derives bounds from Sui checkpoints, caps drift by observed cadence/headroom, surfaces staleness, and refreshes via gRPC when stale.

- [Checkpoint clock details](./leader-checkpoint-clock.md)

Expand All @@ -101,7 +105,7 @@ A checkpoint-driven clock provides the leader with a conservative, monotonic vie
Some parts of the Leader service use a custom channel implementation that handles indexing of messages sent over this channel, as well as retries and sweeps of stale messages. Notably, the event listener<>event executor and the event executor<>merchant processes communicate via this channel.

### Why queue discipline and resource gating
- Avoid head-of-line blocking when a queued item cannot run (shared-object locks, external rate limits, or missing resources).
- Avoid head of line blocking when a queued item cannot run (shared object locks, external rate limits, or missing resources).
- Let domains swap in their own ordering policy without touching channel internals.
- Prevent wasted retries by dispatching only when capacity for the payload exists.

Expand Down Expand Up @@ -142,7 +146,7 @@ flowchart LR

**Dispatcher + receiver**: a semaphore enforces the configured capacity. The dispatcher asks the queue discipline for the next ID, activates it, loads the payload, consults the resource pool, and delivers `(payload, handle, retries)` to consumers. The handle’s `ack`/`nack` remove or requeue the message (data, active, retries) and notify the queue discipline.

**Persistence surface**: Redis stores queued IDs, active IDs with timestamps, payload data, and retry counters. Metrics track delivery latency and in-flight counts.
**Persistence surface**: Redis stores queued IDs, active IDs with timestamps, payload data, and retry counters. Metrics track delivery latency and in flight counts.

{% hint style="info" %}
Note that this channel "assumes" it has a stable Redis connection. There are edge cases, where dropping events is very unlikely, but possible. One such edge case is if sending a message over this channel fails due to Redis being unavailable but the Sui event listener successfully saves the next page cursor to Redis. This can in the future be improved by handling Redis errors within the channel differently (by for example, halting).
Expand Down
18 changes: 14 additions & 4 deletions nexus/flow-controls/looping.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,10 @@ Think of `for_each` as a **map** operation, and `collect` as the **reduce back t
Suppose Tool `A` outputs an array `[1, 2, 3]`.

1. A `for_each` edge connects `A` → `B`.

- Tool `B` runs three times with inputs `1`, `2`, and `3`.
- Each run produces an incremented number, outputting `2`, `3`, and `4`.

1. A `collect` edge connects `B` → `C`.

- Tool `C` receives `[2, 3, 4]`.

This creates a parallel map-like computation:
Expand All @@ -78,7 +76,6 @@ The **`do_while`** / **`break`** edge pair enables conditional loops.
- Both `do_while` and `break` edges must originate from **two distinct output variants** of the **same vertex**.
- Both edges **must** be present for the loop to be valid.
- On each iteration:

1. The vertex produces an output.
1. If the **`do_while` edge** is taken, execution loops back, overwriting the input data with the new outputs.
1. If the **`break` edge** is taken, execution exits the loop and continues forward.
Expand All @@ -87,7 +84,6 @@ The **`do_while`** / **`break`** edge pair enables conditional loops.

1. Tool `A` adds `+1` to a number.
1. Tool `B` checks whether the number is `< 3`.

- If **true**, the `do_while` edge loops back to `A`.
- If **false**, the `break` edge continues the walk.

Expand All @@ -107,6 +103,20 @@ flowchart TD

---

## Static Edges

Static edges originate from outside a loop and provide data for vertices inside the loop. Normally, vertices inside a loop must wait for all their input ports to be filled with data for the _current iteration_. Static edges bypass this and the same data is re-used for each iteration of the loop.

{% hint style="info" %}
All entry port data is considered to be static.
{% endhint %}

{% hint style="warning" %}
Static edges **must be evaluated** before the loop begins. Otherwise the loop might not execute correctly.
{% endhint %}

---

## Loop Execution Limits

- Loops are **bounded** with an iteration cap of `0xff` (255 iterations).
Expand Down
6 changes: 6 additions & 0 deletions nexus/glossary.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Glossary

- **`Tool Registry`** - Onchain object that holds Tool definitions so that the Leader knows where and how to invoke them.
- **`Network Auth`** - Onchain trusted binding registry that maps offchain identities (Tools and Leader nodes) to Ed25519 public keys for message signing and verification (supports rotation/revocation and active key discovery).
- **`Key Binding`** - Per-identity record in Network Auth that stores registered keys, the active key id, and rotation/revocation state.
- **`Key id (kid)`** - Monotonic identifier for a key within a Key Binding, used to support key rotation.
- **`Active key`** - The currently selected key id in a Key Binding; verifiers should accept signatures from this key only.
- **`Proof of identity`** - Onchain capability-based proof that a transaction is authorized to act for an identity (used to create/modify a Key Binding).
- **`Proof of possession (PoP)`** - Signature-based proof that the registrant controls the private key corresponding to a public key being registered (prevents registering keys you don’t control).
- **`Tool`** - HTTP service or a smart contract with a predefined interface, executing a specific task. It is a Vertex in the Nexus DAG.
- **`DAG`** - Directed acyclic graph describes how outputs from Tools flow into inputs of other Tools. This is a static definition.
- **`JSON DAG`** - JSON representation of a DAG with a Nexus provided schema.
Expand Down
Loading