Files
agentic-govbot/ARCHITECTURE.md
Nathan Schneider bda868cb45 Implement LLM-driven governance architecture with structured memory
This commit completes the transition to a pure LLM-driven agentic
governance system with no hard-coded governance logic.

Core Architecture Changes:
- Add structured memory system (memory.py) for tracking governance processes
- Add LLM tools (tools.py) for deterministic operations (math, dates, random)
- Add audit trail system (audit.py) for human-readable decision explanations
- Add LLM-driven agent (agent_refactored.py) that interprets constitution

Documentation:
- Add ARCHITECTURE.md describing process-centric design
- Add ARCHITECTURE_EXAMPLE.md with complete workflow walkthrough
- Update README.md to reflect current LLM-driven architecture
- Simplify constitution.md to benevolent dictator model for testing

Templates:
- Add 8 governance templates (petition, consensus, do-ocracy, jury, etc.)
- Add 8 dispute resolution templates
- All templates work with generic process-based architecture

Key Design Principles:
- "Process" is central abstraction (not "proposal")
- No hard-coded process types or thresholds
- LLM interprets constitution to understand governance rules
- Tools ensure correctness for calculations
- Complete auditability with reasoning and citations

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 14:24:23 -07:00

16 KiB

Govbot Architecture

Current State: Pure LLM-driven governance with structured memory and strong auditability

Design Principles

  1. No Hard-Coded Governance Logic: Constitution defines ALL governance rules in natural language
  2. LLM as Interpreter: Agent interprets constitution and makes all governance decisions
  3. Structured Memory: Explicit memory system tracks state and enables LLM reasoning
  4. Tools for Correctness: LLM uses tools for calculations (not reasoning about math)
  5. Auditability First: Every decision logged with reasoning and constitutional citation
  6. Human-Readable: All state and decisions must be inspectable by humans

Central Concept: Process

The core abstraction in Govbot is the process - a generic container for any governance activity that unfolds over time.

Process Types (examples, not exhaustive):

  • Proposals: Seeking decisions on policy, rules, actions
  • Disputes: Conflict resolution, mediation, arbitration
  • Elections: Selecting people for roles or responsibilities
  • Discussions: Facilitated conversations without a specific decision goal
  • Do-ocracy: Tracking autonomous actions taken by members
  • Reviews: Evaluating past decisions, actions, or outcomes
  • Juries: Random selection and deliberation processes
  • Any activity defined in your constitution

Why "Process" not "Proposal"?

  • Generic - doesn't assume voting or decisions
  • Flexible - works for conversations, actions, selections, etc.
  • Temporal - captures activities that unfold over time
  • Minimalist - one concept covers all governance activities

The LLM interprets your constitution to understand what types of processes exist and how they work. No process types are hard-coded.

Architecture Overview

┌─────────────────────────────────────────────┐
│           Governance Request                │
│     (Natural Language from User)            │
└─────────────────┬───────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────┐
│         Governance Agent (LLM)              │
│  • Interprets request                       │
│  • Consults constitution (RAG)              │
│  • Queries memory for context               │
│  • Uses tools for calculations              │
│  • Makes governance decisions               │
│  • Updates memory with reasoning            │
└─────────────────┬───────────────────────────┘
                  │
        ┌─────────┼─────────┐
        │         │         │
        ▼         ▼         ▼
┌─────────┐  ┌────────┐  ┌──────────┐
│ Memory  │  │ Tools  │  │ Audit    │
│ System  │  │        │  │ Trail    │
│         │  │        │  │          │
│ Tracks  │  │ Math   │  │ Explains │
│ State   │  │ Dates  │  │ Decisions│
│ Context │  │ Random │  │ Cites    │
└─────────┘  └────────┘  └──────────┘

Core Components

1. Governance Agent (LLM-Driven)

Role: Interprets constitution and makes all governance decisions

Responsibilities:

  • Parse user requests to understand intent
  • Query constitution for relevant rules (using RAG)
  • Query memory for current state
  • Reason about what action to take
  • Use tools for deterministic operations
  • Update memory with decisions
  • Generate audit trail

Key Feature: Agent does NOT execute hard-coded logic. Instead:

  • Reads constitution to understand rules
  • Uses tools to calculate/verify
  • Decides based on interpretation

Implementation: src/govbot/agent_refactored.py

2. Structured Memory System

Role: Persistent state that LLM can query and update

What It Tracks:

  • Processes: Active governance processes (proposals, disputes, etc.)
  • Events: Timeline of all governance events
  • Decisions: Bot decisions with reasoning
  • Participants: Who's involved in what
  • Context: Historical information for precedent

Key Features:

  • Queryable: LLM can search memory by criteria
  • Structured: Not just raw text, but typed records
  • Temporal: Tracks history and changes over time
  • Human-Readable: Can be inspected and understood
  • Versioned: Changes tracked for audit

Memory Schema:

ProcessMemory:
    id: str
    type: str  # "proposal", "dispute", "election", etc.
    status: str  # "active", "completed", "cancelled"
    created_at: datetime
    created_by: str
    deadline: Optional[datetime]
    constitution_basis: List[str]  # Article/section citations
    state: Dict[str, Any]  # Flexible process-specific state
    events: List[Event]  # History of what happened
    decisions: List[Decision]  # Bot decisions about this process

Event:
    timestamp: datetime
    actor: str
    event_type: str  # "vote_cast", "proposal_submitted", etc.
    data: Dict[str, Any]
    context: str  # Human description

Decision:
    timestamp: datetime
    decision_type: str  # "threshold_met", "deadline_reached", etc.
    reasoning: str  # LLM's reasoning
    constitution_citations: List[str]
    calculation_used: Optional[str]  # If tool was used
    result: Any

Important: The type field in ProcessMemory is completely flexible - it's not an enum or predefined list. The LLM reads your constitution to understand what process types exist and uses the same terminology. If your constitution mentions "lazy consensus", "sortition", or "restorative circle", those become valid process types. The system doesn't need to know about them in advance.

Implementation: src/govbot/memory.py

3. LLM Tools (Generic Primitives)

Role: Deterministic operations the LLM can use

Why Tools?

  • LLMs are unreliable at arithmetic
  • Need deterministic, verifiable results
  • Separates "what to do" (LLM) from "how to do it" (tool)

Available Tools:

  • calculate(expression, variables) - Evaluate mathematical expressions safely
  • get_datetime() - Current time
  • datetime_add(dt, days, hours) - Date calculations
  • is_past_deadline(deadline) - Check if deadline passed
  • random_select(items, count) - Random selection for juries
  • tally(items, key) - Count votes
  • filter_items(items, criteria) - Filter data
  • percentage(num, denom) - Calculate percentages

Before (hard-coded):

def check_threshold(counts, "simple_majority")  bool
def check_threshold(counts, "3x_majority")  bool

After (generic):

def calculate(expression: str, variables: Dict)  Any
# LLM provides: "agree > disagree", {"agree": 10, "disagree": 3}

Implementation: src/govbot/tools.py

4. Audit Trail System

Role: Human-readable explanation of all decisions

What It Captures:

  • Decision: What was decided
  • Reasoning: Why (in natural language)
  • Constitutional Basis: Which articles/sections
  • Calculations: What math was done
  • Precedent: Related past decisions
  • Participants: Who was involved
  • Timeline: When things happened

Audit Output Example:

# GOVERNANCE DECISION AUDIT TRAIL

**Decision**: Proposal #23 has passed
**Timestamp**: 2026-02-15 18:00:00 UTC
**Process**: standard_proposal (ID: prop_23)

## Constitutional Basis
- Article 3, Section 3.1: "Standard Proposals address routine governance matters"
- Article 3, Section 3.1: "Passage threshold: More Agree than Disagree votes"

## Calculation
Expression: agree > disagree
Variables: {"agree": 12, "disagree": 3, "abstain": 2, "block": 0}
Result: 12 > 3 = True

## Reasoning
The proposal reached its deadline of Feb 15, 2026 at 18:00 UTC.
According to the constitution, this is a standard proposal requiring
more agree votes than disagree votes. The vote tally shows 12 agree
and 3 disagree, which satisfies the threshold. Therefore, the proposal passes.

## Related Precedent
- Proposal #18 (passed with similar threshold)
- Proposal #21 (failed with 8 agree, 10 disagree)

## Next Actions
- Announce result to community
- Log outcome in governance record
- Execute authorized actions (if any)

Implementation: src/govbot/audit.py

5. Constitutional Reasoning (RAG)

Role: Retrieval-augmented generation for querying the constitution

How It Works:

  • Constitution chunked into semantic sections
  • Vector embeddings enable similarity search
  • LLM retrieves relevant constitutional passages
  • Provides context for decision-making

Implementation: src/govbot/governance/constitution.py

Complete Workflow Example

See ARCHITECTURE_EXAMPLE.md for a detailed walkthrough of a complete process lifecycle.

High-Level Flow (Example: Proposal Process)

This example uses a proposal to illustrate the flow, but the same pattern works for any process type (disputes, elections, discussions, etc.):

1. User initiates process (in this case, submitting a proposal)
2. Agent queries constitution: "What rules apply to this process?"
   → Constitution: Standard proposals need 6 days, more agree than disagree
3. Agent queries memory: "What active processes exist?"
   → Memory: 2 active processes currently
4. Agent updates memory:
   - Create process record (type: "proposal")
   - Log "process_initiated" event
   - Calculate deadline using datetime tool
   - Store decision: "Created process based on Article 3.1"
5. Agent announces to user with reasoning
6. [Time passes, users interact with the process]
7. Agent checks deadlines (scheduled task)
8. Agent queries memory: "What processes have reached deadline?"
   → Memory: Process #23 deadline was Feb 15
9. Agent queries memory: "What interactions occurred in process #23?"
   → Memory: 12 agree, 3 disagree, 2 abstain
10. Agent queries constitution: "What's the completion criteria?"
    → Constitution: "More agree than disagree"
11. Agent uses calculate tool: "agree > disagree", {agree: 12, disagree: 3}
    → Tool: True
12. Agent updates memory:
    - Store decision with reasoning
    - Log "process_completed" event
    - Update process status to "completed"
13. Agent generates audit trail
14. Agent announces result with full explanation

Key Point: The same flow works for any process type. The constitution defines what "initiating", "interacting", and "completing" mean for each process type.

Key Architecture Differences

Aspect Traditional (Hard-Coded) Current (Agentic)
Governance Rules In Python code In constitution (natural language)
Thresholds 4 fixed types Any expression interpretable by LLM
State Storage Database records Structured memory
Decision Making if/else logic LLM reasoning with tools
Flexibility Requires code changes Constitution changes only
Auditability Code + logs Natural language reasoning
Process Types Pre-defined Any process type in constitution
Calculations Python code LLM + calculator tool

Benefits

1. Flexibility

No code changes needed for new governance models. Just update the constitution in natural language to define new process types.

Example:

## Lazy Consensus Process
Proposals pass unless blocked by 2+ members within 7 days.

## Restorative Circle Process
When harm occurs, affected parties meet with a facilitator to discuss
impact and agree on repairs. Process completes when all parties signal readiness.

## Sortition Selection Process
For jury roles, randomly select 5-7 members from those who opt in.

The LLM interprets these rules and handles each process type correctly using the generic tools. No code changes needed.

2. Auditability

Every decision includes:

  • Natural language reasoning
  • Constitutional citations
  • Calculation details
  • Related precedent

Non-programmers can understand exactly what happened and why.

3. Template Support

Works with diverse governance templates:

  • Petition (simple voting)
  • Consensus (blocks, concern resolution)
  • Do-ocracy (authority through action)
  • Jury (random selection, deliberation)
  • Circles (lazy consensus, domains)
  • All 8 dispute resolution processes

The same code handles all templates by interpreting their constitutional rules.

4. Transparency

Community members can:

  • Read audit trails in plain language
  • Verify constitutional citations
  • Inspect calculation details
  • Review precedent
  • Understand bot reasoning

5. Community Governance

Communities can amend their governance processes through the governance process itself, without requiring developer involvement.

File Structure

src/govbot/
├── memory.py              # Structured memory system
├── tools.py               # LLM tools for calculations
├── audit.py               # Audit trail generation
├── agent_refactored.py    # LLM-driven agent (current)
├── governance/
│   ├── primitives.py      # Generic platform actions
│   └── constitution.py    # RAG system
└── db/
    ├── models.py          # Database models
    └── queries.py         # Database queries

Considerations

LLM Reliability

Challenge: LLMs can make mistakes or interpret rules inconsistently

Mitigations:

  • Use tools for all math (not LLM reasoning)
  • Require constitutional citations
  • Store decisions as precedent
  • Enable community review and appeals
  • Validate critical decisions

Cost and Latency

Challenge: More LLM calls than hard-coded approach

Mitigations:

  • Cache constitutional interpretations
  • Use faster models for routine tasks
  • Batch deadline checks
  • Optimize prompts

Context Window Limits

Challenge: Memory can grow large over time

Mitigations:

  • Hierarchical memory (summary → detail)
  • Relevance filtering
  • Only include relevant precedent
  • Summarize completed processes

Testing

See implementation files for unit tests:

  • tests/test_memory.py - Memory operations
  • tests/test_tools.py - Tool correctness
  • tests/test_audit.py - Audit generation
  • tests/test_agent.py - End-to-end workflows

Success Criteria

Zero hard-coded governance logic: No if/else for proposal types, thresholds, etc.

Constitution is source of truth: All rules come from constitution text

LLM makes all decisions: Agent interprets and decides, not executes programmed routines

Memory is queryable: Can ask "what proposals are active?" and get answer

Audit trail is complete: Every decision has reasoning + citations

Human-readable: Non-programmers can understand what happened and why

Handles diverse templates: Works with consensus, do-ocracy, jury, etc.

Next Steps

Current implementation status:

  • Memory system complete
  • Tools system complete
  • Audit system complete
  • 🚧 LLM agent integration in progress
  • Production deployment pending testing

For implementation details and complete examples, see: