Files

Lily Miller 22ba493d9e Initial commit: Split Macha autonomous system into separate flake

Macha is now a standalone NixOS flake that can be imported into other
systems. This provides:

- Independent versioning
- Easier reusability
- Cleaner separation of concerns
- Better development workflow

Includes:
- Complete autonomous system code
- NixOS module with full configuration options
- Queue-based architecture with priority system
- Chunked map-reduce for large outputs
- ChromaDB knowledge base
- Tool calling system
- Multi-host SSH management
- Gotify notification integration

All capabilities from DESIGN.md are preserved.

2025-10-06 14:32:37 -06:00

6.7 KiB

Raw Permalink Blame History

Macha Autonomous System - Quick Start Guide

What is This?

Macha now has a self-maintenance system that uses local AI (via Ollama) to monitor, analyze, and maintain itself. Think of it as a 24/7 system administrator that watches over Macha.

How It Works

Monitor: Every 5 minutes, collects system health data (services, resources, logs, etc.)
Analyze: Uses llama3.1:70b to analyze the data and detect issues
Act: Based on autonomy level, either proposes fixes or executes them automatically
Learn: Logs all decisions and actions for auditing and improvement

Autonomy Levels

`observe` - Monitoring Only

Monitors system health
Logs everything
Takes NO actions
Good for: Testing, learning what the system sees

`suggest` - Approval Required (DEFAULT)

Monitors and analyzes
Proposes fixes
Requires manual approval before executing
Good for: Production use, when you want control

`auto-safe` - Limited Autonomy

Auto-executes "safe" actions:
- Restarting failed services
- Disk cleanup
- Log rotation
- Read-only diagnostics
Asks approval for risky changes
Good for: Hands-off operation with safety net

`auto-full` - Full Autonomy

Auto-executes most actions
Still requires approval for HIGH RISK actions
Never touches protected services (SSH, networking, etc.)
Good for: Experimental, when you trust the system

Commands

Check the status

# View the service status
systemctl status macha-autonomous

# View live logs
macha-logs service

# View AI decision log
macha-logs decisions

# View action execution log
macha-logs actions

# View orchestrator log
macha-logs orchestrator

Run a manual check

# Run one maintenance cycle now
macha-check

Approval workflow (when autonomyLevel = "suggest")

# List pending actions awaiting approval
macha-approve list

# Approve action number 0
macha-approve approve 0

Change autonomy level

Edit /home/lily/Documents/nixos-servers/systems/macha.nix:

services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";  # Change this
  checkInterval = 300;
  model = "llama3.1:70b";
};

Then rebuild:

sudo nixos-rebuild switch --flake .#macha

What Can It Do?

Automatically Detects

Failed systemd services
High resource usage (CPU, RAM, disk)
Recent errors in logs
Network connectivity issues
Disk space problems
Boot/uptime anomalies

Can Propose/Execute

Restart failed services
Clean up disk space (nix store, old logs)
Investigate issues (run diagnostics)
Propose configuration changes (for manual review)
NixOS rebuilds (with safety checks)

Safety Features

Protected services: Never touches SSH, networking, systemd core
Dry-run testing: Tests NixOS rebuilds before applying
Action logging: Every action is logged with context
Rollback capability: Can revert changes
Rate limiting: Won't spam actions
Human override: You can always disable or intervene

Example Workflow

System detects failed service

Monitor: "ollama.service is failed"
AI Agent: "The ollama service crashed. Propose restarting it."

In suggest mode (default)

Executor: "Action queued for approval"
You: Run `macha-approve list`
You: Review the proposed action
You: Run `macha-approve approve 0`
Executor: Restarts the service

In auto-safe mode

Executor: "Low risk action, auto-executing"
Executor: Restarts the service automatically
You: Check logs later to see what happened

Monitoring the System

All data is stored in /var/lib/macha-autonomous/:

orchestrator.log - Main system log
decisions.jsonl - AI analysis decisions (JSON Lines format)
actions.jsonl - Executed actions log
snapshot_*.json - System state snapshots
approval_queue.json - Pending actions

Tips

Start with suggest mode - Get comfortable with what it proposes
Review the logs - See what it's detecting and proposing
Graduate to auto-safe - Let it handle routine maintenance
Use observe for debugging - If something seems wrong
Check approval queue regularly - If using suggest mode

Troubleshooting

Service won't start

# Check for errors
journalctl -u macha-autonomous -n 50

# Verify Ollama is running
systemctl status ollama

# Test Ollama manually
curl http://localhost:11434/api/generate -d '{"model": "llama3.1:70b", "prompt": "test"}'

AI making bad decisions

Switch to observe mode to stop actions
Review decisions.jsonl to see reasoning
File an issue or adjust prompts in agent.py

Want to disable temporarily

sudo systemctl stop macha-autonomous

Want to disable permanently

Edit systems/macha.nix:

services.macha-autonomous.enable = false;

Then rebuild.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Orchestrator                          │
│         (Main loop, runs every 5 minutes)                │
└────────────┬──────────────┬──────────────┬──────────────┘
             │              │              │
         ┌───▼────┐    ┌────▼────┐    ┌────▼─────┐
         │Monitor │    │ Agent   │    │ Executor │
         │        │───▶│  (AI)   │───▶│  (Safe)  │
         └────────┘    └─────────┘    └──────────┘
             │              │              │
         Collects        Analyzes       Executes
         System          Issues         Actions
         Health          w/ LLM         Safely

Future Enhancements

Potential future capabilities:

Integration with MCP servers (already installed!)
Predictive maintenance (learning from patterns)
Self-optimization (tuning configs based on usage)
Cluster management (if you add more systems)
Automated backups and disaster recovery
Security monitoring and hardening
Performance tuning recommendations

Philosophy

The goal is a system that maintains itself while being:

Safe - Never breaks critical functionality
Transparent - All decisions are logged and explainable
Conservative - When in doubt, ask for approval
Learning - Gets better over time
Human-friendly - Easy to understand and override

Macha is here to help you, not replace you!

6.7 KiB Raw Permalink Blame History