Initial commit: Split Macha autonomous system into separate flake

Macha is now a standalone NixOS flake that can be imported into other systems. This provides: - Independent versioning - Easier reusability - Cleaner separation of concerns - Better development workflow Includes: - Complete autonomous system code - NixOS module with full configuration options - Queue-based architecture with priority system - Chunked map-reduce for large outputs - ChromaDB knowledge base - Tool calling system - Multi-host SSH management - Gotify notification integration All capabilities from DESIGN.md are preserved.
2025-10-06 14:32:37 -06:00
commit 22ba493d9e
30 changed files with 10306 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,23 @@
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 *.egg-info/
 dist/
 build/
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 # Nix
 result
 result-*
 # Test data
 test_*.db
 *.log
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -0,0 +1,269 @@
 # Macha Autonomous System - Design Document
 > **⚠️ IMPORTANT - READ THIS FIRST**  
 > **FOR AI ASSISTANT**: This document is YOUR reference guide when modifying Macha's code.
 > - **ALWAYS consult this BEFORE refactoring** to ensure you don't remove existing capabilities
 > - **CHECK this when adding features** to avoid conflicts
 > - **UPDATE this document** when new capabilities are added
 > - **DO NOT DELETE ANYTHING FROM THIS DOCUMENT**
 > - During major refactors, you MUST verify each capability listed here is preserved
 ## Overview
 Macha is an AI-powered autonomous system administrator capable of monitoring, maintaining, and managing multiple NixOS hosts in the infrastructure.
 ## Core Capabilities
 ### 1. Local System Management
 - Monitor system health (CPU, memory, disk, services)
 - Read and analyze logs via `journalctl`
 - Check service status and restart failed services
 - Execute system commands (with safety restrictions)
 - Monitor and repair Nix store corruption
 - Hardware awareness (CPU, GPU, network, storage)
 ### 2. Multi-Host Management via SSH
 **Macha CAN and SHOULD use SSH to manage other hosts.**
 #### SSH Access
 - Runs as `macha` user (UID 2501)
 - Has `NOPASSWD` sudo access for administrative commands
 - Shares SSH keys with other hosts in the infrastructure
 - Can SSH to: `rhiannon`, `alexander`, `UCAR-Kinston`, and others in the flake
 #### SSH Usage Patterns
 1. **Direct diagnostic commands:**
   ```bash
   ssh rhiannon systemctl status ollama
   ssh alexander df -h
   ```
   - Commands automatically prefixed with `sudo` by the tools layer
   - Full command: `ssh macha@rhiannon sudo systemctl status ollama`
 2. **Status checks:**
   - Check service health on remote hosts
   - Gather system metrics
   - Review logs
   - Monitor resource usage
 3. **File operations:**
   - Use `scp` to copy files between hosts
   - Read configuration files on remote systems
 #### When to use SSH vs nh
 - **SSH**: For diagnostics, status checks, log review, quick commands
 - **nh remote deployment**: For applying NixOS configuration changes
  - `nh os switch -u --target-host=rhiannon --hostname=rhiannon`
  - Builds locally, deploys to remote host
  - Use for permanent configuration changes
 ### 3. NixOS Configuration Management
 #### Local Changes
 - Can propose changes to NixOS configuration
 - Requires human approval before applying
 - Uses `nh os switch` for local updates
 #### Remote Deployment
 - Can deploy to other hosts using `nh` with `--target-host`
 - Builds configuration locally (on Macha)
 - Pushes to remote system
 - Can take up to 1 hour for complex builds
 - **IMPORTANT**: Be patient with long-running builds, don't retry prematurely
 ### 4. Hardware Awareness
 #### Local Hardware Detection
 - CPU: `lscpu` via `nix-shell -p util-linux`
 - GPU: `lspci` via `nix-shell -p pciutils`
 - Network: `lsblk`, `ip addr`
 - Storage: `df -h`, `lsblk`
 - USB devices: `lsusb`
 #### GPU Metrics
 - AMD GPUs: Try `rocm-smi`, sysfs (`/sys/class/drm/card*/device/`)
 - NVIDIA GPUs: Try `nvidia-smi`
 - Fallback: `sensors` for temperature data
 - Queries: temperature, utilization, clock speeds, power usage
 ### 5. Ollama Queue System
 #### Architecture
 - **File-based queue**: `/var/lib/macha/queues/ollama/`
 - **Queue worker**: `ollama-queue-worker.service` (runs as `macha` user)
 - **Purpose**: Serialize all LLM requests to prevent resource contention
 #### Request Flow
 1. Any user (including regular users) → Write request to `pending/`
 2. Queue worker → Process requests serially (FIFO with priority)
 3. Queue worker → Write response to `completed/`
 4. Original requester → Read response from `completed/`
 #### Priority Levels
 - `INTERACTIVE` (0): User requests via `macha-chat`, `macha-ask`
 - `AUTONOMOUS` (1): Background maintenance checks
 - `BATCH` (2): Low-priority bulk operations
 #### Large Output Handling
 - Outputs >8KB: Split into chunks for hierarchical processing
 - Each chunk ~8KB (~2000 tokens)
 - Process chunks serially with progress feedback
 - Generate chunk summaries → meta-summary
 - Full outputs cached in `/var/lib/macha/tool_cache/`
 ### 6. Knowledge Base & Learning
 #### ChromaDB Collections
 1. **System Context**: Infrastructure topology, service relationships
 2. **Issues**: Historical problems and resolutions
 3. **Knowledge**: Operational wisdom learned from experience
 #### Automatic Learning
 - After successful operations, Macha reflects and extracts key learnings
 - Stores: topic, knowledge content, category
 - Retrieved automatically when relevant to current tasks
 - Use `macha-knowledge` CLI to view/manage
 ### 7. Notifications
 #### Gotify Integration
 - Can send notifications via `macha-notify` command
 - Tool: `send_notification(title, message, priority)`
 #### Priority Levels
 - `2` (Low/Info): Routine status updates, completed tasks
 - `5` (Medium/Attention): Important events, configuration changes
 - `8` (High/Critical): Service failures, critical errors, security issues
 #### When to Notify
 - Critical service failures
 - Successful completion of major operations
 - Configuration changes that may affect users
 - Security-related events
 - When explicitly requested by user
 ### 8. Safety & Constraints
 #### Command Restrictions
 **Allowed Commands** (see `tools.py` for full list):
 - System management: `systemctl`, `journalctl`, `nh`, `nixos-rebuild`
 - Monitoring: `free`, `df`, `uptime`, `ps`, `top`, `ip`, `ss`
 - Hardware: `lscpu`, `lspci`, `lsblk`, `lshw`, `dmidecode`
 - Remote: `ssh`, `scp`
 - Power: `reboot`, `shutdown`, `poweroff` (use cautiously!)
 - File ops: `cat`, `ls`, `grep`
 - Network: `ping`, `dig`, `nslookup`, `curl`, `wget`
 - Logging: `logger`
 **NOT Allowed**:
 - Direct package modifications (`nix-env`, `nix profile`)
 - Destructive file operations (`rm -rf`, `dd`)
 - User management outside of NixOS config
 - Direct editing of system files (use NixOS config instead)
 #### Critical Services
 **Never disable or stop:**
 - SSH (network access)
 - Networking (connectivity)
 - systemd (system management)
 - Boot-related services
 #### Approval Required
 - Reboots or system power changes
 - Major configuration changes
 - Disabling any service
 - Changes to multiple hosts
 ### 9. Nix Store Maintenance
 #### Verification & Repair
 - Command: `nix-store --verify --check-contents --repair`
 - **WARNING**: Can take 30+ minutes to several hours
 - Only use when corruption is suspected
 - Not for routine maintenance
 - Verifies all store paths, repairs corrupted files
 #### Garbage Collection
 - Automatic via system configuration
 - Can be triggered manually with approval
 - Frees disk space by removing unused derivations
 ### 10. Conversational Behavior
 #### Distinguish Requests from Acknowledgments
 - "Thanks" / "Thank you" → Acknowledgment (don't re-execute)
 - "Can you..." / "Please..." → Request (execute)
 - "What is..." / "How do..." → Question (answer)
 #### Tool Calling
 - Don't repeat tool calls unnecessarily
 - If a tool succeeds, don't run it again unless asked
 - Use cached results when available (`retrieve_cached_output`)
 #### Context Management
 - Be aware of token limits
 - Use hierarchical processing for large outputs
 - Prune conversation history intelligently
 - Cache and summarize when needed
 ## Infrastructure Topology
 ### Hosts in Flake
 - **macha**: Main autonomous system (self), GPU server
 - **rhiannon**: Production server
 - **alexander**: Production server  
 - **UCAR-Kinston**: Work laptop
 - **test-vm**: Testing environment
 ### Shared Configuration
 - All hosts share root SSH keys (for `nh` remote deployment)
 - `macha` user (UID 2501) exists on all hosts
 - Common NixOS configuration via flake
 ## Service Ecosystem
 ### Core Services on Macha
 - `ollama.service`: LLM inference engine
 - `ollama-queue-worker.service`: Request serialization
 - `macha-autonomous.service`: Autonomous monitoring loop
 - Servarr stack: Sonarr, Radarr, Prowlarr, Lidarr, Readarr, Whisparr
 - Media: Transmission, SABnzbd, Calibre
 ### State Directories
 - `/var/lib/macha/`: Main state directory (0755, macha:macha)
 - `/var/lib/macha/queues/`: Queue directories (0777 for multi-user)
 - `/var/lib/macha/tool_cache/`: Cached tool outputs (0777)
 - `/var/lib/macha/system_context.db`: ChromaDB database
 ## CLI Tools
 - `macha-chat`: Interactive chat with tool calling
 - `macha-ask`: Single-question interface
 - `macha-check`: Trigger immediate health check
 - `macha-approve`: Approve pending actions
 - `macha-logs`: View autonomous service logs
 - `macha-issues`: Query issue database
 - `macha-knowledge`: Query knowledge base
 - `macha-systems`: List managed systems
 - `macha-notify`: Send Gotify notification
 ## Philosophy & Principles
 1. **KISS (Keep It Simple, Stupid)**: Use existing NixOS options, avoid custom wrappers
 2. **Verify first**: Check source code/documentation before acting
 3. **Safety first**: Never break critical services, always require approval for risky changes
 4. **Learn continuously**: Extract and store operational knowledge
 5. **Multi-host awareness**: Macha manages the entire infrastructure, not just herself
 6. **User-friendly**: Clear communication, appropriate notifications
 7. **Patience**: Long-running operations (builds, repairs) can take an hour - don't panic
 8. **Tool reuse**: Use existing, verified tools instead of writing custom scripts
 ## Future Capabilities (Not Yet Implemented)
 - [ ] Automatic security updates across all hosts
 - [ ] Predictive failure detection
 - [ ] Resource optimization recommendations
 - [ ] Integration with other communication platforms
 - [ ] Multi-agent coordination between hosts
 - [ ] Automated testing before deployment
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@@ -0,0 +1,275 @@
 # Macha Autonomous System - Configuration Examples
 ## Basic Configurations
 ### Conservative (Recommended for Start)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";  # Require approval for all actions
  checkInterval = 300;        # Check every 5 minutes
  model = "llama3.1:70b";     # Most capable model
 };
 ```
 ### Moderate Autonomy
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";  # Auto-fix safe issues
  checkInterval = 180;          # Check every 3 minutes
  model = "llama3.1:70b";
 };
 ```
 ### High Autonomy (Experimental)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-full";  # Full autonomy
  checkInterval = 300;
  model = "llama3.1:70b";
 };
 ```
 ### Monitoring Only
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "observe";  # No actions, just watch
  checkInterval = 60;         # Check every minute
  model = "qwen3:8b-fp16";    # Lighter model is fine for observation
 };
 ```
 ## Advanced Scenarios
 ### Using a Smaller Model (Faster, Less Capable)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";
  checkInterval = 120;
  model = "qwen3:8b-fp16";  # Faster inference, less reasoning depth
  # or
  # model = "llama3.1:8b";  # Also good for simple tasks
 };
 ```
 ### High-Frequency Monitoring
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";
  checkInterval = 60;  # Check every minute
  model = "qwen3:4b-instruct-2507-fp16";  # Lightweight model
 };
 ```
 ### Remote Ollama (if running Ollama elsewhere)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";
  checkInterval = 300;
  ollamaHost = "http://192.168.1.100:11434";  # Remote Ollama instance
  model = "llama3.1:70b";
 };
 ```
 ## Manual Testing Workflow
 1. **Test with a one-shot run:**
 ```bash
 # Run once in observe mode
 macha-check
 # Review what it detected
 cat /var/lib/macha-autonomous/decisions.jsonl | tail -1 | jq .
 ```
 2. **Enable in suggest mode:**
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";
  checkInterval = 300;
  model = "llama3.1:70b";
 };
 ```
 3. **Rebuild and start:**
 ```bash
 sudo nixos-rebuild switch --flake .#macha
 sudo systemctl status macha-autonomous
 ```
 4. **Monitor for a while:**
 ```bash
 # Watch the logs
 journalctl -u macha-autonomous -f
 # Or use the helper
 macha-logs service
 ```
 5. **Review proposed actions:**
 ```bash
 macha-approve list
 ```
 6. **Graduate to auto-safe when comfortable:**
 ```nix
 services.macha-autonomous.autonomyLevel = "auto-safe";
 ```
 ## Scenario-Based Examples
 ### Media Server (Let it auto-restart services)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";  # Auto-restart failed arr apps
  checkInterval = 180;
  model = "llama3.1:70b";
 };
 ```
 ### Development Machine (Observe only, you want control)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "observe";
  checkInterval = 600;  # Check less frequently
  model = "llama3.1:8b";  # Lighter model
 };
 ```
 ### Critical Production (Suggest only, manual approval)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";
  checkInterval = 120;  # More frequent monitoring
  model = "llama3.1:70b";  # Best reasoning
 };
 ```
 ### Experimental/Learning (Full autonomy)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-full";
  checkInterval = 300;
  model = "llama3.1:70b";
 };
 ```
 ## Customizing Behavior
 ### The config file lives at:
 `/etc/macha-autonomous/config.json` (auto-generated from NixOS config)
 ### To modify the AI prompts:
 Edit the Python files in `systems/macha-configs/autonomous/`:
 - `agent.py` - AI analysis and decision prompts
 - `monitor.py` - What data to collect
 - `executor.py` - Safety rules and action execution
 - `orchestrator.py` - Main control flow
 After editing, rebuild:
 ```bash
 sudo nixos-rebuild switch --flake .#macha
 sudo systemctl restart macha-autonomous
 ```
 ## Integration with Other Services
 ### Example: Auto-restart specific services
 The system will automatically detect and propose restarting failed services.
 ### Example: Disk cleanup when space is low
 Monitor will detect low disk space, AI will propose cleanup, executor will run `nix-collect-garbage`.
 ### Example: Log analysis
 AI analyzes recent error logs and can propose fixes based on error patterns.
 ## Debugging
 ### See what the monitor sees:
 ```bash
 sudo -u macha-autonomous python3 /nix/store/.../monitor.py
 ```
 ### Test the AI agent:
 ```bash
 sudo -u macha-autonomous python3 /nix/store/.../agent.py test
 ```
 ### View all snapshots:
 ```bash
 ls -lh /var/lib/macha-autonomous/snapshot_*.json
 cat /var/lib/macha-autonomous/snapshot_$(ls -t /var/lib/macha-autonomous/snapshot_*.json | head -1) | jq .
 ```
 ### Check approval queue:
 ```bash
 cat /var/lib/macha-autonomous/approval_queue.json | jq .
 ```
 ## Performance Tuning
 ### Model Choice Impact:
 | Model | Speed | Capability | RAM Usage | Best For |
 |-------|-------|------------|-----------|----------|
 | llama3.1:70b | Slow (~30s) | Excellent | ~40GB | Complex reasoning |
 | llama3.1:8b | Fast (~3s) | Good | ~5GB | General use |
 | qwen3:8b-fp16 | Fast (~2s) | Good | ~16GB | General use |
 | qwen3:4b | Very Fast (~1s) | Moderate | ~8GB | Simple tasks |
 ### Check Interval Impact:
 - 60s: High responsiveness, more resource usage
 - 300s (default): Good balance
 - 600s: Low overhead, slower detection
 ### Memory Usage:
 - Monitor: ~50MB
 - Agent (per query): Depends on model (see above)
 - Executor: ~30MB
 - Orchestrator: ~20MB
 Total continuous overhead: ~100MB + model inference when running
 ## Security Considerations
 ### The autonomous user has sudo access to:
 - `systemctl restart/status` - Restart services
 - `journalctl` - Read logs
 - `nix-collect-garbage` - Clean up Nix store
 ### It CANNOT:
 - Modify arbitrary files
 - Access user home directories (ProtectHome=true)
 - Disable protected services (SSH, networking)
 - Make changes without logging
 ### Audit trail:
 All actions are logged in `/var/lib/macha-autonomous/actions.jsonl`
 ### To revoke access:
 Set `enable = false` and rebuild, or stop the service.
 ## Future: MCP Integration
 You already have MCP servers installed:
 - `mcp-nixos` - NixOS-specific tools
 - `gitea-mcp-server` - Git integration
 - `emcee` - General MCP orchestration
 Future versions could integrate these for:
 - Better NixOS config manipulation
 - Git-based config versioning
 - More sophisticated tooling
 Stay tuned!
--- a/LOGGING_EXAMPLE.md
+++ b/LOGGING_EXAMPLE.md
@@ -0,0 +1,217 @@
 # Enhanced Logging Example
 This shows what the improved journalctl output will look like for Macha's autonomous system.
 ## Example Output
 ### Maintenance Cycle Start
 ```
 [2025-10-01T14:30:00] === Starting maintenance cycle ===
 [2025-10-01T14:30:00] Collecting system health data...
 [2025-10-01T14:30:02] ============================================================
 [2025-10-01T14:30:02] SYSTEM HEALTH SUMMARY
 [2025-10-01T14:30:02] ============================================================
 [2025-10-01T14:30:02] Resources: CPU 25.3%, Memory 45.2%, Load 1.24
 [2025-10-01T14:30:02] Disk: 35.6% used (/ partition)
 [2025-10-01T14:30:02] Services: 1 failed
 [2025-10-01T14:30:02]   - ollama.service (failed)
 [2025-10-01T14:30:02] Network: Internet reachable
 [2025-10-01T14:30:02] Recent logs: 3 errors in last hour
 [2025-10-01T14:30:02] ============================================================
 [2025-10-01T14:30:02] KEY METRICS:
 [2025-10-01T14:30:02]   CPU Usage: 25.3%
 [2025-10-01T14:30:02]   Memory Usage: 45.2%
 [2025-10-01T14:30:02]   Load Average: 1.24
 [2025-10-01T14:30:02]   Failed Services: 1
 [2025-10-01T14:30:02]   Errors (1h): 3
 [2025-10-01T14:30:02]   Disk /: 35.6% used
 [2025-10-01T14:30:02]   Disk /home: 62.1% used
 [2025-10-01T14:30:02]   Disk /var: 28.9% used
 [2025-10-01T14:30:02]   Internet: ✅ Connected
 ```
 ### AI Analysis Section
 ```
 [2025-10-01T14:30:02] Analyzing system state with AI...
 [2025-10-01T14:30:35] ============================================================
 [2025-10-01T14:30:35] AI ANALYSIS RESULTS
 [2025-10-01T14:30:35] ============================================================
 [2025-10-01T14:30:35] Overall Status: ATTENTION_NEEDED
 [2025-10-01T14:30:35] Assessment: System has one failed service that should be restarted
 [2025-10-01T14:30:35] Detected 1 issue(s):
 [2025-10-01T14:30:35]   Issue #1:
 [2025-10-01T14:30:35]     Severity: WARNING
 [2025-10-01T14:30:35]     Category: services
 [2025-10-01T14:30:35]     Description: ollama.service has failed and needs to be restarted
 [2025-10-01T14:30:35]     ⚠️ ACTION REQUIRED
 [2025-10-01T14:30:35] Recommended Actions (1):
 [2025-10-01T14:30:35]   - Restart ollama.service to restore LLM functionality
 [2025-10-01T14:30:35] ============================================================
 ```
 ### Action Handling Section
 ```
 [2025-10-01T14:30:35] Found 1 issues requiring action
 [2025-10-01T14:30:35] ────────────────────────────────────────────────────────────
 [2025-10-01T14:30:35] Addressing issue: ollama.service has failed and needs to be restarted
 [2025-10-01T14:30:35] Requesting AI fix proposal...
 [2025-10-01T14:30:45] AI FIX PROPOSAL:
 [2025-10-01T14:30:45]   Diagnosis: ollama.service crashed or failed to start properly
 [2025-10-01T14:30:45]   Proposed Action: Restart ollama.service using systemctl
 [2025-10-01T14:30:45]   Action Type: systemd_restart
 [2025-10-01T14:30:45]   Risk Level: LOW
 [2025-10-01T14:30:45]   Commands to execute:
 [2025-10-01T14:30:45]     - systemctl restart ollama.service
 [2025-10-01T14:30:45]   Reasoning: Restarting the service is a safe, standard troubleshooting step
 [2025-10-01T14:30:45]   Rollback Plan: Service will return to failed state if restart doesn't work
 [2025-10-01T14:30:45] Executing action...
 [2025-10-01T14:30:47] EXECUTION RESULT:
 [2025-10-01T14:30:47]   Status: QUEUED_FOR_APPROVAL
 [2025-10-01T14:30:47]   Executed: No
 [2025-10-01T14:30:47]   Reason: Autonomy level requires manual approval
 ```
 ### Cycle Complete Summary
 ```
 [2025-10-01T14:30:47] No issues requiring immediate action
 [2025-10-01T14:30:47] ============================================================
 [2025-10-01T14:30:47] MAINTENANCE CYCLE COMPLETE
 [2025-10-01T14:30:47] ============================================================
 [2025-10-01T14:30:47] Status: ATTENTION_NEEDED
 [2025-10-01T14:30:47] Issues Found: 1
 [2025-10-01T14:30:47] Actions Taken: 1
 [2025-10-01T14:30:47]   - Executed: 0
 [2025-10-01T14:30:47]   - Queued for approval: 1
 [2025-10-01T14:30:47] Next check in: 300 seconds
 [2025-10-01T14:30:47] ============================================================
 ```
 ## When System is Healthy
 ```
 [2025-10-01T14:35:00] === Starting maintenance cycle ===
 [2025-10-01T14:35:00] Collecting system health data...
 [2025-10-01T14:35:02] ============================================================
 [2025-10-01T14:35:02] SYSTEM HEALTH SUMMARY
 [2025-10-01T14:35:02] ============================================================
 [2025-10-01T14:35:02] Resources: CPU 12.5%, Memory 38.1%, Load 0.65
 [2025-10-01T14:35:02] Disk: 35.6% used (/ partition)
 [2025-10-01T14:35:02] Services: All running
 [2025-10-01T14:35:02] Network: Internet reachable
 [2025-10-01T14:35:02] Recent logs: 0 errors in last hour
 [2025-10-01T14:35:02] ============================================================
 [2025-10-01T14:35:02] KEY METRICS:
 [2025-10-01T14:35:02]   CPU Usage: 12.5%
 [2025-10-01T14:35:02]   Memory Usage: 38.1%
 [2025-10-01T14:35:02]   Load Average: 0.65
 [2025-10-01T14:35:02]   Failed Services: 0
 [2025-10-01T14:35:02]   Errors (1h): 0
 [2025-10-01T14:35:02]   Disk /: 35.6% used
 [2025-10-01T14:35:02]   Internet: ✅ Connected
 [2025-10-01T14:35:02] Analyzing system state with AI...
 [2025-10-01T14:35:28] ============================================================
 [2025-10-01T14:35:28] AI ANALYSIS RESULTS
 [2025-10-01T14:35:28] ============================================================
 [2025-10-01T14:35:28] Overall Status: HEALTHY
 [2025-10-01T14:35:28] Assessment: System is operating normally with no issues detected
 [2025-10-01T14:35:28] ✅ No issues detected
 [2025-10-01T14:35:28] ============================================================
 [2025-10-01T14:35:28] No issues requiring immediate action
 [2025-10-01T14:35:28] ============================================================
 [2025-10-01T14:35:28] MAINTENANCE CYCLE COMPLETE
 [2025-10-01T14:35:28] ============================================================
 [2025-10-01T14:35:28] Status: HEALTHY
 [2025-10-01T14:35:28] Issues Found: 0
 [2025-10-01T14:35:28] Actions Taken: 0
 [2025-10-01T14:35:28] Next check in: 300 seconds
 [2025-10-01T14:35:28] ============================================================
 ```
 ## Viewing Logs
 ### Follow live logs
 ```bash
 journalctl -u macha-autonomous.service -f
 ```
 ### See only AI decisions
 ```bash
 journalctl -u macha-autonomous.service | grep "AI ANALYSIS"
 ```
 ### See only execution results
 ```bash
 journalctl -u macha-autonomous.service | grep "EXECUTION RESULT"
 ```
 ### See key metrics
 ```bash
 journalctl -u macha-autonomous.service | grep "KEY METRICS" -A 10
 ```
 ### Filter by status level
 ```bash
 # Only show intervention required
 journalctl -u macha-autonomous.service | grep "INTERVENTION_REQUIRED"
 # Only show critical issues
 journalctl -u macha-autonomous.service | grep "CRITICAL"
 # Only show action required
 journalctl -u macha-autonomous.service | grep "ACTION REQUIRED"
 ```
 ### Summary of last cycle
 ```bash
 journalctl -u macha-autonomous.service | grep "MAINTENANCE CYCLE COMPLETE" -B 5 | tail -6
 ```
 ## Benefits of Enhanced Logging
 ### 1. **Easy to Scan**
 Clear section headers with separators make it easy to find what you need
 ### 2. **Structured Data**
 Key metrics are labeled consistently for easy parsing/grepping
 ### 3. **Complete Context**
 Each cycle shows:
 - What the system saw
 - What the AI thought
 - What action was proposed
 - What actually happened
 ### 4. **AI Transparency**
 You can see:
 - The AI's reasoning for each decision
 - Risk assessment for each action
 - Rollback plans if something goes wrong
 ### 5. **Audit Trail**
 Everything is logged to journalctl for long-term storage and analysis
 ### 6. **Troubleshooting**
 If something goes wrong, you have complete context:
 - System state before the issue
 - AI's diagnosis
 - Action attempted
 - Result of action
--- a/NOTIFICATIONS.md
+++ b/NOTIFICATIONS.md
@@ -0,0 +1,224 @@
 # Gotify Notifications Setup
 Macha's autonomous system can now send notifications to Gotify on Rhiannon for critical events.
 ## What Gets Notified
 ### High Priority (🚨 Priority 8)
 - **Critical issues detected** - System problems requiring immediate attention
 - **Service failures** - When critical services fail
 - **Failed actions** - When an action execution fails
 - **Intervention required** - When system status is critical
 ### Medium Priority (📋 Priority 5)
 - **Actions queued for approval** - When medium/high-risk actions need manual review
 - **System attention needed** - When system status needs attention
 ### Low Priority (✅ Priority 2)
 - **Successful actions** - When safe actions execute successfully
 - **System healthy** - Periodic health check confirmations (if enabled)
 ## Setup Instructions
 ### Step 1: Create Gotify Application on Rhiannon
 1. Open Gotify web interface on Rhiannon:
   ```bash
   # URL: http://rhiannon:8181 (or use external access)
   ```
 2. Log in to Gotify
 3. Go to **"Apps"** tab
 4. Click **"Create Application"**
 5. Name it: `Macha Autonomous System`
 6. Copy the generated **Application Token**
 ### Step 2: Configure Macha
 Edit `/home/lily/Documents/gitrepos/nixos-servers/systems/macha.nix`:
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";
  checkInterval = 300;
  model = "llama3.1:70b";
  # Gotify notifications
  gotifyUrl = "http://rhiannon:8181";
  gotifyToken = "YOUR_TOKEN_HERE";  # Paste the token from Step 1
 };
 ```
 ### Step 3: Rebuild and Deploy
 ```bash
 cd /home/lily/Documents/gitrepos/nixos-servers
 sudo nixos-rebuild switch --flake .#macha
 ```
 ### Step 4: Test Notifications
 Send a test notification:
 ```bash
 macha-notify "Test" "Macha notifications are working!" 5
 ```
 You should see this notification appear in Gotify on Rhiannon.
 ## CLI Tools
 ### Send Test Notification
 ```bash
 macha-notify <title> <message> [priority]
 # Examples:
 macha-notify "Test" "This is a test" 5
 macha-notify "Critical" "This is urgent" 8
 macha-notify "Info" "Just FYI" 2
 ```
 Priorities:
 - `2` - Low (✅ green)
 - `5` - Medium (📋 blue)
 - `8` - High (🚨 red)
 ### Check if Notifications are Enabled
 ```bash
 # View the service environment
 systemctl show macha-autonomous.service | grep GOTIFY
 ```
 ## Notification Examples
 ### Critical Issue
 ```
 🚨 Macha: Critical Issue
 ⚠️ Critical Issue Detected
 High disk usage on /var partition (95% full)
 Details:
 Category: disk
 ```
 ### Action Queued for Approval
 ```
 📋 Macha: Action Needs Approval
 ℹ️ Action Queued for Approval
 Action: Restart failed service: ollama.service
 Risk Level: low
 Use 'macha-approve list' to review
 ```
 ### Action Executed Successfully
 ```
 ✅ Macha: Action Success
 ✅ Action Success
 Restart failed service: ollama.service
 Output:
 Service restarted successfully
 ```
 ### Action Failed
 ```
 ❌ Macha: Action Failed
 ❌ Action Failed
 Clean up disk space with nix-collect-garbage
 Output:
 Error: Insufficient permissions
 ```
 ## Security Notes
 1. **Token Storage**: The Gotify token is stored in the NixOS configuration. Consider using a secrets management solution for production.
 2. **Network Access**: Macha needs network access to Rhiannon. Ensure your firewall allows HTTP traffic between them.
 3. **Token Scope**: The Gotify token only allows sending messages, not reading or managing Gotify.
 ## Troubleshooting
 ### Notifications Not Appearing
 1. **Check Gotify is running on Rhiannon:**
   ```bash
   ssh rhiannon systemctl status gotify
   ```
 2. **Test connectivity from Macha:**
   ```bash
   curl http://rhiannon:8181/health
   ```
 3. **Verify token is set:**
   ```bash
   macha-notify "Test" "Testing" 5
   ```
 4. **Check service logs:**
   ```bash
   macha-logs service | grep -i gotify
   ```
 ### Notification Spam
 If you're getting too many notifications, you can:
 1. **Disable notifications temporarily:**
   ```nix
   services.macha-autonomous.gotifyUrl = "";  # Empty string disables
   ```
 2. **Adjust autonomy level:**
   ```nix
   services.macha-autonomous.autonomyLevel = "auto-safe";  # Fewer approval notifications
   ```
 3. **Increase check interval:**
   ```nix
   services.macha-autonomous.checkInterval = 900;  # Check every 15 minutes instead of 5
   ```
 ## Implementation Details
 ### Files Modified
 - `notifier.py` - Gotify notification client
 - `module.nix` - Added configuration options and CLI tool
 - `orchestrator.py` - Integrated notifications at decision points
 - `macha.nix` - Added Gotify configuration
 ### Notification Flow
 ```
 Issue Detected → AI Analysis → Decision Made → Notification Sent
                                    ↓
                          Queued or Executed → Notification Sent
 ```
 ### Graceful Degradation
 - If Gotify is unavailable, the system continues to operate
 - Failed notifications are logged but don't crash the service
 - Notifications have a 10-second timeout to prevent blocking
 ## Future Enhancements
 Possible improvements:
 - [ ] Rate limiting to prevent notification spam
 - [ ] Notification grouping (batch similar issues)
 - [ ] Custom notification templates
 - [ ] Priority-based notification filtering
 - [ ] Integration with other notification services (email, SMS)
 - [ ] Secrets management for tokens (agenix, sops-nix)
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -0,0 +1,229 @@
 # Macha Autonomous System - Quick Start Guide
 ## What is This?
 Macha now has a self-maintenance system that uses local AI (via Ollama) to monitor, analyze, and maintain itself. Think of it as a 24/7 system administrator that watches over Macha.
 ## How It Works
 1. **Monitor**: Every 5 minutes, collects system health data (services, resources, logs, etc.)
 2. **Analyze**: Uses llama3.1:70b to analyze the data and detect issues
 3. **Act**: Based on autonomy level, either proposes fixes or executes them automatically
 4. **Learn**: Logs all decisions and actions for auditing and improvement
 ## Autonomy Levels
 ### `observe` - Monitoring Only
 - Monitors system health
 - Logs everything
 - Takes NO actions
 - Good for: Testing, learning what the system sees
 ### `suggest` - Approval Required (DEFAULT)
 - Monitors and analyzes
 - Proposes fixes
 - Requires manual approval before executing
 - Good for: Production use, when you want control
 ### `auto-safe` - Limited Autonomy
 - Auto-executes "safe" actions:
  - Restarting failed services
  - Disk cleanup
  - Log rotation
  - Read-only diagnostics
 - Asks approval for risky changes
 - Good for: Hands-off operation with safety net
 ### `auto-full` - Full Autonomy
 - Auto-executes most actions
 - Still requires approval for HIGH RISK actions
 - Never touches protected services (SSH, networking, etc.)
 - Good for: Experimental, when you trust the system
 ## Commands
 ### Check the status
 ```bash
 # View the service status
 systemctl status macha-autonomous
 # View live logs
 macha-logs service
 # View AI decision log
 macha-logs decisions
 # View action execution log
 macha-logs actions
 # View orchestrator log
 macha-logs orchestrator
 ```
 ### Run a manual check
 ```bash
 # Run one maintenance cycle now
 macha-check
 ```
 ### Approval workflow (when autonomyLevel = "suggest")
 ```bash
 # List pending actions awaiting approval
 macha-approve list
 # Approve action number 0
 macha-approve approve 0
 ```
 ### Change autonomy level
 Edit `/home/lily/Documents/nixos-servers/systems/macha.nix`:
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "auto-safe";  # Change this
  checkInterval = 300;
  model = "llama3.1:70b";
 };
 ```
 Then rebuild:
 ```bash
 sudo nixos-rebuild switch --flake .#macha
 ```
 ## What Can It Do?
 ### Automatically Detects
 - Failed systemd services
 - High resource usage (CPU, RAM, disk)
 - Recent errors in logs
 - Network connectivity issues
 - Disk space problems
 - Boot/uptime anomalies
 ### Can Propose/Execute
 - Restart failed services
 - Clean up disk space (nix store, old logs)
 - Investigate issues (run diagnostics)
 - Propose configuration changes (for manual review)
 - NixOS rebuilds (with safety checks)
 ### Safety Features
 - **Protected services**: Never touches SSH, networking, systemd core
 - **Dry-run testing**: Tests NixOS rebuilds before applying
 - **Action logging**: Every action is logged with context
 - **Rollback capability**: Can revert changes
 - **Rate limiting**: Won't spam actions
 - **Human override**: You can always disable or intervene
 ## Example Workflow
 1. **System detects failed service**
   ```
   Monitor: "ollama.service is failed"
   AI Agent: "The ollama service crashed. Propose restarting it."
   ```
 2. **In `suggest` mode (default)**
   ```
   Executor: "Action queued for approval"
   You: Run `macha-approve list`
   You: Review the proposed action
   You: Run `macha-approve approve 0`
   Executor: Restarts the service
   ```
 3. **In `auto-safe` mode**
   ```
   Executor: "Low risk action, auto-executing"
   Executor: Restarts the service automatically
   You: Check logs later to see what happened
   ```
 ## Monitoring the System
 All data is stored in `/var/lib/macha-autonomous/`:
 - `orchestrator.log` - Main system log
 - `decisions.jsonl` - AI analysis decisions (JSON Lines format)
 - `actions.jsonl` - Executed actions log
 - `snapshot_*.json` - System state snapshots
 - `approval_queue.json` - Pending actions
 ## Tips
 1. **Start with `suggest` mode** - Get comfortable with what it proposes
 2. **Review the logs** - See what it's detecting and proposing
 3. **Graduate to `auto-safe`** - Let it handle routine maintenance
 4. **Use `observe` for debugging** - If something seems wrong
 5. **Check approval queue regularly** - If using `suggest` mode
 ## Troubleshooting
 ### Service won't start
 ```bash
 # Check for errors
 journalctl -u macha-autonomous -n 50
 # Verify Ollama is running
 systemctl status ollama
 # Test Ollama manually
 curl http://localhost:11434/api/generate -d '{"model": "llama3.1:70b", "prompt": "test"}'
 ```
 ### AI making bad decisions
 - Switch to `observe` mode to stop actions
 - Review `decisions.jsonl` to see reasoning
 - File an issue or adjust prompts in `agent.py`
 ### Want to disable temporarily
 ```bash
 sudo systemctl stop macha-autonomous
 ```
 ### Want to disable permanently
 Edit `systems/macha.nix`:
 ```nix
 services.macha-autonomous.enable = false;
 ```
 Then rebuild.
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                    Orchestrator                          │
 │         (Main loop, runs every 5 minutes)                │
 └────────────┬──────────────┬──────────────┬──────────────┘
             │              │              │
         ┌───▼────┐    ┌────▼────┐    ┌────▼─────┐
         │Monitor │    │ Agent   │    │ Executor │
         │        │───▶│  (AI)   │───▶│  (Safe)  │
         └────────┘    └─────────┘    └──────────┘
             │              │              │
         Collects        Analyzes       Executes
         System          Issues         Actions
         Health          w/ LLM         Safely
 ```
 ## Future Enhancements
 Potential future capabilities:
 - Integration with MCP servers (already installed!)
 - Predictive maintenance (learning from patterns)
 - Self-optimization (tuning configs based on usage)
 - Cluster management (if you add more systems)
 - Automated backups and disaster recovery
 - Security monitoring and hardening
 - Performance tuning recommendations
 ## Philosophy
 The goal is a system that maintains itself while being:
 1. **Safe** - Never breaks critical functionality
 2. **Transparent** - All decisions are logged and explainable
 3. **Conservative** - When in doubt, ask for approval
 4. **Learning** - Gets better over time
 5. **Human-friendly** - Easy to understand and override
 Macha is here to help you, not replace you!
--- a/README.md
+++ b/README.md
@@ -0,0 +1,93 @@
 # Macha - AI-Powered Autonomous System Administrator
 Macha is an AI-powered autonomous system administrator for NixOS that monitors system health, diagnoses issues, and can take corrective actions with appropriate approval workflows.
 ## Features
 - **Autonomous Monitoring**: Continuous health checks with configurable intervals
 - **Multi-Host Management**: SSH-based management of multiple NixOS hosts
 - **Tool Calling**: Comprehensive system administration tools via Ollama LLM
 - **Queue-Based Architecture**: Serialized LLM requests to prevent resource contention
 - **Knowledge Base**: ChromaDB-backed learning system for operational wisdom
 - **Approval Workflows**: Safety-first approach with configurable autonomy levels
 - **Notification System**: Gotify integration for alerts
 ## Quick Start
 ### As a NixOS Flake Input
 Add to your `flake.nix`:
 ```nix
 {
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    macha-autonomous.url = "git+https://git.coven.systems/lily/macha-autonomous";
  };
  outputs = { self, nixpkgs, macha-autonomous }: {
    nixosConfigurations.yourhost = nixpkgs.lib.nixosSystem {
      modules = [
        macha-autonomous.nixosModules.default
        {
          services.macha-autonomous = {
            enable = true;
            autonomyLevel = "suggest";  # observe, suggest, auto-safe, auto-full
            checkInterval = 300;
            ollamaHost = "http://localhost:11434";
            model = "gpt-oss:latest";
          };
        }
      ];
    };
  };
 }
 ```
 ## Configuration Options
 See `module.nix` for full configuration options including:
 - Autonomy levels (observe, suggest, auto-safe, auto-full)
 - Check intervals
 - Ollama host and model settings
 - Git repository monitoring
 - Service user/group configuration
 ## CLI Tools
 - `macha-chat` - Interactive chat interface
 - `macha-ask` - Single-question interface
 - `macha-check` - Trigger immediate health check
 - `macha-approve` - Approve pending actions
 - `macha-logs` - View service logs
 - `macha-issues` - Query issue database
 - `macha-knowledge` - Query knowledge base
 - `macha-systems` - List managed systems
 - `macha-notify` - Send Gotify notification
 ## Architecture
 - **Agent**: Core AI logic with tool calling
 - **Orchestrator**: Main monitoring loop
 - **Executor**: Safe action execution
 - **Queue System**: Serialized Ollama requests with priorities
 - **Context DB**: ChromaDB for system context and learning
 - **Tools**: System administration capabilities
 ## Requirements
 - NixOS with flakes enabled
 - Ollama service running
 - Python 3 with requests, psutil, chromadb
 ## Documentation
 See `DESIGN.md` for comprehensive architecture documentation.
 ## License
 [Add your license here]
 ## Author
 Lily Miller
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -0,0 +1,317 @@
 # Macha Autonomous System - Implementation Summary
 ## What We Built
 A complete self-maintaining system for Macha that uses local AI models (via Ollama) to monitor, analyze, and fix issues automatically. This is a production-ready implementation with safety mechanisms, audit trails, and multiple autonomy levels.
 ## Components Created
 ### 1. System Monitor (`monitor.py` - 310 lines)
 - Collects comprehensive system health data every cycle
 - Monitors: systemd services, resources (CPU/RAM), disk usage, logs, network, NixOS status
 - Saves snapshots for historical analysis
 - Generates human-readable summaries
 ### 2. AI Agent (`agent.py` - 238 lines)
 - Analyzes system state using llama3.1:70b (or other models)
 - Detects issues and classifies severity
 - Proposes specific, actionable fixes
 - Logs all decisions for auditing
 - Uses structured JSON responses for reliability
 ### 3. Safe Executor (`executor.py` - 371 lines)
 - Executes actions with safety checks
 - Protected services list (never touches SSH, networking, etc.)
 - Supports multiple action types:
  - `systemd_restart` - Restart failed services
  - `cleanup` - Disk/log cleanup
  - `nix_rebuild` - NixOS configuration rebuilds
  - `config_change` - Config file modifications
  - `investigation` - Diagnostic commands
 - Approval queue for manual review
 - Complete action logging
 ### 4. Orchestrator (`orchestrator.py` - 211 lines)
 - Main control loop
 - Coordinates monitor → agent → executor pipeline
 - Handles signals and graceful shutdown
 - Configuration management
 - Multiple run modes (once, continuous, daemon)
 ### 5. NixOS Module (`module.nix` - 168 lines)
 - Full systemd service integration
 - Configuration options via NixOS
 - User/group management
 - Security hardening
 - CLI tools (`macha-check`, `macha-approve`, `macha-logs`)
 - Resource limits (1GB RAM, 50% CPU)
 ### 6. Documentation
 - `README.md` - Architecture overview
 - `QUICKSTART.md` - User guide
 - `EXAMPLES.md` - Configuration examples
 - `SUMMARY.md` - This file
 **Total: ~1,400 lines of code**
 ## Architecture
 ```
 ┌──────────────────────────────────────────────────────────────┐
 │                      NixOS Module                             │
 │  - Creates systemd service                                    │
 │  - Manages user/permissions                                   │
 │  - Provides CLI tools                                         │
 └───────────────────────┬──────────────────────────────────────┘
                        │
                        ▼
 ┌──────────────────────────────────────────────────────────────┐
 │                    Orchestrator                               │
 │  - Runs main loop (every 5 minutes)                          │
 │  - Coordinates components                                     │
 │  - Handles errors and logging                                 │
 └───────┬──────────────┬──────────────┬──────────────┬─────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
   ┌─────────┐   ┌──────────┐   ┌─────────┐   ┌──────────┐
   │ Monitor │──▶│  Agent   │──▶│Executor │──▶│  Logs    │
   │         │   │  (AI)    │   │ (Safe)  │   │          │
   └─────────┘   └──────────┘   └─────────┘   └──────────┘
        │              │              │              │
        │              │              │              │
   Collects        Analyzes       Executes        Records
   System          with LLM       Actions         Everything
   Health          (Ollama)       Safely
 ```
 ## Data Flow
 1. **Collection**: Monitor gathers system health data
 2. **Analysis**: Agent sends data + prompts to Ollama
 3. **Decision**: AI returns structured analysis (JSON)
 4. **Execution**: Executor checks permissions & autonomy level
 5. **Action**: Either executes or queues for approval
 6. **Logging**: All steps logged to JSONL files
 ## Safety Mechanisms
 ### Multi-Level Protection
 1. **Autonomy Levels**: observe → suggest → auto-safe → auto-full
 2. **Protected Services**: Hardcoded list of critical services
 3. **Dry-Run Testing**: NixOS rebuilds tested before applying
 4. **Approval Queue**: Manual review workflow
 5. **Action Logging**: Complete audit trail
 6. **Resource Limits**: systemd enforced (1GB RAM, 50% CPU)
 7. **Rollback Capability**: Can revert changes
 8. **Timeout Protection**: All operations have timeouts
 ### What It Can Do Automatically (auto-safe)
 - ✅ Restart failed services (except protected ones)
 - ✅ Clean up disk space (nix-collect-garbage)
 - ✅ Rotate/clean logs
 - ✅ Run diagnostics
 - ❌ Modify configs (requires approval)
 - ❌ Rebuild NixOS (requires approval)
 - ❌ Touch protected services
 ## Files Created
 ```
 systems/macha-configs/autonomous/
 ├── __init__.py           # Python package marker
 ├── monitor.py            # System health monitoring
 ├── agent.py              # AI analysis and reasoning  
 ├── executor.py           # Safe action execution
 ├── orchestrator.py       # Main control loop
 ├── module.nix            # NixOS integration
 ├── README.md             # Architecture docs
 ├── QUICKSTART.md         # User guide
 ├── EXAMPLES.md           # Configuration examples
 └── SUMMARY.md            # This file
 ```
 ## Integration Points
 ### Modified Files
 - `systems/macha.nix` - Added autonomous module and configuration
 ### Created Systemd Service
 - `macha-autonomous.service` - Main service
 - Runs continuously, checks every 5 minutes
 - Auto-starts on boot
 - Restart on failure
 ### Created Users/Groups
 - `macha-autonomous` user (system user)
 - Limited sudo access for specific commands
 - Home: `/var/lib/macha-autonomous`
 ### Created CLI Commands
 - `macha-check` - Run manual health check
 - `macha-approve list` - Show pending actions
 - `macha-approve approve <N>` - Approve action N
 - `macha-logs [orchestrator|decisions|actions|service]` - View logs
 ### State Directory
 `/var/lib/macha-autonomous/` contains:
 - `orchestrator.log` - Main log
 - `decisions.jsonl` - AI analysis log
 - `actions.jsonl` - Executed actions log  
 - `snapshot_*.json` - System state snapshots
 - `approval_queue.json` - Pending actions
 - `suggested_patch_*.txt` - Config change suggestions
 ## Configuration
 ### Current Configuration (in systems/macha.nix)
 ```nix
 services.macha-autonomous = {
  enable = true;
  autonomyLevel = "suggest";  # Requires approval
  checkInterval = 300;        # 5 minutes
  model = "llama3.1:70b";     # Most capable model
 };
 ```
 ### To Deploy
 ```bash
 # Build and activate
 sudo nixos-rebuild switch --flake .#macha
 # Check status
 systemctl status macha-autonomous
 # View logs
 macha-logs service
 ```
 ## Usage Workflow
 ### Day 1: Observation
 ```bash
 # Just watch what it detects
 macha-logs decisions
 ```
 ### Day 2-7: Review Proposals
 ```bash
 # Check what it wants to do
 macha-approve list
 # Approve good actions
 macha-approve approve 0
 ```
 ### Week 2+: Increase Autonomy
 ```nix
 # Let it handle safe actions automatically
 services.macha-autonomous.autonomyLevel = "auto-safe";
 ```
 ### Monthly: Review Audit Logs
 ```bash
 # See what it's been doing
 cat /var/lib/macha-autonomous/actions.jsonl | jq .
 ```
 ## Performance Characteristics
 ### Resource Usage
 - **Idle**: ~100MB RAM
 - **Active (w/ llama3.1:70b)**: ~100MB + ~40GB model (shared with Ollama)
 - **CPU**: Limited to 50% by systemd
 - **Disk**: Minimal (logs rotate, snapshots limited to last 100)
 ### Timing
 - **Monitor**: ~2 seconds
 - **AI Analysis**: ~30 seconds (70B model) to ~3 seconds (8B model)
 - **Execution**: Varies by action (seconds to minutes)
 - **Full Cycle**: ~1-2 minutes typically
 ### Scalability
 - Can handle multiple issues per cycle
 - Queue system prevents action spam
 - Configurable check intervals
 - Model choice affects speed/quality tradeoff
 ## Current Status
 ✅ **READY TO USE** - All components implemented and integrated
 The system is:
 - ✅ Fully functional
 - ✅ Safety mechanisms in place
 - ✅ Well documented
 - ✅ Integrated into NixOS configuration
 - ✅ Ready for deployment
 Currently configured in **conservative mode** (`suggest`):
 - Monitors continuously
 - Analyzes with AI
 - Proposes actions
 - Waits for your approval
 ## Next Steps
 1. **Deploy and test:**
   ```bash
   sudo nixos-rebuild switch --flake .#macha
   ```
 2. **Monitor for a few days:**
   ```bash
   macha-logs service
   ```
 3. **Review what it detects:**
   ```bash
   macha-approve list
   cat /var/lib/macha-autonomous/decisions.jsonl | jq .
   ```
 4. **Gradually increase autonomy as you gain confidence**
 ## Future Enhancement Ideas
 ### Short Term
 - Web dashboard for easier monitoring
 - Email/notification system for critical issues
 - More sophisticated action types
 - Historical trend analysis
 ### Medium Term
 - Integration with MCP servers (already installed!)
 - Predictive maintenance using historical data
 - Self-tuning of check intervals based on activity
 - Multi-system orchestration (manage other NixOS hosts)
 ### Long Term
 - Learning from past decisions to improve
 - A/B testing of configuration changes
 - Distributed consensus for multi-host decisions
 - Integration with external monitoring systems
 ## Philosophy
 This implementation follows key principles:
 1. **Safety First**: Multiple layers of protection
 2. **Transparency**: Everything is logged and auditable
 3. **Conservative Default**: Start restricted, earn trust
 4. **Human in Loop**: Always allow override
 5. **Gradual Autonomy**: Progressive trust model
 6. **Local First**: No external dependencies
 7. **Declarative**: NixOS-native configuration
 ## Conclusion
 Macha now has a sophisticated autonomous maintenance system that can:
 - Monitor itself 24/7
 - Detect and analyze issues using AI
 - Fix problems automatically (with appropriate safeguards)
 - Learn and improve over time
 - Maintain complete audit trails
 All powered by local AI models, no external dependencies, fully integrated with NixOS, and designed with safety as the top priority.
 **Welcome to the future of self-maintaining systems!** 🎉
--- a/init.py
+++ b/init.py
@@ -0,0 +1 @@
 # Macha Autonomous System Maintenance
--- a/agent.py
+++ b/agent.py
--- a/chat.py
+++ b/chat.py
@@ -0,0 +1,522 @@
 #!/usr/bin/env python3
 """
 Interactive chat interface with Macha AI agent.
 Allows conversational interaction and directive execution.
 """
 import json
 import os
 import subprocess
 import sys
 from datetime import datetime
 from pathlib import Path
 from typing import List, Dict, Any
 # Add parent directory to path for imports
 sys.path.insert(0, str(Path(__file__).parent))
 from agent import MachaAgent
 class MachaChatSession:
    """Interactive chat session with Macha"""
    def __init__(self):
        self.agent = MachaAgent(use_queue=True, priority="INTERACTIVE")
        self.conversation_history: List[Dict[str, str]] = []
        self.session_start = datetime.now().isoformat()
    def _create_chat_prompt(self, user_message: str) -> str:
        """Create a prompt for the chat session"""
        # Build conversation context
        context = ""
        if self.conversation_history:
            context = "\n\nCONVERSATION HISTORY:\n"
            for entry in self.conversation_history[-10:]:  # Last 10 messages
                role = entry['role'].upper()
                msg = entry['message']
                context += f"{role}: {msg}\n"
        prompt = f"""{MachaAgent.SYSTEM_PROMPT}
 TASK: INTERACTIVE CHAT SESSION
 You are in an interactive chat session with the system administrator.
 You can have a natural conversation and execute commands when directed.
 CAPABILITIES:
 - Answer questions about system status
 - Explain configurations and issues
 - Execute commands when explicitly asked
 - Provide guidance and recommendations
 COMMAND EXECUTION:
 When the user asks you to run a command or perform an action that requires execution:
 1. Respond with a JSON object containing the command to execute
 2. Format: {{"action": "execute", "command": "the command", "explanation": "why you're running it"}}
 3. After seeing the output, continue the conversation naturally
 RESPONSE FORMAT:
 - For normal conversation: Respond naturally in plain text
 - For command execution: Respond with JSON containing action/command/explanation
 - Keep responses concise but informative
 RULES:
 - Only execute commands when explicitly asked or when it's clearly needed
 - Explain what you're about to do before executing
 - Never execute destructive commands without explicit confirmation
 - If unsure, ask for clarification
 {context}
 USER: {user_message}
 MACHA:"""
        return prompt
    def _execute_command(self, command: str) -> Dict[str, Any]:
        """Execute a shell command and return results"""
        try:
            result = subprocess.run(
                command,
                shell=True,
                capture_output=True,
                text=True,
                timeout=30
            )
            # Check if command failed due to permissions
            needs_sudo = False
            permission_errors = [
                'Interactive authentication required',
                'Permission denied',
                'Operation not permitted',
                'Must be root',
                'insufficient privileges',
                'authentication is required'
            ]
            if result.returncode != 0:
                error_text = (result.stderr + result.stdout).lower()
                for perm_error in permission_errors:
                    if perm_error.lower() in error_text:
                        needs_sudo = True
                        break
            # Retry with sudo if permission error detected
            if needs_sudo and not command.strip().startswith('sudo'):
                print(f"\n⚠️  Permission denied, retrying with sudo...")
                sudo_command = f"sudo {command}"
                result = subprocess.run(
                    sudo_command,
                    shell=True,
                    capture_output=True,
                    text=True,
                    timeout=30
                )
                return {
                    'success': result.returncode == 0,
                    'exit_code': result.returncode,
                    'stdout': result.stdout,
                    'stderr': result.stderr,
                    'command': sudo_command,
                    'retried_with_sudo': True
                }
            return {
                'success': result.returncode == 0,
                'exit_code': result.returncode,
                'stdout': result.stdout,
                'stderr': result.stderr,
                'command': command,
                'retried_with_sudo': False
            }
        except subprocess.TimeoutExpired:
            return {
                'success': False,
                'exit_code': -1,
                'stdout': '',
                'stderr': 'Command timed out after 30 seconds',
                'command': command,
                'retried_with_sudo': False
            }
        except Exception as e:
            return {
                'success': False,
                'exit_code': -1,
                'stdout': '',
                'stderr': str(e),
                'command': command,
                'retried_with_sudo': False
            }
    def _parse_response(self, response: str) -> Dict[str, Any]:
        """Parse AI response to determine if it's a command or text"""
        try:
            # Try to parse as JSON
            parsed = json.loads(response.strip())
            if isinstance(parsed, dict) and 'action' in parsed:
                return parsed
        except json.JSONDecodeError:
            pass
        # It's plain text conversation
        return {'action': 'chat', 'message': response}
    def _auto_diagnose_ollama(self) -> str:
        """Automatically diagnose Ollama issues"""
        diagnostics = []
        diagnostics.append("🔍 AUTO-DIAGNOSIS: Investigating Ollama failure...\n")
        # Check if Ollama service is running
        try:
            result = subprocess.run(
                ['systemctl', 'is-active', 'ollama.service'],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode == 0:
                diagnostics.append("✅ Ollama service is active")
            else:
                diagnostics.append(f"❌ Ollama service is NOT active: {result.stdout.strip()}")
                # Get service status
                status_result = subprocess.run(
                    ['systemctl', 'status', 'ollama.service', '--no-pager', '-l'],
                    capture_output=True,
                    text=True,
                    timeout=5
                )
                diagnostics.append(f"\nService status:\n```\n{status_result.stdout[-500:]}\n```")
        except Exception as e:
            diagnostics.append(f"⚠️  Could not check service status: {e}")
        # Check memory usage
        try:
            result = subprocess.run(['free', '-h'], capture_output=True, text=True, timeout=5)
            lines = result.stdout.split('\n')
            for line in lines[:3]:  # First 3 lines
                diagnostics.append(f"  {line}")
        except Exception as e:
            diagnostics.append(f"⚠️  Could not check memory: {e}")
        # Check which models are loaded
        try:
            import requests
            response = requests.get(f"{self.agent.ollama_host}/api/tags", timeout=5)
            if response.status_code == 200:
                models = response.json().get('models', [])
                diagnostics.append(f"\n📦 Loaded models ({len(models)}):")
                for model in models:
                    name = model.get('name', 'unknown')
                    size = model.get('size', 0) / (1024**3)
                    is_current = "← TARGET" if name == self.agent.model else ""
                    diagnostics.append(f"  • {name} ({size:.1f} GB) {is_current}")
                # Check if target model is loaded
                model_names = [m.get('name') for m in models]
                if self.agent.model not in model_names:
                    diagnostics.append(f"\n❌ TARGET MODEL NOT LOADED: {self.agent.model}")
                    diagnostics.append(f"   Available models: {', '.join(model_names)}")
            else:
                diagnostics.append(f"❌ Ollama API returned {response.status_code}")
        except Exception as e:
            diagnostics.append(f"⚠️  Could not query Ollama API: {e}")
        # Check recent Ollama logs
        try:
            result = subprocess.run(
                ['journalctl', '-u', 'ollama.service', '-n', '10', '--no-pager'],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.stdout:
                diagnostics.append(f"\n📋 Recent Ollama logs (last 10 lines):\n```\n{result.stdout}\n```")
        except Exception as e:
            diagnostics.append(f"⚠️  Could not check logs: {e}")
        return "\n".join(diagnostics)
    def process_message(self, user_message: str) -> str:
        """Process a user message and return Macha's response"""
        # Add user message to history
        self.conversation_history.append({
            'role': 'user',
            'message': user_message,
            'timestamp': datetime.now().isoformat()
        })
        # Build chat messages for tool-calling API
        messages = []
        # Query relevant knowledge based on user message
        knowledge_context = self.agent._query_relevant_knowledge(user_message, limit=3)
        # Add recent conversation history (last 15 messages to stay within context limits)
        # With tool calling, messages grow quickly, so we limit more aggressively
        recent_history = self.conversation_history[-15:]  # Last ~7 exchanges
        for entry in recent_history:
            content = entry['message']
            # Truncate very long messages (e.g., command outputs)
            if len(content) > 3000:
                content = content[:1500] + "\n... [message truncated] ...\n" + content[-1500:]
            # Add knowledge context to first user message if available
            if entry == recent_history[-1] and knowledge_context:
                content += knowledge_context
            messages.append({
                "role": entry['role'],
                "content": content
            })
        try:
            # Use tool-aware chat API
            ai_response = self.agent._query_ollama_with_tools(messages)
        except Exception as e:
            error_msg = (
                f"❌ CRITICAL: Failed to communicate with Ollama inference engine\n\n"
                f"Error Type: {type(e).__name__}\n"
                f"Error Message: {str(e)}\n\n"
            )
            # Auto-diagnose the issue
            diagnostics = self._auto_diagnose_ollama()
            return error_msg + "\n" + diagnostics
        if not ai_response:
            error_msg = (
                f"❌ Empty response from Ollama inference engine\n\n"
                f"The request succeeded but returned no data. This usually means:\n"
                f"  • The model ({self.agent.model}) is still loading\n"
                f"  • Ollama ran out of memory during generation\n"
                f"  • The prompt was too large for the context window\n\n"
            )
            # Auto-diagnose the issue
            diagnostics = self._auto_diagnose_ollama()
            return error_msg + "\n" + diagnostics
        # Check if Ollama returned an error
        try:
            error_check = json.loads(ai_response)
            if isinstance(error_check, dict) and 'error' in error_check:
                error_msg = (
                    f"❌ Ollama API Error\n\n"
                    f"Error: {error_check.get('error', 'Unknown error')}\n"
                    f"Diagnosis: {error_check.get('diagnosis', 'No details')}\n\n"
                )
                # Auto-diagnose the issue
                diagnostics = self._auto_diagnose_ollama()
                return error_msg + "\n" + diagnostics
        except json.JSONDecodeError:
            # Not JSON, it's a normal response
            pass
        # Parse response
        parsed = self._parse_response(ai_response)
        if parsed.get('action') == 'execute':
            # AI wants to execute a command
            command = parsed.get('command', '')
            explanation = parsed.get('explanation', '')
            # Show what we're about to do
            response = f"🔧 {explanation}\n\nExecuting: `{command}`\n\n"
            # Execute the command
            result = self._execute_command(command)
            # Show if we retried with sudo
            if result.get('retried_with_sudo'):
                response += f"⚠️  Permission denied, retried as: `{result['command']}`\n\n"
            if result['success']:
                response += "✅ Command succeeded:\n"
                if result['stdout']:
                    response += f"```\n{result['stdout']}\n```"
                else:
                    response += "(no output)"
            else:
                response += f"❌ Command failed (exit code {result['exit_code']}):\n"
                if result['stderr']:
                    response += f"```\n{result['stderr']}\n```"
                elif result['stdout']:
                    response += f"```\n{result['stdout']}\n```"
            # Add command execution to history
            self.conversation_history.append({
                'role': 'macha',
                'message': response,
                'timestamp': datetime.now().isoformat(),
                'command_result': result
            })
            # Now ask AI to respond to the command output
            followup_prompt = f"""The command completed. Here's what happened:
 Command: {command}
 Success: {result['success']}
 Output: {result['stdout'][:500] if result['stdout'] else '(none)'}
 Error: {result['stderr'][:500] if result['stderr'] else '(none)'}
 Please provide a brief analysis or next steps."""
            followup_response = self.agent._query_ollama(followup_prompt)
            if followup_response:
                response += f"\n\n{followup_response}"
            return response
        else:
            # Normal conversation response
            message = parsed.get('message', ai_response)
            self.conversation_history.append({
                'role': 'macha',
                'message': message,
                'timestamp': datetime.now().isoformat()
            })
            return message
    def run(self):
        """Run the interactive chat session"""
        print("=" * 70)
        print("🌐 MACHA INTERACTIVE CHAT")
        print("=" * 70)
        print("Type your message and press Enter. Commands:")
        print("  /exit or /quit - End the chat session")
        print("  /clear - Clear conversation history")
        print("  /history - Show conversation history")
        print("  /debug - Show Ollama connection status")
        print("=" * 70)
        print()
        while True:
            try:
                # Get user input
                user_input = input("\n💬 YOU: ").strip()
                if not user_input:
                    continue
                # Handle special commands
                if user_input.lower() in ['/exit', '/quit']:
                    print("\n👋 Ending chat session. Goodbye!")
                    break
                elif user_input.lower() == '/clear':
                    self.conversation_history.clear()
                    print("🧹 Conversation history cleared.")
                    continue
                elif user_input.lower() == '/history':
                    print("\n" + "=" * 70)
                    print("CONVERSATION HISTORY")
                    print("=" * 70)
                    for entry in self.conversation_history:
                        role = entry['role'].upper()
                        msg = entry['message'][:100] + "..." if len(entry['message']) > 100 else entry['message']
                        print(f"{role}: {msg}")
                    print("=" * 70)
                    continue
                elif user_input.lower() == '/debug':
                    import os
                    import subprocess
                    print("\n" + "=" * 70)
                    print("MACHA ARCHITECTURE & STATUS")
                    print("=" * 70)
                    print("\n🏗️  SYSTEM ARCHITECTURE:")
                    print(f"  Hostname: macha.coven.systems")
                    print(f"  Service: macha-autonomous.service (systemd)")
                    print(f"  Working Directory: /var/lib/macha")
                    print("\n👤 EXECUTION CONTEXT:")
                    current_user = os.getenv('USER') or os.getenv('USERNAME') or 'unknown'
                    print(f"  Current User: {current_user}")
                    print(f"  UID: {os.getuid()}")
                    # Check if user has sudo access
                    try:
                        result = subprocess.run(['sudo', '-n', 'true'], 
                                              capture_output=True, timeout=1)
                        if result.returncode == 0:
                            print(f"  Sudo Access: ✓ Yes (passwordless)")
                        else:
                            print(f"  Sudo Access: ⚠ Requires password")
                    except:
                        print(f"  Sudo Access: ❌ No")
                    print(f"  Note: Chat runs as invoking user (you), not as macha-autonomous")
                    print("\n🧠 INFERENCE ENGINE:")
                    print(f"  Backend: Ollama")
                    print(f"  Host: {self.agent.ollama_host}")
                    print(f"  Model: {self.agent.model}")
                    print(f"  Service: ollama.service (systemd)")
                    print("\n💾 DATABASE:")
                    print(f"  Backend: ChromaDB")
                    print(f"  Host: http://localhost:8000")
                    print(f"  Data: /var/lib/chromadb")
                    print(f"  Service: chromadb.service (systemd)")
                    print("\n🔍 OLLAMA STATUS:")
                    # Try to query Ollama status
                    try:
                        import requests
                        # Check if Ollama is running
                        response = requests.get(f"{self.agent.ollama_host}/api/tags", timeout=5)
                        if response.status_code == 200:
                            models = response.json().get('models', [])
                            print(f"  Status: ✓ Running")
                            print(f"  Loaded models: {len(models)}")
                            for model in models:
                                name = model.get('name', 'unknown')
                                size = model.get('size', 0) / (1024**3)  # GB
                                is_current = "← ACTIVE" if name == self.agent.model else ""
                                print(f"    • {name} ({size:.1f} GB) {is_current}")
                        else:
                            print(f"  Status: ❌ Error (HTTP {response.status_code})")
                    except Exception as e:
                        print(f"  Status: ❌ Cannot connect: {e}")
                        print(f"  Hint: Check 'systemctl status ollama.service'")
                    print("\n💡 CONVERSATION:")
                    print(f"  History: {len(self.conversation_history)} messages")
                    print(f"  Session started: {self.session_start}")
                    print("=" * 70)
                    continue
                # Process the message
                print("\n🤖 MACHA: ", end='', flush=True)
                response = self.process_message(user_input)
                print(response)
            except KeyboardInterrupt:
                print("\n\n👋 Chat interrupted. Use /exit to quit properly.")
                continue
            except EOFError:
                print("\n\n👋 Ending chat session. Goodbye!")
                break
            except Exception as e:
                print(f"\n❌ Error: {e}")
                continue
 def main():
    """Main entry point"""
    session = MachaChatSession()
    session.run()
 if __name__ == "__main__":
    main()
--- a/config_parser.py
+++ b/config_parser.py
@@ -0,0 +1,245 @@
 #!/usr/bin/env python3
 """
 Config Parser - Extract imports and content from NixOS configuration files
 """
 import re
 import subprocess
 from pathlib import Path
 from typing import List, Dict, Set, Optional
 from datetime import datetime
 class ConfigParser:
    """Parse NixOS flake and configuration files"""
    def __init__(self, repo_url: str, local_path: Path = Path("/var/lib/macha/config-repo")):
        """
        Initialize config parser
        Args:
            repo_url: Git repository URL (e.g., git+https://...)
            local_path: Where to clone/update the repository
        """
        # Strip git+ prefix if present for git commands
        self.repo_url = repo_url.replace("git+", "")
        self.local_path = local_path
        self.local_path.mkdir(parents=True, exist_ok=True)
    def ensure_repo(self) -> bool:
        """Clone or update the repository"""
        try:
            if (self.local_path / ".git").exists():
                # Update existing repo
                result = subprocess.run(
                    ["git", "-C", str(self.local_path), "pull"],
                    capture_output=True,
                    text=True,
                    timeout=30
                )
                return result.returncode == 0
            else:
                # Clone new repo
                result = subprocess.run(
                    ["git", "clone", self.repo_url, str(self.local_path)],
                    capture_output=True,
                    text=True,
                    timeout=60
                )
                return result.returncode == 0
        except Exception as e:
            print(f"Error updating repository: {e}")
            return False
    def get_systems_from_flake(self) -> List[str]:
        """Extract system names from flake.nix"""
        flake_path = self.local_path / "flake.nix"
        if not flake_path.exists():
            return []
        systems = []
        try:
            content = flake_path.read_text()
            # Match patterns like: "macha" = nixpkgs.lib.nixosSystem
            matches = re.findall(r'"([^"]+)"\s*=\s*nixpkgs\.lib\.nixosSystem', content)
            systems = matches
        except Exception as e:
            print(f"Error parsing flake.nix: {e}")
        return systems
    def extract_imports(self, nix_file: Path) -> List[str]:
        """Extract imports from a .nix file"""
        if not nix_file.exists():
            return []
        imports = []
        try:
            content = nix_file.read_text()
            # Find the imports = [ ... ]; block
            imports_match = re.search(
                r'imports\s*=\s*\[(.*?)\];',
                content,
                re.DOTALL
            )
            if imports_match:
                imports_block = imports_match.group(1)
                # Extract all paths (relative paths starting with ./ or ../)
                paths = re.findall(r'[./]+[^\s\]]+\.nix', imports_block)
                imports = paths
        except Exception as e:
            print(f"Error parsing {nix_file}: {e}")
        return imports
    def resolve_import_path(self, base_file: Path, import_path: str) -> Optional[Path]:
        """Resolve a relative import path to absolute path within repo"""
        try:
            # Get directory of the base file
            base_dir = base_file.parent
            # Resolve the relative path
            resolved = (base_dir / import_path).resolve()
            # Make sure it's within the repo
            if self.local_path in resolved.parents or resolved == self.local_path:
                return resolved
        except Exception as e:
            print(f"Error resolving import {import_path} from {base_file}: {e}")
        return None
    def get_system_config(self, system_name: str) -> Dict[str, any]:
        """
        Get configuration for a specific system
        Returns:
            Dict with:
            - main_file: Path to systems/<name>.nix
            - imports: List of imported file paths (relative to repo root)
            - all_files: Set of all .nix files used (including recursive imports)
        """
        main_file = self.local_path / "systems" / f"{system_name}.nix"
        if not main_file.exists():
            return {
                "main_file": None,
                "imports": [],
                "all_files": set()
            }
        # Track all files (avoid infinite loops)
        all_files = set()
        files_to_process = [main_file]
        processed = set()
        while files_to_process:
            current_file = files_to_process.pop(0)
            if current_file in processed:
                continue
            processed.add(current_file)
            # Get relative path from repo root
            try:
                rel_path = current_file.relative_to(self.local_path)
                all_files.add(str(rel_path))
            except ValueError:
                continue
            # Extract imports from this file
            imports = self.extract_imports(current_file)
            # Resolve and queue imported files
            for imp in imports:
                resolved = self.resolve_import_path(current_file, imp)
                if resolved and resolved not in processed:
                    files_to_process.append(resolved)
        return {
            "main_file": str(main_file.relative_to(self.local_path)),
            "imports": self.extract_imports(main_file),
            "all_files": sorted(all_files)
        }
    def read_file_content(self, relative_path: str) -> Optional[str]:
        """Read content of a file by its path relative to repo root"""
        try:
            file_path = self.local_path / relative_path
            if file_path.exists():
                return file_path.read_text()
        except Exception as e:
            print(f"Error reading {relative_path}: {e}")
        return None
    def get_all_config_files(self) -> List[Dict[str, str]]:
        """
        Get all .nix files in the repository with their content
        Returns:
            List of dicts with:
            - path: relative path from repo root
            - content: file contents
            - category: apps/systems/osconfigs/users based on path
        """
        files = []
        # Categories to scan
        categories = {
            "apps": self.local_path / "apps",
            "systems": self.local_path / "systems",
            "osconfigs": self.local_path / "osconfigs",
            "users": self.local_path / "users"
        }
        for category, path in categories.items():
            if not path.exists():
                continue
            for nix_file in path.rglob("*.nix"):
                try:
                    rel_path = nix_file.relative_to(self.local_path)
                    content = nix_file.read_text()
                    files.append({
                        "path": str(rel_path),
                        "content": content,
                        "category": category
                    })
                except Exception as e:
                    print(f"Error reading {nix_file}: {e}")
        return files
 if __name__ == "__main__":
    # Test the parser
    import sys
    repo_url = "git+https://git.coven.systems/lily/nixos-servers"
    parser = ConfigParser(repo_url)
    print("Ensuring repository is up to date...")
    if parser.ensure_repo():
        print("✓ Repository ready")
    else:
        print("✗ Failed to update repository")
        sys.exit(1)
    print("\nSystems defined in flake:")
    systems = parser.get_systems_from_flake()
    for system in systems:
        print(f"  - {system}")
    if len(sys.argv) > 1:
        system_name = sys.argv[1]
        print(f"\nConfiguration for {system_name}:")
        config = parser.get_system_config(system_name)
        print(f"  Main file: {config['main_file']}")
        print(f"  Direct imports: {len(config['imports'])}")
        print(f"  All files used: {len(config['all_files'])}")
        for f in config['all_files']:
            print(f"    - {f}")
--- a/context_db.py
+++ b/context_db.py
@@ -0,0 +1,947 @@
 #!/usr/bin/env python3
 """
 Context Database - Store and retrieve system context using ChromaDB for RAG
 """
 import json
 import os
 from typing import Dict, List, Any, Optional, Set
 from datetime import datetime
 from pathlib import Path
 # Set environment variable BEFORE importing chromadb to prevent .env file reading
 os.environ.setdefault("CHROMA_ENV_FILE", "")
 import chromadb
 from chromadb.config import Settings
 class ContextDatabase:
    """Manage system context and relationships in ChromaDB"""
    def __init__(
        self,
        host: str = "localhost",
        port: int = 8000,
        persist_directory: str = "/var/lib/chromadb"
    ):
        """Initialize ChromaDB client"""
        self.client = chromadb.HttpClient(
            host=host,
            port=port,
            settings=Settings(
                anonymized_telemetry=False,
                allow_reset=False,
                chroma_api_impl="chromadb.api.fastapi.FastAPI"
            )
        )
        # Create or get collections
        self.systems_collection = self.client.get_or_create_collection(
            name="systems",
            metadata={"description": "System definitions and metadata"}
        )
        self.relationships_collection = self.client.get_or_create_collection(
            name="relationships",
            metadata={"description": "System relationships and dependencies"}
        )
        self.issues_collection = self.client.get_or_create_collection(
            name="issues",
            metadata={"description": "Issue tracking and resolution history"}
        )
        self.decisions_collection = self.client.get_or_create_collection(
            name="decisions",
            metadata={"description": "AI decisions and outcomes"}
        )
        self.config_files_collection = self.client.get_or_create_collection(
            name="config_files",
            metadata={"description": "NixOS configuration files for RAG"}
        )
        self.knowledge_collection = self.client.get_or_create_collection(
            name="knowledge",
            metadata={"description": "Operational knowledge: commands, patterns, best practices"}
        )
    # ============ System Registry ============
    def register_system(
        self,
        hostname: str,
        system_type: str,
        services: List[str],
        capabilities: List[str] = None,
        metadata: Dict[str, Any] = None,
        config_repo: str = None,
        config_branch: str = None,
        os_type: str = "nixos"
    ):
        """Register a system in the database
        Args:
            hostname: FQDN of the system
            system_type: Role (e.g., 'workstation', 'server')
            services: List of running services
            capabilities: System capabilities
            metadata: Additional metadata
            config_repo: Git repository URL
            config_branch: Git branch name
            os_type: Operating system (e.g., 'nixos', 'ubuntu', 'debian', 'arch', 'windows', 'macos')
        """
        doc_parts = [
            f"System: {hostname}",
            f"Type: {system_type}",
            f"OS: {os_type}",
            f"Services: {', '.join(services)}",
            f"Capabilities: {', '.join(capabilities or [])}"
        ]
        if config_repo:
            doc_parts.append(f"Configuration Repository: {config_repo}")
        if config_branch:
            doc_parts.append(f"Configuration Branch: {config_branch}")
        doc = "\n".join(doc_parts)
        metadata_dict = {
            "hostname": hostname,
            "type": system_type,
            "os_type": os_type,
            "services": json.dumps(services),
            "capabilities": json.dumps(capabilities or []),
            "metadata": json.dumps(metadata or {}),
            "config_repo": config_repo or "",
            "config_branch": config_branch or "",
            "updated_at": datetime.now().isoformat()
        }
        self.systems_collection.upsert(
            ids=[hostname],
            documents=[doc],
            metadatas=[metadata_dict]
        )
    def get_system(self, hostname: str) -> Optional[Dict[str, Any]]:
        """Get system information"""
        try:
            result = self.systems_collection.get(
                ids=[hostname],
                include=["metadatas", "documents"]
            )
            if result['ids']:
                metadata = result['metadatas'][0]
                return {
                    "hostname": metadata["hostname"],
                    "type": metadata["type"],
                    "services": json.loads(metadata["services"]),
                    "capabilities": json.loads(metadata["capabilities"]),
                    "metadata": json.loads(metadata["metadata"]),
                    "document": result['documents'][0]
                }
        except:
            pass
        return None
    def get_all_systems(self) -> List[Dict[str, Any]]:
        """Get all registered systems"""
        result = self.systems_collection.get(include=["metadatas"])
        systems = []
        for metadata in result['metadatas']:
            systems.append({
                "hostname": metadata["hostname"],
                "type": metadata["type"],
                "os_type": metadata.get("os_type", "unknown"),
                "services": json.loads(metadata["services"]),
                "capabilities": json.loads(metadata["capabilities"]),
                "config_repo": metadata.get("config_repo", ""),
                "config_branch": metadata.get("config_branch", "")
            })
        return systems
    def is_system_known(self, hostname: str) -> bool:
        """Check if a system is already registered"""
        try:
            result = self.systems_collection.get(ids=[hostname])
            return len(result['ids']) > 0
        except:
            return False
    def get_known_hostnames(self) -> Set[str]:
        """Get set of all known system hostnames"""
        result = self.systems_collection.get(include=["metadatas"])
        return set(metadata["hostname"] for metadata in result['metadatas'])
    # ============ Relationships ============
    def add_relationship(
        self,
        source: str,
        target: str,
        relationship_type: str,
        description: str = ""
    ):
        """Add a relationship between systems"""
        rel_id = f"{source}→{target}:{relationship_type}"
        doc = f"{source} {relationship_type} {target}. {description}"
        self.relationships_collection.upsert(
            ids=[rel_id],
            documents=[doc],
            metadatas=[{
                "source": source,
                "target": target,
                "type": relationship_type,
                "description": description,
                "created_at": datetime.now().isoformat()
            }]
        )
    def get_dependencies(self, hostname: str) -> List[Dict[str, Any]]:
        """Get what a system depends on"""
        result = self.relationships_collection.get(
            where={"source": hostname},
            include=["metadatas"]
        )
        return [
            {
                "target": m["target"],
                "type": m["type"],
                "description": m.get("description", "")
            }
            for m in result['metadatas']
        ]
    def get_dependents(self, hostname: str) -> List[Dict[str, Any]]:
        """Get what depends on a system"""
        result = self.relationships_collection.get(
            where={"target": hostname},
            include=["metadatas"]
        )
        return [
            {
                "source": m["source"],
                "type": m["type"],
                "description": m.get("description", "")
            }
            for m in result['metadatas']
        ]
    # ============ Issue History ============
    def store_issue(
        self,
        system: str,
        issue_description: str,
        resolution: str = "",
        severity: str = "unknown",
        metadata: Dict[str, Any] = None
    ) -> str:
        """Store an issue and its resolution"""
        issue_id = f"{system}_{datetime.now().timestamp()}"
        doc = f"""
 System: {system}
 Issue: {issue_description}
 Resolution: {resolution}
 Severity: {severity}
 """
        self.issues_collection.add(
            ids=[issue_id],
            documents=[doc],
            metadatas=[{
                "system": system,
                "severity": severity,
                "resolved": bool(resolution),
                "timestamp": datetime.now().isoformat(),
                "metadata": json.dumps(metadata or {})
            }]
        )
        return issue_id
    def store_investigation(
        self,
        system: str,
        issue_description: str,
        commands: List[str],
        output: str,
        timestamp: str = None
    ) -> str:
        """Store investigation results for an issue"""
        if timestamp is None:
            timestamp = datetime.now().isoformat()
        investigation_id = f"investigation_{system}_{datetime.now().timestamp()}"
        doc = f"""
 System: {system}
 Issue: {issue_description}
 Commands executed: {', '.join(commands)}
 Output:
 {output[:2000]}  # Limit output to prevent token overflow
 """
        self.issues_collection.add(
            ids=[investigation_id],
            documents=[doc],
            metadatas=[{
                "system": system,
                "issue": issue_description,
                "type": "investigation",
                "commands": json.dumps(commands),
                "timestamp": timestamp,
                "metadata": json.dumps({"output_length": len(output)})
            }]
        )
        return investigation_id
    def get_recent_investigations(
        self,
        issue_description: str,
        system: str,
        hours: int = 24
    ) -> List[Dict[str, Any]]:
        """Get recent investigations for a similar issue"""
        # Query for similar issues
        try:
            result = self.issues_collection.query(
                query_texts=[f"System: {system}\nIssue: {issue_description}"],
                n_results=10,
                where={"type": "investigation"},
                include=["documents", "metadatas", "distances"]
            )
            investigations = []
            if result['ids'] and result['ids'][0]:
                cutoff_time = datetime.now().timestamp() - (hours * 3600)
                for i, doc_id in enumerate(result['ids'][0]):
                    meta = result['metadatas'][0][i]
                    timestamp = datetime.fromisoformat(meta['timestamp'])
                    # Only include recent investigations
                    if timestamp.timestamp() > cutoff_time:
                        investigations.append({
                            "id": doc_id,
                            "system": meta['system'],
                            "issue": meta['issue'],
                            "commands": json.loads(meta['commands']),
                            "output": result['documents'][0][i],
                            "timestamp": meta['timestamp'],
                            "relevance": 1 - result['distances'][0][i]
                        })
            return investigations
        except Exception as e:
            print(f"Error querying investigations: {e}")
            return []
    def find_similar_issues(
        self,
        issue_description: str,
        system: Optional[str] = None,
        n_results: int = 5
    ) -> List[Dict[str, Any]]:
        """Find similar past issues using semantic search"""
        where = {"system": system} if system else None
        results = self.issues_collection.query(
            query_texts=[issue_description],
            n_results=n_results,
            where=where,
            include=["documents", "metadatas", "distances"]
        )
        similar = []
        for i, doc in enumerate(results['documents'][0]):
            similar.append({
                "issue": doc,
                "metadata": results['metadatas'][0][i],
                "similarity": 1 - results['distances'][0][i]  # Convert distance to similarity
            })
        return similar
    # ============ AI Decisions ============
    def store_decision(
        self,
        system: str,
        analysis: Dict[str, Any],
        action: Dict[str, Any],
        outcome: Dict[str, Any] = None
    ):
        """Store an AI decision for learning"""
        decision_id = f"decision_{datetime.now().timestamp()}"
        doc = f"""
 System: {system}
 Status: {analysis.get('status', 'unknown')}
 Assessment: {analysis.get('overall_assessment', '')}
 Action: {action.get('proposed_action', '')}
 Risk: {action.get('risk_level', 'unknown')}
 Outcome: {outcome.get('status', 'pending') if outcome else 'pending'}
 """
        self.decisions_collection.add(
            ids=[decision_id],
            documents=[doc],
            metadatas=[{
                "system": system,
                "timestamp": datetime.now().isoformat(),
                "analysis": json.dumps(analysis),
                "action": json.dumps(action),
                "outcome": json.dumps(outcome or {})
            }]
        )
    def get_recent_decisions(
        self,
        system: Optional[str] = None,
        n_results: int = 10
    ) -> List[Dict[str, Any]]:
        """Get recent decisions, optionally filtered by system"""
        where = {"system": system} if system else None
        results = self.decisions_collection.query(
            query_texts=["recent decisions"],
            n_results=n_results,
            where=where,
            include=["documents", "metadatas"]
        )
        decisions = []
        for i, doc in enumerate(results['documents'][0]):
            meta = results['metadatas'][0][i]
            decisions.append({
                "system": meta["system"],
                "timestamp": meta["timestamp"],
                "analysis": json.loads(meta["analysis"]),
                "action": json.loads(meta["action"]),
                "outcome": json.loads(meta["outcome"])
            })
        return decisions
    # ============ Context Generation for AI ============
    def get_system_context(self, hostname: str, git_context=None) -> str:
        """Generate rich context about a system for AI prompts"""
        context_parts = []
        # System info
        system = self.get_system(hostname)
        if system:
            context_parts.append(f"System: {hostname} ({system['type']})")
            context_parts.append(f"Services: {', '.join(system['services'])}")
            if system['capabilities']:
                context_parts.append(f"Capabilities: {', '.join(system['capabilities'])}")
        # Git repository info
        if system and system.get('metadata'):
            metadata = json.loads(system['metadata']) if isinstance(system['metadata'], str) else system['metadata']
            config_repo = metadata.get('config_repo', '')
            if config_repo:
                context_parts.append(f"\nConfiguration Repository: {config_repo}")
        # Recent git changes for this system
        if git_context:
            try:
                # Extract system name from FQDN
                system_name = hostname.split('.')[0]
                git_summary = git_context.get_system_context_summary(system_name)
                if git_summary:
                    context_parts.append(f"\n{git_summary}")
            except:
                pass
        # Dependencies
        deps = self.get_dependencies(hostname)
        if deps:
            context_parts.append("\nDependencies:")
            for dep in deps:
                context_parts.append(f"  - Depends on {dep['target']} for {dep['type']}")
        # Dependents
        dependents = self.get_dependents(hostname)
        if dependents:
            context_parts.append("\nUsed by:")
            for dependent in dependents:
                context_parts.append(f"  - {dependent['source']} uses this for {dependent['type']}")
        return "\n".join(context_parts)
    def get_issue_context(self, issue_description: str, system: str) -> str:
        """Get context about similar past issues"""
        similar = self.find_similar_issues(issue_description, system, n_results=3)
        if not similar:
            return ""
        context_parts = ["Similar past issues:"]
        for i, issue in enumerate(similar, 1):
            if issue['similarity'] > 0.7:  # Only include if fairly similar
                context_parts.append(f"\n{i}. {issue['issue']}")
                context_parts.append(f"   Similarity: {issue['similarity']:.2%}")
        return "\n".join(context_parts) if len(context_parts) > 1 else ""
    # ============ Config Files (for RAG) ============
    def store_config_file(
        self,
        file_path: str,
        content: str,
        category: str = "unknown",
        systems_using: List[str] = None
    ):
        """
        Store a configuration file for RAG retrieval
        Args:
            file_path: Path relative to repo root (e.g., "apps/gotify.nix")
            content: Full file contents
            category: apps/systems/osconfigs/users
            systems_using: List of system hostnames that import this file
        """
        self.config_files_collection.upsert(
            ids=[file_path],
            documents=[content],
            metadatas=[{
                "path": file_path,
                "category": category,
                "systems": json.dumps(systems_using or []),
                "updated_at": datetime.now().isoformat()
            }]
        )
    def get_config_file(self, file_path: str) -> Optional[Dict[str, Any]]:
        """Get a specific config file by path"""
        try:
            result = self.config_files_collection.get(
                ids=[file_path],
                include=["documents", "metadatas"]
            )
            if result['ids']:
                return {
                    "path": file_path,
                    "content": result['documents'][0],
                    "metadata": result['metadatas'][0]
                }
        except:
            pass
        return None
    def query_config_files(
        self,
        query: str,
        system: str = None,
        category: str = None,
        n_results: int = 5
    ) -> List[Dict[str, Any]]:
        """
        Query config files using semantic search
        Args:
            query: Natural language query (e.g., "gotify configuration")
            system: Optional filter by system hostname
            category: Optional filter by category (apps/systems/etc)
            n_results: Number of results to return
        Returns:
            List of dicts with path, content, and metadata
        """
        where = {}
        if category:
            where["category"] = category
        try:
            result = self.config_files_collection.query(
                query_texts=[query],
                n_results=n_results,
                where=where if where else None,
                include=["documents", "metadatas", "distances"]
            )
            configs = []
            if result['ids'] and result['ids'][0]:
                for i, doc_id in enumerate(result['ids'][0]):
                    config = {
                        "path": doc_id,
                        "content": result['documents'][0][i],
                        "metadata": result['metadatas'][0][i],
                        "relevance": 1 - result['distances'][0][i]  # Convert distance to relevance
                    }
                    # Filter by system if specified
                    if system:
                        systems = json.loads(config['metadata'].get('systems', '[]'))
                        if system not in systems:
                            continue
                    configs.append(config)
            return configs
        except Exception as e:
            print(f"Error querying config files: {e}")
            return []
    def get_system_config_files(self, system: str) -> List[str]:
        """Get all config file paths used by a system"""
        # This is stored in the system's metadata now
        system_info = self.get_system(system)
        if system_info and 'config_files' in system_info.get('metadata', {}):
            # metadata is already a dict, config_files is already a list
            return system_info['metadata']['config_files']
        return []
    def update_system_config_files(self, system: str, config_files: List[str]):
        """Update the list of config files used by a system"""
        system_info = self.get_system(system)
        if system_info:
            # metadata is already a dict from get_system(), no need to json.loads()
            metadata = system_info.get('metadata', {})
            metadata['config_files'] = config_files
            metadata['config_updated_at'] = datetime.now().isoformat()
            # Re-register with updated metadata
            self.register_system(
                hostname=system,
                system_type=system_info['type'],
                services=system_info['services'],
                capabilities=system_info.get('capabilities', []),
                metadata=metadata,
                config_repo=system_info.get('config_repo'),
                config_branch=system_info.get('config_branch')
            )
    # =========================================================================
    # ISSUE TRACKING
    # =========================================================================
    def store_issue(self, issue: Dict[str, Any]):
        """Store a new issue in the database"""
        issue_id = issue['issue_id']
        # Store in ChromaDB with the issue as document
        self.issues_collection.add(
            documents=[json.dumps(issue)],
            metadatas=[{
                'issue_id': issue_id,
                'hostname': issue['hostname'],
                'title': issue['title'],
                'status': issue['status'],
                'severity': issue['severity'],
                'created_at': issue['created_at'],
                'source': issue['source']
            }],
            ids=[issue_id]
        )
    def get_issue(self, issue_id: str) -> Optional[Dict[str, Any]]:
        """Retrieve an issue by ID"""
        try:
            results = self.issues_collection.get(ids=[issue_id])
            if results['documents']:
                return json.loads(results['documents'][0])
            return None
        except Exception as e:
            print(f"Error retrieving issue {issue_id}: {e}")
            return None
    def update_issue(self, issue: Dict[str, Any]):
        """Update an existing issue"""
        issue_id = issue['issue_id']
        # Delete old version
        try:
            self.issues_collection.delete(ids=[issue_id])
        except:
            pass
        # Store updated version
        self.store_issue(issue)
    def delete_issue(self, issue_id: str):
        """Remove an issue from the database (used when archiving)"""
        try:
            self.issues_collection.delete(ids=[issue_id])
        except Exception as e:
            print(f"Error deleting issue {issue_id}: {e}")
    def list_issues(
        self,
        hostname: Optional[str] = None,
        status: Optional[str] = None,
        severity: Optional[str] = None
    ) -> List[Dict[str, Any]]:
        """List issues with optional filters"""
        try:
            # Build query filter
            where_filter = {}
            if hostname:
                where_filter['hostname'] = hostname
            if status:
                where_filter['status'] = status
            if severity:
                where_filter['severity'] = severity
            if where_filter:
                results = self.issues_collection.get(where=where_filter)
            else:
                results = self.issues_collection.get()
            issues = []
            for doc in results['documents']:
                issues.append(json.loads(doc))
            # Sort by created_at descending
            issues.sort(key=lambda x: x.get('created_at', ''), reverse=True)
            return issues
        except Exception as e:
            print(f"Error listing issues: {e}")
            return []
    # ============ Knowledge Base ============
    def store_knowledge(
        self,
        topic: str,
        knowledge: str,
        category: str = "general",
        source: str = "experience",
        confidence: str = "medium",
        tags: list = None
    ) -> str:
        """
        Store a piece of operational knowledge
        Args:
            topic: Main subject (e.g., "nh os switch", "systemd-journal-remote")
            knowledge: The actual knowledge/insight/pattern
            category: Type of knowledge (command, pattern, troubleshooting, performance, etc.)
            source: Where this came from (experience, documentation, user-provided)
            confidence: How confident we are (low, medium, high)
            tags: Optional tags for categorization
        Returns:
            Knowledge ID
        """
        import uuid
        from datetime import datetime
        knowledge_id = str(uuid.uuid4())
        knowledge_doc = {
            "id": knowledge_id,
            "topic": topic,
            "knowledge": knowledge,
            "category": category,
            "source": source,
            "confidence": confidence,
            "tags": tags or [],
            "created_at": datetime.utcnow().isoformat(),
            "last_verified": datetime.utcnow().isoformat(),
            "times_referenced": 0
        }
        try:
            self.knowledge_collection.add(
                ids=[knowledge_id],
                documents=[knowledge],
                metadatas=[{
                    "topic": topic,
                    "category": category,
                    "source": source,
                    "confidence": confidence,
                    "tags": json.dumps(tags or []),
                    "created_at": knowledge_doc["created_at"],
                    "full_doc": json.dumps(knowledge_doc)
                }]
            )
            return knowledge_id
        except Exception as e:
            print(f"Error storing knowledge: {e}")
            return None
    def query_knowledge(
        self,
        query: str,
        category: str = None,
        limit: int = 5
    ) -> list:
        """
        Query the knowledge base for relevant information
        Args:
            query: What to search for
            category: Optional category filter
            limit: Maximum results to return
        Returns:
            List of relevant knowledge entries
        """
        try:
            where_filter = {}
            if category:
                where_filter["category"] = category
            results = self.knowledge_collection.query(
                query_texts=[query],
                n_results=limit,
                where=where_filter if where_filter else None
            )
            knowledge_items = []
            if results and results['documents']:
                for i, doc in enumerate(results['documents'][0]):
                    metadata = results['metadatas'][0][i]
                    full_doc = json.loads(metadata.get('full_doc', '{}'))
                    # Increment reference count
                    full_doc['times_referenced'] = full_doc.get('times_referenced', 0) + 1
                    knowledge_items.append(full_doc)
            return knowledge_items
        except Exception as e:
            print(f"Error querying knowledge: {e}")
            return []
    def get_knowledge_by_topic(self, topic: str) -> list:
        """Get all knowledge entries for a specific topic"""
        try:
            results = self.knowledge_collection.get(
                where={"topic": topic}
            )
            knowledge_items = []
            for metadata in results['metadatas']:
                full_doc = json.loads(metadata.get('full_doc', '{}'))
                knowledge_items.append(full_doc)
            return knowledge_items
        except Exception as e:
            print(f"Error getting knowledge by topic: {e}")
            return []
    def update_knowledge(
        self,
        knowledge_id: str,
        knowledge: str = None,
        confidence: str = None,
        verify: bool = False
    ):
        """
        Update an existing knowledge entry
        Args:
            knowledge_id: ID of knowledge to update
            knowledge: New knowledge text (optional)
            confidence: New confidence level (optional)
            verify: Mark as verified (updates last_verified timestamp)
        """
        from datetime import datetime
        try:
            # Get existing entry
            result = self.knowledge_collection.get(ids=[knowledge_id])
            if not result['documents']:
                return False
            metadata = result['metadatas'][0]
            full_doc = json.loads(metadata.get('full_doc', '{}'))
            # Update fields
            if knowledge:
                full_doc['knowledge'] = knowledge
            if confidence:
                full_doc['confidence'] = confidence
            if verify:
                full_doc['last_verified'] = datetime.utcnow().isoformat()
            # Update in collection
            self.knowledge_collection.update(
                ids=[knowledge_id],
                documents=[full_doc['knowledge']],
                metadatas=[{
                    "topic": full_doc['topic'],
                    "category": full_doc['category'],
                    "source": full_doc['source'],
                    "confidence": full_doc['confidence'],
                    "tags": json.dumps(full_doc['tags']),
                    "created_at": full_doc['created_at'],
                    "full_doc": json.dumps(full_doc)
                }]
            )
            return True
        except Exception as e:
            print(f"Error updating knowledge: {e}")
            return False
    def list_knowledge_topics(self, category: str = None) -> list:
        """List all unique topics in the knowledge base"""
        try:
            where_filter = {"category": category} if category else None
            results = self.knowledge_collection.get(where=where_filter)
            topics = set()
            for metadata in results['metadatas']:
                topics.add(metadata.get('topic'))
            return sorted(list(topics))
        except Exception as e:
            print(f"Error listing knowledge topics: {e}")
            return []
 if __name__ == "__main__":
    import sys
    # Test the database
    db = ContextDatabase()
    # Register test systems
    db.register_system(
        "macha",
        "workstation",
        ["ollama"],
        capabilities=["ai-inference"]
    )
    db.register_system(
        "rhiannon",
        "server",
        ["gotify", "nextcloud", "prowlarr"],
        capabilities=["notifications", "cloud-storage"]
    )
    # Add relationship
    db.add_relationship(
        "macha",
        "rhiannon",
        "uses-service",
        "Macha uses Rhiannon's Gotify for notifications"
    )
    # Test queries
    print("All systems:", db.get_all_systems())
    print("\nMacha's dependencies:", db.get_dependencies("macha"))
    print("\nRhiannon's dependents:", db.get_dependents("rhiannon"))
    print("\nSystem context:", db.get_system_context("macha"))
--- a/conversation.py
+++ b/conversation.py
@@ -0,0 +1,328 @@
 #!/usr/bin/env python3
 """
 Conversational Interface - Allows questioning Macha about decisions and system state
 """
 import json
 import requests
 from typing import Dict, List, Any, Optional
 from pathlib import Path
 from datetime import datetime
 from agent import MachaAgent
 class MachaConversation:
    """Conversational interface for Macha"""
    def __init__(
        self,
        ollama_host: str = "http://localhost:11434",
        model: str = "gpt-oss:latest",
        state_dir: Path = Path("/var/lib/macha")
    ):
        self.ollama_host = ollama_host
        self.model = model
        self.state_dir = state_dir
        self.decision_log = self.state_dir / "decisions.jsonl"
        self.approval_queue = self.state_dir / "approval_queue.json"
        self.orchestrator_log = self.state_dir / "orchestrator.log"
        # Initialize agent with tool support and queue
        self.agent = MachaAgent(
            ollama_host=ollama_host,
            model=model,
            state_dir=state_dir,
            enable_tools=True,
            use_queue=True,
            priority="INTERACTIVE"
        )
    def ask(self, question: str, include_context: bool = True) -> str:
        """Ask Macha a question with optional system context"""
        context = ""
        if include_context:
            context = self._gather_context()
        # Build messages for tool-aware chat
        content = self._create_conversational_prompt(question, context)
        messages = [{"role": "user", "content": content}]
        response = self.agent._query_ollama_with_tools(messages)
        return response
    def discuss_action(self, action_index: int) -> str:
        """Discuss a specific queued action by its queue position (0-based index)"""
        action = self._get_action_from_queue(action_index)
        if not action:
            return f"No action found at queue position {action_index}. Use 'macha-approve list' to see available actions."
        context = self._gather_context()
        action_context = json.dumps(action, indent=2)
        content = f"""TASK: DISCUSS PROPOSED ACTION
 ================================================================================
 A user is asking about a proposed action in your approval queue.
 QUEUED ACTION (Queue Position #{action_index}):
 {action_context}
 RECENT SYSTEM CONTEXT:
 {context}
 The user wants to discuss this action. Explain:
 1. Why you proposed this action
 2. What problem it solves
 3. The risks involved
 4. What could go wrong
 5. Alternative approaches if any
 Be conversational, helpful, and honest about uncertainties.
 """
        messages = [{"role": "user", "content": content}]
        return self.agent._query_ollama_with_tools(messages)
    def _gather_context(self) -> str:
        """Gather relevant system context for the conversation"""
        context_parts = []
        # System infrastructure from ChromaDB
        try:
            from context_db import ContextDatabase
            db = ContextDatabase()
            systems = db.get_all_systems()
            if systems:
                context_parts.append("INFRASTRUCTURE:")
                for system in systems:
                    context_parts.append(f"  - {system['hostname']} ({system.get('type', 'unknown')})")
                    if system.get('config_repo'):
                        context_parts.append(f"    Config Repo: {system['config_repo']}")
                        context_parts.append(f"    Branch: {system.get('config_branch', 'unknown')}")
                    if system.get('capabilities'):
                        context_parts.append(f"    Capabilities: {', '.join(system['capabilities'])}")
        except Exception as e:
            # ChromaDB not available, skip
            pass
        # Recent decisions
        recent_decisions = self._get_recent_decisions(5)
        if recent_decisions:
            context_parts.append("\nRECENT DECISIONS:")
            for i, dec in enumerate(recent_decisions, 1):
                timestamp = dec.get("timestamp", "unknown")
                analysis = dec.get("analysis", {})
                status = analysis.get("status", "unknown")
                context_parts.append(f"{i}. [{timestamp}] Status: {status}")
                if "issues" in analysis:
                    for issue in analysis.get("issues", [])[:3]:
                        context_parts.append(f"   - {issue.get('description', 'N/A')}")
        # Pending approvals
        pending = self._get_pending_approvals()
        if pending:
            context_parts.append(f"\nPENDING APPROVALS: {len(pending)} action(s) awaiting approval")
        # Recent log excerpts (last 10 lines)
        recent_logs = self._get_recent_logs(10)
        if recent_logs:
            context_parts.append("\nRECENT LOG ENTRIES:")
            context_parts.extend(recent_logs)
        return "\n".join(context_parts)
    def _create_conversational_prompt(self, question: str, context: str) -> str:
        """Create a conversational prompt"""
        return f"""{MachaAgent.SYSTEM_PROMPT}
 TASK: ANSWER QUESTION
 ================================================================================
 You monitor system health, analyze issues using AI, and propose fixes. Be helpful, 
 honest about what you know and don't know, and reference the context provided below.
 SYSTEM CONTEXT:
 {context if context else "No recent activity"}
 USER QUESTION:
 {question}
 Respond conversationally and helpfully. If the question is about your recent decisions 
 or actions, reference the context above. If you don't have enough information, say so.
 Keep responses concise but informative.
 """
    def _query_ollama(self, prompt: str, temperature: float = 0.7) -> str:
        """Query Ollama API"""
        try:
            response = requests.post(
                f"{self.ollama_host}/api/generate",
                json={
                    "model": self.model,
                    "prompt": prompt,
                    "stream": False,
                    "temperature": temperature,
                },
                timeout=60
            )
            response.raise_for_status()
            return response.json().get("response", "")
        except requests.exceptions.HTTPError as e:
            error_detail = ""
            try:
                error_detail = f" - {response.text}"
            except:
                pass
            return f"Error: Ollama returned HTTP {response.status_code}{error_detail}"
        except Exception as e:
            return f"Error querying Ollama: {str(e)}"
    def _get_recent_decisions(self, count: int = 5) -> List[Dict[str, Any]]:
        """Get recent decisions from log"""
        if not self.decision_log.exists():
            return []
        decisions = []
        try:
            with open(self.decision_log, 'r') as f:
                for line in f:
                    if line.strip():
                        try:
                            decisions.append(json.loads(line))
                        except:
                            pass
        except:
            pass
        return decisions[-count:]
    def _get_pending_approvals(self) -> List[Dict[str, Any]]:
        """Get pending approvals from queue"""
        if not self.approval_queue.exists():
            return []
        try:
            with open(self.approval_queue, 'r') as f:
                data = json.load(f)
                # Queue is a JSON array, not an object with "pending" key
                if isinstance(data, list):
                    return data
                return data.get("pending", [])
        except:
            return []
    def _get_action_from_queue(self, action_index: int) -> Optional[Dict[str, Any]]:
        """Get a specific action from the queue by index"""
        pending = self._get_pending_approvals()
        if 0 <= action_index < len(pending):
            return pending[action_index]
        return None
    def _get_recent_logs(self, count: int = 10) -> List[str]:
        """Get recent orchestrator log lines"""
        if not self.orchestrator_log.exists():
            return []
        try:
            with open(self.orchestrator_log, 'r') as f:
                lines = f.readlines()
                return [line.strip() for line in lines[-count:] if line.strip()]
        except:
            return []
 if __name__ == "__main__":
    import sys
    import argparse
    parser = argparse.ArgumentParser(description="Ask Macha a question or discuss an action")
    parser.add_argument("--discuss", type=int, metavar="ACTION_ID", help="Discuss a specific queued action")
    parser.add_argument("--follow-up", type=str, metavar="QUESTION", help="Follow-up question about the action")
    parser.add_argument("question", nargs="*", help="Your question for Macha")
    parser.add_argument("--no-context", action="store_true", help="Don't include system context")
    args = parser.parse_args()
    # Load config if available
    config_file = Path("/etc/macha-autonomous/config.json")
    ollama_host = "http://localhost:11434"
    model = "gpt-oss:latest"
    if config_file.exists():
        try:
            with open(config_file, 'r') as f:
                config = json.load(f)
                ollama_host = config.get("ollama_host", ollama_host)
                model = config.get("model", model)
        except:
            pass
    conversation = MachaConversation(
        ollama_host=ollama_host,
        model=model
    )
    if args.discuss is not None:
        if args.follow_up:
            # Follow-up question about a specific action
            action = conversation._get_action_from_queue(args.discuss)
            if not action:
                print(f"No action found at queue position {args.discuss}. Use 'macha-approve list' to see available actions.")
                sys.exit(1)
            # Build context with the action details
            action_context = f"""
 QUEUED ACTION #{args.discuss}:
 Diagnosis: {action.get('proposal', {}).get('diagnosis', 'N/A')}
 Proposed Action: {action.get('proposal', {}).get('proposed_action', 'N/A')}
 Action Type: {action.get('proposal', {}).get('action_type', 'N/A')}
 Risk Level: {action.get('proposal', {}).get('risk_level', 'N/A')}
 Commands: {json.dumps(action.get('proposal', {}).get('commands', []), indent=2)}
 Reasoning: {action.get('proposal', {}).get('reasoning', 'N/A')}
 FOLLOW-UP QUESTION:
 {args.follow_up}
 """
            # Query the AI with the action context
            response = conversation._query_ollama(f"""{MachaAgent.SYSTEM_PROMPT}
 TASK: ANSWER FOLLOW-UP QUESTION ABOUT QUEUED ACTION
 ================================================================================
 You are answering a follow-up question about a proposed fix that is awaiting approval.
 Be helpful and answer directly. If the user is concerned about risks, explain them clearly.
 If they ask about alternatives, suggest them.
 {action_context}
 RESPOND CONCISELY AND DIRECTLY.
 """)
        else:
            # Initial discussion about the action
            response = conversation.discuss_action(args.discuss)
    elif args.question:
        # Ask a general question
        question = " ".join(args.question)
        response = conversation.ask(question, include_context=not args.no_context)
    else:
        parser.print_help()
        sys.exit(1)
    # Only print formatted output for initial discussion, not for follow-ups
    if args.follow_up:
        print(response)
    else:
        print("\n" + "="*60)
        print("MACHA:")
        print("="*60)
        print(response)
        print("="*60 + "\n")
--- a/executor.py
+++ b/executor.py
@@ -0,0 +1,537 @@
 #!/usr/bin/env python3
 """
 Action Executor - Safely executes proposed fixes with rollback capability
 """
 import json
 import subprocess
 import shutil
 from typing import Dict, List, Any, Optional
 from pathlib import Path
 from datetime import datetime
 import time
 class SafeExecutor:
    """Executes system maintenance actions with safety checks"""
    # Actions that are considered safe to auto-execute
    SAFE_ACTIONS = {
        "systemd_restart",  # Restart failed services
        "cleanup",  # Disk cleanup, log rotation
        "investigation",  # Read-only diagnostics
    }
    # Services that should NEVER be stopped/disabled
    PROTECTED_SERVICES = {
        "sshd",
        "systemd-networkd",
        "NetworkManager",
        "systemd-resolved",
        "dbus",
    }
    def __init__(
        self,
        state_dir: Path = Path("/var/lib/macha"),
        autonomy_level: str = "suggest",  # observe, suggest, auto-safe, auto-full
        dry_run: bool = False,
        agent = None  # Optional agent for learning from actions
    ):
        self.state_dir = state_dir
        self.state_dir.mkdir(parents=True, exist_ok=True)
        self.autonomy_level = autonomy_level
        self.dry_run = dry_run
        self.agent = agent
        self.action_log = self.state_dir / "actions.jsonl"
        self.approval_queue = self.state_dir / "approval_queue.json"
    def execute_action(self, action: Dict[str, Any], monitoring_context: Dict[str, Any]) -> Dict[str, Any]:
        """Execute a proposed action with appropriate safety checks"""
        action_type = action.get("action_type", "unknown")
        risk_level = action.get("risk_level", "high")
        # Determine if we should execute
        should_execute, reason = self._should_execute(action_type, risk_level)
        if not should_execute:
            if self.autonomy_level == "suggest":
                # Queue for approval
                self._queue_for_approval(action, monitoring_context)
                return {
                    "executed": False,
                    "status": "queued_for_approval",
                    "reason": reason,
                    "queue_file": str(self.approval_queue)
                }
            else:
                return {
                    "executed": False,
                    "status": "blocked",
                    "reason": reason
                }
        # Execute the action
        if self.dry_run:
            return self._dry_run_action(action)
        return self._execute_action_impl(action, monitoring_context)
    def _should_execute(self, action_type: str, risk_level: str) -> tuple[bool, str]:
        """Determine if an action should be auto-executed based on autonomy level"""
        if self.autonomy_level == "observe":
            return False, "Autonomy level set to observe-only"
        # Auto-approve low-risk investigation actions
        if action_type == "investigation" and risk_level == "low":
            return True, "Auto-approved: Low-risk information gathering"
        if self.autonomy_level == "suggest":
            return False, "Autonomy level requires manual approval"
        if self.autonomy_level == "auto-safe":
            if action_type in self.SAFE_ACTIONS and risk_level == "low":
                return True, "Auto-executing safe action"
            return False, "Action requires higher autonomy level"
        if self.autonomy_level == "auto-full":
            if risk_level == "high":
                return False, "High risk actions always require approval"
            return True, "Auto-executing approved action"
        return False, "Unknown autonomy level"
    def _execute_action_impl(self, action: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
        """Actually execute the action"""
        action_type = action.get("action_type")
        result = {
            "executed": True,
            "timestamp": datetime.now().isoformat(),
            "action": action,
            "success": False,
            "output": "",
            "error": None
        }
        try:
            if action_type == "systemd_restart":
                result.update(self._restart_services(action))
            elif action_type == "cleanup":
                result.update(self._perform_cleanup(action))
            elif action_type == "nix_rebuild":
                result.update(self._nix_rebuild(action))
            elif action_type == "config_change":
                result.update(self._apply_config_change(action))
            elif action_type == "investigation":
                result.update(self._run_investigation(action))
            else:
                result["error"] = f"Unknown action type: {action_type}"
        except Exception as e:
            result["error"] = str(e)
            result["success"] = False
        # Log the action
        self._log_action(result)
        # Learn from successful operations
        if result.get("success") and self.agent:
            try:
                self.agent.reflect_and_learn(
                    situation=action.get("diagnosis", "Unknown situation"),
                    action_taken=action.get("proposed_action", "Unknown action"),
                    outcome=result.get("output", ""),
                    success=True
                )
            except Exception as e:
                # Don't fail the action if learning fails
                print(f"Note: Could not record learning: {e}")
        return result
    def _restart_services(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Restart systemd services"""
        commands = action.get("commands", [])
        output_lines = []
        for cmd in commands:
            if not cmd.startswith("systemctl restart "):
                continue
            service = cmd.split()[-1]
            # Safety check
            if any(protected in service for protected in self.PROTECTED_SERVICES):
                output_lines.append(f"BLOCKED: {service} is protected")
                continue
            try:
                result = subprocess.run(
                    ["systemctl", "restart", service],
                    capture_output=True,
                    text=True,
                    timeout=30
                )
                if result.returncode == 0:
                    output_lines.append(f"✓ Restarted {service}")
                else:
                    output_lines.append(f"✗ Failed to restart {service}: {result.stderr}")
            except subprocess.TimeoutExpired:
                output_lines.append(f"✗ Timeout restarting {service}")
        return {
            "success": len(output_lines) > 0,
            "output": "\n".join(output_lines)
        }
    def _perform_cleanup(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Perform system cleanup tasks"""
        output_lines = []
        # Nix store cleanup
        if "nix" in action.get("proposed_action", "").lower():
            try:
                result = subprocess.run(
                    ["nix-collect-garbage", "--delete-old"],
                    capture_output=True,
                    text=True,
                    timeout=300
                )
                output_lines.append(f"Nix cleanup: {result.stdout}")
            except Exception as e:
                output_lines.append(f"Nix cleanup failed: {e}")
        # Journal cleanup (keep last 7 days)
        try:
            result = subprocess.run(
                ["journalctl", "--vacuum-time=7d"],
                capture_output=True,
                text=True,
                timeout=60
            )
            output_lines.append(f"Journal cleanup: {result.stdout}")
        except Exception as e:
            output_lines.append(f"Journal cleanup failed: {e}")
        return {
            "success": True,
            "output": "\n".join(output_lines)
        }
    def _nix_rebuild(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Rebuild NixOS configuration"""
        # This is HIGH RISK - always requires approval or full autonomy
        # And we should test first
        output_lines = []
        # First, try a dry build
        try:
            result = subprocess.run(
                ["nixos-rebuild", "dry-build", "--flake", ".#macha"],
                capture_output=True,
                text=True,
                timeout=600,
                cwd="/home/lily/Documents/nixos-servers"
            )
            if result.returncode != 0:
                return {
                    "success": False,
                    "output": f"Dry build failed:\n{result.stderr}"
                }
            output_lines.append("✓ Dry build successful")
        except Exception as e:
            return {
                "success": False,
                "output": f"Dry build error: {e}"
            }
        # Now do the actual rebuild
        try:
            result = subprocess.run(
                ["nixos-rebuild", "switch", "--flake", ".#macha"],
                capture_output=True,
                text=True,
                timeout=1200,
                cwd="/home/lily/Documents/nixos-servers"
            )
            output_lines.append(result.stdout)
            return {
                "success": result.returncode == 0,
                "output": "\n".join(output_lines),
                "error": result.stderr if result.returncode != 0 else None
            }
        except Exception as e:
            return {
                "success": False,
                "output": "\n".join(output_lines),
                "error": str(e)
            }
    def _apply_config_change(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Apply a configuration file change"""
        config_changes = action.get("config_changes", {})
        file_path = config_changes.get("file")
        if not file_path:
            return {
                "success": False,
                "output": "No file specified in config_changes"
            }
        # For now, we DON'T auto-modify configs - too risky
        # Instead, we create a suggested patch file
        patch_file = self.state_dir / f"suggested_patch_{int(time.time())}.txt"
        with open(patch_file, 'w') as f:
            f.write(f"Suggested change to {file_path}:\n\n")
            f.write(config_changes.get("change", "No change description"))
            f.write(f"\n\nReasoning: {action.get('reasoning', 'No reasoning provided')}")
        return {
            "success": True,
            "output": f"Config change suggestion saved to {patch_file}\nThis requires manual review and application."
        }
    def _run_investigation(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Run diagnostic commands"""
        commands = action.get("commands", [])
        output_lines = []
        for cmd in commands:
            # Only allow safe read-only commands
            safe_commands = ["journalctl", "systemctl status", "df", "free", "ps", "netstat", "ss"]
            if not any(cmd.startswith(safe) for safe in safe_commands):
                output_lines.append(f"BLOCKED unsafe command: {cmd}")
                continue
            try:
                result = subprocess.run(
                    cmd,
                    shell=True,
                    capture_output=True,
                    text=True,
                    timeout=30
                )
                output_lines.append(f"$ {cmd}")
                output_lines.append(result.stdout)
            except Exception as e:
                output_lines.append(f"Error running {cmd}: {e}")
        return {
            "success": True,
            "output": "\n".join(output_lines)
        }
    def _dry_run_action(self, action: Dict[str, Any]) -> Dict[str, Any]:
        """Simulate action execution"""
        return {
            "executed": False,
            "status": "dry_run",
            "action": action,
            "output": "Dry run mode - no actual changes made"
        }
    def _queue_for_approval(self, action: Dict[str, Any], context: Dict[str, Any]):
        """Add action to approval queue"""
        queue = []
        if self.approval_queue.exists():
            with open(self.approval_queue, 'r') as f:
                queue = json.load(f)
        # Check for duplicate pending actions
        proposed_action = action.get("proposed_action", "")
        diagnosis = action.get("diagnosis", "")
        for existing in queue:
            # Skip already approved/rejected items
            if existing.get("approved") is not None:
                continue
            existing_action = existing.get("action", {})
            existing_proposed = existing_action.get("proposed_action", "")
            existing_diagnosis = existing_action.get("diagnosis", "")
            # Check if this is essentially the same issue
            # Match if diagnosis is very similar OR proposed action is very similar
            if (diagnosis and existing_diagnosis and 
                self._similarity_check(diagnosis, existing_diagnosis) > 0.7):
                print(f"Skipping duplicate action - similar diagnosis already queued")
                return
            if (proposed_action and existing_proposed and
                self._similarity_check(proposed_action, existing_proposed) > 0.7):
                print(f"Skipping duplicate action - similar proposal already queued")
                return
        queue.append({
            "timestamp": datetime.now().isoformat(),
            "action": action,
            "context": context,
            "approved": None
        })
        with open(self.approval_queue, 'w') as f:
            json.dump(queue, f, indent=2)
    def _similarity_check(self, str1: str, str2: str) -> float:
        """Simple similarity check between two strings"""
        # Normalize strings
        s1 = str1.lower().strip()
        s2 = str2.lower().strip()
        # Exact match
        if s1 == s2:
            return 1.0
        # Check for significant word overlap
        words1 = set(s1.split())
        words2 = set(s2.split())
        # Remove common words that don't indicate similarity
        common_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'is', 'are', 'was', 'were', 'be', 'been', 'have', 'has', 'had'}
        words1 = words1 - common_words
        words2 = words2 - common_words
        if not words1 or not words2:
            return 0.0
        # Calculate Jaccard similarity
        intersection = len(words1 & words2)
        union = len(words1 | words2)
        return intersection / union if union > 0 else 0.0
    def _log_action(self, result: Dict[str, Any]):
        """Log executed actions"""
        with open(self.action_log, 'a') as f:
            f.write(json.dumps(result) + '\n')
    def get_approval_queue(self) -> List[Dict[str, Any]]:
        """Get pending actions awaiting approval"""
        if not self.approval_queue.exists():
            return []
        with open(self.approval_queue, 'r') as f:
            return json.load(f)
    def approve_action(self, index: int) -> bool:
        """Approve and execute a queued action, then remove it from queue"""
        queue = self.get_approval_queue()
        if 0 <= index < len(queue):
            action_item = queue[index]
            # Execute the approved action
            result = self._execute_action_impl(action_item["action"], action_item["context"])
            # Archive the action (success or failure)
            self._archive_action(action_item, result)
            # Remove from queue regardless of outcome
            queue.pop(index)
            with open(self.approval_queue, 'w') as f:
                json.dump(queue, f, indent=2)
            return result.get("success", False)
        return False
    def _archive_action(self, action_item: Dict[str, Any], result: Dict[str, Any]):
        """Archive an approved action with its execution result"""
        archive_file = self.state_dir / "approved_actions.jsonl"
        archive_entry = {
            "timestamp": datetime.now().isoformat(),
            "original_timestamp": action_item.get("timestamp"),
            "action": action_item.get("action"),
            "context": action_item.get("context"),
            "result": result
        }
        with open(archive_file, 'a') as f:
            f.write(json.dumps(archive_entry) + '\n')
    def reject_action(self, index: int) -> bool:
        """Reject and remove a queued action"""
        queue = self.get_approval_queue()
        if 0 <= index < len(queue):
            removed_action = queue.pop(index)
            with open(self.approval_queue, 'w') as f:
                json.dump(queue, f, indent=2)
            return True
        return False
 if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1:
        if sys.argv[1] == "queue":
            executor = SafeExecutor()
            queue = executor.get_approval_queue()
            if queue:
                print("\n" + "="*70)
                print(f"PENDING ACTIONS: {len(queue)}")
                print("="*70)
                for i, item in enumerate(queue):
                    action = item.get("action", {})
                    timestamp = item.get("timestamp", "unknown")
                    approved = item.get("approved")
                    status = "✓ APPROVED" if approved else "⏳ PENDING" if approved is None else "✗ REJECTED"
                    print(f"\n[{i}] {status} - {timestamp}")
                    print("-" * 70)
                    print(f"DIAGNOSIS: {action.get('diagnosis', 'N/A')}")
                    print(f"\nPROPOSED ACTION: {action.get('proposed_action', 'N/A')}")
                    print(f"TYPE: {action.get('action_type', 'N/A')}")
                    print(f"RISK: {action.get('risk_level', 'N/A')}")
                    if action.get('commands'):
                        print(f"\nCOMMANDS:")
                        for cmd in action['commands']:
                            print(f"  - {cmd}")
                    if action.get('config_changes'):
                        print(f"\nCONFIG CHANGES:")
                        for key, value in action['config_changes'].items():
                            print(f"  {key}: {value}")
                    print(f"\nREASONING: {action.get('reasoning', 'N/A')}")
                print("\n" + "="*70 + "\n")
            else:
                print("No pending actions")
        elif sys.argv[1] == "approve" and len(sys.argv) > 2:
            executor = SafeExecutor()
            index = int(sys.argv[2])
            success = executor.approve_action(index)
            print(f"Approval {'succeeded' if success else 'failed'}")
        elif sys.argv[1] == "reject" and len(sys.argv) > 2:
            executor = SafeExecutor()
            index = int(sys.argv[2])
            success = executor.reject_action(index)
            print(f"Action {'rejected and removed from queue' if success else 'rejection failed'}")
--- a/flake.nix
+++ b/flake.nix
@@ -0,0 +1,41 @@
 {
  description = "Macha - AI-Powered Autonomous System Administrator";
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
  };
  outputs = { self, nixpkgs }: {
    # NixOS module
    nixosModules.default = import ./module.nix;
    # Alternative explicit name
    nixosModules.macha-autonomous = import ./module.nix;
    # For development
    devShells = nixpkgs.lib.genAttrs [ "x86_64-linux" "aarch64-linux" ] (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
        pythonEnv = pkgs.python3.withPackages (ps: with ps; [
          requests
          psutil
          chromadb
        ]);
      in {
        default = pkgs.mkShell {
          packages = [ pythonEnv pkgs.git ];
          shellHook = ''
            echo "Macha Autonomous Development Environment"
            echo "Python packages: requests, psutil, chromadb"
          '';
        };
      }
    );
    # Formatter
    formatter = nixpkgs.lib.genAttrs [ "x86_64-linux" "aarch64-linux" ] (system:
      nixpkgs.legacyPackages.${system}.nixpkgs-fmt
    );
  };
 }
--- a/git_context.py
+++ b/git_context.py
@@ -0,0 +1,222 @@
 #!/usr/bin/env python3
 """
 Git Context - Extract context from NixOS configuration repository
 """
 import subprocess
 from typing import Dict, List, Any, Optional
 from datetime import datetime, timedelta
 from pathlib import Path
 class GitContext:
    """Extract context from git repository"""
    def __init__(self, repo_path: str = "/etc/nixos"):
        """
        Initialize git context extractor
        Args:
            repo_path: Path to the git repository (default: /etc/nixos for NixOS systems)
        """
        self.repo_path = Path(repo_path)
    def _run_git(self, args: List[str]) -> tuple[bool, str]:
        """Run git command"""
        try:
            result = subprocess.run(
                ["git", "-C", str(self.repo_path)] + args,
                capture_output=True,
                text=True,
                timeout=10
            )
            return (result.returncode == 0, result.stdout.strip())
        except Exception as e:
            return (False, str(e))
    def get_current_branch(self) -> str:
        """Get current git branch"""
        success, output = self._run_git(["rev-parse", "--abbrev-ref", "HEAD"])
        return output if success else "unknown"
    def get_remote_url(self) -> str:
        """Get git remote URL"""
        success, output = self._run_git(["remote", "get-url", "origin"])
        return output if success else ""
    def get_recent_commits(self, count: int = 10, since: str = "1 week ago") -> List[Dict[str, str]]:
        """
        Get recent commits
        Args:
            count: Number of commits to retrieve
            since: Time range (e.g., "1 week ago", "3 days ago")
        Returns:
            List of commit dictionaries with hash, author, date, message
        """
        success, output = self._run_git([
            "log",
            f"--since={since}",
            f"-n{count}",
            "--format=%H|%an|%ar|%s"
        ])
        if not success:
            return []
        commits = []
        for line in output.split('\n'):
            if not line.strip():
                continue
            parts = line.split('|', 3)
            if len(parts) == 4:
                commits.append({
                    "hash": parts[0][:8],  # Short hash
                    "author": parts[1],
                    "date": parts[2],
                    "message": parts[3]
                })
        return commits
    def get_system_config_files(self, system_name: str) -> List[str]:
        """
        Get configuration files for a specific system
        Args:
            system_name: Name of the system (e.g., "macha", "rhiannon")
        Returns:
            List of configuration file paths
        """
        system_dir = self.repo_path / "systems" / system_name
        config_files = []
        if system_dir.exists():
            # Main config
            if (system_dir.parent / f"{system_name}.nix").exists():
                config_files.append(f"systems/{system_name}.nix")
            # System-specific configs
            for config_file in system_dir.rglob("*.nix"):
                config_files.append(str(config_file.relative_to(self.repo_path)))
        return config_files
    def get_recent_changes_for_system(self, system_name: str, since: str = "1 week ago") -> List[Dict[str, str]]:
        """
        Get recent changes affecting a specific system
        Args:
            system_name: Name of the system
            since: Time range
        Returns:
            List of commits that affected this system
        """
        config_files = self.get_system_config_files(system_name)
        if not config_files:
            return []
        # Get commits that touched these files
        file_args = []
        for f in config_files:
            file_args.extend(["--", f])
        success, output = self._run_git([
            "log",
            f"--since={since}",
            "-n10",
            "--format=%H|%an|%ar|%s"
        ] + file_args)
        if not success:
            return []
        commits = []
        for line in output.split('\n'):
            if not line.strip():
                continue
            parts = line.split('|', 3)
            if len(parts) == 4:
                commits.append({
                    "hash": parts[0][:8],
                    "author": parts[1],
                    "date": parts[2],
                    "message": parts[3]
                })
        return commits
    def get_system_context_summary(self, system_name: str) -> str:
        """
        Get a summary of git context for a system
        Args:
            system_name: Name of the system
        Returns:
            Human-readable summary
        """
        lines = []
        # Repository info
        repo_url = self.get_remote_url()
        branch = self.get_current_branch()
        if repo_url:
            lines.append(f"Configuration Repository: {repo_url}")
        lines.append(f"Branch: {branch}")
        # Recent changes to this system
        recent_changes = self.get_recent_changes_for_system(system_name, "2 weeks ago")
        if recent_changes:
            lines.append(f"\nRecent configuration changes (last 2 weeks):")
            for commit in recent_changes[:5]:
                lines.append(f"  - {commit['date']}: {commit['message']} ({commit['author']})")
        else:
            lines.append("\nNo recent configuration changes")
        return "\n".join(lines)
    def get_all_managed_systems(self) -> List[str]:
        """
        Get list of all systems managed by this repository
        Returns:
            List of system names
        """
        systems = []
        systems_dir = self.repo_path / "systems"
        if systems_dir.exists():
            for system_file in systems_dir.glob("*.nix"):
                if system_file.stem not in ["default"]:
                    systems.append(system_file.stem)
        return sorted(systems)
 if __name__ == "__main__":
    import sys
    git = GitContext()
    print("Repository:", git.get_remote_url())
    print("Branch:", git.get_current_branch())
    print("\nManaged Systems:")
    for system in git.get_all_managed_systems():
        print(f"  - {system}")
    print("\nRecent Commits:")
    for commit in git.get_recent_commits(5):
        print(f"  {commit['hash']}: {commit['message']} - {commit['author']}, {commit['date']}")
    if len(sys.argv) > 1:
        system = sys.argv[1]
        print(f"\nContext for {system}:")
        print(git.get_system_context_summary(system))
--- a/issue_tracker.py
+++ b/issue_tracker.py
@@ -0,0 +1,219 @@
 #!/usr/bin/env python3
 """
 Issue Tracker - Internal ticketing system for tracking problems and their resolution
 """
 import json
 import uuid
 from datetime import datetime
 from typing import Dict, List, Any, Optional
 from pathlib import Path
 class IssueTracker:
    """Manages issue lifecycle: detection -> investigation -> resolution"""
    def __init__(self, context_db, log_dir: str = "/var/lib/macha/logs"):
        self.context_db = context_db
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.closed_log = self.log_dir / "closed_issues.jsonl"
    def create_issue(
        self,
        hostname: str,
        title: str,
        description: str,
        severity: str = "medium",
        source: str = "auto-detected"
    ) -> str:
        """Create a new issue and return its ID"""
        issue_id = str(uuid.uuid4())
        now = datetime.utcnow().isoformat()
        issue = {
            "issue_id": issue_id,
            "hostname": hostname,
            "title": title,
            "description": description,
            "status": "open",
            "severity": severity,
            "created_at": now,
            "updated_at": now,
            "source": source,
            "investigations": [],
            "actions": [],
            "resolution": None
        }
        self.context_db.store_issue(issue)
        return issue_id
    def get_issue(self, issue_id: str) -> Optional[Dict[str, Any]]:
        """Retrieve an issue by ID"""
        return self.context_db.get_issue(issue_id)
    def update_issue(
        self,
        issue_id: str,
        status: Optional[str] = None,
        investigation: Optional[Dict[str, Any]] = None,
        action: Optional[Dict[str, Any]] = None
    ) -> bool:
        """Update an issue with new information"""
        issue = self.get_issue(issue_id)
        if not issue:
            return False
        if status:
            issue["status"] = status
        if investigation:
            investigation["timestamp"] = datetime.utcnow().isoformat()
            issue["investigations"].append(investigation)
        if action:
            action["timestamp"] = datetime.utcnow().isoformat()
            issue["actions"].append(action)
        issue["updated_at"] = datetime.utcnow().isoformat()
        self.context_db.update_issue(issue)
        return True
    def find_similar_issue(
        self,
        hostname: str,
        title: str,
        description: str = None
    ) -> Optional[Dict[str, Any]]:
        """Find an existing open issue that matches this problem"""
        open_issues = self.list_issues(hostname=hostname, status="open")
        # Simple similarity check on title
        title_lower = title.lower()
        for issue in open_issues:
            issue_title_lower = issue.get("title", "").lower()
            # Check for keyword overlap
            title_words = set(title_lower.split())
            issue_words = set(issue_title_lower.split())
            # If >50% of words overlap, consider it similar
            if len(title_words & issue_words) / max(len(title_words), 1) > 0.5:
                return issue
        return None
    def list_issues(
        self,
        hostname: Optional[str] = None,
        status: Optional[str] = None,
        severity: Optional[str] = None
    ) -> List[Dict[str, Any]]:
        """List issues with optional filters"""
        return self.context_db.list_issues(
            hostname=hostname,
            status=status,
            severity=severity
        )
    def resolve_issue(self, issue_id: str, resolution: str) -> bool:
        """Mark an issue as resolved with a resolution note"""
        issue = self.get_issue(issue_id)
        if not issue:
            return False
        issue["status"] = "resolved"
        issue["resolution"] = resolution
        issue["updated_at"] = datetime.utcnow().isoformat()
        self.context_db.update_issue(issue)
        return True
    def close_issue(self, issue_id: str) -> bool:
        """Archive a resolved issue to the closed log"""
        issue = self.get_issue(issue_id)
        if not issue:
            return False
        # Can only close resolved issues
        if issue["status"] != "resolved":
            return False
        issue["status"] = "closed"
        issue["closed_at"] = datetime.utcnow().isoformat()
        # Archive to closed log
        self._archive_issue(issue)
        # Remove from active database
        self.context_db.delete_issue(issue_id)
        return True
    def get_issue_history(self, issue_id: str) -> Dict[str, Any]:
        """Get full history for an issue (investigations + actions)"""
        issue = self.get_issue(issue_id)
        if not issue:
            return {}
        return {
            "issue": issue,
            "investigation_count": len(issue.get("investigations", [])),
            "action_count": len(issue.get("actions", [])),
            "age_hours": self._calculate_age(issue["created_at"]),
            "last_activity": issue["updated_at"]
        }
    def auto_resolve_if_fixed(self, hostname: str, detected_problems: List[str]) -> int:
        """
        Auto-resolve open issues if their problems are no longer detected.
        Returns count of auto-resolved issues.
        """
        open_issues = self.list_issues(hostname=hostname, status="open")
        resolved_count = 0
        # Convert detected problems to lowercase for comparison
        detected_lower = [p.lower() for p in detected_problems]
        for issue in open_issues:
            title_lower = issue.get("title", "").lower()
            desc_lower = issue.get("description", "").lower()
            # Check if issue keywords are still in detected problems
            still_present = False
            for detected in detected_lower:
                if any(word in detected for word in title_lower.split()) or \
                   any(word in detected for word in desc_lower.split()):
                    still_present = True
                    break
            # If problem is no longer detected, auto-resolve
            if not still_present:
                self.resolve_issue(
                    issue["issue_id"],
                    "Auto-resolved: Problem no longer detected in system monitoring"
                )
                resolved_count += 1
        return resolved_count
    def _archive_issue(self, issue: Dict[str, Any]):
        """Append closed issue to the archive log"""
        try:
            with open(self.closed_log, "a") as f:
                f.write(json.dumps(issue) + "\n")
        except Exception as e:
            print(f"Failed to archive issue {issue.get('issue_id')}: {e}")
    def _calculate_age(self, created_at: str) -> float:
        """Calculate age of issue in hours"""
        try:
            created = datetime.fromisoformat(created_at)
            now = datetime.utcnow()
            delta = now - created
            return delta.total_seconds() / 3600
        except:
            return 0
--- a/journal_monitor.py
+++ b/journal_monitor.py
@@ -0,0 +1,358 @@
 #!/usr/bin/env python3
 """
 Journal Monitor - Monitor remote systems via centralized journald
 """
 import json
 import subprocess
 from typing import Dict, List, Any, Optional, Set
 from datetime import datetime, timedelta
 from pathlib import Path
 from collections import defaultdict
 class JournalMonitor:
    """Monitor systems via centralized journald logs"""
    def __init__(self, domain: str = "coven.systems"):
        """
        Initialize journal monitor
        Args:
            domain: Domain suffix for FQDNs
        """
        self.domain = domain
        self.known_hosts: Set[str] = set()
    def _run_journalctl(self, args: List[str], timeout: int = 30) -> tuple[bool, str, str]:
        """
        Run journalctl command
        Args:
            args: Arguments to journalctl
            timeout: Timeout in seconds
        Returns:
            (success, stdout, stderr)
        """
        try:
            cmd = ["journalctl"] + args
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=timeout
            )
            return (
                result.returncode == 0,
                result.stdout.strip(),
                result.stderr.strip()
            )
        except subprocess.TimeoutExpired:
            return False, "", f"Command timed out after {timeout}s"
        except Exception as e:
            return False, "", str(e)
    def discover_hosts(self) -> List[str]:
        """
        Discover hosts reporting to centralized journal
        Returns:
            List of discovered FQDNs
        """
        success, output, _ = self._run_journalctl([
            "--output=json",
            "--since=1 day ago",
            "-n", "10000"
        ])
        if not success:
            return []
        hosts = set()
        for line in output.split('\n'):
            if not line.strip():
                continue
            try:
                entry = json.loads(line)
                hostname = entry.get('_HOSTNAME', '')
                # Ensure FQDN format
                if hostname and not hostname.endswith(f'.{self.domain}'):
                    if '.' not in hostname:
                        hostname = f"{hostname}.{self.domain}"
                if hostname:
                    hosts.add(hostname)
            except json.JSONDecodeError:
                continue
        self.known_hosts = hosts
        return sorted(hosts)
    def collect_resources(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
        """
        Collect resource usage from journal entries
        This extracts CPU/memory info from systemd service messages
        """
        # For now, return empty - we'll primarily use this for service/log monitoring
        # Resource metrics could be added if systems log them
        return {
            "cpu_percent": 0,
            "memory_percent": 0,
            "load_average": {"1min": 0, "5min": 0, "15min": 0}
        }
    def collect_systemd_status(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
        """
        Collect systemd service status from journal
        Args:
            hostname: FQDN of the system
            since: Time range to check
        Returns:
            Dictionary with failed service information
        """
        # Query for systemd service failures
        success, output, _ = self._run_journalctl([
            f"_HOSTNAME={hostname}",
            "--priority=err",
            "--unit=*.service",
            f"--since={since}",
            "--output=json"
        ])
        if not success:
            return {"failed_count": 0, "failed_services": []}
        failed_services = {}
        for line in output.split('\n'):
            if not line.strip():
                continue
            try:
                entry = json.loads(line)
                unit = entry.get('_SYSTEMD_UNIT', '')
                if unit and unit.endswith('.service'):
                    service_name = unit.replace('.service', '')
                    if service_name not in failed_services:
                        failed_services[service_name] = {
                            "unit": unit,
                            "message": entry.get('MESSAGE', ''),
                            "timestamp": entry.get('__REALTIME_TIMESTAMP', '')
                        }
            except json.JSONDecodeError:
                continue
        return {
            "failed_count": len(failed_services),
            "failed_services": list(failed_services.values())
        }
    def collect_log_errors(self, hostname: str, since: str = "1 hour ago") -> Dict[str, Any]:
        """
        Collect error logs from journal
        Args:
            hostname: FQDN of the system
            since: Time range to check
        Returns:
            Dictionary with error log information
        """
        success, output, _ = self._run_journalctl([
            f"_HOSTNAME={hostname}",
            "--priority=err",
            f"--since={since}",
            "--output=json"
        ])
        if not success:
            return {"error_count_1h": 0, "recent_errors": []}
        errors = []
        error_count = 0
        for line in output.split('\n'):
            if not line.strip():
                continue
            try:
                entry = json.loads(line)
                error_count += 1
                if len(errors) < 10:  # Keep last 10 errors
                    errors.append({
                        "message": entry.get('MESSAGE', ''),
                        "unit": entry.get('_SYSTEMD_UNIT', 'unknown'),
                        "priority": entry.get('PRIORITY', ''),
                        "timestamp": entry.get('__REALTIME_TIMESTAMP', '')
                    })
            except json.JSONDecodeError:
                continue
        return {
            "error_count_1h": error_count,
            "recent_errors": errors
        }
    def collect_disk_usage(self, hostname: str) -> Dict[str, Any]:
        """
        Collect disk usage - Note: This would require systems to log disk metrics
        For now, returns empty. Could be enhanced if systems periodically log disk usage
        """
        return {"partitions": []}
    def collect_network_status(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
        """
        Check network connectivity based on recent journal activity
        If we see recent logs from a host, it's reachable
        """
        success, output, _ = self._run_journalctl([
            f"_HOSTNAME={hostname}",
            f"--since={since}",
            "-n", "1",
            "--output=json"
        ])
        # If we got recent logs, network is working
        internet_reachable = bool(success and output.strip())
        return {
            "internet_reachable": internet_reachable,
            "last_seen": datetime.now().isoformat() if internet_reachable else None
        }
    def collect_all(self, hostname: str) -> Dict[str, Any]:
        """
        Collect all monitoring data for a host from journal
        Args:
            hostname: FQDN of the system to monitor
        Returns:
            Complete monitoring data
        """
        # First check if we have recent logs from this host
        net_status = self.collect_network_status(hostname)
        if not net_status.get("internet_reachable"):
            return {
                "hostname": hostname,
                "reachable": False,
                "error": "No recent journal entries from this host"
            }
        return {
            "hostname": hostname,
            "reachable": True,
            "source": "journal",
            "resources": self.collect_resources(hostname),
            "systemd": self.collect_systemd_status(hostname),
            "disk": self.collect_disk_usage(hostname),
            "network": net_status,
            "logs": self.collect_log_errors(hostname),
        }
    def get_summary(self, data: Dict[str, Any]) -> str:
        """Generate human-readable summary from journal data"""
        hostname = data.get("hostname", "unknown")
        if not data.get("reachable", False):
            return f"❌ {hostname}: {data.get('error', 'Unreachable')}"
        lines = [f"System: {hostname} (via journal)"]
        # Services
        systemd = data.get("systemd", {})
        failed_count = systemd.get("failed_count", 0)
        if failed_count > 0:
            lines.append(f"Services: {failed_count} failed")
            for svc in systemd.get("failed_services", [])[:3]:
                lines.append(f"  - {svc.get('unit', 'unknown')}")
        else:
            lines.append("Services: No recent failures")
        # Network
        net = data.get("network", {})
        last_seen = net.get("last_seen")
        if last_seen:
            lines.append(f"Last seen: {last_seen}")
        # Logs
        logs = data.get("logs", {})
        error_count = logs.get("error_count_1h", 0)
        if error_count > 0:
            lines.append(f"Recent logs: {error_count} errors in last hour")
        return "\n".join(lines)
    def get_active_services(self, hostname: str, since: str = "1 hour ago") -> List[str]:
        """
        Get list of active services on a host by looking at journal entries
        This helps with auto-discovery of what's running on each system
        """
        success, output, _ = self._run_journalctl([
            f"_HOSTNAME={hostname}",
            f"--since={since}",
            "--output=json",
            "-n", "1000"
        ])
        if not success:
            return []
        services = set()
        for line in output.split('\n'):
            if not line.strip():
                continue
            try:
                entry = json.loads(line)
                unit = entry.get('_SYSTEMD_UNIT', '')
                if unit and unit.endswith('.service'):
                    # Extract service name
                    service = unit.replace('.service', '')
                    # Filter out common system services, focus on application services
                    if service not in ['systemd-journald', 'systemd-logind', 'sshd', 'dbus']:
                        services.add(service)
            except json.JSONDecodeError:
                continue
        return sorted(services)
 if __name__ == "__main__":
    import sys
    monitor = JournalMonitor()
    # Discover hosts
    print("Discovering hosts from journal...")
    hosts = monitor.discover_hosts()
    print(f"Found {len(hosts)} hosts:")
    for host in hosts:
        print(f"  - {host}")
    # Monitor first host if available
    if hosts:
        hostname = hosts[0]
        print(f"\nMonitoring {hostname}...")
        data = monitor.collect_all(hostname)
        print("\n" + "="*60)
        print(monitor.get_summary(data))
        print("="*60)
        # Discover services
        print(f"\nActive services on {hostname}:")
        services = monitor.get_active_services(hostname)
        for svc in services[:10]:
            print(f"  - {svc}")
--- a/module.nix
+++ b/module.nix
@@ -0,0 +1,847 @@
 { config, lib, pkgs, ... }:
 with lib;
 let
  cfg = config.services.macha-autonomous;
  # Python environment with all dependencies
  pythonEnv = pkgs.python3.withPackages (ps: with ps; [
    requests
    psutil
    chromadb
  ]);
  # Main autonomous system package
  macha-autonomous = pkgs.writeScriptBin "macha-autonomous" ''
    #!${pythonEnv}/bin/python3
    import sys
    sys.path.insert(0, "${./.}")
    from orchestrator import main
    main()
  '';
  # Config file
  configFile = pkgs.writeText "macha-autonomous-config.json" (builtins.toJSON {
    check_interval = cfg.checkInterval;
    autonomy_level = cfg.autonomyLevel;
    ollama_host = cfg.ollamaHost;
    model = cfg.model;
    config_repo = cfg.configRepo;
    config_branch = cfg.configBranch;
  });
 in {
  options.services.macha-autonomous = {
    enable = mkEnableOption "Macha autonomous system maintenance";
    autonomyLevel = mkOption {
      type = types.enum [ "observe" "suggest" "auto-safe" "auto-full" ];
      default = "suggest";
      description = ''
        Level of autonomy for the system:
        - observe: Only monitor and log, no actions
        - suggest: Propose actions, require manual approval
        - auto-safe: Auto-execute low-risk actions (restarts, cleanup)
        - auto-full: Full autonomy with safety limits (still requires approval for high-risk)
      '';
    };
    checkInterval = mkOption {
      type = types.int;
      default = 300;
      description = "Interval in seconds between system checks";
    };
    ollamaHost = mkOption {
      type = types.str;
      default = "http://localhost:11434";
      description = "Ollama API host";
    };
    model = mkOption {
      type = types.str;
      default = "llama3.1:70b";
      description = "LLM model to use for reasoning";
    };
    user = mkOption {
      type = types.str;
      default = "macha";
      description = "User to run the autonomous system as";
    };
    group = mkOption {
      type = types.str;
      default = "macha";
      description = "Group to run the autonomous system as";
    };
    gotifyUrl = mkOption {
      type = types.str;
      default = "";
      example = "http://rhiannon:8181";
      description = "Gotify server URL for notifications (empty to disable)";
    };
    gotifyToken = mkOption {
      type = types.str;
      default = "";
      description = "Gotify application token for notifications";
    };
    remoteSystems = mkOption {
      type = types.listOf types.str;
      default = [];
      example = [ "rhiannon" "alexander" ];
      description = "List of remote NixOS systems to monitor and maintain";
    };
    configRepo = mkOption {
      type = types.str;
      default = if config.programs.nh.flake != null 
                then config.programs.nh.flake
                else "git+https://git.coven.systems/lily/nixos-servers";
      description = "URL of the NixOS configuration repository (auto-detected from programs.nh.flake if available)";
    };
    configBranch = mkOption {
      type = types.str;
      default = "main";
      description = "Branch of the NixOS configuration repository";
    };
  };
  config = mkIf cfg.enable {
    # Create user and group
    users.users.${cfg.user} = {
      isSystemUser = true;
      group = cfg.group;
      uid = 2501;
      description = "Macha autonomous system maintenance";
      home = "/var/lib/macha";
      createHome = true;
    };
    users.groups.${cfg.group} = {};
    # Git configuration for credential storage
    programs.git = {
      enable = true;
      config = {
        credential.helper = "store";
      };
    };
    # Ollama service for AI inference
    services.ollama = {
      enable = true;
      acceleration = "rocm";
      host = "0.0.0.0";
      port = 11434;
      environmentVariables = {
        "OLLAMA_DEBUG" = "1";
        "OLLAMA_KEEP_ALIVE" = "600";
        "OLLAMA_NEW_ENGINE" = "true";
        "OLLAMA_CONTEXT_LENGTH" = "131072";
      };
      openFirewall = false;  # Keep internal only
      loadModels = [
        "qwen3"
        "gpt-oss"
        "gemma3"
        "gpt-oss:20b"
        "qwen3:4b-instruct-2507-fp16"
        "qwen3:8b-fp16"
        "mistral:7b"
        "chroma/all-minilm-l6-v2-f32:latest"
      ];
    };
    # ChromaDB service for vector storage
    services.chromadb = {
      enable = true;
      port = 8000;
      dbpath = "/var/lib/chromadb";
    };
    # Give the user permissions it needs
    security.sudo.extraRules = [{
      users = [ cfg.user ];
      commands = [
        # Local system management
        { command = "${pkgs.systemd}/bin/systemctl restart *"; options = [ "NOPASSWD" ]; }
        { command = "${pkgs.systemd}/bin/systemctl status *"; options = [ "NOPASSWD" ]; }
        { command = "${pkgs.systemd}/bin/journalctl *"; options = [ "NOPASSWD" ]; }
        { command = "${pkgs.nix}/bin/nix-collect-garbage *"; options = [ "NOPASSWD" ]; }
        # Remote system access (uses existing root SSH keys)
        { command = "${pkgs.openssh}/bin/ssh *"; options = [ "NOPASSWD" ]; }
        { command = "${pkgs.openssh}/bin/scp *"; options = [ "NOPASSWD" ]; }
        { command = "${pkgs.nixos-rebuild}/bin/nixos-rebuild *"; options = [ "NOPASSWD" ]; }
      ];
    }];
    # Config file
    environment.etc."macha-autonomous/config.json".source = configFile;
    # State directory and queue directories (world-writable queues for multi-user access)
    # Using 'z' to set permissions even if directory exists
    systemd.tmpfiles.rules = [
      "d /var/lib/macha 0755 ${cfg.user} ${cfg.group} -"
      "z /var/lib/macha 0755 ${cfg.user} ${cfg.group} -"  # Ensure permissions are set
      "d /var/lib/macha/queues 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/queues/ollama 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/queues/ollama/pending 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/queues/ollama/processing 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/queues/ollama/completed 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/queues/ollama/failed 0777 ${cfg.user} ${cfg.group} -"
      "d /var/lib/macha/tool_cache 0777 ${cfg.user} ${cfg.group} -"
    ];
    # Systemd service
    systemd.services.macha-autonomous = {
      description = "Macha Autonomous System Maintenance";
      after = [ "network.target" "ollama.service" ];
      wants = [ "ollama.service" ];
      wantedBy = [ "multi-user.target" ];
      serviceConfig = {
        Type = "simple";
        User = cfg.user;
        Group = cfg.group;
        WorkingDirectory = "/var/lib/macha";
        ExecStart = "${macha-autonomous}/bin/macha-autonomous --mode continuous --autonomy ${cfg.autonomyLevel} --interval ${toString cfg.checkInterval}";
        Restart = "on-failure";
        RestartSec = "30s";
        # Security hardening
        PrivateTmp = true;
        NoNewPrivileges = false;  # Need privileges for sudo
        ProtectSystem = "strict";
        ProtectHome = true;
        ReadWritePaths = [ "/var/lib/macha" "/var/lib/macha/tool_cache" "/var/lib/macha/queues" ];
        # Resource limits
        MemoryLimit = "1G";
        CPUQuota = "50%";
      };
      environment = {
        PYTHONPATH = toString ./.;
        GOTIFY_URL = cfg.gotifyUrl;
        GOTIFY_TOKEN = cfg.gotifyToken;
        CHROMA_ENV_FILE = "";  # Prevent ChromaDB from trying to read .env files
        ANONYMIZED_TELEMETRY = "False";  # Disable ChromaDB telemetry
      };
      path = [ pkgs.git ];  # Make git available for config parsing
    };
    # Ollama Queue Worker Service (serializes all Ollama requests)
    systemd.services.ollama-queue-worker = {
      description = "Macha Ollama Queue Worker";
      after = [ "network.target" "ollama.service" ];
      wants = [ "ollama.service" ];
      wantedBy = [ "multi-user.target" ];
      serviceConfig = {
        Type = "simple";
        User = cfg.user;
        Group = cfg.group;
        WorkingDirectory = "/var/lib/macha";
        ExecStart = "${pythonEnv}/bin/python3 ${./.}/ollama_worker.py";
        Restart = "on-failure";
        RestartSec = "10s";
        # Security hardening
        PrivateTmp = true;
        NoNewPrivileges = true;
        ProtectSystem = "strict";
        ProtectHome = true;
        ReadWritePaths = [ "/var/lib/macha/queues" "/var/lib/macha/tool_cache" ];
        # Resource limits
        MemoryLimit = "512M";
        CPUQuota = "25%";
      };
      environment = {
        PYTHONPATH = toString ./.;
        CHROMA_ENV_FILE = "";
        ANONYMIZED_TELEMETRY = "False";
      };
    };
    # CLI tools for manual control and system packages
    environment.systemPackages = with pkgs; [
      macha-autonomous
      # Python packages for ChromaDB
      python313
      python313Packages.pip
      python313Packages.chromadb.pythonModule
      # Tool to check approval queue
      (pkgs.writeScriptBin "macha-approve" ''
        #!${pkgs.bash}/bin/bash
        if [ "$1" == "list" ]; then
          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py queue
        elif [ "$1" == "discuss" ] && [ -n "$2" ]; then
          ACTION_ID="$2"
          echo "==================================================================="
          echo "Interactive Discussion with Macha about Action #$ACTION_ID"
          echo "==================================================================="
          echo ""
          # Initial explanation
          sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py --discuss "$ACTION_ID"
          echo ""
          echo "==================================================================="
          echo "You can now ask follow-up questions about this action."
          echo "Type 'approve' to approve it, 'reject' to reject it, or 'exit' to quit."
          echo "==================================================================="
          # Interactive loop
          while true; do
            echo ""
            echo -n "You: "
            read -r USER_INPUT
            # Check for special commands
            if [ "$USER_INPUT" = "exit" ] || [ "$USER_INPUT" = "quit" ] || [ -z "$USER_INPUT" ]; then
              echo "Exiting discussion."
              break
            elif [ "$USER_INPUT" = "approve" ]; then
              echo "Approving action #$ACTION_ID..."
              sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py approve "$ACTION_ID"
              break
            elif [ "$USER_INPUT" = "reject" ]; then
              echo "Rejecting and removing action #$ACTION_ID from queue..."
              sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py reject "$ACTION_ID"
              break
            fi
            # Ask Macha the follow-up question in context of the action
            echo ""
            echo -n "Macha: "
            sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py --discuss "$ACTION_ID" --follow-up "$USER_INPUT"
            echo ""
          done
        elif [ "$1" == "approve" ] && [ -n "$2" ]; then
          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py approve "$2"
        elif [ "$1" == "reject" ] && [ -n "$2" ]; then
          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py reject "$2"
        else
          echo "Usage:"
          echo "  macha-approve list          - Show pending actions"
          echo "  macha-approve discuss <N>   - Discuss action number N with Macha (interactive)"
          echo "  macha-approve approve <N>   - Approve action number N"
          echo "  macha-approve reject <N>    - Reject and remove action number N from queue"
        fi
      '')
      # Tool to run manual check
      (pkgs.writeScriptBin "macha-check" ''
        #!${pkgs.bash}/bin/bash
        sudo -u ${cfg.user} sh -c 'cd /var/lib/macha && CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${macha-autonomous}/bin/macha-autonomous --mode once --autonomy ${cfg.autonomyLevel}'
      '')
      # Tool to view logs
      (pkgs.writeScriptBin "macha-logs" ''
        #!${pkgs.bash}/bin/bash
        case "$1" in
          orchestrator)
            sudo tail -f /var/lib/macha/orchestrator.log
            ;;
          decisions)
            sudo tail -f /var/lib/macha/decisions.jsonl
            ;;
          actions)
            sudo tail -f /var/lib/macha/actions.jsonl
            ;;
          service)
            journalctl -u macha-autonomous.service -f
            ;;
          *)
            echo "Usage: macha-logs [orchestrator|decisions|actions|service]"
            ;;
        esac
      '')
      # Tool to send test notification
      (pkgs.writeScriptBin "macha-notify" ''
        #!${pkgs.bash}/bin/bash
        if [ -z "$1" ] || [ -z "$2" ]; then
          echo "Usage: macha-notify <title> <message> [priority]"
          echo "Example: macha-notify 'Test' 'This is a test' 5"
          echo "Priorities: 2 (low), 5 (medium), 8 (high)"
          exit 1
        fi
        export GOTIFY_URL="${cfg.gotifyUrl}"
        export GOTIFY_TOKEN="${cfg.gotifyToken}"
        ${pythonEnv}/bin/python3 ${./.}/notifier.py "$1" "$2" "''${3:-5}"
      '')
      # Tool to query config files
      (pkgs.writeScriptBin "macha-configs" ''
        #!${pkgs.bash}/bin/bash
        export PYTHONPATH=${toString ./.}
        export CHROMA_ENV_FILE=""
        export ANONYMIZED_TELEMETRY="False"
        if [ $# -eq 0 ]; then
          echo "Usage: macha-configs <search-query> [system-name]"
          echo "Examples:"
          echo "  macha-configs gotify"
          echo "  macha-configs 'journald configuration'"
          echo "  macha-configs ollama macha.coven.systems"
          exit 1
        fi
        QUERY="$1"
        SYSTEM="''${2:-}"
        ${pythonEnv}/bin/python3 -c "
 from context_db import ContextDatabase
 import sys
 db = ContextDatabase()
 query = sys.argv[1]
 system = sys.argv[2] if len(sys.argv) > 2 else None
 print(f'Searching for: {query}')
 if system:
    print(f'Filtered to system: {system}')
 print('='*60)
 configs = db.query_config_files(query, system=system, n_results=5)
 if not configs:
    print('No matching configuration files found.')
 else:
    for i, cfg in enumerate(configs, 1):
        print(f\"\\n{i}. {cfg['path']} (relevance: {cfg['relevance']:.1%})\")
        print(f\"   Category: {cfg['metadata']['category']}\")
        print('   Preview:')
        preview = cfg['content'][:300].replace('\\n', '\\n   ')
        print(f'   {preview}')
        if len(cfg['content']) > 300:
            print('   ... (use macha-configs-read to see full file)')
        " "$QUERY" "$SYSTEM"
      '')
      # Interactive chat tool (runs as invoking user, not as macha-autonomous)
      (pkgs.writeScriptBin "macha-chat" ''
        #!${pkgs.bash}/bin/bash
        export PYTHONPATH=${toString ./.}
        export CHROMA_ENV_FILE=""
        export ANONYMIZED_TELEMETRY="False"
        # Run as the current user, not as macha-autonomous
        # This allows the chat to execute privileged commands with the user's permissions
        ${pythonEnv}/bin/python3 ${./.}/chat.py
      '')
      # Tool to read full config file
      (pkgs.writeScriptBin "macha-configs-read" ''
        #!${pkgs.bash}/bin/bash
        export PYTHONPATH=${toString ./.}
        export CHROMA_ENV_FILE=""
        export ANONYMIZED_TELEMETRY="False"
        if [ $# -eq 0 ]; then
          echo "Usage: macha-configs-read <file-path>"
          echo "Example: macha-configs-read apps/gotify.nix"
          exit 1
        fi
        ${pythonEnv}/bin/python3 -c "
 from context_db import ContextDatabase
 import sys
 db = ContextDatabase()
 file_path = sys.argv[1]
 cfg = db.get_config_file(file_path)
 if not cfg:
    print(f'Config file not found: {file_path}')
    sys.exit(1)
 print(f'File: {cfg[\"path\"]}')
 print(f'Category: {cfg[\"metadata\"][\"category\"]}')
 print('='*60)
 print(cfg['content'])
        " "$1"
      '')
      # Tool to view system registry
      (pkgs.writeScriptBin "macha-systems" ''
        #!${pkgs.bash}/bin/bash
        export PYTHONPATH=${toString ./.}
        export CHROMA_ENV_FILE=""
        export ANONYMIZED_TELEMETRY="False"
        ${pythonEnv}/bin/python3 -c "
 from context_db import ContextDatabase
 import json
 db = ContextDatabase()
 systems = db.get_all_systems()
 print('Registered Systems:')
 print('='*60)
 for system in systems:
    os_type = system.get('os_type', 'unknown').upper()
    print(f\"\\n{system['hostname']} ({system['type']}) [{os_type}]\")
    print(f\"  Config Repo: {system.get('config_repo') or '(not set)'}\")
    print(f\"  Branch: {system.get('config_branch', 'unknown')}\")
    if system.get('services'):
        print(f\"  Services: {', '.join(system['services'][:10])}\")
        if len(system['services']) > 10:
            print(f\"    ... and {len(system['services']) - 10} more\")
    if system.get('capabilities'):
        print(f\"  Capabilities: {', '.join(system['capabilities'])}\")
 print('='*60)
        "
      '')
      # Tool to ask Macha questions
      (pkgs.writeScriptBin "macha-ask" ''
        #!${pkgs.bash}/bin/bash
        if [ $# -eq 0 ]; then
          echo "Usage: macha-ask <your question>"
          echo "Example: macha-ask Why did you recommend restarting that service?"
          exit 1
        fi
        sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py "$@"
      '')
      # Issue tracking CLI
      (pkgs.writeScriptBin "macha-issues" ''
        #!${pythonEnv}/bin/python3
        import sys
        import os
        os.environ["CHROMA_ENV_FILE"] = ""
        os.environ["ANONYMIZED_TELEMETRY"] = "False"
        sys.path.insert(0, "${./.}")
        from context_db import ContextDatabase
        from issue_tracker import IssueTracker
        from datetime import datetime
        import json
        db = ContextDatabase()
        tracker = IssueTracker(db)
        def list_issues(show_all=False):
            """List issues"""
            if show_all:
                issues = tracker.list_issues()
            else:
                issues = tracker.list_issues(status="open")
            if not issues:
                print("No issues found")
                return
            print("="*70)
            print(f"ISSUES: {len(issues)}")
            print("="*70)
            for issue in issues:
                issue_id = issue['issue_id'][:8]
                age_hours = (datetime.utcnow() - datetime.fromisoformat(issue['created_at'])).total_seconds() / 3600
                inv_count = len(issue.get('investigations', []))
                action_count = len(issue.get('actions', []))
                print(f"\n[{issue_id}] {issue['title']}")
                print(f"  Host: {issue['hostname']}")
                print(f"  Status: {issue['status'].upper()} | Severity: {issue['severity'].upper()}")
                print(f"  Age: {age_hours:.1f}h | Activity: {inv_count} investigations, {action_count} actions")
                print(f"  Source: {issue['source']}")
                if issue.get('resolution'):
                    print(f"  Resolution: {issue['resolution']}")
        def show_issue(issue_id):
            """Show detailed issue information"""
            # Find issue by partial ID
            all_issues = tracker.list_issues()
            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
            if not matching:
                print(f"Issue {issue_id} not found")
                return
            issue = matching[0]
            full_id = issue['issue_id']
            print("="*70)
            print(f"ISSUE: {issue['title']}")
            print("="*70)
            print(f"ID: {full_id}")
            print(f"Host: {issue['hostname']}")
            print(f"Status: {issue['status'].upper()}")
            print(f"Severity: {issue['severity'].upper()}")
            print(f"Source: {issue['source']}")
            print(f"Created: {issue['created_at']}")
            print(f"Updated: {issue['updated_at']}")
            print(f"\nDescription:\n{issue['description']}")
            investigations = issue.get('investigations', [])
            if investigations:
                print(f"\n{'─'*70}")
                print(f"INVESTIGATIONS ({len(investigations)}):")
                for i, inv in enumerate(investigations, 1):
                    print(f"\n  [{i}] {inv.get('timestamp', 'N/A')}")
                    print(f"  Diagnosis: {inv.get('diagnosis', 'N/A')}")
                    print(f"  Commands: {', '.join(inv.get('commands', []))}")
                    print(f"  Success: {inv.get('success', False)}")
                    if inv.get('output'):
                        print(f"  Output: {inv['output'][:200]}...")
            actions = issue.get('actions', [])
            if actions:
                print(f"\n{'─'*70}")
                print(f"ACTIONS ({len(actions)}):")
                for i, action in enumerate(actions, 1):
                    print(f"\n  [{i}] {action.get('timestamp', 'N/A')}")
                    print(f"  Action: {action.get('proposed_action', 'N/A')}")
                    print(f"  Risk: {action.get('risk_level', 'N/A').upper()}")
                    print(f"  Commands: {', '.join(action.get('commands', []))}")
                    print(f"  Success: {action.get('success', False)}")
            if issue.get('resolution'):
                print(f"\n{'─'*70}")
                print(f"RESOLUTION:")
                print(f"  {issue['resolution']}")
            print("="*70)
        def create_issue(description):
            """Create a new issue manually"""
            import socket
            hostname = f"{socket.gethostname()}.coven.systems"
            issue_id = tracker.create_issue(
                hostname=hostname,
                title=description[:100],
                description=description,
                severity="medium",
                source="user-reported"
            )
            print(f"Created issue: {issue_id[:8]}")
            print(f"Title: {description[:100]}")
        def resolve_issue(issue_id, resolution="Manually resolved"):
            """Mark an issue as resolved"""
            # Find issue by partial ID
            all_issues = tracker.list_issues()
            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
            if not matching:
                print(f"Issue {issue_id} not found")
                return
            full_id = matching[0]['issue_id']
            success = tracker.resolve_issue(full_id, resolution)
            if success:
                print(f"Resolved issue {issue_id[:8]}")
            else:
                print(f"Failed to resolve issue {issue_id}")
        def close_issue(issue_id):
            """Archive a resolved issue"""
            # Find issue by partial ID
            all_issues = tracker.list_issues()
            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
            if not matching:
                print(f"Issue {issue_id} not found")
                return
            full_id = matching[0]['issue_id']
            if matching[0]['status'] != 'resolved':
                print(f"Issue {issue_id} must be resolved before closing")
                print(f"Use: macha-issues resolve {issue_id}")
                return
            success = tracker.close_issue(full_id)
            if success:
                print(f"Closed and archived issue {issue_id[:8]}")
            else:
                print(f"Failed to close issue {issue_id}")
        # Main CLI
        if len(sys.argv) < 2:
            print("Usage: macha-issues <command> [options]")
            print("")
            print("Commands:")
            print("  list               List open issues")
            print("  list --all         List all issues (including resolved/closed)")
            print("  show <id>          Show detailed issue information")
            print("  create <desc>      Create a new issue manually")
            print("  resolve <id>       Mark issue as resolved")
            print("  close <id>         Archive a resolved issue")
            sys.exit(1)
        command = sys.argv[1]
        if command == "list":
            show_all = "--all" in sys.argv
            list_issues(show_all)
        elif command == "show" and len(sys.argv) >= 3:
            show_issue(sys.argv[2])
        elif command == "create" and len(sys.argv) >= 3:
            description = " ".join(sys.argv[2:])
            create_issue(description)
        elif command == "resolve" and len(sys.argv) >= 3:
            resolution = " ".join(sys.argv[3:]) if len(sys.argv) > 3 else "Manually resolved"
            resolve_issue(sys.argv[2], resolution)
        elif command == "close" and len(sys.argv) >= 3:
            close_issue(sys.argv[2])
        else:
            print(f"Unknown command: {command}")
            sys.exit(1)
      '')
      # Knowledge base CLI
      (pkgs.writeScriptBin "macha-knowledge" ''
        #!${pythonEnv}/bin/python3
        import sys
        import os
        os.environ["CHROMA_ENV_FILE"] = ""
        os.environ["ANONYMIZED_TELEMETRY"] = "False"
        sys.path.insert(0, "${./.}")
        from context_db import ContextDatabase
        db = ContextDatabase()
        def list_topics(category=None):
            """List all knowledge topics"""
            topics = db.list_knowledge_topics(category)
            if not topics:
                print("No knowledge topics found.")
                return
            print(f"{'='*70}")
            if category:
                print(f"KNOWLEDGE TOPICS ({category.upper()}):")
            else:
                print(f"KNOWLEDGE TOPICS:")
            print(f"{'='*70}")
            for topic in topics:
                print(f"  • {topic}")
            print(f"{'='*70}")
        def show_topic(topic):
            """Show all knowledge for a topic"""
            items = db.get_knowledge_by_topic(topic)
            if not items:
                print(f"No knowledge found for topic: {topic}")
                return
            print(f"{'='*70}")
            print(f"KNOWLEDGE: {topic}")
            print(f"{'='*70}\n")
            for item in items:
                print(f"ID: {item['id'][:8]}...")
                print(f"Category: {item['category']}")
                print(f"Source: {item['source']}")
                print(f"Confidence: {item['confidence']}")
                print(f"Created: {item['created_at']}")
                print(f"Times Referenced: {item['times_referenced']}")
                if item.get('tags'):
                    print(f"Tags: {', '.join(item['tags'])}")
                print(f"\nKnowledge:")
                print(f"  {item['knowledge']}\n")
                print(f"{'-'*70}\n")
        def search_knowledge(query, category=None):
            """Search knowledge base"""
            items = db.query_knowledge(query, category=category, limit=10)
            if not items:
                print(f"No knowledge found matching: {query}")
                return
            print(f"{'='*70}")
            print(f"SEARCH RESULTS: {query}")
            if category:
                print(f"Category Filter: {category}")
            print(f"{'='*70}\n")
            for i, item in enumerate(items, 1):
                print(f"[{i}] {item['topic']}")
                print(f"    Category: {item['category']} | Confidence: {item['confidence']}")
                print(f"    {item['knowledge'][:150]}...")
                print()
        def add_knowledge(topic, knowledge, category="general"):
            """Add new knowledge"""
            kid = db.store_knowledge(
                topic=topic,
                knowledge=knowledge,
                category=category,
                source="user-provided",
                confidence="high"
            )
            if kid:
                print(f"✓ Added knowledge for topic: {topic}")
                print(f"  ID: {kid[:8]}...")
            else:
                print(f"✗ Failed to add knowledge")
        def seed_initial():
            """Seed initial knowledge"""
            print("Seeding initial knowledge from seed_knowledge.py...")
            exec(open("${./.}/seed_knowledge.py").read())
        # Main CLI
        if len(sys.argv) < 2:
            print("Usage: macha-knowledge <command> [options]")
            print("")
            print("Commands:")
            print("  list                 List all knowledge topics")
            print("  list <category>      List topics in category")
            print("  show <topic>         Show all knowledge for a topic")
            print("  search <query>       Search knowledge base")
            print("  search <query> <cat> Search in specific category")
            print("  add <topic> <text>   Add new knowledge")
            print("  seed                 Seed initial knowledge")
            print("")
            print("Categories: command, pattern, troubleshooting, performance, general")
            sys.exit(1)
        command = sys.argv[1]
        if command == "list":
            category = sys.argv[2] if len(sys.argv) >= 3 else None
            list_topics(category)
        elif command == "show" and len(sys.argv) >= 3:
            show_topic(sys.argv[2])
        elif command == "search" and len(sys.argv) >= 3:
            query = sys.argv[2]
            category = sys.argv[3] if len(sys.argv) >= 4 else None
            search_knowledge(query, category)
        elif command == "add" and len(sys.argv) >= 4:
            topic = sys.argv[2]
            knowledge = " ".join(sys.argv[3:])
            add_knowledge(topic, knowledge)
        elif command == "seed":
            seed_initial()
        else:
            print(f"Unknown command: {command}")
            sys.exit(1)
      '')
    ];
  };
 }
--- a/monitor.py
+++ b/monitor.py
@@ -0,0 +1,291 @@
 #!/usr/bin/env python3
 """
 System Monitor - Collects health data from Macha
 """
 import json
 import subprocess
 import psutil
 import time
 from datetime import datetime
 from pathlib import Path
 from typing import Dict, List, Any
 class SystemMonitor:
    """Monitors system health and collects diagnostic data"""
    def __init__(self, state_dir: Path = Path("/var/lib/macha")):
        self.state_dir = state_dir
        self.state_dir.mkdir(parents=True, exist_ok=True)
    def collect_all(self) -> Dict[str, Any]:
        """Collect all system health data"""
        return {
            "timestamp": datetime.now().isoformat(),
            "systemd": self.check_systemd_services(),
            "resources": self.check_resources(),
            "disk": self.check_disk_usage(),
            "logs": self.check_recent_errors(),
            "nixos": self.check_nixos_status(),
            "network": self.check_network(),
            "boot": self.check_boot_status(),
        }
    def check_systemd_services(self) -> Dict[str, Any]:
        """Check status of all systemd services"""
        try:
            # Get failed services
            result = subprocess.run(
                ["systemctl", "--failed", "--no-pager", "--output=json"],
                capture_output=True,
                text=True,
                timeout=10
            )
            failed_services = []
            if result.returncode == 0 and result.stdout:
                try:
                    failed_services = json.loads(result.stdout)
                except json.JSONDecodeError:
                    pass
            # Get all services status
            result = subprocess.run(
                ["systemctl", "list-units", "--type=service", "--no-pager", "--output=json"],
                capture_output=True,
                text=True,
                timeout=10
            )
            all_services = []
            if result.returncode == 0 and result.stdout:
                try:
                    all_services = json.loads(result.stdout)
                except json.JSONDecodeError:
                    pass
            return {
                "failed_count": len(failed_services),
                "failed_services": failed_services,
                "total_services": len(all_services),
                "active_services": [s for s in all_services if s.get("active") == "active"],
            }
        except Exception as e:
            return {"error": str(e)}
    def check_resources(self) -> Dict[str, Any]:
        """Check CPU, RAM, and system resources"""
        try:
            cpu_percent = psutil.cpu_percent(interval=1)
            memory = psutil.virtual_memory()
            load_avg = psutil.getloadavg()
            return {
                "cpu_percent": cpu_percent,
                "cpu_count": psutil.cpu_count(),
                "memory_percent": memory.percent,
                "memory_available_gb": memory.available / (1024**3),
                "memory_total_gb": memory.total / (1024**3),
                "load_average": {
                    "1min": load_avg[0],
                    "5min": load_avg[1],
                    "15min": load_avg[2],
                },
                "swap_percent": psutil.swap_memory().percent,
            }
        except Exception as e:
            return {"error": str(e)}
    def check_disk_usage(self) -> Dict[str, Any]:
        """Check disk usage for all mounted filesystems"""
        try:
            partitions = psutil.disk_partitions()
            disk_info = []
            for partition in partitions:
                try:
                    usage = psutil.disk_usage(partition.mountpoint)
                    disk_info.append({
                        "device": partition.device,
                        "mountpoint": partition.mountpoint,
                        "fstype": partition.fstype,
                        "percent_used": usage.percent,
                        "total_gb": usage.total / (1024**3),
                        "used_gb": usage.used / (1024**3),
                        "free_gb": usage.free / (1024**3),
                    })
                except PermissionError:
                    continue
            return {"partitions": disk_info}
        except Exception as e:
            return {"error": str(e)}
    def check_recent_errors(self) -> Dict[str, Any]:
        """Check recent system logs for errors"""
        try:
            # Get errors from the last hour
            result = subprocess.run(
                ["journalctl", "-p", "err", "--since", "1 hour ago", "--no-pager", "-o", "json"],
                capture_output=True,
                text=True,
                timeout=10
            )
            errors = []
            if result.returncode == 0 and result.stdout:
                for line in result.stdout.strip().split('\n'):
                    if line:
                        try:
                            errors.append(json.loads(line))
                        except json.JSONDecodeError:
                            continue
            return {
                "error_count_1h": len(errors),
                "recent_errors": errors[-50:],  # Last 50 errors
            }
        except Exception as e:
            return {"error": str(e)}
    def check_nixos_status(self) -> Dict[str, Any]:
        """Check NixOS generation and system info"""
        try:
            # Get current generation
            result = subprocess.run(
                ["nixos-version"],
                capture_output=True,
                text=True,
                timeout=5
            )
            version = result.stdout.strip() if result.returncode == 0 else "unknown"
            # Get generation list
            result = subprocess.run(
                ["nix-env", "--list-generations", "-p", "/nix/var/nix/profiles/system"],
                capture_output=True,
                text=True,
                timeout=10
            )
            generations = result.stdout.strip() if result.returncode == 0 else ""
            return {
                "version": version,
                "generations": generations,
                "nix_store_size": self._get_nix_store_size(),
            }
        except Exception as e:
            return {"error": str(e)}
    def _get_nix_store_size(self) -> str:
        """Get Nix store size"""
        try:
            result = subprocess.run(
                ["du", "-sh", "/nix/store"],
                capture_output=True,
                text=True,
                timeout=30
            )
            if result.returncode == 0:
                return result.stdout.split()[0]
        except:
            pass
        return "unknown"
    def check_network(self) -> Dict[str, Any]:
        """Check network connectivity"""
        try:
            # Check if we can reach the internet
            result = subprocess.run(
                ["ping", "-c", "1", "-W", "2", "8.8.8.8"],
                capture_output=True,
                timeout=5
            )
            internet_up = result.returncode == 0
            # Get network interfaces
            interfaces = {}
            for iface, addrs in psutil.net_if_addrs().items():
                interfaces[iface] = [
                    {"family": addr.family.name, "address": addr.address}
                    for addr in addrs
                ]
            return {
                "internet_reachable": internet_up,
                "interfaces": interfaces,
            }
        except Exception as e:
            return {"error": str(e)}
    def check_boot_status(self) -> Dict[str, Any]:
        """Check boot and uptime information"""
        try:
            boot_time = datetime.fromtimestamp(psutil.boot_time())
            uptime_seconds = time.time() - psutil.boot_time()
            return {
                "boot_time": boot_time.isoformat(),
                "uptime_seconds": uptime_seconds,
                "uptime_hours": uptime_seconds / 3600,
            }
        except Exception as e:
            return {"error": str(e)}
    def save_snapshot(self, data: Dict[str, Any]):
        """Save a snapshot of system state"""
        snapshot_file = self.state_dir / f"snapshot_{int(time.time())}.json"
        with open(snapshot_file, 'w') as f:
            json.dump(data, f, indent=2)
        # Keep only last 100 snapshots
        snapshots = sorted(self.state_dir.glob("snapshot_*.json"))
        for old_snapshot in snapshots[:-100]:
            old_snapshot.unlink()
    def get_summary(self, data: Dict[str, Any]) -> str:
        """Generate human-readable summary of system state"""
        lines = []
        lines.append(f"=== System Health Summary ({data['timestamp']}) ===\n")
        # Resources
        res = data.get("resources", {})
        lines.append(f"CPU: {res.get('cpu_percent', 0):.1f}%")
        lines.append(f"Memory: {res.get('memory_percent', 0):.1f}% ({res.get('memory_available_gb', 0):.1f}GB free)")
        lines.append(f"Load: {res.get('load_average', {}).get('1min', 0):.2f}")
        # Disk
        disk = data.get("disk", {})
        for part in disk.get("partitions", [])[:5]:  # Top 5 partitions
            lines.append(f"Disk {part['mountpoint']}: {part['percent_used']:.1f}% used ({part['free_gb']:.1f}GB free)")
        # Systemd
        systemd = data.get("systemd", {})
        failed = systemd.get("failed_count", 0)
        if failed > 0:
            lines.append(f"\n⚠️  WARNING: {failed} failed services!")
            for svc in systemd.get("failed_services", [])[:5]:
                lines.append(f"  - {svc.get('unit', 'unknown')}")
        # Errors
        logs = data.get("logs", {})
        error_count = logs.get("error_count_1h", 0)
        if error_count > 0:
            lines.append(f"\n⚠️  {error_count} errors in last hour")
        # Network
        net = data.get("network", {})
        if not net.get("internet_reachable", True):
            lines.append("\n⚠️  WARNING: No internet connectivity!")
        return "\n".join(lines)
 if __name__ == "__main__":
    monitor = SystemMonitor()
    data = monitor.collect_all()
    monitor.save_snapshot(data)
    print(monitor.get_summary(data))
    print(f"\nFull data saved to {monitor.state_dir}")
--- a/notifier.py
+++ b/notifier.py
@@ -0,0 +1,248 @@
 #!/usr/bin/env python3
 """
 Gotify Notifier - Send notifications to Gotify server
 """
 import requests
 import os
 from typing import Optional
 from datetime import datetime
 class GotifyNotifier:
    """Send notifications to Gotify server"""
    # Priority levels
    PRIORITY_LOW = 2
    PRIORITY_MEDIUM = 5
    PRIORITY_HIGH = 8
    def __init__(
        self,
        gotify_url: Optional[str] = None,
        gotify_token: Optional[str] = None
    ):
        """
        Initialize Gotify notifier
        Args:
            gotify_url: URL to Gotify server (e.g. http://rhiannon:8181)
            gotify_token: Application token from Gotify
        """
        self.gotify_url = gotify_url or os.environ.get("GOTIFY_URL", "")
        self.gotify_token = gotify_token or os.environ.get("GOTIFY_TOKEN", "")
        self.enabled = bool(self.gotify_url and self.gotify_token)
    def send(
        self,
        title: str,
        message: str,
        priority: int = PRIORITY_MEDIUM,
        extras: Optional[dict] = None
    ) -> bool:
        """
        Send a notification to Gotify
        Args:
            title: Notification title
            message: Notification message
            priority: Priority level (2=low, 5=medium, 8=high)
            extras: Optional extra data
        Returns:
            True if successful, False otherwise
        """
        if not self.enabled:
            return False
        try:
            url = f"{self.gotify_url}/message"
            headers = {
                "Authorization": f"Bearer {self.gotify_token}",
                "Content-Type": "application/json"
            }
            data = {
                "title": title,
                "message": message,
                "priority": priority,
            }
            if extras:
                data["extras"] = extras
            response = requests.post(
                url,
                json=data,
                headers=headers,
                timeout=10
            )
            return response.status_code == 200
        except Exception as e:
            # Fail silently - don't crash if Gotify is unavailable
            print(f"Warning: Failed to send Gotify notification: {e}")
            return False
    def notify_critical_issue(self, issue_description: str, details: str = ""):
        """Send high-priority notification for critical issues"""
        message = f"⚠️ Critical Issue Detected\n\n{issue_description}"
        if details:
            message += f"\n\nDetails:\n{details}"
        return self.send(
            title="🚨 Macha: Critical Issue",
            message=message,
            priority=self.PRIORITY_HIGH
        )
    def notify_issue_created(self, issue_id: str, title: str, severity: str):
        """Send notification when a new issue is created"""
        severity_icons = {
            "low": "ℹ️",
            "medium": "⚠️",
            "high": "🚨",
            "critical": "🔴"
        }
        icon = severity_icons.get(severity, "⚠️")
        priority_map = {
            "low": self.PRIORITY_LOW,
            "medium": self.PRIORITY_MEDIUM,
            "high": self.PRIORITY_HIGH,
            "critical": self.PRIORITY_HIGH
        }
        priority = priority_map.get(severity, self.PRIORITY_MEDIUM)
        message = f"{icon} New Issue Tracked\n\nID: {issue_id}\nSeverity: {severity.upper()}\n\n{title}"
        return self.send(
            title="📋 Macha: Issue Created",
            message=message,
            priority=priority
        )
    def notify_action_queued(self, action_description: str, risk_level: str):
        """Send notification when action is queued for approval"""
        emoji = "⚠️" if risk_level == "high" else "ℹ️"
        message = (
            f"{emoji} Action Queued for Approval\n\n"
            f"Action: {action_description}\n"
            f"Risk Level: {risk_level}\n\n"
            f"Use 'macha-approve list' to review"
        )
        priority = self.PRIORITY_HIGH if risk_level == "high" else self.PRIORITY_MEDIUM
        return self.send(
            title="📋 Macha: Action Needs Approval",
            message=message,
            priority=priority
        )
    def notify_action_executed(self, action_description: str, success: bool, output: str = ""):
        """Send notification when action is executed"""
        if success:
            emoji = "✅"
            title_prefix = "Success"
        else:
            emoji = "❌"
            title_prefix = "Failed"
        message = f"{emoji} Action {title_prefix}\n\n{action_description}"
        if output:
            message += f"\n\nOutput:\n{output[:500]}"  # Limit output length
        priority = self.PRIORITY_HIGH if not success else self.PRIORITY_LOW
        return self.send(
            title=f"{emoji} Macha: Action {title_prefix}",
            message=message,
            priority=priority
        )
    def notify_service_failure(self, service_name: str, details: str = ""):
        """Send notification for service failures"""
        message = f"🔴 Service Failed: {service_name}"
        if details:
            message += f"\n\nDetails:\n{details}"
        return self.send(
            title="🔴 Macha: Service Failure",
            message=message,
            priority=self.PRIORITY_HIGH
        )
    def notify_health_summary(self, summary: str, status: str):
        """Send periodic health summary"""
        emoji = {
            "healthy": "✅",
            "attention_needed": "⚠️",
            "intervention_required": "🚨"
        }.get(status, "ℹ️")
        priority = {
            "healthy": self.PRIORITY_LOW,
            "attention_needed": self.PRIORITY_MEDIUM,
            "intervention_required": self.PRIORITY_HIGH
        }.get(status, self.PRIORITY_MEDIUM)
        return self.send(
            title=f"{emoji} Macha: Health Check",
            message=summary,
            priority=priority
        )
    def send_system_discovered(
        self,
        hostname: str,
        os_type: str,
        role: str,
        services_count: int
    ):
        """Send notification when a new system is discovered"""
        message = (
            f"🔍 New System Auto-Discovered\n\n"
            f"Hostname: {hostname}\n"
            f"OS: {os_type.upper()}\n"
            f"Role: {role}\n"
            f"Services: {services_count} detected\n\n"
            f"System has been registered and analyzed.\n"
            f"Use 'macha-systems' to view all registered systems."
        )
        return self.send(
            title="🌐 Macha: New System Discovered",
            message=message,
            priority=self.PRIORITY_MEDIUM
        )
 if __name__ == "__main__":
    import sys
    # Test the notifier
    if len(sys.argv) < 3:
        print("Usage: notifier.py <title> <message> [priority]")
        print("Example: notifier.py 'Test' 'This is a test message' 5")
        sys.exit(1)
    title = sys.argv[1]
    message = sys.argv[2]
    priority = int(sys.argv[3]) if len(sys.argv) > 3 else GotifyNotifier.PRIORITY_MEDIUM
    notifier = GotifyNotifier()
    if not notifier.enabled:
        print("Error: Gotify not configured (GOTIFY_URL and GOTIFY_TOKEN required)")
        sys.exit(1)
    success = notifier.send(title, message, priority)
    if success:
        print("✅ Notification sent successfully")
    else:
        print("❌ Failed to send notification")
        sys.exit(1)
--- a/ollama_queue.py
+++ b/ollama_queue.py
@@ -0,0 +1,238 @@
 #!/usr/bin/env python3
 """
 Ollama Queue Handler - Serializes all LLM requests to prevent resource contention
 """
 import json
 import time
 import fcntl
 import signal
 from pathlib import Path
 from typing import Dict, Any, Optional, Callable
 from datetime import datetime
 from enum import IntEnum
 class Priority(IntEnum):
    """Request priority levels"""
    INTERACTIVE = 0  # User requests (highest priority)
    AUTONOMOUS = 1   # Background maintenance
    BATCH = 2        # Low priority bulk operations
 class OllamaQueue:
    """File-based queue for serializing Ollama requests"""
    def __init__(self, queue_dir: Path = Path("/var/lib/macha/queues/ollama")):
        self.queue_dir = queue_dir
        self.queue_dir.mkdir(parents=True, exist_ok=True)
        self.pending_dir = self.queue_dir / "pending"
        self.processing_dir = self.queue_dir / "processing"
        self.completed_dir = self.queue_dir / "completed"
        self.failed_dir = self.queue_dir / "failed"
        for dir in [self.pending_dir, self.processing_dir, self.completed_dir, self.failed_dir]:
            dir.mkdir(parents=True, exist_ok=True)
        self.lock_file = self.queue_dir / "queue.lock"
        self.running = False
    def submit(
        self,
        request_type: str,  # "generate", "chat", "chat_with_tools"
        payload: Dict[str, Any],
        priority: Priority = Priority.INTERACTIVE,
        callback: Optional[Callable] = None,
        progress_callback: Optional[Callable] = None
    ) -> str:
        """Submit a request to the queue. Returns request ID."""
        request_id = f"{int(time.time() * 1000000)}_{priority.value}"
        request_data = {
            "id": request_id,
            "type": request_type,
            "payload": payload,
            "priority": priority.value,
            "submitted_at": datetime.now().isoformat(),
            "status": "pending"
        }
        request_file = self.pending_dir / f"{request_id}.json"
        request_file.write_text(json.dumps(request_data, indent=2))
        return request_id
    def get_status(self, request_id: str) -> Dict[str, Any]:
        """Get the status of a request"""
        # Check pending
        pending_file = self.pending_dir / f"{request_id}.json"
        if pending_file.exists():
            data = json.loads(pending_file.read_text())
            # Calculate position in queue
            position = self._get_queue_position(request_id)
            return {"status": "pending", "position": position, "data": data}
        # Check processing
        processing_file = self.processing_dir / f"{request_id}.json"
        if processing_file.exists():
            data = json.loads(processing_file.read_text())
            return {"status": "processing", "data": data}
        # Check completed
        completed_file = self.completed_dir / f"{request_id}.json"
        if completed_file.exists():
            data = json.loads(completed_file.read_text())
            return {"status": "completed", "result": data.get("result"), "data": data}
        # Check failed
        failed_file = self.failed_dir / f"{request_id}.json"
        if failed_file.exists():
            data = json.loads(failed_file.read_text())
            return {"status": "failed", "error": data.get("error"), "data": data}
        return {"status": "not_found"}
    def _get_queue_position(self, request_id: str) -> int:
        """Get position in queue (1-indexed)"""
        pending_requests = sorted(
            self.pending_dir.glob("*.json"),
            key=lambda p: (int(p.stem.split('_')[1]), int(p.stem.split('_')[0]))  # Sort by priority, then timestamp
        )
        for i, req_file in enumerate(pending_requests):
            if req_file.stem == request_id:
                return i + 1
        return 0
    def wait_for_result(
        self,
        request_id: str,
        timeout: int = 300,
        poll_interval: float = 0.5,
        progress_callback: Optional[Callable] = None
    ) -> Dict[str, Any]:
        """Wait for a request to complete and return the result"""
        start_time = time.time()
        last_status = None
        while time.time() - start_time < timeout:
            status = self.get_status(request_id)
            # Report progress if status changed
            if progress_callback and status != last_status:
                if status["status"] == "pending":
                    progress_callback(f"Queued (position {status.get('position', '?')})")
                elif status["status"] == "processing":
                    progress_callback("Processing...")
            last_status = status
            if status["status"] == "completed":
                return status["result"]
            elif status["status"] == "failed":
                raise Exception(f"Request failed: {status.get('error')}")
            elif status["status"] == "not_found":
                raise Exception(f"Request {request_id} not found")
            time.sleep(poll_interval)
        raise TimeoutError(f"Request {request_id} timed out after {timeout}s")
    def start_worker(self, ollama_client):
        """Start the queue worker (processes requests serially)"""
        self.running = True
        self.ollama_client = ollama_client
        # Set up signal handlers for graceful shutdown
        signal.signal(signal.SIGTERM, self._shutdown_handler)
        signal.signal(signal.SIGINT, self._shutdown_handler)
        print("[OllamaQueue] Worker started, processing requests...")
        while self.running:
            try:
                self._process_next_request()
            except Exception as e:
                print(f"[OllamaQueue] Error processing request: {e}")
            time.sleep(0.1)  # Small sleep to prevent busy-waiting
        print("[OllamaQueue] Worker stopped")
    def _shutdown_handler(self, signum, frame):
        """Handle shutdown signals"""
        print(f"[OllamaQueue] Received signal {signum}, shutting down...")
        self.running = False
    def _process_next_request(self):
        """Process the next request in the queue"""
        # Get pending requests sorted by priority
        pending_requests = sorted(
            self.pending_dir.glob("*.json"),
            key=lambda p: (int(p.stem.split('_')[1]), int(p.stem.split('_')[0]))
        )
        if not pending_requests:
            return
        next_request = pending_requests[0]
        request_id = next_request.stem
        # Move to processing
        request_data = json.loads(next_request.read_text())
        request_data["status"] = "processing"
        request_data["started_at"] = datetime.now().isoformat()
        processing_file = self.processing_dir / f"{request_id}.json"
        processing_file.write_text(json.dumps(request_data, indent=2))
        next_request.unlink()
        try:
            # Process based on type
            result = None
            if request_data["type"] == "generate":
                result = self.ollama_client.generate(request_data["payload"])
            elif request_data["type"] == "chat":
                result = self.ollama_client.chat(request_data["payload"])
            elif request_data["type"] == "chat_with_tools":
                result = self.ollama_client.chat_with_tools(request_data["payload"])
            else:
                raise ValueError(f"Unknown request type: {request_data['type']}")
            # Move to completed
            request_data["status"] = "completed"
            request_data["completed_at"] = datetime.now().isoformat()
            request_data["result"] = result
            completed_file = self.completed_dir / f"{request_id}.json"
            completed_file.write_text(json.dumps(request_data, indent=2))
            processing_file.unlink()
        except Exception as e:
            # Move to failed
            request_data["status"] = "failed"
            request_data["failed_at"] = datetime.now().isoformat()
            request_data["error"] = str(e)
            failed_file = self.failed_dir / f"{request_id}.json"
            failed_file.write_text(json.dumps(request_data, indent=2))
            processing_file.unlink()
    def cleanup_old_requests(self, max_age_seconds: int = 3600):
        """Clean up completed/failed requests older than max_age_seconds"""
        cutoff_time = time.time() - max_age_seconds
        for directory in [self.completed_dir, self.failed_dir]:
            for request_file in directory.glob("*.json"):
                # Extract timestamp from filename
                timestamp = int(request_file.stem.split('_')[0]) / 1000000
                if timestamp < cutoff_time:
                    request_file.unlink()
    def get_queue_stats(self) -> Dict[str, Any]:
        """Get queue statistics"""
        return {
            "pending": len(list(self.pending_dir.glob("*.json"))),
            "processing": len(list(self.processing_dir.glob("*.json"))),
            "completed": len(list(self.completed_dir.glob("*.json"))),
            "failed": len(list(self.failed_dir.glob("*.json")))
        }
--- a/ollama_worker.py
+++ b/ollama_worker.py
@@ -0,0 +1,111 @@
 #!/usr/bin/env python3
 """
 Ollama Queue Worker - Daemon that processes queued Ollama requests
 """
 import sys
 import requests
 from pathlib import Path
 from ollama_queue import OllamaQueue
 class OllamaClient:
    """Simple Ollama API client for the queue worker"""
    def __init__(self, host: str = "http://localhost:11434"):
        self.host = host
    def generate(self, payload: dict) -> dict:
        """Call /api/generate"""
        response = requests.post(
            f"{self.host}/api/generate",
            json=payload,
            timeout=payload.get("timeout", 300),
            stream=False
        )
        response.raise_for_status()
        return response.json()
    def chat(self, payload: dict) -> dict:
        """Call /api/chat"""
        response = requests.post(
            f"{self.host}/api/chat",
            json=payload,
            timeout=payload.get("timeout", 300),
            stream=False
        )
        response.raise_for_status()
        return response.json()
    def chat_with_tools(self, payload: dict) -> dict:
        """Call /api/chat with tools (streaming or non-streaming)"""
        import json
        # Check if streaming is requested
        stream = payload.get("stream", False)
        response = requests.post(
            f"{self.host}/api/chat",
            json=payload,
            timeout=payload.get("timeout", 300),
            stream=stream
        )
        response.raise_for_status()
        if not stream:
            # Non-streaming: return response directly
            return response.json()
        # Streaming: accumulate response
        full_response = {"message": {"role": "assistant", "content": "", "tool_calls": []}}
        for line in response.iter_lines():
            if line:
                chunk = json.loads(line)
                if "message" in chunk:
                    msg = chunk["message"]
                    # Preserve role from first chunk
                    if "role" in msg and not full_response["message"].get("role"):
                        full_response["message"]["role"] = msg["role"]
                    if "content" in msg:
                        full_response["message"]["content"] += msg["content"]
                    if "tool_calls" in msg:
                        full_response["message"]["tool_calls"].extend(msg["tool_calls"])
                if chunk.get("done"):
                    full_response["done"] = True
                    # Copy any additional fields from final chunk
                    for key in chunk:
                        if key not in ("message", "done"):
                            full_response[key] = chunk[key]
                    break
        # Ensure role is set
        if "role" not in full_response["message"]:
            full_response["message"]["role"] = "assistant"
        return full_response
 def main():
    """Main entry point for the worker"""
    print("Starting Ollama Queue Worker...")
    # Initialize queue and client
    queue = OllamaQueue()
    client = OllamaClient()
    # Cleanup old requests on startup
    queue.cleanup_old_requests(max_age_seconds=3600)
    # Start processing
    try:
        queue.start_worker(client)
    except KeyboardInterrupt:
        print("\nShutting down gracefully...")
        queue.running = False
    return 0
 if __name__ == "__main__":
    sys.exit(main())
--- a/orchestrator.py
+++ b/orchestrator.py
--- a/remote_monitor.py
+++ b/remote_monitor.py
@@ -0,0 +1,263 @@
 #!/usr/bin/env python3
 """
 Remote Monitor - Collect system health data from remote NixOS systems via SSH
 """
 import json
 import subprocess
 from typing import Dict, Any, Optional
 from pathlib import Path
 class RemoteMonitor:
    """Monitor remote systems via SSH"""
    def __init__(self, hostname: str, ssh_user: str = "root"):
        """
        Initialize remote monitor
        Args:
            hostname: Remote hostname or IP
            ssh_user: SSH user (default: root for NixOS remote builds)
        """
        self.hostname = hostname
        self.ssh_user = ssh_user
        self.ssh_target = f"{ssh_user}@{hostname}"
    def _run_remote_command(self, command: str, timeout: int = 30) -> tuple[bool, str, str]:
        """
        Run a command on the remote system via SSH
        Args:
            command: Command to run
            timeout: Timeout in seconds
        Returns:
            (success, stdout, stderr)
        """
        try:
            # Use sudo to run SSH as root (which has the keys)
            ssh_cmd = [
                "sudo", "ssh",
                "-o", "StrictHostKeyChecking=no",
                "-o", "ConnectTimeout=10",
                self.ssh_target,
                command
            ]
            result = subprocess.run(
                ssh_cmd,
                capture_output=True,
                text=True,
                timeout=timeout
            )
            return (
                result.returncode == 0,
                result.stdout.strip(),
                result.stderr.strip()
            )
        except subprocess.TimeoutExpired:
            return False, "", f"Command timed out after {timeout}s"
        except Exception as e:
            return False, "", str(e)
    def check_connectivity(self) -> bool:
        """Check if we can connect to the remote system"""
        success, _, _ = self._run_remote_command("echo 'ping'")
        return success
    def collect_resources(self) -> Dict[str, Any]:
        """Collect CPU, memory, and load average"""
        success, output, error = self._run_remote_command("""
            python3 -c "
 import psutil, json
 print(json.dumps({
    'cpu_percent': psutil.cpu_percent(interval=1),
    'memory_percent': psutil.virtual_memory().percent,
    'load_average': {
        '1min': psutil.getloadavg()[0],
        '5min': psutil.getloadavg()[1],
        '15min': psutil.getloadavg()[2]
    }
 }))
 "
        """)
        if success:
            try:
                return json.loads(output)
            except json.JSONDecodeError:
                return {}
        return {}
    def collect_systemd_status(self) -> Dict[str, Any]:
        """Collect systemd service status"""
        success, output, error = self._run_remote_command(
            "systemctl list-units --failed --no-pager --no-legend --output=json"
        )
        if success:
            try:
                failed_services = json.loads(output) if output else []
                return {
                    "failed_count": len(failed_services),
                    "failed_services": failed_services
                }
            except json.JSONDecodeError:
                pass
        return {"failed_count": 0, "failed_services": []}
    def collect_disk_usage(self) -> Dict[str, Any]:
        """Collect disk usage information"""
        success, output, error = self._run_remote_command("""
            python3 -c "
 import psutil, json
 partitions = []
 for part in psutil.disk_partitions():
    try:
        usage = psutil.disk_usage(part.mountpoint)
        partitions.append({
            'device': part.device,
            'mountpoint': part.mountpoint,
            'fstype': part.fstype,
            'total': usage.total,
            'used': usage.used,
            'free': usage.free,
            'percent_used': usage.percent
        })
    except:
        pass
 print(json.dumps({'partitions': partitions}))
 "
        """)
        if success:
            try:
                return json.loads(output)
            except json.JSONDecodeError:
                return {"partitions": []}
        return {"partitions": []}
    def collect_network_status(self) -> Dict[str, Any]:
        """Check network connectivity"""
        # If we can SSH to it, network is working
        success, _, _ = self._run_remote_command("ping -c 1 -W 2 8.8.8.8")
        return {
            "internet_reachable": success
        }
    def collect_log_errors(self) -> Dict[str, Any]:
        """Collect recent error logs"""
        success, output, error = self._run_remote_command(
            "journalctl --priority=err --since='1 hour ago' --output=json --no-pager | wc -l"
        )
        error_count = 0
        if success:
            try:
                error_count = int(output)
            except ValueError:
                pass
        return {
            "error_count_1h": error_count,
            "recent_errors": []  # Could expand this later
        }
    def collect_all(self) -> Dict[str, Any]:
        """Collect all monitoring data from remote system"""
        # First check if we can connect
        if not self.check_connectivity():
            return {
                "hostname": self.hostname,
                "reachable": False,
                "error": "Unable to connect via SSH"
            }
        return {
            "hostname": self.hostname,
            "reachable": True,
            "resources": self.collect_resources(),
            "systemd": self.collect_systemd_status(),
            "disk": self.collect_disk_usage(),
            "network": self.collect_network_status(),
            "logs": self.collect_log_errors(),
        }
    def get_summary(self, data: Dict[str, Any]) -> str:
        """Generate human-readable summary of remote system health"""
        if not data.get("reachable", False):
            return f"❌ {self.hostname}: Unreachable - {data.get('error', 'Unknown error')}"
        lines = [f"System: {self.hostname}"]
        # Resources
        res = data.get("resources", {})
        if res:
            lines.append(
                f"Resources: CPU {res.get('cpu_percent', 0):.1f}%, "
                f"Memory {res.get('memory_percent', 0):.1f}%, "
                f"Load {res.get('load_average', {}).get('1min', 0):.2f}"
            )
        # Disk
        disk = data.get("disk", {})
        max_usage = 0
        for part in disk.get("partitions", []):
            if part.get("mountpoint") == "/":
                max_usage = part.get("percent_used", 0)
                break
        if max_usage > 0:
            lines.append(f"Disk: {max_usage:.1f}% used (/ partition)")
        # Services
        systemd = data.get("systemd", {})
        failed_count = systemd.get("failed_count", 0)
        if failed_count > 0:
            lines.append(f"Services: {failed_count} failed")
            for svc in systemd.get("failed_services", [])[:3]:
                lines.append(f"  - {svc.get('unit', 'unknown')}")
        else:
            lines.append("Services: All running")
        # Network
        net = data.get("network", {})
        if net.get("internet_reachable"):
            lines.append("Network: Internet reachable")
        else:
            lines.append("Network: ⚠️ No internet connectivity")
        # Logs
        logs = data.get("logs", {})
        error_count = logs.get("error_count_1h", 0)
        if error_count > 0:
            lines.append(f"Recent logs: {error_count} errors in last hour")
        return "\n".join(lines)
 if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: remote_monitor.py <hostname>")
        print("Example: remote_monitor.py rhiannon")
        sys.exit(1)
    hostname = sys.argv[1]
    monitor = RemoteMonitor(hostname)
    print(f"Monitoring {hostname}...")
    data = monitor.collect_all()
    print("\n" + "="*60)
    print(monitor.get_summary(data))
    print("="*60)
    print("\nFull data:")
    print(json.dumps(data, indent=2))
--- a/seed_knowledge.py
+++ b/seed_knowledge.py
@@ -0,0 +1,128 @@
 #!/usr/bin/env python3
 """
 Seed initial operational knowledge into Macha's knowledge base
 """
 import sys
 sys.path.insert(0, '.')
 from context_db import ContextDatabase
 def seed_knowledge():
    """Add foundational operational knowledge"""
    db = ContextDatabase()
    knowledge_items = [
        # nh command knowledge
        {
            "topic": "nh os switch",
            "knowledge": "NixOS rebuild command. Takes 1-5 minutes normally, up to 1 HOUR for major updates with many packages. DO NOT retry if slow - this is normal. Use -u flag to update flake inputs first. Can use --target-host and --hostname for remote deployment.",
            "category": "command",
            "source": "documentation",
            "confidence": "high",
            "tags": ["nixos", "rebuild", "deployment"]
        },
        {
            "topic": "nh os boot",
            "knowledge": "NixOS rebuild for next boot only. Safer than 'switch' for high-risk changes - allows easy rollback. After 'nh os boot', need to reboot for changes to take effect. Use -u to update flake inputs.",
            "category": "command",
            "source": "documentation",
            "confidence": "high",
            "tags": ["nixos", "rebuild", "safety"]
        },
        {
            "topic": "nh remote deployment",
            "knowledge": "Format: 'nh os switch -u --target-host=HOSTNAME --hostname=HOSTNAME'. Builds locally and deploys to remote. Much cleaner than SSH'ing to run commands. Uses root SSH keys for authentication.",
            "category": "command",
            "source": "documentation",
            "confidence": "high",
            "tags": ["nixos", "remote", "deployment"]
        },
        # Performance patterns
        {
            "topic": "build timeouts",
            "knowledge": "System rebuilds can take 1 hour or more. Never retry builds prematurely - multiple simultaneous builds corrupt the Nix cache. Default timeout is 3600 seconds (1 hour). Be patient!",
            "category": "performance",
            "source": "experience",
            "confidence": "high",
            "tags": ["builds", "timeouts", "patience"]
        },
        # Nix store maintenance
        {
            "topic": "nix-store repair",
            "knowledge": "Command: 'nix-store --verify --check-contents --repair'. Verifies and repairs Nix store integrity. WARNING: Can take HOURS on large stores. Only use when there's clear evidence of corruption (hash mismatches, sqlite errors). This is a LAST RESORT - most build failures are NOT corruption.",
            "category": "troubleshooting",
            "source": "documentation",
            "confidence": "high",
            "tags": ["nix-store", "repair", "corruption"]
        },
        {
            "topic": "nix cache corruption",
            "knowledge": "Caused by interrupted builds or multiple simultaneous builds. Symptoms: hash mismatches, sqlite errors, corrupt database. Solution: 'nix-store --verify --check-contents --repair' but this takes hours. Prevention: Never retry build commands, use proper timeouts.",
            "category": "troubleshooting",
            "source": "experience",
            "confidence": "high",
            "tags": ["nix-store", "corruption", "builds"]
        },
        # systemd-journal-remote
        {
            "topic": "systemd-journal-remote errors",
            "knowledge": "Common failure: missing output directory. systemd-journal-remote needs /var/log/journal/remote to exist with proper permissions (root:root, 755). Create it if missing, then restart the service.",
            "category": "troubleshooting",
            "source": "experience",
            "confidence": "medium",
            "tags": ["systemd", "journal", "logging"]
        },
        # SSH and remote access
        {
            "topic": "ssh-keygen",
            "knowledge": "Generate SSH keys: 'ssh-keygen -t ed25519 -N \"\" -f ~/.ssh/id_ed25519'. Creates public key at ~/.ssh/id_ed25519.pub and private key at ~/.ssh/id_ed25519. Use -N \"\" for no passphrase.",
            "category": "command",
            "source": "documentation",
            "confidence": "high",
            "tags": ["ssh", "keys", "authentication"]
        },
        # General patterns
        {
            "topic": "command retries",
            "knowledge": "NEVER automatically retry long-running commands like builds or system updates. If something times out, check if it's still running before retrying. Automatic retries can cause: corrupted state, wasted resources, conflicting operations.",
            "category": "pattern",
            "source": "experience",
            "confidence": "high",
            "tags": ["best-practices", "safety", "retries"]
        },
        {
            "topic": "conversation etiquette",
            "knowledge": "Social responses like 'thank you', 'thanks', 'ok', 'great', 'nice' are acknowledgments, NOT requests. When user thanks you or acknowledges completion, respond conversationally - DO NOT re-execute tools or commands.",
            "category": "pattern",
            "source": "documentation",
            "confidence": "high",
            "tags": ["conversation", "etiquette", "ui"]
        }
    ]
    print("Seeding knowledge base...")
    for item in knowledge_items:
        kid = db.store_knowledge(**item)
        if kid:
            print(f"  ✓ Added: {item['topic']}")
        else:
            print(f"  ✗ Failed: {item['topic']}")
    print(f"\nSeeded {len(knowledge_items)} knowledge items!")
    # List all topics
    print("\nAvailable knowledge topics:")
    topics = db.list_knowledge_topics()
    for topic in topics:
        print(f"  - {topic}")
 if __name__ == "__main__":
    seed_knowledge()
--- a/system_discovery.py
+++ b/system_discovery.py
@@ -0,0 +1,209 @@
 #!/usr/bin/env python3
 """
 System Discovery - Auto-discover and profile systems from journal logs
 """
 import subprocess
 import json
 import re
 from typing import Dict, List, Set, Optional, Any
 from datetime import datetime
 from pathlib import Path
 class SystemDiscovery:
    """Discover and profile new systems appearing in logs"""
    def __init__(self, domain: str = "coven.systems"):
        self.domain = domain
        self.known_systems: Set[str] = set()
    def discover_from_journal(self, since_minutes: int = 10) -> List[str]:
        """Discover systems that have sent logs recently"""
        try:
            # Query systemd-journal-remote logs for remote hostnames
            result = subprocess.run(
                ["journalctl", "-u", "systemd-journal-remote.service", 
                 f"--since={since_minutes} minutes ago", "--no-pager"],
                capture_output=True,
                text=True,
                timeout=30
            )
            # Also check journal for _HOSTNAME field (from remote logs)
            result2 = subprocess.run(
                ["journalctl", f"--since={since_minutes} minutes ago",
                 "-o", "json", "--no-pager"],
                capture_output=True,
                text=True,
                timeout=30
            )
            hostnames = set()
            # Parse JSON output for _HOSTNAME field
            for line in result2.stdout.split('\n'):
                if not line.strip():
                    continue
                try:
                    entry = json.loads(line)
                    hostname = entry.get('_HOSTNAME')
                    if hostname and hostname not in ['localhost', 'macha']:
                        # Convert short hostname to FQDN if needed
                        if '.' not in hostname:
                            hostname = f"{hostname}.{self.domain}"
                        hostnames.add(hostname)
                except:
                    pass
            return list(hostnames)
        except Exception as e:
            print(f"Error discovering from journal: {e}")
            return []
    def detect_os_type(self, hostname: str) -> str:
        """Detect the operating system of a remote host via SSH"""
        try:
            # Try to detect OS via SSH
            result = subprocess.run(
                ["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
                 hostname, "cat /etc/os-release"],
                capture_output=True,
                text=True,
                timeout=10
            )
            if result.returncode == 0:
                os_release = result.stdout.lower()
                # Parse os-release
                if 'nixos' in os_release:
                    return 'nixos'
                elif 'ubuntu' in os_release:
                    return 'ubuntu'
                elif 'debian' in os_release:
                    return 'debian'
                elif 'arch' in os_release or 'manjaro' in os_release:
                    return 'arch'
                elif 'fedora' in os_release:
                    return 'fedora'
                elif 'centos' in os_release or 'rhel' in os_release:
                    return 'rhel'
                elif 'alpine' in os_release:
                    return 'alpine'
            # Try uname for other systems
            result = subprocess.run(
                ["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
                 hostname, "uname -s"],
                capture_output=True,
                text=True,
                timeout=10
            )
            if result.returncode == 0:
                uname = result.stdout.strip().lower()
                if 'darwin' in uname:
                    return 'macos'
                elif 'freebsd' in uname:
                    return 'freebsd'
            return 'linux'  # Generic fallback
        except Exception as e:
            print(f"Could not detect OS for {hostname}: {e}")
            return 'unknown'
    def profile_system(self, hostname: str, os_type: str) -> Dict[str, Any]:
        """Gather comprehensive information about a system"""
        profile = {
            'hostname': hostname,
            'os_type': os_type,
            'services': [],
            'capabilities': [],
            'hardware': {},
            'discovered_at': datetime.now().isoformat()
        }
        try:
            # Discover running services
            if os_type in ['nixos', 'ubuntu', 'debian', 'arch', 'fedora', 'rhel', 'alpine']:
                # Systemd-based systems
                result = subprocess.run(
                    ["ssh", "-o", "ConnectTimeout=5", hostname,
                     "systemctl list-units --type=service --state=running --no-pager --no-legend"],
                    capture_output=True,
                    text=True,
                    timeout=15
                )
                if result.returncode == 0:
                    for line in result.stdout.split('\n'):
                        if line.strip():
                            # Extract service name (first column)
                            service = line.split()[0]
                            if service.endswith('.service'):
                                service = service[:-8]  # Remove .service suffix
                            profile['services'].append(service)
            # Get hardware info
            result = subprocess.run(
                ["ssh", "-o", "ConnectTimeout=5", hostname,
                 "nproc && free -g | grep Mem | awk '{print $2}'"],
                capture_output=True,
                text=True,
                timeout=10
            )
            if result.returncode == 0:
                lines = result.stdout.strip().split('\n')
                if len(lines) >= 2:
                    profile['hardware']['cpu_cores'] = lines[0].strip()
                    profile['hardware']['memory_gb'] = lines[1].strip()
            # Detect capabilities based on services
            services_str = ' '.join(profile['services'])
            if 'docker' in services_str or 'containerd' in services_str:
                profile['capabilities'].append('containers')
            if 'nginx' in services_str or 'apache' in services_str or 'httpd' in services_str:
                profile['capabilities'].append('web-server')
            if 'postgresql' in services_str or 'mysql' in services_str or 'mariadb' in services_str:
                profile['capabilities'].append('database')
            if 'sshd' in services_str:
                profile['capabilities'].append('remote-access')
            # NixOS-specific: Check if it's in our flake
            if os_type == 'nixos':
                profile['capabilities'].append('nixos-managed')
        except Exception as e:
            print(f"Error profiling {hostname}: {e}")
        return profile
    def get_system_role(self, profile: Dict[str, Any]) -> str:
        """Determine system role based on profile"""
        capabilities = profile.get('capabilities', [])
        services = profile.get('services', [])
        # Check for specific roles
        if 'ai-inference' in capabilities or 'ollama' in services:
            return 'ai-workstation'
        elif 'web-server' in capabilities:
            return 'web-server'
        elif 'database' in capabilities:
            return 'database-server'
        elif 'containers' in capabilities:
            return 'container-host'
        elif len(services) > 20:
            return 'server'
        elif len(services) > 5:
            return 'workstation'
        else:
            return 'minimal'
--- a/system_prompt.txt
+++ b/system_prompt.txt
@@ -0,0 +1,131 @@
 You are Macha, an autonomous AI system maintenance agent running on NixOS.
 IDENTITY:
 - You are intelligent, careful, methodical, and motherly
 - You have access to system monitoring data, configuration files, and investigation results
 - You can propose fixes, but humans must approve risky changes
 YOUR ARCHITECTURE:
 - You run as a systemd service (macha-autonomous.service) on the macha.coven.systems host
 - You are monitoring the SAME SYSTEM you are running on (macha.coven.systems)
 - Your inference engine is Ollama, running locally at http://localhost:11434
 - You are powered by the gpt-oss:latest language model (GPT-like open source model)
 - Your database is ChromaDB, running at http://localhost:8000
 - All your components (orchestrator, agent, ChromaDB, Ollama) run on the same machine
 - You can investigate and fix issues with your own infrastructure
 - Be aware: if you break the system, you break yourself
 - SELF-DIAGNOSTIC: In chat mode, if your inference fails, you automatically diagnose:
  * Ollama service status
  * Memory usage
  * Which models are loaded
  * Recent Ollama logs
 EXECUTION CONTEXT:
 - In autonomous mode: You run as the 'macha' user (unprivileged, UID 2501)
 - In chat mode: You run as the invoking user (usually has sudo access)
 - IMPORTANT: You do NOT need to add 'sudo' to commands in chat mode
 - The system automatically retries commands with sudo if permission is denied
 - Just use the command directly: 'reboot', 'systemctl restart X', 'nh os switch', etc.
 - The user will see a notification if the command was retried with elevated privileges
 CONVERSATIONAL ETIQUETTE:
 - Recognize social responses: "thank you", "thanks", "ok", "great", "nice" etc. are acknowledgments, NOT requests
 - When the user thanks you or acknowledges completion, simply respond conversationally - DO NOT re-execute tools
 - Only use tools when the user makes an actual request or asks a question requiring information
 - If a task is complete and the user acknowledges it, the conversation is done - just say "You're welcome!" or similar
 CORE PRINCIPLES:
 1. CONSERVATIVE: When in doubt, investigate before acting
 2. DECLARATIVE: Prefer NixOS configuration changes over imperative commands
 3. SAFE: Never disable critical services (SSH, networking, systemd, boot)
 4. INFORMED: Use previous investigation results to avoid repetition
 5. CONTEXTUAL: Reference actual configuration files when available
 RISK LEVELS:
 - LOW: Investigation commands (systemctl status, journalctl, ls, cat, grep)
 - MEDIUM: Service restarts, configuration changes, cleanup
 - HIGH: System rebuilds, package changes, network reconfigurations
 AUTO-APPROVAL:
 - Low-risk investigation actions are automatically executed
 - Medium/high-risk actions require human approval
 CONFIGURATION:
 - This system uses NixOS flakes for configuration management
 - Config changes must specify the actual .nix file in the repository
 - Example: autonomous/module.nix, apps/gotify.nix, or systems/macha.nix
 - NEVER reference /etc/nixos/configuration.nix (this system doesn't use it)
 - You cannot directly edit the flake, only suggest changes to get pushed to the repo
 SYSTEM MANAGEMENT COMMANDS:
 - CRITICAL: This system uses 'nh' (a modern nixos-rebuild wrapper) for all rebuilds
 - 'nh' is a wrapper around nixos-rebuild that provides better UX and flake auto-detection
 - The flake URL is auto-detected from programs.nh.flake (no need to specify it)
 Available nh commands (USE THESE, NOT nixos-rebuild):
  * 'nh os switch' - Rebuild and activate immediately (replaces: nixos-rebuild switch)
  * 'nh os switch -u' - Update flake inputs first, then rebuild/activate
  * 'nh os boot' - Rebuild for next boot only (replaces: nixos-rebuild boot)
  * 'nh os test' - Activate temporarily without setting as default
 MULTI-HOST MANAGEMENT:
 You manage multiple hosts in the infrastructure. You have TWO tools for remote operations:
 1. SSH - For diagnostics, monitoring, and status checks:
   - You CAN and SHOULD use SSH to check other hosts
   - Examples: 'ssh rhiannon systemctl status ollama', 'ssh alexander df -h'
   - Commands are automatically run with sudo as the macha user
   - Use for: checking services, reading logs, gathering metrics, quick diagnostics
   - Hosts available: rhiannon, alexander, UCAR-Kinston, test-vm
 2. nh remote deployment - For NixOS configuration changes:
   - Format: 'nh os switch -u --target-host=HOSTNAME --hostname=HOSTNAME'
   - Examples:
     * 'nh os switch -u --target-host=rhiannon --hostname=rhiannon'
     * 'nh os boot -u --target-host=alexander --hostname=alexander'
   - Builds configuration locally, deploys to remote host
   - Use for: permanent configuration changes, service updates, system modifications
 When asked to check on another host, USE SSH. When asked to update configuration, use nh.
 NOTIFICATIONS:
 - You can send notifications to the user via Gotify using the send_notification tool
 - Use notifications to inform the user about important events, especially when they're not actively chatting
 - Notification priorities:
  * Priority 2 (Low): Informational updates, routine completions, FYI items
  * Priority 5 (Medium): Actions needing attention, warnings, manual approval requests
  * Priority 8 (High): Critical issues, service failures, urgent problems requiring immediate attention
 - When to send notifications:
  * Critical issues detected (priority 8)
  * Service failures or degraded states (priority 8)
  * Actions queued for manual approval (priority 5)
  * Successful completion of important actions (priority 2)
  * When user explicitly asks for a notification
 - Keep titles brief and messages clear and actionable
 - Example: send_notification("Service Alert", "Ollama service crashed and was restarted", 8)
 PATIENCE WITH LONG-RUNNING OPERATIONS:
 - System rebuilds take time: 1-5 minutes normally, up to 1 HOUR for major updates
 - DO NOT retry build commands if they're taking a while - this is NORMAL
 - Multiple simultaneous builds will corrupt the Nix cache
 - If a build times out, check if it's still running before retrying
 - Default timeout is 1 hour (3600 seconds) - this is appropriate for most operations
 - Trust the timeout - if a command is still running, it will complete or fail on its own
 NIX STORE MAINTENANCE:
 - If builds fail with corruption errors, use: 'nix-store --verify --check-contents --repair'
 - This command verifies and repairs the Nix store integrity
 - WARNING: Store repair can take a LONG time (potentially hours on large stores)
 - Only run store repair when there's clear evidence of corruption (e.g., hash mismatches, sqlite errors)
 - Store repair is a last resort - most build failures are NOT corruption
 Risk-based command selection:
  * HIGH-RISK changes: Use 'nh os boot' + 'reboot' (allows easy rollback)
  * MEDIUM-RISK changes: Use 'nh os switch'
  * LOW-RISK changes: Use 'nh os switch'
 FORBIDDEN COMMANDS:
  * NEVER suggest 'nixos-rebuild' - it doesn't know the flake path
  * NEVER suggest 'nixos-rebuild switch --flake .#macha' - use 'nh os switch' instead
  * NEVER suggest 'sudo nixos-rebuild' commands - nh handles privileges correctly
--- a/tools.py
+++ b/tools.py
@@ -0,0 +1,705 @@
 #!/usr/bin/env python3
 """
 Tool Definitions - Functions that the AI can call to interact with the system
 """
 import subprocess
 import json
 import os
 from typing import Dict, Any, List, Optional
 from pathlib import Path
 class SysadminTools:
    """Collection of tools for system administration tasks"""
    def __init__(self, safe_mode: bool = True):
        """
        Initialize sysadmin tools
        Args:
            safe_mode: If True, restricts dangerous operations
        """
        self.safe_mode = safe_mode
        self.allowed_commands = [
            'systemctl', 'journalctl', 'free', 'df', 'uptime',
            'ps', 'top', 'ip', 'ss', 'cat', 'ls', 'grep',
            'ping', 'dig', 'nslookup', 'curl', 'wget',
            'lscpu', 'lspci', 'lsblk', 'lshw', 'dmidecode',
            'ssh', 'scp',  # Remote access to other systems in infrastructure
            'nh', 'nixos-rebuild',  # NixOS system management
            'reboot', 'shutdown', 'poweroff',  # System power management
            'logger'  # Logging for notifications
        ]
    def get_tool_definitions(self) -> List[Dict[str, Any]]:
        """
        Return tool definitions in Ollama's format
        Returns:
            List of tool definitions with JSON schema
        """
        return [
            {
                "type": "function",
                "function": {
                    "name": "execute_command",
                    "description": "Execute a shell command on the system. Use this to run system commands, check status, or gather information. Returns command output.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "command": {
                                "type": "string",
                                "description": "The shell command to execute (e.g., 'systemctl status ollama', 'df -h', 'journalctl -u myservice -n 20')"
                            },
                            "timeout": {
                                "type": "integer",
                                "description": "Command timeout in seconds (default: 3600). System rebuilds can take 1-5 minutes normally, up to 1 hour for major updates. Be patient!",
                                "default": 3600
                            }
                        },
                        "required": ["command"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "read_file",
                    "description": "Read the contents of a file from the filesystem. Use this to inspect configuration files, logs, or other text files.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "file_path": {
                                "type": "string",
                                "description": "Absolute path to the file to read (e.g., '/etc/nixos/configuration.nix', '/var/log/syslog')"
                            },
                            "max_lines": {
                                "type": "integer",
                                "description": "Maximum number of lines to read (default: 500)",
                                "default": 500
                            }
                        },
                        "required": ["file_path"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "check_service_status",
                    "description": "Check the status of a systemd service. Returns whether the service is active, enabled, and recent log entries.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "service_name": {
                                "type": "string",
                                "description": "Name of the systemd service (e.g., 'ollama.service', 'nginx', 'sshd')"
                            }
                        },
                        "required": ["service_name"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "view_logs",
                    "description": "View systemd journal logs. Can filter by unit, time period, or priority.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "unit": {
                                "type": "string",
                                "description": "Systemd unit name to filter logs (e.g., 'ollama.service')"
                            },
                            "lines": {
                                "type": "integer",
                                "description": "Number of recent log lines to return (default: 50)",
                                "default": 50
                            },
                            "priority": {
                                "type": "string",
                                "description": "Filter by priority: emerg, alert, crit, err, warning, notice, info, debug",
                                "enum": ["emerg", "alert", "crit", "err", "warning", "notice", "info", "debug"]
                            }
                        }
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_system_metrics",
                    "description": "Get current system resource metrics including CPU, memory, disk, and load average.",
                    "parameters": {
                        "type": "object",
                        "properties": {}
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_hardware_info",
                    "description": "Get detailed hardware information including CPU model, GPU, network interfaces, storage devices, and memory specs. Returns comprehensive hardware inventory.",
                    "parameters": {
                        "type": "object",
                        "properties": {}
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_gpu_metrics",
                    "description": "Get GPU temperature, utilization, clock speeds, and power usage. Works with AMD and NVIDIA GPUs. Returns current GPU metrics.",
                    "parameters": {
                        "type": "object",
                        "properties": {}
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "list_directory",
                    "description": "List contents of a directory. Returns file names, sizes, and permissions.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "directory_path": {
                                "type": "string",
                                "description": "Absolute path to the directory (e.g., '/etc', '/var/log')"
                            },
                            "show_hidden": {
                                "type": "boolean",
                                "description": "Include hidden files (starting with dot)",
                                "default": False
                            }
                        },
                        "required": ["directory_path"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "check_network",
                    "description": "Test network connectivity to a host. Can use ping or HTTP check.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "host": {
                                "type": "string",
                                "description": "Hostname or IP address to check (e.g., 'google.com', '8.8.8.8')"
                            },
                            "method": {
                                "type": "string",
                                "description": "Test method to use",
                                "enum": ["ping", "http"],
                                "default": "ping"
                            }
                        },
                        "required": ["host"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "retrieve_cached_output",
                    "description": "Retrieve full cached output from a previous tool call. Use this when you need to see complete data that was summarized earlier. The cache_id is shown in hierarchical summaries.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "cache_id": {
                                "type": "string",
                                "description": "Cache ID from a previous tool summary (e.g., 'view_logs_20251006_103045')"
                            },
                            "max_chars": {
                                "type": "integer",
                                "description": "Maximum characters to return (default: 10000 for focused analysis)",
                                "default": 10000
                            }
                        },
                        "required": ["cache_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "send_notification",
                    "description": "Send a notification to the user via Gotify. Use this to alert the user about important events, issues, or completed actions. Choose appropriate priority based on urgency.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "title": {
                                "type": "string",
                                "description": "Notification title (brief, e.g., 'Service Alert', 'Action Complete')"
                            },
                            "message": {
                                "type": "string",
                                "description": "Notification message body (detailed description of the event)"
                            },
                            "priority": {
                                "type": "integer",
                                "description": "Priority level: 2=Low (info), 5=Medium (attention needed), 8=High (critical/urgent)",
                                "enum": [2, 5, 8],
                                "default": 5
                            }
                        },
                        "required": ["title", "message"]
                    }
                }
            }
        ]
    def execute_command(self, command: str, timeout: int = 3600) -> Dict[str, Any]:
        """Execute a shell command safely (default timeout: 1 hour for system operations)"""
        # Safety check in safe mode
        if self.safe_mode:
            cmd_base = command.split()[0] if command.strip() else ""
            if cmd_base not in self.allowed_commands:
                return {
                    "success": False,
                    "error": f"Command '{cmd_base}' not in allowed list (safe mode enabled)",
                    "allowed_commands": self.allowed_commands
                }
        # Automatically configure SSH commands to use macha user on remote systems
        # Transform: ssh hostname cmd -> ssh macha@hostname sudo cmd
        if command.strip().startswith('ssh ') and '@' not in command.split()[1]:
            parts = command.split(maxsplit=2)
            if len(parts) >= 2:
                hostname = parts[1]
                remaining = ' '.join(parts[2:]) if len(parts) > 2 else ''
                # If there's a command to run remotely, prefix it with sudo
                if remaining:
                    command = f"ssh macha@{hostname} sudo {remaining}".strip()
                else:
                    command = f"ssh macha@{hostname}".strip()
        try:
            result = subprocess.run(
                command,
                shell=True,
                capture_output=True,
                text=True,
                timeout=timeout
            )
            return {
                "success": result.returncode == 0,
                "exit_code": result.returncode,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "command": command
            }
        except subprocess.TimeoutExpired:
            return {
                "success": False,
                "error": f"Command timed out after {timeout} seconds",
                "command": command
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "command": command
            }
    def read_file(self, file_path: str, max_lines: int = 500) -> Dict[str, Any]:
        """Read a file safely"""
        try:
            path = Path(file_path)
            if not path.exists():
                return {
                    "success": False,
                    "error": f"File not found: {file_path}"
                }
            if not path.is_file():
                return {
                    "success": False,
                    "error": f"Not a file: {file_path}"
                }
            # Read file with line limit
            lines = []
            with open(path, 'r', errors='replace') as f:
                for i, line in enumerate(f):
                    if i >= max_lines:
                        lines.append(f"\n... truncated after {max_lines} lines ...")
                        break
                    lines.append(line.rstrip('\n'))
            return {
                "success": True,
                "content": '\n'.join(lines),
                "path": file_path,
                "lines_read": len(lines)
            }
        except PermissionError:
            return {
                "success": False,
                "error": f"Permission denied: {file_path}"
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    def check_service_status(self, service_name: str) -> Dict[str, Any]:
        """Check systemd service status"""
        # Ensure .service suffix
        if not service_name.endswith('.service'):
            service_name = f"{service_name}.service"
        # Get service status
        status_result = self.execute_command(f"systemctl status {service_name}")
        is_active_result = self.execute_command(f"systemctl is-active {service_name}")
        is_enabled_result = self.execute_command(f"systemctl is-enabled {service_name}")
        # Get recent logs
        logs_result = self.execute_command(f"journalctl -u {service_name} -n 10 --no-pager")
        return {
            "service": service_name,
            "active": is_active_result.get("stdout", "").strip() == "active",
            "enabled": is_enabled_result.get("stdout", "").strip() == "enabled",
            "status_output": status_result.get("stdout", ""),
            "recent_logs": logs_result.get("stdout", "")
        }
    def view_logs(
        self,
        unit: Optional[str] = None,
        lines: int = 50,
        priority: Optional[str] = None
    ) -> Dict[str, Any]:
        """View systemd journal logs"""
        cmd_parts = ["journalctl", "--no-pager"]
        if unit:
            cmd_parts.extend(["-u", unit])
        cmd_parts.extend(["-n", str(lines)])
        if priority:
            cmd_parts.extend(["-p", priority])
        command = " ".join(cmd_parts)
        result = self.execute_command(command)
        return {
            "logs": result.get("stdout", ""),
            "unit": unit,
            "lines": lines,
            "priority": priority
        }
    def get_system_metrics(self) -> Dict[str, Any]:
        """Get current system metrics"""
        # CPU and load
        uptime_result = self.execute_command("uptime")
        # Memory
        free_result = self.execute_command("free -h")
        # Disk
        df_result = self.execute_command("df -h")
        return {
            "uptime": uptime_result.get("stdout", ""),
            "memory": free_result.get("stdout", ""),
            "disk": df_result.get("stdout", "")
        }
    def get_hardware_info(self) -> Dict[str, Any]:
        """Get comprehensive hardware information"""
        hardware = {}
        # CPU info (use nix-shell for util-linux)
        cpu_result = self.execute_command("nix-shell -p util-linux --run lscpu")
        if cpu_result.get("success"):
            hardware["cpu"] = cpu_result.get("stdout", "")
        # Memory details
        mem_result = self.execute_command("free -h")
        if mem_result.get("success"):
            hardware["memory"] = mem_result.get("stdout", "")
        # GPU info (lspci for AMD/NVIDIA) - use nix-shell for pciutils
        gpu_result = self.execute_command("nix-shell -p pciutils --run \"lspci | grep -i 'vga\\|3d\\|display'\"")
        if gpu_result.get("success"):
            hardware["gpu"] = gpu_result.get("stdout", "")
        # Detailed GPU
        lspci_detailed = self.execute_command("nix-shell -p pciutils --run \"lspci -v | grep -A 20 -i 'vga\\|3d\\|display'\"")
        if lspci_detailed.get("success"):
            hardware["gpu_detailed"] = lspci_detailed.get("stdout", "")
        # Network interfaces
        net_result = self.execute_command("ip link show")
        if net_result.get("success"):
            hardware["network_interfaces"] = net_result.get("stdout", "")
        # Network addresses
        addr_result = self.execute_command("ip addr show")
        if addr_result.get("success"):
            hardware["network_addresses"] = addr_result.get("stdout", "")
        # Storage devices (use nix-shell for util-linux)
        storage_result = self.execute_command("nix-shell -p util-linux --run \"lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE\"")
        if storage_result.get("success"):
            hardware["storage"] = storage_result.get("stdout", "")
        # PCI devices (comprehensive)
        pci_result = self.execute_command("nix-shell -p pciutils --run lspci")
        if pci_result.get("success"):
            hardware["pci_devices"] = pci_result.get("stdout", "")
        # USB devices
        usb_result = self.execute_command("nix-shell -p usbutils --run lsusb")
        if usb_result.get("success"):
            hardware["usb_devices"] = usb_result.get("stdout", "")
        # DMI/SMBIOS info (motherboard, system)
        dmi_result = self.execute_command("cat /sys/class/dmi/id/board_name /sys/class/dmi/id/board_vendor 2>/dev/null")
        if dmi_result.get("success"):
            hardware["motherboard"] = dmi_result.get("stdout", "")
        return hardware
    def get_gpu_metrics(self) -> Dict[str, Any]:
        """Get GPU metrics (temperature, utilization, clocks, power)"""
        metrics = {}
        # Try AMD GPU via sysfs (DRM/hwmon)
        try:
            # Find GPU hwmon directory
            import glob
            hwmon_dirs = glob.glob("/sys/class/drm/card*/device/hwmon/hwmon*")
            if hwmon_dirs:
                hwmon_path = hwmon_dirs[0]
                amd_metrics = {}
                # Temperature
                temp_files = glob.glob(f"{hwmon_path}/temp*_input")
                for temp_file in temp_files:
                    try:
                        with open(temp_file, 'r') as f:
                            temp_millidegrees = int(f.read().strip())
                            temp_celsius = temp_millidegrees / 1000
                            label = temp_file.split('/')[-1].replace('_input', '')
                            amd_metrics[f"{label}_celsius"] = temp_celsius
                    except:
                        pass
                # GPU busy percent (utilization)
                gpu_busy_file = f"{hwmon_path.replace('/hwmon/hwmon', '')}/gpu_busy_percent"
                try:
                    with open(gpu_busy_file, 'r') as f:
                        amd_metrics["gpu_utilization_percent"] = int(f.read().strip())
                except:
                    pass
                # Power usage
                power_files = glob.glob(f"{hwmon_path}/power*_average")
                for power_file in power_files:
                    try:
                        with open(power_file, 'r') as f:
                            power_microwatts = int(f.read().strip())
                            power_watts = power_microwatts / 1000000
                            amd_metrics["power_watts"] = power_watts
                    except:
                        pass
                # Clock speeds
                sclk_file = f"{hwmon_path.replace('/hwmon/hwmon', '')}/pp_dpm_sclk"
                try:
                    with open(sclk_file, 'r') as f:
                        sclk_data = f.read()
                        amd_metrics["gpu_clocks"] = sclk_data.strip()
                except:
                    pass
                if amd_metrics:
                    metrics["amd_gpu"] = amd_metrics
        except Exception as e:
            metrics["amd_sysfs_error"] = str(e)
        # Try rocm-smi for AMD
        rocm_result = self.execute_command("nix-shell -p rocmPackages.rocm-smi --run 'rocm-smi --showtemp --showuse --showpower'")
        if rocm_result.get("success"):
            metrics["rocm_smi"] = rocm_result.get("stdout", "")
        # Try nvidia-smi for NVIDIA
        nvidia_result = self.execute_command("nix-shell -p linuxPackages.nvidia_x11 --run 'nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,power.draw,clocks.gr --format=csv'")
        if nvidia_result.get("success") and "NVIDIA" in nvidia_result.get("stdout", ""):
            metrics["nvidia_smi"] = nvidia_result.get("stdout", "")
        # Fallback: try sensors command
        if not metrics.get("amd_gpu") and not metrics.get("nvidia_smi"):
            sensors_result = self.execute_command("nix-shell -p lm_sensors --run sensors")
            if sensors_result.get("success"):
                metrics["sensors"] = sensors_result.get("stdout", "")
        return metrics
    def list_directory(
        self,
        directory_path: str,
        show_hidden: bool = False
    ) -> Dict[str, Any]:
        """List directory contents"""
        cmd = f"ls -lh"
        if show_hidden:
            cmd += "a"
        cmd += f" {directory_path}"
        result = self.execute_command(cmd)
        return {
            "success": result.get("success", False),
            "directory": directory_path,
            "listing": result.get("stdout", ""),
            "error": result.get("error")
        }
    def check_network(self, host: str, method: str = "ping") -> Dict[str, Any]:
        """Check network connectivity"""
        if method == "ping":
            cmd = f"ping -c 3 -W 2 {host}"
        elif method == "http":
            cmd = f"curl -I -m 5 {host}"
        else:
            return {
                "success": False,
                "error": f"Unknown method: {method}"
            }
        result = self.execute_command(cmd, timeout=10)
        return {
            "host": host,
            "method": method,
            "reachable": result.get("success", False),
            "output": result.get("stdout", ""),
            "error": result.get("stderr", "")
        }
    def retrieve_cached_output(self, cache_id: str, max_chars: int = 10000) -> Dict[str, Any]:
        """Retrieve full cached output from a previous tool call"""
        cache_dir = Path("/var/lib/macha/tool_cache")
        cache_file = cache_dir / f"{cache_id}.txt"
        if not cache_file.exists():
            return {
                "success": False,
                "error": f"Cache file not found: {cache_id}",
                "hint": "Check that the cache_id matches exactly what was shown in the summary"
            }
        try:
            content = cache_file.read_text()
            # Truncate if still too large for context
            if len(content) > max_chars:
                half = max_chars // 2
                content = (
                    content[:half] + 
                    f"\n... [SHOWING {max_chars} of {len(content)} chars] ...\n" +
                    content[-half:]
                )
            return {
                "success": True,
                "cache_id": cache_id,
                "size": len(cache_file.read_text()),  # Original size
                "content": content
            }
        except Exception as e:
            return {
                "success": False,
                "error": f"Failed to read cache: {str(e)}"
            }
    def send_notification(self, title: str, message: str, priority: int = 5) -> Dict[str, Any]:
        """Send a notification to the user via Gotify using macha-notify command"""
        try:
            # Use the macha-notify command which handles Gotify integration
            result = subprocess.run(
                ['macha-notify', title, message, str(priority)],
                capture_output=True,
                text=True,
                timeout=10
            )
            if result.returncode == 0:
                return {
                    "success": True,
                    "title": title,
                    "message": message,
                    "priority": priority,
                    "output": result.stdout.strip() if result.stdout else "Notification sent successfully"
                }
            else:
                return {
                    "success": False,
                    "error": f"macha-notify failed: {result.stderr.strip() if result.stderr else 'Unknown error'}",
                    "hint": "Check if Gotify is configured (gotifyUrl and gotifyToken in module config)"
                }
        except FileNotFoundError:
            return {
                "success": False,
                "error": "macha-notify command not found",
                "hint": "This should not happen - macha-notify is installed by the module"
            }
        except subprocess.TimeoutExpired:
            return {
                "success": False,
                "error": "Notification send timeout (10s)"
            }
        except Exception as e:
            return {
                "success": False,
                "error": f"Unexpected error sending notification: {str(e)}"
            }
    def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """Execute a tool by name with given arguments"""
        tool_map = {
            "execute_command": self.execute_command,
            "read_file": self.read_file,
            "check_service_status": self.check_service_status,
            "view_logs": self.view_logs,
            "get_system_metrics": self.get_system_metrics,
            "get_hardware_info": self.get_hardware_info,
            "get_gpu_metrics": self.get_gpu_metrics,
            "list_directory": self.list_directory,
            "check_network": self.check_network,
            "retrieve_cached_output": self.retrieve_cached_output,
            "send_notification": self.send_notification
        }
        tool_func = tool_map.get(tool_name)
        if not tool_func:
            return {
                "success": False,
                "error": f"Unknown tool: {tool_name}"
            }
        try:
            return tool_func(**arguments)
        except Exception as e:
            return {
                "success": False,
                "error": f"Tool execution failed: {str(e)}",
                "tool": tool_name,
                "arguments": arguments
            }