Version: v3.3
Core Philosophy: "Verify First, Commit Later."
Cerebro is a state-monitoring backup engine for Linux environments. Unlike simple file replication scripts or standard tar cron jobs, Cerebro operates on a verification pipeline: it stages files, inspects content changes against the latest valid archive across multiple destinations, and only commits a new backup if actual content changes are detected.
- Zero External Dependencies (Offline-First): Cerebro operates entirely offline. It does not require an internet connection, Git authentication, cloud APIs, or third-party webhooks to track state changes, compute text diffs, maintain metadata, or create backups.
- Portable & Self-Contained: Because the engine runs entirely out of its own directory, it does not modify the global user environment, install system libraries, or rely on shared global state.
- Multi-Instance Isolation: You can run multiple instances of Cerebro in parallel by placing them in separate directories (e.g., configuring
/opt/cerebro-configs/and/opt/cerebro-db/independently). By simply copying the folder structure and naming the script files uniquely, non-technical users can maintain isolated, specialized backup profiles without managing complex environment variables or shared system locks. - Scalable Monitoring Overhead: The engine scales dynamically based on configuration. It can run as a zero-overhead, completely hands-off local backup cron job, or scale up to a high-frequency, multi-destination configuration auditor with pre- and post-run service execution hooks.
- Target Scenarios: Designed for environments (such as self-hosted containers, databases, and configuration directories) where you need to distinguish between critical configuration changes and routine runtime file updates.
Cerebro addresses standard backup limitations through three primary mechanisms:
Instead of relying solely on modification timestamps—which often trigger redundant backups for unchanged files (e.g. rotated log files or database touches)—Cerebro parses content. By combining [NOLOG] and [TARDISCARD] directives, Cerebro can detect that only temporary files or logs have changed, record the event in the log, and discard the redundant archive, reducing backup storage footprints.
Cerebro can log text changes directly to its execution log. By configuring [LOGDIFF] rules for text files (such as scripts, .env files, or configurations), Cerebro computes unified diffs between the previous backup and the active environment, writing the specific line modifications into cerebro.log. This provides a continuous audit history of configurations without requiring manual extraction of archives.
Cerebro supports writing backups to multiple storage locations ([DESTINATIONS]). Before a backup run, it scans all configured destinations to locate the most recent valid backup. After creating a new backup, it replicates the archive to all active destinations, maintaining target parity.
Cerebro follows a strict execution pipeline to ensure data integrity:
- Startup Configuration Guard: Validates configuration rules and verifies that all absolute paths defined in
[CRUCIAL]are covered by a path in[INCLUDE]. If any crucial path is uncovered, aborts immediately. - Staging: Creates a temporary, isolated environment in
/tmpfor comparison work. - Latest Backup Discovery: Scans ALL
[DESTINATIONS]to find the most recent valid backup (by timestamp), verifying backup integrity usingtimeout tar -tzfand automatically deleting corrupted archives. - Comparison: Extracts the previous backup and performs a file-by-file diff against live data.
- Decision Matrix:
- Crucial Deletion: If a file in the previous backup matches a pattern in
[CRUCIAL]and is missing in the new backup, abort execution immediately with code 1, deleting the new tarball (post-backup hooks are executed). - No Change: Abort, delete staged tar, log "No changes detected."
- Meaningful Change (RETAIN): If changes are detected in files not covered by
[NOLOG], or matching[LOGDIFF](where unified text diffs are written tocerebro.log), the backup is tagged asRETAIN. - Routine Change (DISCARD): If changes match ONLY
[NOLOG]patterns, the backup is tagged asDISCARD:LATEST.
- Crucial Deletion: If a file in the previous backup matches a pattern in
- Tagging & Metadata: Create entry in
.tar_meta_data.txtwith the backup tag (RETAINorDISCARD:LATEST). - Creation: Build the final
tar.gzarchive with all excludes applied. - Verification: Run
tar -tzfto ensure the archive is not corrupted. - Distribution: Use
rsyncwith timeout to copy the valid archive to all reachable[DESTINATIONS], falling back to direct copying viacp -ruif all rsync retry attempts fail. - Cleanup: If
[TARDISCARD] DISCARD=1, prune oldDISCARDarchives from the destinations. Prune log file if[LOGPRUNE]enabled. - Self-Maintenance: Update cron job, remove temp files, release lock.
Cerebro is controlled entirely via cerebro.cfg. This file defines the "Intelligence" of the system.
Standard path definitions. Supports wildcards (*).
How it works:
[INCLUDE]: Absolute paths to files or directories you want backed up.[EXCLUDE]: Patterns or paths to skip (logs, temp files, cache directories).
Built-in Protection:
- Automatic Backup Directory Exclusion: Cerebro automatically excludes its own
backups/folder to prevent recursive backup loops. You do NOT need to manually add/path/to/cerebro/backupsto[EXCLUDE]- the script handles this internally regardless of where you install it.
Example:
[INCLUDE]
/home/pi/Docker
/home/pi/.config/important-app
/etc/nginx/nginx.conf
[EXCLUDE]
*.log
*.tmp
/home/pi/Docker/container-logsFiles matching these patterns get their CONTENT read and diffed.
- Use Case: Configuration files, scripts, docker-compose files - anything where you need to see WHAT changed.
- Behavior: When a file matching this pattern changes, Cerebro extracts the previous version, runs
diff, and writes the added/removed lines directly intocerebro.log. - Benefit: Your log becomes a version control system. You can see exactly when
docker-compose.ymlchanged and what lines were modified without extracting any tar files.
What gets logged:
FILE: /home/pi/Docker/docker-compose.yml
DIFFERENCE: Content changed between old and new backup
< OLD LINE: image: nginx:1.20
> NEW LINE: image: nginx:1.21
< OLD LINE: - "8080:80"
> NEW LINE: - "8081:80"
Patterns:
[LOGDIFF]
*.yml
*.sh
*.conf
*.env
*.py
*.jsonFiles here trigger backups but DON'T log content differences.
- Use Case: Binary files, databases, frequently changing data where the diff output would be useless noise.
- Behavior: Cerebro detects the file changed (via hash/size comparison) and triggers a backup, but it does NOT write the diff to the log file. Instead it just notes: "File X changed."
- Why? A 500MB
.dbfile changing would create a 500MB diff in your log. This keeps logs readable while still capturing state changes.
Common Use Cases:
- SQLite databases (
*.db) - Application state files
- Binary configuration blobs
- Docker volume data that changes frequently but you don't need content-level visibility
Example:
[NOLOG]
/home/pi/Docker/pi-hole/data/*
/home/pi/Apps/app-state.db
*.sqlitePrunes archives containing only noise-level changes (triggered by [NOLOG] updates).
DISCARD=0: Disabled. Retains all generated backup archives.DISCARD=1: Enabled. Activates retention optimization logic.
- Meaningful Changes (RETAIN): If a backup contains modifications that do not match
[NOLOG]patterns, or matches[LOGDIFF]patterns (unified diffs captured), it is preserved and tagged asRETAIN. - Routine Changes Only (DISCARD): If a backup contains modifications matching only
[NOLOG]rules, it is tagged asDISCARD:LATEST. - Pruning Cycle: Upon a successful subsequent run, historical
DISCARDarchives are pruned, leaving only the single most recent archive tagged asDISCARD:LATESTalongside the preservedRETAINarchives. This ensures that long-term backup history is preserved for code and configuration changes while preventing database updates from consuming excessive storage.
Define multiple backup storage locations.
Cerebro treats this as an array of equals - no "primary" or "secondary". Before every run, it scans ALL destinations to find the latest backup across all of them.
Features:
- Automatic hostname appending: If you define
/mnt/nas/backup/cerebro, Cerebro will actually write to/mnt/nas/backup/cerebro/hostname- this allows multiple machines to use the same NAS share without conflicts. - Resilience: If one destination is offline (unmounted NAS), Cerebro continues with the available ones, using a non-blocking double-forked background monitor to check connection responsiveness.
- Sync logic: After creating a new backup, Cerebro attempts to copy it to ALL reachable destinations using
rsyncwith timeout protection, falling back to direct copying viacp -ruif rsync fails.
Example:
[DESTINATIONS]
/media/G-Drive/backup/cerebro
/mnt/nas/backup/cerebro
/mnt/cloud-mount/backupsWhat happens:
- Machine hostname is
raspberrypi - Cerebro will write to:
/media/G-Drive/backup/cerebro/raspberrypi/backup_20240215_040000-1.tar.gz/mnt/nas/backup/cerebro/raspberrypi/backup_20240215_040000-1.tar.gz/mnt/cloud-mount/backups/raspberrypi/backup_20240215_040000-1.tar.gz
Control when Cerebro runs automatically.
[SCHEDULE]
cron=1
schedule=00 04 * * 1cron=0: Disable automatic scheduling. Run manually only.cron=1: Enable automatic scheduling.schedule=: Standard cron syntax. Use https://crontab.guru/ to generate.
Common schedules:
00 04 * * *- Daily at 4:00 AM00 04 * * 1- Weekly on Monday at 4:00 AM00 */6 * * *- Every 6 hours*/30 * * * *- Every 30 minutes (aggressive monitoring)
Important: Cron uses the system's local timezone. If your system timezone is UTC but you want backups at 4 AM local time, you need to convert:
# Check your timezone
timedatectl
# Or
date +%ZHow it works:
When you run ./cerebro.sh for the first time (or any time config changes), it automatically installs/updates the cron job. You never need to manually edit crontab.
Cron vs Manual backups:
- Cron backups are named with time-based suffixes:
backup_20240215_040000-1.tar.gz(1 = 00:00-05:59, 2 = 06:00-11:59, etc.) - Manual backups are named with letter suffixes:
backup_20240215_153000-a.tar.gz,backup_20240215_154500-b.tar.gz(cycling a-z)
Configure copy/rsync retry attempts to destinations.
[TRANSFER]
RETRIES=1
TIMEOUT=300
RSYNC_TIMEOUT=30RETRIES: Sets the number ofrsyncattempts to each backup destination (defaults to3attempts if omitted). If all attempts fail, Cerebro automatically falls back to copying files directly usingcp -ru.TIMEOUT: Sets the shell-level execution limit (in seconds) for each transfer command before it is force-killed. If omitted, defaults to300seconds (5 minutes).RSYNC_TIMEOUT: Sets the connection/data inactivity timeout (in seconds) for the internalrsynccommand. If omitted, defaults to30seconds.
Control what gets written to cerebro.log.
[LOGTYPE]
DEBUG=0
INFO=1
NOFIRSTRUN=0DEBUG=1: Ultra-verbose. Logs every file being processed, timing info, decision trees. Use for troubleshooting.DEBUG=0: Production mode. Only logs significant events.INFO=1: Log normal operational messages (backup created, files transferred, etc.)INFO=0: Silent mode. Only log errors and file differences.NOFIRSTRUN=1: Suppress the "First run, no previous backup to compare" message. Useful if you're running Cerebro on a new system and don't want log noise.
Recommended settings:
- Development/Testing:
DEBUG=1, INFO=1 - Production:
DEBUG=0, INFO=1 - High-frequency monitoring:
DEBUG=0, INFO=0(only log real changes)
Automatically clean old entries from cerebro.log without losing configuration audit history.
[LOGPRUNE]
ENABLED=1
DISCARD_MAX_AGE_DAYS=1ENABLED=1: Active. Cerebro will prune log entries.DISCARD_MAX_AGE_DAYS=1: Defines the age threshold (in days) for pruning routine logs.
Smart Retention Logic: Unlike standard log rotation tools that truncate the entire file or delete all historical records, Cerebro uses an intelligent parser:
- Pruned: Only standard run blocks, runs with no changes, and runs containing ONLY
DISCARD-tagged changes (such as routine database or volume updates) are pruned once they exceed the maximum age. - Preserved: All run blocks containing unified text diffs and configuration changes (tagged as
RETAIN) are preserved forever. This ensures that your system config audit history is never lost while still keeping disk usage under control.
Why this exists: If you run Cerebro frequently with verbose settings, the log file will grow rapidly. This feature prunes the routine change noise while maintaining a permanent version history of configurations.
Recommendation:
- High-frequency backups (< 1 hour):
DISCARD_MAX_AGE_DAYS=1 - Daily backups:
DISCARD_MAX_AGE_DAYS=30 - Weekly backups:
DISCARD_MAX_AGE_DAYS=365
Execute custom scripts or commands before and after the backup process.
This section allows you to orchestrate system state changes around your backup execution. It is particularly powerful for ensuring application consistency (e.g. freezing databases or stopping containers) before archiving files.
[HOOKS]
# Command to run before backup creation
PRE_BACKUP_CMD=docker-compose -f /home/pi/Docker/docker-compose.yml down
# Command to run after backup completes (or on script termination)
POST_BACKUP_CMD=docker-compose -f /home/pi/Docker/docker-compose.yml up -dPRE_BACKUP_CMD(Pre-Backup Hook):- Runs immediately before the backup creation starts.
- Fail-Fast Design: If the command fails (returns a non-zero exit status), Cerebro logs the failure and aborts immediately without modifying your existing backups or staging files.
POST_BACKUP_CMD(Post-Backup Hook):- Runs when the script exits.
- Guaranteed Execution: This command is bound to the script's
EXITtrap. Even if the backup fails, the connection check times out, or the script is terminated, the post-backup hook is guaranteed to run. This ensures services (like Docker stacks or databases) are never left stopped or frozen.
- Database Consistent Backups: Lock database tables or dump a transaction log before archiving, then unlock/resume the database.
- Docker Container Backups: Stop active containers (
docker stopordocker-compose down) to release file locks on volumes, then restart them (docker startordocker-compose up -d). - Notifications: Send a web-hook or email alert on backup completion.
Define files and folders that are absolutely essential and must never be lost.
This section acts as a fail-fast safety check to protect your backup timeline from being corrupted if critical system or application configurations are accidentally deleted on the live system.
[CRUCIAL]
# Exact files that must not be deleted
/home/user/.ssh/id_rsa
/etc/nginx/nginx.conf
/home/user/.config/app/config.ini
# Directory contents (ensures the directory is not empty)
/home/user/projects/critical-app/*- Startup Configuration Guard:
Before running the backup, Cerebro verifies that every path defined under
[CRUCIAL]is covered by a path in[INCLUDE]. If a crucial path is not included in the backup, the run halts immediately with a config error. - Deletion Abort Check:
During the comparison phase (
compare_tars), Cerebro compares the previous backup with the newly created backup. If any file present in the previous backup matches a pattern in[CRUCIAL]but is missing in the new backup, the run is aborted:- The newly created, incomplete backup tarball is deleted.
- The backup run exits with code
1. - The global
EXITtrap executes thePOST_BACKUP_CMDhook to ensure your system services are restarted.
- Intentional Deletions:
If you intentionally delete a crucial file, the backup will fail until you remove its pattern or path from the
[CRUCIAL]section incerebro.cfg.
Cerebro maintains a hidden metadata file at $SCRIPT_DIR/assets/.tar_meta_data.txt. This file tracks every backup and its classification.
Format:
backup_20260215_040000-1.tar.gz:RETAIN:FIRST
backup_20260215_100000-2.tar.gz:RETAIN
backup_20260215_160000-3.tar.gz:DISCARD
backup_20260215_220000-4.tar.gz:DISCARD:LATEST
Tags:
RETAIN: This backup contains meaningful changes (e.g. file content differences matching[LOGDIFF], new files, or non-excluded deleted files). Always kept.RETAIN:FIRST: The first backup created on a system (which has no previous backup to compare against). Always kept.DISCARD: This backup contains ONLY changes that matched[NOLOG]patterns. Eligible for deletion.DISCARD:LATEST: The most recent DISCARD backup. Protected until a newer backup is created.
TARDISCARD Logic:
Current state:
backup_001.tar.gz:RETAIN
backup_002.tar.gz:DISCARD
backup_003.tar.gz:DISCARD
backup_004.tar.gz:DISCARD:LATEST
New backup created (backup_005.tar.gz):
- If it's RETAIN → Keep it, convert current DISCARD:LATEST to DISCARD, and clean up older DISCARD backups.
- If it's DISCARD → Promote backup_005 to DISCARD:LATEST, and delete backup_002 and backup_003.
Result (if backup_005 is DISCARD):
backup_001.tar.gz:RETAIN
backup_004.tar.gz:DISCARD
backup_005.tar.gz:DISCARD:LATEST
This ensures you always have:
- All meaningful configuration and code history (RETAIN files)
- The two most recent backup states (current + previous)
Setup:
[INCLUDE]
/home/pi/Docker
[EXCLUDE]
*.log
*.tmp
[LOGDIFF]
*.yml
*.sh
*.conf
[NOLOG]
/home/pi/Docker/*/data/*
[SCHEDULE]
cron=1
schedule=00 04 * * *
[TARDISCARD]
DISCARD=1Result:
- If no changes are detected, the backup is skipped.
- If a configuration file changes, a backup is created, tagged as
RETAIN, textual differences are logged, and the archive is synced to the NAS. - If only database files or volume data change, a backup is created but tagged as
DISCARD:LATEST, which will be cleaned up on the subsequent run to minimize storage usage.
Setup:
[INCLUDE]
/home/user/scripts
/home/user/projects
[LOGDIFF]
*.sh
*.py
*.js
*.json
*.md
[SCHEDULE]
cron=1
schedule=*/30 * * * *
[TARDISCARD]
DISCARD=0Result: Cerebro runs every 30 minutes.
- Creates a backup only when changes are detected in scripts or codebase.
- Writes unified code differences directly to
cerebro.log. - Retains all historical backups for granular version tracking.
Setup:
[INCLUDE]
/etc
/var/www
/opt/production-app
[LOGDIFF]
*.conf
*.ini
*.yml
[DESTINATIONS]
/mnt/local-raid/backup
/mnt/nas/backup
/mnt/cloud-sync/backup
[SCHEDULE]
cron=1
schedule=00 */4 * * *
[TARDISCARD]
DISCARD=1
[LOGPRUNE]
ENABLED=1
DISCARD_MAX_AGE_DAYS=7Result: Cerebro runs every 4 hours.
- Replicates backup archives across all configured locations (RAID, NAS, Cloud).
- If one destination goes offline, Cerebro copies to the remaining active ones.
- Log pruning automatically trims
cerebro.logto keep its size managed.
Setup:
[INCLUDE]
/home/pi/.ssh
/etc/passwd
/etc/shadow
/etc/sudoers
/var/log/auth.log
[LOGDIFF]
*
[SCHEDULE]
cron=1
schedule=*/5 * * * *
[LOGTYPE]
DEBUG=1
INFO=1
[TARDISCARD]
DISCARD=0Result: Cerebro runs every 5 minutes.
- Monitors critical system configuration directories.
- Logs immediate warnings and line-level diffs on configuration changes.
- Retains all archives to maintain a complete history of system changes.
Example cerebro.log output after a run where docker-compose.yml changed:
2026-02-15 04:00:01 - [INFO] ========== BACKUP RUN STARTED ==========
2026-02-15 04:00:01 - [INFO] Run Type: cron
2026-02-15 04:00:01 - [INFO] Version: v3.2
2026-02-15 04:00:01 - [INFO] Backup Name: backup_20260215_040001-1.tar.gz
2026-02-15 04:00:01 - [Sync Destinations] Synced backups from /media/G-Drive/backup/cerebro/raspberrypi to all destinations.
2026-02-15 04:00:02 - [Backup Creation] Starting backup creation...
2026-02-15 04:00:05 - [Backup Creation] Tar file created: /home/pi/Apps/cerebro/backups/backup_20260215_040001-1.tar.gz
2026-02-15 04:00:10 - [Comparison] [MOD] /home/pi/Docker/docker-compose.yml
2026-02-15 04:00:10 - [Comparison] [-] 'image: nginx:1.20'
2026-02-15 04:00:10 - [Comparison] [+] 'image: nginx:1.21'
2026-02-15 04:00:10 - [Comparison] [-] '- "8080:80"'
2026-02-15 04:00:10 - [Comparison] [+] '- "8081:80"'
2026-02-15 04:00:11 - [Comparison] [META] Backup tagged as: RETAIN
2026-02-15 04:00:11 - [Comparison] [META] Changes detected between:
2026-02-15 04:00:11 - [Comparison] [META] New: /home/pi/Apps/cerebro/backups/backup_20260215_040001-1.tar.gz
2026-02-15 04:00:11 - [Comparison] [META] Previous: /mnt/nas/backup/cerebro/raspberrypi/backup_20260214_040000-1.tar.gz
2026-02-15 04:00:12 - [Verification] Tar file verified successfully.
2026-02-15 04:00:12 - [Verification] Backup created: /home/pi/Apps/cerebro/backups/backup_20260215_040001-1.tar.gz
2026-02-15 04:00:15 - [Transfer] Backup copied to /media/G-Drive/backup/cerebro/raspberrypi/backup_20260215_040001-1.tar.gz via rsync.
2026-02-15 04:00:20 - [Transfer] Backup copied to /mnt/nas/backup/cerebro/raspberrypi/backup_20260215_040001-1.tar.gz via rsync.
2026-02-15 04:00:21 - [Cleanup] Backup removed from /home/pi/Apps/cerebro/backups/backup_20260215_040001-1.tar.gz
2026-02-15 04:00:21 - [Tar Removal] Removed backup_20260214_100000-2.tar.gz from /media/G-Drive/backup/cerebro/raspberrypi.
2026-02-15 04:00:21 - [Tar Removal] Removed backup_20260214_100000-2.tar.gz from /mnt/nas/backup/cerebro/raspberrypi.
2026-02-15 04:00:21 - [INFO] ========== BACKUP RUN ENDED ==========
Key takeaways:
- You can see EXACTLY what changed (nginx version and port mapping)
- You know which backup contains the change
- You can see the transfer was successful to both destinations
- You can see old DISCARD backups were cleaned up
Cerebro creates standard .tar.gz files. Restoring is straightforward:
# Find the backup you want
ls /mnt/nas/backup/cerebro/raspberrypi/
# Extract everything
cd /
sudo tar -xzf /mnt/nas/backup/cerebro/raspberrypi/backup_20240215_040000-1.tar.gz
# This restores all files to their original locations# List contents
tar -tzf backup_20240215_040000-1.tar.gz | grep docker-compose
# Extract specific file
tar -xzf backup_20240215_040000-1.tar.gz home/pi/Docker/docker-compose.yml
# File is now in ./home/pi/Docker/docker-compose.yml (relative path)
# Copy it to the actual location
sudo cp home/pi/Docker/docker-compose.yml /home/pi/Docker/docker-compose.yml# Extract to a temp directory
mkdir /tmp/restore-check
tar -xzf backup_20240215_040000-1.tar.gz -C /tmp/restore-check
# Now you can browse /tmp/restore-check to see the backed up state
# without overwriting live dataUse the log file:
# Search for when a specific file changed
grep "docker-compose.yml" cerebro.log
# Look for the backup name associated with that change
# Then extract that specific backupRecall is a companion recovery utility for Cerebro. All recovery operations in Section 8 can be performed manually. Recall automates search, path construction, and extraction.
recall.sh is a CLI extraction utility that reads cerebro.cfg to search and extract target files or directories from the latest backup across all configured destinations.
Key Features:
- Automatic Configuration Reading: Resolves backup destinations and targets the latest valid archive automatically.
- Fuzzy Search with Disambiguation: Searches for files using partial names. If multiple matches exist, it presents an interactive selector.
- Safe Extraction: To prevent accidental overwrites of active files, extracted files are placed at
.bakpaths (e.g.,smb.conf.bak). - Directory Extraction: Appending a trailing slash to the search term extracts the entire directory tree (renamed with a
_baksuffix). - Multi-term Queries: Restores multiple files or folders in a single command execution.
- Multi-Destination Scanning: Scans all active
[DESTINATIONS]to locate the most recent valid backup. - Targeted Restores: Bypasses auto-discovery when the
-b/--backupoption is specified, extracting files from a targeted archive.
recall.sh lives alongside cerebro.sh in the same directory. No additional dependencies beyond what Cerebro already requires.
chmod +x recall.sh./recall.sh [OPTIONS] <search_term_1> [search_term_2] ...Options:
-h, --helpShow help message and exit-b, --backupSpecify a custom backup tarball archive as the source
Restore a single config file:
./recall.sh smb.conf
# Output: [SUCCESS] Saved to -> /etc/samba/smb.conf.bak
# Review it, then: sudo mv /etc/samba/smb.conf.bak /etc/samba/smb.confRestore your rclone auth:
./recall.sh rclone.conf
# Output: [SUCCESS] Saved to -> /home/pi/.config/rclone/rclone.conf.bakRestore multiple files in one call:
./recall.sh smb.conf rclone.conf
# Both extracted and placed as .bak files simultaneouslyRestore from a specific backup archive:
./recall.sh -b /mnt/nas/backup/cerebro/raspberrypi/backup_20240215_040000-1.tar.gz etc/samba/smb.confRestore an entire folder:
./recall.sh Apps/tutor/
# Output: [SUCCESS] Saved to -> /home/pi/Apps/tutor_bak/Ambiguous search — interactive picker:
./recall.sh compose
# Found 4 matching files for 'compose':
# 1) home/pi/Docker/stack-a/docker-compose.yml
# 2) home/pi/Docker/stack-b/docker-compose.yml
# ...
# Select the file to extract for 'compose' (or Cancel):Because extracted files are saved with a .bak suffix, you can inspect differences before applying the restored version:
# Extract
./recall.sh smb.conf
# Review what you are getting back
diff /etc/samba/smb.conf /etc/samba/smb.conf.bak
# If satisfied, apply it
sudo mv /etc/samba/smb.conf.bak /etc/samba/smb.conf
sudo systemctl restart smbdRecall writes its own log to recall.log in the same directory as cerebro.sh. Each session is delimited with RECALL SESSION STARTED / ENDED markers.
When installed, Cerebro creates the following structure:
/opt/cerebro/ # Installation directory (example)
├── cerebro.sh # Main script
├── cerebro.cfg # Configuration file
├── cerebro.log # Log file (all events)
├── backups/ # Temporary staging area
│ └── (empty - backups are moved to destinations immediately)
└── assets/ # Metadata directory
├── .tar_meta_data.txt # Backup classification tracking
└── .manual_backup_counter # Cycles through a-z for manual backups
Destination directories (configured in [DESTINATIONS]):
/mnt/nas/backup/cerebro/
└── hostname/ # Auto-appended based on system hostname
├── backup_20240215_040000-1.tar.gz
├── backup_20240215_100000-2.tar.gz
└── backup_20240216_040000-1.tar.gz
Key points:
- The
backups/directory is always empty after successful runs (files are moved to destinations) - If all destinations fail, backups remain in
backups/until the next successful run - The
assets/directory is critical - losing it breaks TARDISCARD logic (but doesn't affect backup data) - Each machine backing up to the same NAS gets a separate subdirectory based on hostname
- Linux environment (Bash 4.0+)
- Standard GNU tools (Cerebro self-checks and can install if missing):
tar- Archive creationrsync- File transferdiff- Content comparisongawk(GNU Awk) - CRITICAL: Required for log pruning. Standardawkimplementations won't work.grep,sed- Text processingfind,wc- File operationstimeout- Process management
You can install Cerebro using curl or wget. The installer will automatically detect your environment, download the script and a template configuration, and set up the directory structure in ~/cerebro.
Option A: Using curl
mkdir -p ~/cerebro && curl -sL https://raw.githubusercontent.com/Arelius-D/Cerebro/main/install.sh | bashOption B: Using wget
mkdir -p ~/cerebro && wget -qO - https://raw.githubusercontent.com/Arelius-D/Cerebro/main/install.sh | bash- Place the script:
mkdir -p /opt/cerebro
cd /opt/cerebro
# Copy cerebro.sh and cerebro.cfg here
chmod +x cerebro.sh- Edit the config:
nano cerebro.cfg
# Set your [INCLUDE] paths
# Set your [DESTINATIONS]
# Configure [SCHEDULE]- First run (manual):
./cerebro.shImportant
Must be Run Manually First: The first run of Cerebro on a new system must be executed manually from a terminal without any flags (specifically no --update flag). This manual execution is required because:
- It performs the interactive dependency check and offers to install missing tools (under headless/cron runs, this interactive prompting fails).
- It initializes the local folder structure (
assets/andbackups/). - It generates the initial
RETAIN:FIRSTbackup reference to start tracking file states. - It installs the automated cron job on the system.
What happens on first run:
- Cerebro checks for required tools, offers to install if missing
- Reads
cerebro.cfg - Creates the
backups/directory - Creates the
assets/directory for metadata - Since there's no previous backup, it creates the first one and tags it as
RETAIN:FIRST - Installs the cron job (if
cron=1in config) - Logs to
cerebro.log
Suppressing "first run" messages:
Set NOFIRSTRUN=1 in [LOGTYPE] if you don't want the "No previous backup found" message in the log.
- Understanding Manual vs. Cron Execution:
When you run Cerebro, it operates in one of two modes:
Manual Mode (default):
./cerebro.sh- Uses letter suffixes (a-z, cycling):
backup_20240215_153022-a.tar.gz - Each manual run increments the letter (a→b→c...→z, then wraps back to a)
- Useful for ad-hoc backups before making risky changes
Cron Mode (--update flag):
./cerebro.sh --update- Uses number suffixes (1-4, based on time of day):
- 1 = 00:00-05:59
- 2 = 06:00-11:59
- 3 = 12:00-17:59
- 4 = 18:00-23:59
- Prevents creating dozens of backups per day if cron runs frequently
- The cron job automatically uses this flag (it's appended in the crontab entry)
Example: If your cron runs every hour, you'll get at most 4 backups per day (one per time window), not 24.
- Verify cron installation:
crontab -l | grep cerebroYou should see your schedule.
- Test a cron run manually:
./cerebro.sh --updateThis simulates a cron-triggered run.
- Compression:
tar -czfuses gzip. On modern systems, negligible impact. - Diffing: Only happens when changes are detected. Text files are fast, large binaries in
[NOLOG]are skipped. - Typical homelab load: < 5% CPU for 30 seconds during backup creation.
- Staging: Uses
/tmpfor extraction and comparison. Ensure/tmphas space for 2x your largest backup. - Typical usage: Extracts old tar, compares with live data, creates new tar. Peak memory = size of largest single file being processed.
- Local staging:
$SCRIPT_DIR/backups/is temporary. Backups are moved to[DESTINATIONS]immediately after creation. - Destination storage: Depends on your data size and retention policy.
- With TARDISCARD: Expect 10-20% of "all backups ever created" due to smart pruning.
- Without TARDISCARD: All backups are kept. Plan accordingly.
Example:
- 10GB of Docker data
- Daily backups
- 1 real config change per week
- 6 log file changes per day (NOLOG events)
Without TARDISCARD: 7 backups/week × 10GB = 70GB/week = 3.6TB/year
With TARDISCARD: 1 RETAIN backup/week × 10GB + 1 DISCARD:LATEST backup = 20GB/week = 1TB/year
- rsync with timeout and failover cp: Cerebro uses
rsyncwith a timeout limit. If the rsync transfer fails or hangs, it terminates the transfer and automatically falls back to copying files directly viacp -ru. - Compression: Backups are gzipped, reducing network transfer size.
- Cloud storage and Mount checks: To prevent execution blocks or hangs on slow or unresponsive cloud/FUSE mounts (like rclone), Cerebro runs a connection check using a double-forked background process before performing read or write operations.
A: Check your [EXCLUDE] patterns. You might be excluding the files that changed. Enable DEBUG=1 to see which files are being processed.
A:
- Enable
TARDISCARD DISCARD=1 - Move frequently-changing data (logs, temp files, cache) to
[NOLOG] - Add patterns to
[EXCLUDE]for unnecessary data
A: Enable [LOGPRUNE] and set DISCARD_MAX_AGE_DAYS=1 (or your preferred retention).
A:
# Check if cron job exists
crontab -l | grep cerebro
# Check cron service
systemctl status cron
# Check permissions
ls -la /path/to/cerebro.sh
# Check the log file for errors
tail -f cerebro.logA:
- Copy the entire Cerebro directory (script, config, assets folder)
- Update paths in
cerebro.cfgto match the new system - Run
./cerebro.shonce to install the cron job - Your existing backups in
[DESTINATIONS]will be discovered automatically
A: Cerebro will keep the backup in $SCRIPT_DIR/backups/ and log an error. The backup is not lost, just not transferred. On the next successful run, it will be synced.
A: Yes. Cerebro automatically appends the hostname to the destination path, so each machine gets its own subdirectory:
/mnt/nas/backup/cerebro/
├── machine1/
├── machine2/
└── machine3/
A: Set cron=0 in [SCHEDULE], run ./cerebro.sh once to remove the cron job. Cerebro is now dormant but all settings are preserved.
A: If you have large binary files in [INCLUDE], move them to [NOLOG]. Cerebro will still back them up but won't try to diff them.
A: Cerebro will rebuild it on the next run. You'll lose the RETAIN/DISCARD tags, so TARDISCARD won't work correctly until new backups are created. Your actual backup data is unaffected.
A: This is a warning, not an error. It happens when a file is being written to while tar is reading it (common with active log files or databases).
- Solution 1: Add actively-written files to
[EXCLUDE]if they're not important - Solution 2: Run Cerebro during low-activity periods
- Solution 3: Use pre-backup scripts to stop services temporarily (see Advanced Use Cases)
- The backup will still be created; only the actively-written file may be incomplete
A: Not directly. Cerebro operates on local filesystems. However:
- Option 1: Mount the remote filesystem (NFS, SMBFS, SSHFS) and add it to
[INCLUDE] - Option 2: Use rsync to sync remote data locally first, then backup the local copy
- Option 3: Run Cerebro on the remote server and use shared storage for
[DESTINATIONS]
A: Check three things:
- Exit code:
./cerebro.sh; echo $?(0 = success) - Log file:
tail cerebro.log(should end with "BACKUP RUN ENDED") - Destination:
ls -lh /path/to/destination/hostname/(newest file should match timestamp)
A: The second instance will detect the lock file (/tmp/$SCRIPT_NAME.lock) and exit immediately to prevent data corruption.
Cerebro handles this with two robust mechanisms:
- Self-Healing Stale Lock Recovery: If a crash or reboot occurs, the lock file remains but the PID inside it will be inactive. Cerebro automatically checks this at startup using
ps -p "$pid". If the PID is dead, it logs a warning, cleans up the stale lock file, and runs the backup. - Multi-Instance Isolation via Renaming: The lock file name is derived dynamically from the script file name (
/tmp/$SCRIPT_NAME.lock). To run multiple instances in parallel on the same system, simply rename the main script file to match its task (e.g. rename it tocerebro-docker.shin one directory andcerebro-dev.shin another). Their lock files will be isolated automatically to/tmp/cerebro-docker.lockand/tmp/cerebro-dev.lockwithout any code modifications.
A:
- Stop any running cron jobs:
crontab -eand comment out the Cerebro line - Move the entire directory:
mv /old/path/cerebro /new/path/cerebro cd /new/path/cerebro- Run
./cerebro.shonce - this updates the cron job with the new path - Verify:
crontab -l | grep cerebroshould show the new path
Your backups in [DESTINATIONS] are unaffected; Cerebro will find them automatically.
Pipe Cerebro's output to a mail command in your cron job:
00 04 * * * /opt/cerebro/cerebro.sh --update 2>&1 | mail -s "Cerebro Backup Report" admin@example.comParse cerebro.log for the string "Backup tagged as: RETAIN" to trigger alerts:
if grep -q "Backup tagged as: RETAIN" cerebro.log; then
curl -X POST https://monitoring.example.com/webhook -d "Critical config changed"
fiAdd a script that runs before Cerebro:
#!/bin/bash
# pre-cerebro.sh
docker-compose -f /home/pi/Docker/docker-compose.yml down
/opt/cerebro/cerebro.sh --update
docker-compose -f /home/pi/Docker/docker-compose.yml up -dThis ensures you're backing up a consistent state.
Encrypt backups before sending to cloud:
# In your cron job
/opt/cerebro/cerebro.sh --update
gpg --encrypt --recipient admin@example.com /mnt/nas/backup/cerebro/hostname/*.tar.gz
rclone sync /mnt/nas/backup/cerebro/hostname/ remote:encrypted-backup/cerebro.sh: Should be700(only owner can read/write/execute)cerebro.cfg: Should be600(contains paths, might contain sensitive info)- Backups: Inherit permissions from the destination directory
- If backing up
/etc/shadow,/home/user/.ssh, or other sensitive files, ensure:- Destination directories are encrypted or access-controlled
- Backups are not world-readable
- Log file (
cerebro.log) doesn't expose sensitive content (use[NOLOG]for these files)
- Cerebro can run as a regular user if it has read access to all
[INCLUDE]paths - If backing up system files (
/etc,/var), run as root or use sudo - Cron jobs inherit the user context - ensure the user running Cerebro has appropriate permissions
| Feature | Cerebro | rsync | Duplicity | Borg | tar+cron |
|---|---|---|---|---|---|
| Smart Change Detection | ✅ Content-aware | ❌ Timestamp-based | ✅ Block-level | ✅ Block-level | ❌ Time-based |
| Diff Logging | ✅ Built-in | ❌ Manual | ❌ No | ❌ No | ❌ Manual |
| Multi-Destination Sync | ✅ Automatic | ❌ Manual | ❌ Single | ❌ Single | ❌ Manual |
| Noise Filtering | ✅ TARDISCARD | ❌ No | ❌ No | ❌ No | ❌ No |
| Configuration Format | ✅ Simple INI | ❌ CLI args | ❌ Complex | ❌ Complex | ❌ Scripts |
| Human-Readable Backups | ✅ tar.gz | ✅ Files | ❌ Encrypted | ❌ Repo format | ✅ tar.gz |
| Setup Complexity | ✅ Simple | ✅ Simple | ❌ Moderate | ❌ Moderate | ✅ Simple |
| Incremental | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
| Deduplication | ❌ No | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
When to use Cerebro:
- You need visibility into WHAT changed, not just THAT something changed
- You want simple, readable backups (tar.gz) you can extract with standard tools
- You need multi-destination redundancy without complex scripts
- You want to filter out noise (log rotations, temp files) from your backup history
When NOT to use Cerebro:
- You need incremental/differential backups (use Borg or Duplicity)
- You have terabytes of data with minor changes (use rsync or Borg)
- You need encryption at rest (use Duplicity or add GPG to Cerebro workflow)
- You need deduplication (use Borg)
Final Note: Cerebro is built on the premise that data is useless without context. By providing deep visibility into what changed and why a backup occurred, it transforms backups from a "storage chore" into a "system administration asset."
Philosophy: A backup system should answer three questions:
- What do I have? (Latest state)
- What changed? (Diffs and logs)
- When did it change? (Timestamped history)
Cerebro answers all three without requiring you to extract archives or maintain external version control.
Questions? Issues? Contributions? https://github.com/Arelius-D/Cerebro
License: MIT License