The Invisible Infrastructure: Building a Self-Healing Minecraft Backup System
At 2:47 AM on a Tuesday in November, my phone buzzed with a Discord notification:
đ¨ Minecraft backup ran, but the server FAILED to restart after 3 attempts. Check logs on the Mac mini.
I was awake instantly. Not because the server was downâbut because the system had caught it.
This is a story about that system. But really, itâs about the slow, unglamorous work of building infrastructure that disappears.
How It Actually Started
âDad, can we play Minecraft together?â
Thatâs how all great infrastructure projects begin, right? With a simple request from your kids?
I didnât want to put them on a public server. Too many variables I couldnât control. Too many strangers. Too much risk for something that should just be⌠fun.
So I did what any reasonable parent would do: I spun up a PaperMC server on a Mac mini in my office.
Simple. Private. Safe.
Initial requirements:
- Donât lose the kidsâ builds
- Donât let the server eat itself
- Donât make this a second job
That last one turned out to be the hardest.
When âJust Worksâ Becomes âJust Enoughâ
For the first few weeks, everything was fine.
Then someone built a castle that took three days. Then someone else built a hidden base in the nether. Then there were shared projects, inside jokes, landmarks that became part of family lore.
The world was no longer replaceable.
And thatâs when I realized: backups arenât optional anymore.
At first, I did it manually. Stop the server, copy the files, zip them up, restart everything. It worked. Until it didnât.
Because âmanualâ means:
- Remembering to do it (I forgot)
- Having time to do it (I didnât)
- Not making mistakes (I did)
One night after the third time I forgot, I thought: I already know how to solve this.
The Realization That Changed Everything
I had been thinking about this wrong.
I didnât need to figure out how to backup a running server safely. Thatâs hard. File locks, data corruption, race conditionsâall the things that make live backups terrifying.
But I was already running nightly restarts to clear memory and apply updates.
The backups didnât need to be live. They needed to be part of the restart.
That single realization turned an impossible problem into a solvable one.
What the System Actually Does
The script runs every night at 2 AM. Hereâs what happens in those quiet hours:
1. Graceful World Preservation
# Tell the server to flush everything to disk
screen -S "$SESSION" -p 0 -X stuff "save-all$(printf \\r)"
sleep 2
# Then backup all three dimensions
zip -r "$BACKUP_ZIP" \
"$SERVER_DIR/world" \
"$SERVER_DIR/world_nether" \
"$SERVER_DIR/world_the_end"
The serverâs still running. Players could still be online (theyâre not at 2 AM, but the system doesnât assume that). The save-all command tells Minecraft to write everything to disk first. We wait two secondsâlong enough for it to finish, short enough that it doesnât matter.
Then we zip. The files are consistent because Minecraft just told us they are.
2. The Stop-Clear-Start Dance
# Stop the server
screen -S "$SESSION" -p 0 -X stuff "stop$(printf \\r)"
sleep 10
# If it didn't stop, force it
if is_running; then
pkill -f "java.*${JAR_FILE}"
fi
# Clear stale locks that cause corruption
rm -f "$SERVER_DIR/world/session.lock" \
"$SERVER_DIR/world_nether/session.lock" \
"$SERVER_DIR/world_the_end/session.lock"
This is the part that took the longest to get right.
Minecraft doesnât always shut down cleanly. Sometimes it hangs. Sometimes it leaves lock files. Sometimes those lock files prevent restarts.
The script handles all of it. Graceful shutdown first. Force kill if needed. Clean up the locks. Every time.
3. Self-Healing Restart with Retries
for attempt in $(seq 1 $MAX_RETRIES); do
screen -dmS "$SESSION" "$JAVA_PATH" -Xmx3G -Xms1G -jar "$JAR_FILE" nogui
sleep 5
if is_running; then
log "Server started successfully on attempt $attempt."
break
fi
done
if ! is_running; then
notify_discord "đ¨ Server FAILED to restart after ${MAX_RETRIES} attempts"
exit 1
fi
This is what caught that 2:47 AM failure.
Most backup scripts stop the server, make a copy, start it back up, and hope for the best. This one checks. If the server doesnât come back up, it tries again. Three times.
If all three fail, it screams for help.
4. The Intelligence Layer
The backup is just the foundation. The system also:
Tracks who tried to join:
# Scan all logs for connection attempts
PAT = re.compile(r"""(?P<name>[A-Za-z0-9_]{3,16})\s+
(?:lost\s+connection|disconnect(ed)?):\s+
.*not\s+whitelisted.*""")
I wanted to know if someone was trying to get in. Not for security theaterâfor actual security. If a friendâs kid wants to join, I want to see that request, not have it disappear into log files.
Builds a player roster with last positions:
# Parse Essentials data to track where everyone logged out
# Useful for: "Where did I leave my stuff?"
# Useful for: "Did the kids actually go to bed?"
logout_x, logout_y, logout_z = extract_position(essentials_yaml)
Generates daily activity summaries:
Date: 2026-01-11
Window: last 24h
â
Backup: SUCCESS
⢠File: world_backup_2026-01-11_02-00.zip
⢠When: 2026-01-11 02:00:34
⢠Size: 847.23 MB
đş Web maps: success
⢠https://maps.maker404.com/
đĽ Joined (3): Player1, Player2, Player3
đŞ Left (3): Player1, Player2, Player3
đŤ Not whitelisted (1): RandomGriefer123
Every morning, this appears in our family Discord. Nobody asked for it. But now people notice when it doesnât show up.
5. Automated Map Generation
This deserves its own section because itâs the part that makes people ask âwait, you built what?â
After the backup completes, the script calls a second script that:
- Extracts the latest backup
- Runs uNmINeD to render interactive maps
- Deploys them to Cloudflare Pages
- Updates the live site at maps.maker404.com
Every morning, the kids can see what got built the night before. From space. With zoom. On any device.
I didnât plan for this to be the part everyone cares about most. But it is.
What Actually Took the Time
Looking back through my commit history, this took four months to stabilize.
Not four months of work. Four months of:
- 20 minutes here
- An hour there
- âWhy did it fail last night?â
- âOh right, lock filesâ
- âWait, why is the timestamp wrong?â
- âI should write that downâ
- âWhere did I write that down?â
The Hard Parts
Understanding uNmINeDâs expectations: It wants the world root, not the worlds themselves. It cares about directory structure. It silently fails if you get it wrong.
Cloudflare Pages upload limits: 25MB per file, 20,000 files total. My first map render was 47,000 files at varying sizes. I spent two weeks optimizing tile generation settings and implementing progressive rendering before I figured out the right balance.
Log rotation timing: Minecraft compresses yesterdayâs logs at startup. My script runs at 2 AM. Startup is at 2:01 AM. I was parsing logs that didnât exist yet. Took me three days to notice.
Session lock corruption: Sometimes the server crashes mid-save. The lock files stay behind. Next restart fails mysteriously. Took me a week to connect those dots.
The playit.gg tunnel: We use playit.gg so the kidsâ friends can connect from outside our network. Itâs great. Except it doesnât always reconnect after a server restart. Now the script handles that too.
The Invisible Details
# Wait for network before trying to send Discord notifications
NET_OK=0
for _ in 1 2 3 4 5 6 7 8 9 10; do
/sbin/ping -c1 -t1 1.1.1.1 >/dev/null 2>&1 && { NET_OK=1; break; }
/bin/sleep 6
done
This 60-second retry loop exists because sometimes macOS hasnât fully initialized networking when the script runs. Without it, Discord notifications silently fail. Took me two weeks to realize they werenât sending.
# Keep 7 days of backups, automatically delete older ones
ls -1t "$BACKUP_DIR"/world_backup_*.zip | tail -n +8 | xargs -r rm -f
This single line prevents the disk from filling up. I found out I needed it when I woke up to a full disk and no backups from the previous three nights.
Why This Matters
This system exists so my family can enjoy something together without friction.
It means:
- The world feels permanent â Nobody worries about losing their builds
- Exploration is documented â The maps show everywhere weâve been
- Someoneâs always watching â If it breaks, I know immediately
- I donât have to remember â It just runs
But more than that, it means I get to be part of something without being the bottleneck.
The kids donât know this script exists. They donât need to. Itâs infrastructure.
Good infrastructure disappears.
The Technical Stack
For anyone building something similar:
Core Components:
- PaperMC â The Minecraft server (Paper is Spigot with better performance)
- Essentials â Player data and tracking
- uNmINeD â Map rendering (the free version is shockingly good)
- Cloudflare Pages â Static site hosting (free tier, perfect for map tiles)
- playit.gg â Tunneling service for external access
- Discord webhooks â Notifications and daily summaries
- Bash + Python â The glue holding it all together
Development Environment:
- Mac mini (2018, i7, 32GB RAM)
- macOS launch daemon for scheduling
- GNU screen for process management
- Git for version control (because of course)
Key Dependencies:
zipâ World archivalscreenâ Server managementcurlâ Discord API callspython3â Data processing and log parsingfind/grep/awkâ Log analysis
What Iâd Do Differently
If I started over today:
-
Iâd use Docker â Not because itâs better, but because itâs more portable. Right now this script is deeply tied to macOS paths and behaviors.
-
Iâd extract the Python scripts â Theyâre embedded in heredocs. They work, but theyâre a pain to test and modify.
-
Iâd add metrics â Backup size over time, player activity patterns, server performance. I have the data, Iâm just not visualizing it.
-
Iâd document more â I have inline comments, but I donât have a âwhy I made this decisionâ journal. Future me would appreciate that.
-
I wouldnât change the core approach â Building around backups instead of live state was the right call. Everything else flows from that.
The Part Iâm Most Proud Of
Not the map rendering (though thatâs cool).
Not the self-healing restart logic (though that saved me).
Not even the Discord notifications (though my kids love them).
Itâs the logging.
log() {
local msg="[$(date '+%Y-%m-%d %H:%M:%S')] $*"
echo "$msg" | tee -a "$DAILY_LOG_FILE" >> "$ROLLING_LOG_FILE"
}
Every action gets logged twice: to a daily file and to a rolling file.
When something goes wrong at 2 AM, I donât have to guess. I have a timestamped record of exactly what happened.
When someone asks âwhen did PlayerX last join?â I can tell them.
When Iâm debugging a weird issue three weeks later, the logs are still there.
Good logging is a gift to your future self.
What This Actually Taught Me
Building this system taught me more about infrastructure design than any tutorial could.
Systems thinking over features:
- I didnât build a backup script. I built a system that happens to include backups.
Reliability through simplicity:
- Every piece can fail. The system handles it.
Observability as a feature:
- If I canât see it, I canât fix it. Logs and notifications arenât nice-to-have.
Automation as respect:
- Not just for my timeâfor everyoneâs time. Nobody should have to think about this.
Personal infrastructure is real infrastructure:
- Just because itâs not running a company doesnât mean itâs not important.
The Quiet Success
This system has run every single night for the last four months.
Itâs survived:
- Server crashes
- Network outages
- Disk space warnings
- macOS updates
- Power failures (thanks, UPS)
- My own coding mistakes
And nobody in my family knows.
They just know the server works. The maps update. Their builds are safe.
Thatâs what success looks like for infrastructure: invisibility.
For Anyone Building Something Similar
If youâre running a home serverâMinecraft or otherwiseâhereâs what Iâd tell you:
-
Start with backups. Everything else can be rebuilt. Data canât.
-
Make it automatic. If it requires you to remember, it will fail.
-
Add notifications. You wonât check logs. You will read Discord.
-
Handle failure gracefully. Assume everything will break. Code for it.
-
Log everything. Future you is debugging at 2 AM with one eye open. Help them.
-
Iterate slowly. This isnât a weekend project. Itâs a system that grows with use.
-
Let it disappear. When it stops needing attention, youâve won.
The Source
The full script is on my GitHub (or will be soon).
Itâs not beautiful. Itâs not clever. But it works every night without complaint.
And sometimes, thatâs the highest compliment you can give code.
Final Thought
Someone asked me recently: âIsnât this overkill for a kidsâ Minecraft server?â
Maybe.
But hereâs what I know:
My kids have a place they built together. That place is permanent. That place has maps. That place never loses progress.
And I donât have to think about it.
Thatâs not overkill.
Thatâs exactly enough.
This is what Maker404 is really about: building systems that support the people you care about, then letting those systems fade into the background so you can focus on what matters.
The infrastructure isnât the point. What it enables is.