TrueNAS Recovery
I have several TrueNAS systems that make up my storage solution.
- Primary NAS - File shares (TrueNAS CORE)
- Secondary NAS - Bulk data and archives that need (or are...) stored but are not needed on a constant basis (TrueNAS CORE)
- Backup NAS - Offsite backup target (TrueNAS CORE)
- Virtual NAS's - Virtualized storage for VMware workload volumes (TrueNAS SCALE)
I use TrueCommand to manage these systems. Fortunately (at this time), TrueCommand is free for up to 50 drives and provides a variety of different features in one singular UI. I primarily use TrueCommand to check on data transfer rates, pool capacity, and view any errors that might occur. One other neat feature is that TrueCommand will backup the server configuration of each connected system daily. This is GREAT as I frequently forget to back up any configurations manually.
Recently, I had a boot pool failure on my Backup NAS. I am currently using a USB thumb drive for the boot device (bad, I know) and it started to encounter uncorrectable errors. Since the system was stil up, I left it running as it was still backing up my data and I had hoped I would visit my offsite location before the drive actually failed. Unfortunately, the drive failed and would not reboot cleanly. After a few weeks, I gave up having the time to visit the offsite location and shipped a new USB drive to the site and walked the non-technical site point-of-contact through plugging in the new drive. Once plugged in, I was able to access the Dell R310's iDRAC interface, set the new USB device as the boot device and install a new version of TrueNAS. Ultimately, I was able to restore the backup configuration and get back online. My plan was to install the new version of TrueNAS, give the system an IP address, and upload the backup config.
I do have a few lessons learned:
- My latest backup from TrueCommand contained a corrupted database (sqlite?). While I didn't expect this, it's not overly surprising given that the boot disk was failing. Also, fortunately, I don't make config changes often so I was able to use the previous day's backup config.
- When I tried to "Upload Config" within the TrueNAS CORE UI, TrueNAS complained of an incompatible file format. While I originally panicked, I then assumed that TrueCommand used a different backup format than TrueNAS CORE itself. This turned out to be correct. I re-added the "unconfigured" server to TrueCommand in the NAS's "spot" (same IP). Once it was connected, I then clicked "restore" next to the backup that I wanted on the system. After a few reboots of the target server, the backup was successfully restored and functional.
- I also utilize snapshot replication between my primary NAS and my backup NAS. After the configuration was restored to the backup NAS, I was unable to get my replication jobs to succeed. They all failed with an "authentication failed" error. I ended up doing a few things:
- I coped the SSH keypair private and public keys from the backup NAS to the primary NAS (for some reason, they were different)
- I created a NEW SSH connection using the updated SSH keypair - This connected added successfully
- I took the host key from the new SSH connection and copied/pasted it to the old SSH connection.
- At this point, the snapshot replication tasks started to succeed and I was able to remove the "NEW SSH connection"
Note: I will be going more in depth about my storage architecture and backup strategy in future post(s).