Signal already added quantum safe encryption to its messaging platform

Signal recently announced that they have already added quantum safe public key crypto to their messaging platform. This is a great step forward for post quantum crypto and a good first proof of concept for the CRYSTALS-Kyber encryption protocl currently a finalist in the NIST post-quantum cryptography project.

As soon as a winner has been announced in the NIST competition has been announced you can expect that Underscore Backup will be upgraded to support it as an encryption algorithm for its backups. We are currently holding off because until the review is complete it is possible that a compromise might be found (Another finalist was recently compromised by academics). However, rest assured that Underscore Backup is closely monitoring this space with the goal of keeping our customers as safe as possible with current available technology and updates will be made available for free as always.

Photo by Fractal Hassan on Unsplash

What should you keep backups of?

Deciding what to back up is a tricky thing and can differ a lot depending on the user. Most personal backup solutions are designed for people who have most of their work in medium-sized files stored in a handful of folders on their computers such as office documents, some photos, etc.

Underscore Backup is designed for tinkerers who tend to fiddle on their computers all over the place and for whom it would be too time-consuming and error-prone to try to keep up with what changes you made ahead of time so aims to back up everything, big or small. It also allows you to have full control of your retention of historical changes and deleted files.

This might sound strange, but I can personally remember several times when I realized a few months after upgrading my computer that I wished I had remembered to copy that one file off of the old computer where I had tweaked an obscure settings somewhere to make it do something I wanted. To solve this the entire storage methodology for the application is specifically designed with multiple different storage techniques supported which will be chosen automatically depending on the data to support both millions of very small files and a few very large files (Several TB are no problem, once you start getting to PB it is not entirely efficient but could be easily extended if the need arises). It can also obviously handle any combination of the two, for instance, I have a single server where I store many large media files as well as several hundred thousand files of router packet logs.

Because of this, the default settings for Underscore Backup will back up everything in your home directory with a few explicit exclusions which are known to be very unlikely to contain useful data (Such as the browser cache folders for instance). Due to its efficient file handling, it can do this without causing your backup size to balloon excessively and still allowing you to efficiently browse, search, and restore any data you want.

Picture by Elisa Ventur on Unsplash

Version 2.3 with new MacOS app as well as many new features, stability, and security enhancements

The new 2.3 release is available to download now from the download page.

Major new features.

Completely new MacOS UI implementation that is now just a regular application, not a package.
Added the ability to run Linux and Windows platforms as a service or root while still controlling through a non-privileged application.
Encrypt all API communication between UI and service.
Implement a new secure custom authentication mechanism based on knowledge of backup password instead of separate credentials.
Detect corrupt local repository and add a method to repair it.

Minor improvements.

Sort schedule sets in order of the next scheduled run.
Improved performance for Linux metadata storage.
Option to disable backups when CPU load is high.
Increased performance for uploading metadata logs.
Several tweaks to button labels and status messages in UI.
Improved performance for large directories and files on MacOS.

Notable bug fixes.

Fixes to continuous backups for Linux and OSX.
Fixed issue with eventual consistency that could sometimes cause a single log file to be left from old logs after performing a log optimization.
Detect and remove orphaned backup files that are not referenced by any directory entry.
Fixed issue with symbolic links (Junctions) on Windows.
Fixed an issue where very long directory names could confuse the root path.

Also contains numerous minor bug fixes and tweaks.

Major performance improvements coming in version 2.3 soon

The next release will feature new handling of log uploads where this is now multithreaded. On a test system with a backup with aorund 5 million files and 50TB of data a log optimziation operation went from taking 90 minutes to 17 minutes or finishing almost 6 times faster than the previous release.

Other things also included in the new release will be improvements running as a service on Windows and Linux. Better handling of continuous backups. Detection and correction of corrupted local metadata repositories. This on top of a myrriad of other minor improvements and stability enhancements.

The new release is almost ready and you can try it out now with a release candidate from the download page.

A new version is coming soon with significant new functionality!

The new release includes a ton of new functionality such as the ability to run as a service on Windows and Linux as well as a completely new and better-integrated MacOS installer. All API traffic between UI and local service is encrypted. Detection and repair of local repository metadata have been added. You can try out the newest release candidate right now from the downloads page.

Why Underscore Backup was created

I started running a server for storing all my projects as well as various multimedia artifacts in 1999 with a small desktop computer and a 20GB HDD. As the size and personal importance of this server grew within a few years I started running RAID5 and then RAID6 to make sure data was not lost from single drive failures. Despite this, in 2006 the current incarnation of this server encountered a catastrophic 3-drive failure which I only managed to recover from after a tremendous amount of work and a fair amount of luck which included among other things manually patching the Linux RAID kernel code to remove certain fail-safes as I pulled data off the partially assembled RAID.

"The Server" in its current iteration.

This episode led me to look for ways to safeguard against this ever happening again. Looking through what options were available to me I found Crashplan which did address all my needs at a reasonable price. My initial backup to Crashplan took several years to complete over my 20mbit/s broadband uplink as my server had at this point grown to several TB.

A few years after I started using Crashplan they stopped offering consumer backups and the only way to keep using them was to migrate to their business plan which I did. However, Crashplan only allowed you to migrate a few TB per computer at the time which meant that I had to re-upload most of my backup again. Fortunately, at this point, I had gotten a fiber internet connection with a reasonable uplink that allowed me to re-upload this data in less than a year. As my backup of this server grew Crashplan also started showing its flaws where it required several GB of memory to be able to back up my server, but it did work and allowed me a reasonable peace of mind for the contents of my server.

This went on for a few years after which I was contacted by Crashplan (Now called Code42) and told that unless I reduced the size of my backup to under 10 TB, they would terminate my account since they considered me violating their terms of service by keeping too large a backup.

From: Support Ops (Code42 Small Business Support) 
Date: Feb 6 2020, 10:38 AM CST 

Hello Administrator,

Thank you for being a CrashPlan® for Small Business subscriber. We appreciate the trust that you have placed in CrashPlan - that relationship is important to us. Unfortunately, we write to you today to notify you that your account has accumulated excessive storage, which will result in degraded performance. You have one of the largest archives in the history of CrashPlan. It is so large, we cannot guarantee the performance of our service. Due to the size of your archive, full restores of your backup archive, and even selectively restoring specific files, may not be possible.

As a result, we are notifying you, per our Master Service Agreement and Documentation, to re-duce your storage utilization for each device to less than 10TB by June 1, 2020. Note that we have extended your subscription to June 1, 2020 to give you ample time to make changes. If you do not do so by June 1, 2020, your subscription will not be renewed, and your account will be closed at the end of your current subscription term.

…

Thank you, 
Eric Wansong, Chief Customer Officer, Code42

The server I was using was Linux based and as far as I could tell Crashplan was the only competitor on the market providing cloud-based backup solutions for that OS. This was when I decided to start working on Underscore Backup as a means for me to continue making backups of my server as I couldn’t find any existing alternatives that fulfilled my needs. The first version was command line only and very primitive even though it did support point-in-time recovery, backup sets as well as obviously efficiently handling my very large backup. Another feature that was built in from the beginning was a strong focus on encrypting everything as much as possible so that any medium could be used for backups even if it was not properly secured from prying eyes. Creating the initial backup of my server using Underscore Backup used a more or less sustained 600mbit/s (To be compared with the at the time impressive 60mbit/s that I experienced using Crashplan on the same connection).

At the same time, I also started using the iDrive service for backing up my laptops and various other smaller Windows and MacOS based machines. I did this because I didn’t think the CLI (Command Line Interface) only implementation of Underscore Backup was just not convenient enough to be used on these machines). This situation continued for a few years when the CLI-only version of Underscore Backup backed up my server data to cloud block storage and my other machines were backed up by the iDrive service. This all came crashing down when my main development laptop of several years had a catastrophic SSD failure and I had to restore my data from iDrive. I found out two things about how the iDrive service works.

The first is that even though iDrive keeps track of versions of your files they do not keep track of directory contents and deletions of files. This is critical to any developer, and I restored a large developer repository with files that I have been working on as I have been running iDrive in the background. For those of you who are not developers, we rename files a lot. And every one of the old names of all my renamed files was restored back when I did a full restore of the contents of my laptop’s hard drive. That meant, that any repository of code that I had basically worked on since I started using iDrive was no longer in a buildable state without a considerable amount of work.

The second surprise to me was that even though to me the iDrive backup of my laptop was relatively small, only around 50GB in size it took almost 2 weeks to restore. Granted it contained a large number of files (Around 3 million, mostly small, files) but I was shocked at the slowness of its performance. I also opened up several support cases with iDrive about this but it was nothing they could do to help me. For comparison, on the same network with roughly the same sized backup in both files and total storage Underscore Backup would complete a similar restore in about 5 minutes (And it would do it properly keeping track of deleted files).

At this point, I evaluated other solutions available but could not find any that would be suitable for my needs. Carbonite does not allow you to specify what files should be backed up but instead in the interest of simplicity tries to be smart about it, when I tried it on my development files it decided to back almost none of them even though I specifically said to include the directory. Backblaze is a very solid solution but also does not keep track of deleted files for a true point-in-time recovery same as iDrive. In the end, I decided that I would put in the effort needed to create an easy-to-use user interface for Underscore Backup so that it would be suitable for use on things other than servers. The end result of these efforts was the first stable release of Underscore Backup in the summer of 2022 and which at that point graduated to be the only backup solution I used on all my computers.

The problem at this point though was that even though I had a backup solution that fulfilled all my needs it was still very tricky to set up for most users since to use it you generally had to supply your own cloud storage such as Amazon S3. It was also quite tricky to access data from other sources you had backed up since every source had to be set up individually on each client you wanted to restore the source on. The sharing functionality, even though present was also so complicated that I am relatively certain nobody managed to set this up except for myself. To solve all of these problems I decided to leave the service-less nature of the software I had followed up until that point and create a service to both remove the need to provide separate cloud storage and also help manage multiple sources and set up shares. This was a relatively large undertaking, but it eventually led to the launch of Underscore Backup 2.0 in the first half of 2023.

This current release as of this writing is the upcoming 2.2 release which has made it very easy to set up backup of multiple computers of any size while staying true to the original guiding principles of security, durability, efficiency, and flexibility.

Threat model

Underscore Backup is built to not rely on the security and integrity of any given backup destination as much as possible. It is assumed that where you place to decide to store your backups is not necessarily a trusted environment and could be accessed and potentially tampered with by untrusted parties such as system administrators or service providers.

Information herein relates to backups taken by version 2.0.4 or later of the client software.

General assumptions

The system where the backup is running is trusted. Even though the private key for the backup is never stored on permanent media this system will contain an unencrypted set of metadata for the backup.
User will always use a trusted and official version of the software.
The private key password is kept secret from any adversary.
Advances in cryptography or computing power do not compromise any of the encryption primitives used such as Argon2, AES-256-GCM, SHA-256, and SHA3.

Security guarantees

Encrypted contents of a backup cannot be accessed without the private key password. This includes both file contents, filenames, and directory contents.
All data is authenticated with the private key password. Any tampering with the backup from outside the system where the backup is running will be detectable.
Any data that has been tampered with will not be decrypted.

Potential attacks

An adversary with access to your backup destination could.

Attempt to brute force your private key from your public key and Argon2 hash. Make sure you use a strong password. As a mitigation, the Argon2 hashing algorithm is specifically designed to be hard to brute force even with specialized hardware.
Estimate the size of the backup from the size of the stored data in the backup destination.
Delete data. As mitigation, either back up to multiple destinations or use a destination that does versioning or does not allow deletion (Such as S3 for instance).

An adversary with network access could.

Attempt a DOS attack by preventing traffic between the client and service.
Infer backup size by observing network traffic.
Determine where you are storing your backups.
Determine from where you are creating backups.
By default, the application will have deduplication across sources enabled. This means that an attacker with the same data can infer if you have that data in your backup for large files (Roughly more than 8 MB). This behavior can be disabled by setting the global property crossSourceDedupe to false in the Settings page of the application.

An adversary with access to the system where the backup is running would be able to.

Delete data from the destination. A possible mitigation is to use a destination with versioning or deletion protection.
Write corrupted backup information.
Compromise any part of the backup system by modifying binaries, and intercepting passwords and private keys. However, an attacker would not be able to access decrypted backup contents from the destination unless the user was made to enter the private key password for the backup since the private key of the backup generally does not exist on the host except when needed.

An adversary with access to your backup password could.

Access any data in your backup. You can however re-encrypt using a new master private key using the command line underscorebackup change-password --force. Just changing your password either through the UI or without the --force flag will only change the password and not update the private key.

UI access considerations

The UI by necessity is served using a non encrypted connection to the browser since it is not possible to generate certificates for localhost connections. However, all communication with the service using the API is both authenticated and encrypted using PFS based AES256 encryption which should stop any passive monitoring of data. This is not secure against man in the middle attacks, however given that the communication is on your local host the risk only exists if your local system is already compromised. Finally when using UI authentication this is based on the knowledge of the public key as derived from the private key password. The private key is not stored in memory, not is it transfered during the call as part of the authentication process.

Service-specific considerations

This section details any specific threats that come from using the companion service.

Using service accounts

Using service accounts does present an additional attack surface for an adversary.

An adversary that gains access to your service account could.

See the service storage usage of all the backup sources tied to your account.
See the location of any backup sources that bind to any interface (Not the default).
See if the service is used for storing at least part of the backup.
If private key recovery is enabled for a source an adversary would be able to access the backed-up data. If private key recovery is not enabled this would not be possible without also having access to the source private key password.

An adversary that gains access to the service could.

See any sources and accounts in the service, including their storage usage and shares.
Access any backup data where private key recovery is enabled and the adversary manages to guess the email of the account. An attacker that also manages to break into the billing system of the service will be able to access any backup data that has private key recovery enabled and billing emails enabled.
Delete customer data from the back end. For the paranoid, the mitigation is to use multiple different destinations for your backup data.
Corrupt customer backup data (It would be detectable, but potentially unrecoverable).

There are many levels of protection to ensure an adversary will not achieve this and in the unlikely event that they would, the impact would be limited to a smaller blast radius. However, due to security concerns, these will not be detailed here. The list above lists the absolute worst-case scenario though.

Private key recovery

If private key recovery is enabled this represents a convenience that comes at the cost of some security. For additional security when key recovery is enabled then disable billing emails. In this case, the email for the account is not stored anywhere in the service and it is required to access the private key recovery (The private key is encrypted with the email as the password in this case).