What is stored in a backup destination?

Underscore Backup was designed from the ground up to be agnostic as to what destination it uses to save the backup. This means that it specifically needs to efficiently store the contents of both extremely large and very small files, something that any backup destination might not support efficiently. Because of this Underscore Backup stores data in two different ways, the first is optimized for very large files and the second for small files. The final thing stored is the backup manifest which will be handled separately.

Large files

A large file is generally considered to be anything over 8MB of data. These files are split into 8MB chunks. Each chunk is then stored separately in the service as a block. The block is then addressed by a hash based on the unencrypted contents of the back. This content is then compressed and finally encrypted (Assuming you have encryption enabled).

The resulting data is then uploaded to the destination and a reference to the block is stored in the service. The beauty of using a hash of the unencrypted contents means that if you have the same contents stored in multiple places in your backup each block will only be stored in one place in your backup.

A backup file then contains a list of the blocks that are needed to reconstruct its entire contents. If a file contains more than 1000 blocks a special "superblock" is created and stored that contains no actual reference to data but only a list of other blocks. This is to ensure that each block itself can maintain a reasonable size.

If you would like to store a 1 TB-sized file that file would be comprised of 131,072 individual blocks. These blocks would be split into 128 superblocks and the actual backup file would contain these 128 superblocks to allow you to restore the entire contents of the file.

Small files

Some backup destinations have strict limits on TPS which means that you cannot store each individual tiny file in the backup destination. To solve this when storing small files several files are combined into a single block. Each small file is first compressed and encrypted individually using the SHA256 of its contents as the AES256 encryption key. It is then written to a file with a 4-byte header of its length followed by the encrypted blob of its contents. This can be repeated for any number of files until the total size of the block reaches 8MB in size. For each of the files we store which index in the block it is stored in and the SHA256 hash of the contents for that file.

When you have a completed block this entire block is encrypted again before being uploaded to the destination.

Doing the double encryption allows the system to safely share individual files from a small file block without necessarily providing the decryption key to all the data in the block because the inner decryption hash value would only be provided for the individual files being shared to the recipient.

The system also keeps track of all the hash of the contents for the small files so that if you have a large number of files containing the same contents only a single copy of that contents will be stored in the backup (Same as for large files).

Storing blocks

Either large or small blocks are stored in the destination under a folder called blocks and then the hash value of the block. This is where the entire contents except for the manifest of your backup are stored. If you do not have error correction enabled, then each block is stored in a single file. If you have enabled error correction, then each block is split into a predefined number of pieces which together contain both data and parity pieces (By default error correction uses Reed Solomon with 17 pieces of data and 3 pieces of parity).

Manifest data

On top of your actual backup data, you also store your manifest information. The manifest information contains three important files plus your change log. The first one of those files is the file called identity. This is just a unique identifier representing a unique installation of the backup application. Whenever you are modifying your back up the software first checks that this file exists and validates that it matches what the software has stored locally. This is to avoid making mistakes where two instances of Underscore Backup would be backing up data to the exact same location.

The second important file is the publickey.json which contains the hash of the public key used to encrypt your backup. It also contains the salt used to derive your private encryption key from your encryption password. This file is extremely important and if you were to lose it there is no way of recovering your backup. Also worth noting is that this file obviously does not contain your private encryption key. This file and the identity file are the only two files not encrypted in your backup.

There is a third file called configuration.json which is somewhat of a misnomer because even though this file does contain your application configuration expressed in JSON it is encrypted so you cannot read this from your backup (A good thing since it can contain things like destination credentials).

Finally, you will have one or more files sitting in a directory called logs under sub-directories with timestamps from when the logs were uploaded. These files contain a step-by-step list of all the changes that have been made to your backup manifest. These include things like adding a block, a file, or the contents of a directory. As you keep doing your back up the application will upload more and more of these logs and as you are doing a complete restore from a backup the first thing that happens (Which can take some time) is that all of these logs are processed to recreate your backup manifest.

Conclusion

With this system, a backup destination will generally only contain files that are roughly the same size (By default a few MB) with the exception of a few small, fixed files in the manifest. This allows you to store your backup data efficiently in pretty much any medium with high throughput, regardless of the underlying limitations.

Finally all the code for looking into how this works is available on GitHub, if you are interested EncryptedSmallBlockAssignment.java is a good place to start.

Second release candidate version 2.0.0rc2 is now available

This release marks a milestone in that it currently has no known bugs. This includes both on the service and the application. There are no new features with this release but plenty of fixes. There has also been considerable amount of work to increase the automated test coverage for the functionality added with the 2.0 release both in the application and the service.

You can get the latest version from the downloads page now.

First release candidate of Underscore Backup 2.0

The 2.0.0 release is now feature complete. Just need a few weeks and additional testing to make sure everything is rock solid before declaring it stable.

This release comes with integrations to this service, improved security through better encryption and hashing algorithms. Also now includes continuous backup support on top of the regularly scheduled backups that were supported before was the last major feature addition.

Get it now from the downloads page.

How does private key recovery work?

If key recovery is enabled the private key of your backup source is encrypted using your account email as the password using the same Argon2 algorithm as is used for other passwords. This operation happens in the client so the private key is never directly transferred to the service before hashing.

The resulting data is then encrypted using a KMS key before stored in the service. You also have an option of what region you wish this data to be stored in to further give you full control over the data sovereignty of this very critical piece of information.

To further protect your data, the email of your account is not stored anywhere in the system except for the billing system and only if you have email billing enabled. In all other cases only a hash of an email is stored or transmitted to the service (With a few notable denoted below). In the unlikely event of a system compromise both the backup storage service and the external billing system (Stripe) would need to be compromised for any risk of customer data being accessed. The only other times when the email is transmitted in clear text in the service (but not stored) to the service is when you sign up, reset your password or change your account email. You can change your email billing setting under you account settings page.

You always have the option to disable the private key recovery feature if this risk is unacceptable.

How does private key recovery work?

Private key recovery can only be started during initial application setup when adopting an existing source. At this point choose the "Private Key Recovery" option on the password page of the setup wizard. You will be prompted for a new password to apply once the private key has been recovered and then redirected to the Underscore Backup service where you will be prompted for your credentials before the stored encrypted private key is returned to the application where they can be decrypted using your account email address.

How is this handled when changing account email?

As described above the email is the key with which the private key is encrypted with which causes a problem when you are changing your account email. What happens in this case is that the old email and new email is kept in the browser when verifying the email change. The encrypted source private keys are then downloaded to your browser, decrypted with the old email, re-encrypted with the new email and uploaded back up to the service. Only after all private keys have been stored encrypted with the new email will the account email actually be changed. It is important to not that at no point during this operation is either the old email, the new email or any of the private keys handled in clear text by the service.

The only exception is the initiation of the email change at which point the email is sent to the service, although this email is never stored anywhere but is only used to send the password validation email.

What is asymmetric encryption and why should I want it for my backups?

Asymmetric encryption or public key cryptography is a class of encryption where you have a public key that is derived from a private key. To be a good asymmetric crypto it should be very hard to derive the private key from the public key. Examples of this kind of encryption is used in a lot of places such as PGP (Encrypted messages), TLS (In transit internet connections). The first popular asymmetric algorithm is RSA which is still in wide use even though it is starting to show its age. More modern algorithms include a variety of elliptic curve cryptography algorithms.

Underscore Backup uses the X25519 elliptic curve crypto to encrypt all its data. The way this works is that the private key for this crypto is derived from the password used to encrypt the backup. Only the public key is ever stored on disk or in the backup destinations. A neat feature of public key cryptography is that you need the public key to encrypt data, but you need the private key to decrypt the data. The upside of this is that when Underscore Backup is running in the background on your computer it does not even have the private key required to read your backup, it can only write backups.

What this means is that if your computer gets compromised that means that the contents of your backups are not compromised along with your computer. Only when you enter your password into your application to do a restore will the application have the private key in memory (It is never committed to disk) and it will make sure to forget it as soon as your restore operation is complete.

I know all about encryption, tell me the details please

The private key for encryption is derived from your password using the Argon2 algorithm. Once this is created during setup the public key and the password salt is stored in both your manifest directory and uploaded to your manifest backup destination. For every backup block (A backup block is around 8mb of backup data) that is stored in the backup destination a new X25519 private and public key pair is created. The block public key is stored in a header in the block, the block private key is combined with the backup public key using a Diffie-Hellman key exchange. The resulting 256 bit key is then used to encrypt the rest of the backup block using the symmetric AES 256 encryption scheme.

A Diffie-Hellman key exchange has the property that given two public/private key pairs if you combine the public key of one key and the private key of the other you end up with the same result as when you do it with the other public/private key pair. In the previous operation the private key for the block is discarded after the block has been written (And as noted the public key is written with the block itself). That means that after the block has been written only the backup private key can be used with the block public key to get the symmetric encryption key to read the block.

Beta for version 2.0 release and registration has been opened for service

The first release for the 2.0 beta has now been released and is available for download on for download.

The main new feature though is the introduction of this companion service that will help with many aspects of running Underscore Backup such as.

  • Keep all your sources organized in one place to easily restore from any of your backups to any other backup.
  • Help facilitate sharing of backup data with other users.
  • Optionally allow private key password recovery.
  • Easily access application UI even if running in a context where a desktop is unavailable, such as root on Linux.
  • Use as a backup destination. Storing backup data is the only feature that requires a paying subscription, giving you 512GB of backup storage per $5 per month.
  • Support multiple regions of data storage. Including Oregon, Frankfurt, and Singapore regions to satisfy latency and data governance requirements.

With this release, the registration for accounts on this service has also been opened.

On top of the companion service changes, the following features and improvements are implementing.

  • Switched from pbkdf2 to Argon2 for private key hashing function.
  • Introduced log rotation for the application log.
  • Move the schedule jitter onto a dedicated setting instead of a custom property and default to 1 hour.
  • Changed all references to passphrase to password.
  • Introduced a password strength meter which requires a score of at least "ok" when setting up.
  • Added detection of new versions and easily download and install from inside the application.

The one major feature planned before 2.0 is to be completed is to add continuous backup functionality.

Changing memory configuration

In some cases, especially if you wish to increase the number of parallel uploads and downloads you might wish to increase the maximum heap memory usage for the application.

You need to find a file in the distribution of the application called underscorebackup.cfg. The location of this file depends on your OS. On Windows it is located in C:\Program Files\Underscore Backup\app\underscorebackup.cfg and there is also a second file used for the GUI application called C:\Program Files\Underscore Backup\app\underscorebackup-gui.cfg. On Linux it is located in /opt/underscorebackup/lib/app/underscorebackup.cfg. On MacOS you need to open the Underscore Backup app bundle (Right click, open) and then find the file lib/app/underscorebackup.cfg.

Once you found this file (Or files on Windows) you need to edit the following line.

[JavaOptions]
java-options=-Xmx256m

To increase your usage edit the -Xmx256m to reflect your new value. For instance if you want to use 1GB of memory you could edit it to -Xmx1024m.

On Windows and Linux this edit will be persisted between application upgrades as of version 2.0.0pre2 and later. However on MacOS unfortunately this change will need to be made every time the application is upgraded.