Threat model

Underscore Backup is built to not rely on the security and integrity of any given backup destination as much as possible. It is assumed that where you place to decide to store your backups is not necessarily a trusted environment and could be accessed and potentially tampered with by untrusted parties such as system administrators or service providers.

Information herein relates to backups taken by version 2.0.4 or later of the client software.

General assumptions

The system where the backup is running is trusted. Even though the private key for the backup is never stored on permanent media this system will contain an unencrypted set of metadata for the backup.
User will always use a trusted and official version of the software.
The private key password is kept secret from any adversary.
Advances in cryptography or computing power do not compromise any of the encryption primitives used such as Argon2, AES-256-GCM, SHA-256, and SHA3.

Security guarantees

Encrypted contents of a backup cannot be accessed without the private key password. This includes both file contents, filenames, and directory contents.
All data is authenticated with the private key password. Any tampering with the backup from outside the system where the backup is running will be detectable.
Any data that has been tampered with will not be decrypted.

Potential attacks

An adversary with access to your backup destination could.

Attempt to brute force your private key from your public key and Argon2 hash. Make sure you use a strong password. As a mitigation, the Argon2 hashing algorithm is specifically designed to be hard to brute force even with specialized hardware.
Estimate the size of the backup from the size of the stored data in the backup destination.
Delete data. As mitigation, either back up to multiple destinations or use a destination that does versioning or does not allow deletion (Such as S3 for instance).

An adversary with network access could.

Attempt a DOS attack by preventing traffic between the client and service.
Infer backup size by observing network traffic.
Determine where you are storing your backups.
Determine from where you are creating backups.
By default, the application will have deduplication across sources enabled. This means that an attacker with the same data can infer if you have that data in your backup for large files (Roughly more than 8 MB). This behavior can be disabled by setting the global property crossSourceDedupe to false in the Settings page of the application.

An adversary with access to the system where the backup is running would be able to.

Delete data from the destination. A possible mitigation is to use a destination with versioning or deletion protection.
Write corrupted backup information.
Compromise any part of the backup system by modifying binaries, and intercepting passwords and private keys. However, an attacker would not be able to access decrypted backup contents from the destination unless the user was made to enter the private key password for the backup since the private key of the backup generally does not exist on the host except when needed.

An adversary with access to your backup password could.

Access any data in your backup. You can however re-encrypt using a new master private key using the command line underscorebackup change-password --force. Just changing your password either through the UI or without the --force flag will only change the password and not update the private key.

UI access considerations

The UI by necessity is served using a non encrypted connection to the browser since it is not possible to generate certificates for localhost connections. However, all communication with the service using the API is both authenticated and encrypted using PFS based AES256 encryption which should stop any passive monitoring of data. This is not secure against man in the middle attacks, however given that the communication is on your local host the risk only exists if your local system is already compromised. Finally when using UI authentication this is based on the knowledge of the public key as derived from the private key password. The private key is not stored in memory, not is it transfered during the call as part of the authentication process.

Service-specific considerations

This section details any specific threats that come from using the companion service.

Using service accounts

Using service accounts does present an additional attack surface for an adversary.

An adversary that gains access to your service account could.

See the service storage usage of all the backup sources tied to your account.
See the location of any backup sources that bind to any interface (Not the default).
See if the service is used for storing at least part of the backup.
If private key recovery is enabled for a source an adversary would be able to access the backed-up data. If private key recovery is not enabled this would not be possible without also having access to the source private key password.

An adversary that gains access to the service could.

See any sources and accounts in the service, including their storage usage and shares.
Access any backup data where private key recovery is enabled and the adversary manages to guess the email of the account. An attacker that also manages to break into the billing system of the service will be able to access any backup data that has private key recovery enabled and billing emails enabled.
Delete customer data from the back end. For the paranoid, the mitigation is to use multiple different destinations for your backup data.
Corrupt customer backup data (It would be detectable, but potentially unrecoverable).

There are many levels of protection to ensure an adversary will not achieve this and in the unlikely event that they would, the impact would be limited to a smaller blast radius. However, due to security concerns, these will not be detailed here. The list above lists the absolute worst-case scenario though.

Private key recovery

If private key recovery is enabled this represents a convenience that comes at the cost of some security. For additional security when key recovery is enabled then disable billing emails. In this case, the email for the account is not stored anywhere in the service and it is required to access the private key recovery (The private key is encrypted with the email as the password in this case).

Announcing Underscore Backup 2.0 and service general availability

The first stable version of Underscore Backup with support for the companion service is now available. At the same time, the service itself is now generally available.

Even with the new service, the main focus of the application is privacy, resiliency, and efficiency. The new service does significantly simplify setting up cloud backups and sharing though compared to use

The main new feature in version 2.0 is the introduction of a companion service that will help with many aspects of running Underscore Backup such as.

Keep all your sources organized in one place to easily restore from any of your backups to any other backup.
Help facilitate sharing of backup data with other users.
Optionally allow private key password recovery.
Easily access application UI even if running in a context where a desktop is unavailable, such as root on Linux.
Use as a backup destination. Storing backup data is the only feature that requires a paying subscription, giving you 512GB of backup storage for $5 per month.
Support multiple regions of data storage supporting North America (Oregon), EU (Frankfurt), and Southeast Asia (Singapore) regions to satisfy latency and data governance requirements.

On top of the companion service changes, the following features and improvements have also been implemented.

Added support for continuous backups by monitoring the filesystem for changes.
Introduced a password strength meter which requires a score of at least “ok” when setting up.
Switched from pbkdf2 to Argon2 for private key hashing function.

On top of these, there are tons of other minor improvements and stability enhancements.

Get started by downloading the client today.

Terminology

Adopting

The task of taking an existing backup Source and using it from a new installation of the client. Involved doing a Rebuild operation from the remote Manifest.

Block

A single piece of data is stored in a destination. A block is usually a few MB in size and can contain anything from a small part of a very large file to many small files.

When working with a backup, the system will always upload and download entire blocks from a backup destination.

Destination

A location where backup data is stored. A destination is usually defined by a type, a location, and a set of credentials.

Source

A source refers to a backup of a single system. Think of it as a specific installation of the Underscore Backup client running on a specific service. You usually start a restore by selecting which source you want to restore from.

A share is a subset of a source that is shared with another Underscore Backup user. Shares use a different Private Key for encryption which means that a recipient of a share can only read the files in the shares that have been explicitly shared and not the entire source.

File

Refers to the collection of all the versions of files from a specific location on your system that has been stored so far in your backup.

File version

A specific version of a file representing the contents of the file at a given time.

Manifest

Refers to both the local metadata stored for a backup where the client is running as well as its representation in a Destination used to recreate it from a backup. In the local representation, it contains your configuration, public encryption key as well as a database of your entire Source contents. The equivalent representation in the Destination contains the configuration, key but instead of a backup contains a log of all the changes that were made to the local manifest database to get it to its current state.

Operations

Sometimes when long-running tasks are being performed the progress is reported as Operations. Each operation just represents a single atomic step that needs to be performed before the entire task is completed and can mean many things and should only be seen to get progress as to when the overall task will be done.

Optimize log

The remote backup representation of a backup is a continuous log of all the changes made in a Source. If you have a backup running for a long that this log can become inefficient since it will contain information about everything that has existed that might have been deleted or changed. To solve this problem the system by default runs a scheduled maintenance task to optimize the log once a month to write the log of exactly what the backup contains right now which replaces the entire log of the backup up until that moment.

Private key

The key that is required to restore data from your backup. It is derived from your backup password. The public key is derived mathematically from this key.

Public key

The key required to write backup data. It is mathematically derived from the private key in a way where it is easy to derive the public key from the private key, but impossible to infer the private key from the public key.

Rebuild

The task of rebuilding the local manifest database from the log stored in the backup Destination for the manifest.

Retention

The settings for a backup set that define how many file versions should be retained and when old versions or files should be deleted from the backup.

Schedule

How often a certain operation should happen. Usually this refers to how often a backup set should be scanner for updated files.

Set

A collection of settings that define a group of files, with a schedule, retention, and destinations.

Trimming

The task of trimming refers to applying retention settings on your repository. It usually runs after the end of each completed backup set.

Retention explained

Retention in Underscore Backup is very flexible but can be somewhat tricky to understand. For each backup set, you can specify different retention. There is also a global setting for retention that is used for all files that are not contained in any current set. A file can end up existing in your backup but not being in a set after you change your definition of a set.

Retention is defined in four parts. First, is the default initial retention which is defined as the maximum number of versions of a file during any given time period. For instance, you could have your default retention be set to 15 minutes which means that even if a file is being changed every 5 minutes the backup would only contain one copy per 15 minutes. You can then progressively move to less and less frequent copies as the version grows older. There is also a setting to indicate how long files should be kept around after they have been deleted. Finally, there is a catch-all setting for how many versions at most to keep of a file.

To explain let's look at this example.

In this example, the application would keep one version of the file at most every 15 minutes. Once a version is older than 1 month only a single version per day is kept and all the other potential versions will be discarded. After a version becomes 6 months another culling will take place and only a single copy per month will be retained. Finally, the file will be kept for up to 1 month after it has been deleted from your system.

There is no maximum number of versions kept, if this was enabled at 10 it would be applied on top of the previous options so there would never be more than 10 versions of any single file.

Retention in combination with continuous backups

Continuous backups complicate retention settings because if you have a file that changes very often it would be very inefficient to replace the most recent version of a file every time it changes. To solve this when the continuous backup is enabled the system will only save a new file once during a retention period.

As an example, let's say that we have a file that updates once every minute for 20 minutes while the retention for the file is to keep a version every 15 minutes. What would happen then is that the first version would be saved immediately after the first change. After this, the next version will not be stored until after 15 minutes. The system will then keep track of that the file has changed more after this for another 5 minutes until the changes stop, but the latest version of the file will not be saved until 30 minutes after the beginning of this cycle (15 minutes after the last change).

How retention is applied

Retention is applied at the end of the completion of each set. If a backup is interrupted while still in progress to for instance do a restore this operation could see slightly more versions than what retention specifies since files have been stored, but any potential culling by adding them has not yet happened. If you are looking at the UI while a backup runs you will see a status of Trimming while the retention is happening.