====== Data Security Principles and Guidelines ====== ===== Backups ===== It's often said "RAID is not a backup". But without further discussion, it's not totally clear why. The real reason, these technologies are, generally, designed to solve different kinds of disasters, and, generally, backup is more important. The purpose of the **RAID** is //to reduce the possibility of the **downtime** caused by storage hardware failure//. This is very precise and narrow definition, which is good. RAID originated in times when we used hard disks that are not very reliable, and the question of failure was not "if" but "when". The natural desire was to introduce some redundancy into the system //a priori// so it will survive (not fail, not even stop serving) a failure of one or few hard disks. The action of replacing failed disks can be then postponed to convenient time frame, or even performed on-line. The purpose of the **backup** is //to reduce the possibility of the **damage to data** (often referred to as "data loss") caused by hardware malfunction, software bugs, human error or malicious actions, and natural catastrophic events//. This by far supersedes the limited protection to the data that is offered by the RAID, but it's use requires the downtime to retrieve the data from the backup media. As it must now be clear, properly designed system must make use of both RAID and backups. To be able to help surviving the natural catastrophes such as floods and volcano eruptions and malicious actions such as theft and war, backups should be stored off-site, preferably far away from the original system. This principle is called "geographic redundancy". A few more less obvious design principles follow. ==== The management of the backup system must be done independently from the management of the target system ==== Since the backup is done to protect from human factor, it shouldn't be a single human (or department) who controls both the system and its backups. If the backup administrator is malicious, we will still have the original system. If the system administrator is malicious, we can at last rely on backups. This is very often overlooked, which results in situations like "disappointed sysadmin destroyed the system and deleted all the backups before leaving". This is one kind of a situation we wanted to avoid. In small organizations this rule is very hard to maintain. Therefore, small organizations who rely on hired or outsourced system administrator should outsource (elsewhere) and regularly audit their backups, while the "in-home" admin should not be allowed to control them. They must be only able to create backups and to restore from backups. ==== The client system which is backed up should be not allowed to delete its own backups ==== Backups are done to facilitate the possibility to recover data even if the target system is compromised. But if the system was compromised //and// can itself remove its backups, it defeats the purpose of having external backups at all. We consider backup as the way to protect from any human factor, even malicious one. The pruning of expired backups should be performed by the backup system itself, and set up and controlled independently from the client system. ==== Backup usability should not depend on the survival of the system that was backed up ==== In case a backup client is lost completely, the backup is the only means to recover. Often this means, we need to backup some application configuration along with the data, or the system image at some large intervals. ==== Backup usability should not depend on the survival of the backup system, only it's datastore / media set ==== In other words, having only the datastore should be enough to extract backups. The live backup system often uses databases to track everything and also has local configuration, but that database should be considered supplementary and not essential, and the configuration should be possible to recreate from scratch. Some backup systems (namely, Bacula/BareOS) employ the following strategy to implement this requirement: there is a possibility to backup the database and configuration of the backup system itself into its own datastore, and they also provide tools to scan the datastore, identify and extract arbitrary jobs from the backup. So the recovery procedure for the lost backup system becomes the following: one first installs the fresh backup system, and then finds and extracts the job where the latest configuration and database is saved. With that, it's possible to reinstante the backup system into the state it had when that backup was done. Then, all the other restores can be done using the running backup system, as usual. Others (Proxmox Backup) just store all the essential state information inside the datastore. To recover, we just point at the existing directory when adding the datastore, and it suddenly finds all the info.