1. The paradox
A worrying tendency has been growing exponentially in the last two decades, worrying because it can give a totally false impression of security to its adepts. This tendency has several denominations like digitization or de-materialization.
All the knowledge that has been archived on traditional support (stone, paper) by our ancestors since the invention of writing, approximately 5300 years ago, is being migrated onto this new so-called “numerical” support, fundamentally different. Functionally fundamentally different because it gives data a new dimension which can be defined as the intersection of ubiquity and instantaneousness and which literally revolutionizes sharing capabilities. But also fundamentally different by essence. Data becomes virtual, its physical representations and locations obscure for most of the humanity, the knowledge of the machinery of this new technology is shared only by a relatively small community of skilled technicians. A characteristic of this new support is its dependency to energy. Depending of the numerical support, absence of electrical power, means at least impossibility to read the data, and at most total loss of the data. We know that energy management is going to be a key challenge in this millennium. What would be the consequences if energy resources were to be one day insufficient to keep alive this gigantic amount of virtual data which is growing at such an incredible speed ?
Data destruction sinistrality takes also a new dimension with this technology revolution. History has been marked by several acts of intentional or accidental data destruction (acts often materialized as book burnings) and such regrettable events continue and will for sure continue to happen in this new numerical era. The two main differences that can be noticed in the new destruction events are their magnitude and instantaneousness. Who has not experimented a data loss or corruption due to a virus (criminal event) or hard disk failure (accidental event) ? How much data was present on this hard disk that just failed ? Today, 1 Tera bytes disks are common, they can easily host the equivalent of a million of books ! The next generation saving units will be more and more capacitive, probably in the order of the Peta bytes in 2025 if we consider that Moore’s Law also applies to hard disks! Risks take definitely a new magnitude. The reasoning can also be pushed at organization level. Consequences for an organization that would lost its data after an electromagnetic attack are disastrous. Electromagnetic weapons do exist and have already been successfully tested on real companies ! It is not an hazard if companies that are totally dependent on their information systems choose, if they can afford it, to bury their electronic equipments in bunkers !
I have the feeling to observe a paradox in my daily life. On the one hand we are (or can be if we decide to) aware of the dimension of these new risks, on the other hand we blindly archive under this new format every single event of our daily life ! The next part of this article tries to define the security measures that should be taken to limit the risk of personal data lost and proposes a practical technical implementation.
2. Risks Analysis
Data loss causes are either accidental or criminal.
Accidental causes are events such as :
- hardware failures (electronic or mechanicals components)
- human error
- software bugs
Criminal causes are events such as :
- hardware thefts
- hacking (viruses, system compromising ..)
- Electromagnetic attack
3. Requirements for a safe backup policy
Backup (or more generally redundancy) is certainly the best and only parade to data loss but it has to be done carefully as it can give a real false impression of security if not done properly. Indeed, defining and implementing a back-up policy is quite challenging as all the different risks should be taken into consideration. We will focus here on the use case concerning individuals willing to deploy a practical solution to secure their personal data. The case of data backup management within an organization is more complex to address as there are real-time security requirements, also because of its important size and the fact that it is usually spread on many physical locations, but the general principles exposed here are still valid.
A crucial parameter to take into consideration in the backup policy is the fact that damages after any of the previous causes are not necessarily immediately visible – part of the data may be in a incoherent state while the other isn’t. If the concerned incoherent files are diagnosed as corrupted too late, restoring the back-up may not help if this back-up has been done after the compromising. This is why it is fundamental to have a data backup technology allowing to restore the state of a file at different points in time. The earlier state it is possible to restore, the safer the backup is.
An other crucial principle is that the backup should be performed on a non-rewritable medium. It is indeed very easy to imagine a nasty virus capable of deleting in a few seconds all the data present on a rewritable backup medium like a external hard drive as soon as it is plugged into the system.
Delta-backup is a simple and efficient tool for cross system backup with differential archiving. I have designed it with the two principles listed above in mind. It is a simple PERL script invoking the powerful RSYNC utility which is the reference tool for file systems synchronization in the UNIX/LINUX world.
The principle of delta-backup is the following: When run for the first time, it performs a backup of a reference directory (the “source directory”) into a backup directory (“the destination directory”). Both directories can be physically located on different hosts. After this initial full backup, which depending of the source directory size can be quite time consuming, each time the script is executed, the destination directory is synchronized with the source directory. This synchronization step between the source and destination directory is much faster than the initial full backup as only the differential (files that have been created and modified since its last execution) is copied from the source to the destination. Delta-backup also duplicates the differential by archiving it (using the tar format) in a third directory, the differential directory (the “delta directory”). The list of deleted files is also archived in the delta directory. It is highly encouraged to regularly save the content of the delta directory on a non rewritable medium (typically an optical support). In case of a major incident (lost of both source and destination directories) it is therefore possible to restore an “acceptable” state of the data by successfully restoring the differentials archived (and deleting the files listed in the list of deleted files). The older the initial differential archive will be, the more acceptable the restored state will be. The script provides two optional features for the archiving of the differential which are compression and high-level encryption. It is therefore conceivable to safely ask someone to keep the differentials in a different physical location than the source or remote directories. Delta-backup can be downloaded here : https://sourceforge.net/projects/delta-backup/.