How Drobo is organized
Drobo is a series of multi-disk devices, which can be either DAS or NAS.
Typically, a Drobo has 5 or 8 drive bays, accepts disks in any combination of capacities and
produces maximum size single storage out of them, given the desired fault tolerance.
Fault tolerance may be single-failure or dual-failure.
This storage is, in turn, either divided into LUNs (virtual physical drives)
and presented to the OS as USB drive (Drobo DAS configurations),
or presented as a 16TB virtual physical drive to the Linux OS of the Drobo NAS version.
Drobo BeyondRAID technology
To combine several physical disks into a single storage, Drobo uses its own technology called BeyondRAID,
which is not like any standard RAID level. The BeyondRAID technology uses a combination of RAID1 and RAID5
for single redundancy and 3-way mirror along with RAID6 for double redundancy.
Drobo uses different RAID levels for metadata and user data. More than that,
even different block sizes can be used on the same Drobo disk pack and LUN.
Thin-provisioning in Drobo
With a Drobo, it is possible to create LUNs up to the limit size (16TB) regardless of the physical disk sizes in the pack.
To ensure this, some variant of thin provisioning must be used and Drobo does use it.
In practice, thin provisioning in Drobo is its vulnerable spot.
This is true for other similar schemes, for example for Storage Spaces;
but in case of Drobo, the implementation is rather extreme.
Drobo implements thin provisioning in the following way: all LUN space is divided into the blocks,
which can be as small as 4 KB. Next, inside Drobo,
these blocks are mapped onto one of the odd RAIDs of unknown level.
RAID arrays in turn are mapped onto the physical disks.
The map is stored only for data blocks that are in use.
The map, which can grow to hold 4 billion elements maximum (16 TB / 4KB ≈ 4*109) and occupies quite a lot of disk space,
is crucial for Drobo data recovery. If the map is damaged or lost,
there is no way to assemble data back from 4 KB pieces randomly scattered on the disks.
The possibility to recover Drobo data directly depends on whether you have the map or not.
Physically, the map is stored in two copies or three copies to satisfy redundancy requirements of the pack,
so if you have a physical failure, you do not lose the map immediately.
Logically, all the copies are updated simultaneously, so if something goes wrong with the map,
the bad change affects all the physical copies at the same time,
and there are no historical records of any kind to find out past states.
There are three main reasons your Drobo device can fail: box failure, disk failure, and filesystem failure.
Drobo box failure
This includes cases when Drobo disks are healthy, but apparently, something is wrong with the box. In these cases, recovery is quite simple – you need to take the disks out and place them to another compatible Drobo box, preferably the same model box replaced under warranty.
This includes cases when you lost more disks than your Drobo fault-tolerance allows. The recovery result depends on:
- whether it is possible to repair the failed disks,
how data is located on physical disks.
Although Drobo does rebalance data between disks,
it doesn't ensure that the data is distributed evenly across the disks.
BeyondRAID allows asymmetric data distribution and it may happen that a failed disk contains significantly smaller amount of data or no data at all;
for example, if this disk is the last added to the Drobo disk pack.
Another Drobo failure is a filesystem failure including an operating system failure on the host PC for Drobo DAS where the operating system controls the Drobo filesystem. File deletion and Drobo volume formatting also belong to this type of failure.
Drobo data recovery in case of such a failure is very complicated.
The difficulties are associated with thin provisioning in general and how it is implemented in Drobo in particular.
Thin provisioning in Drobo requires three components:
- The map, which stores information what blocks are physically located where. We have already discussed this.
- The mechanism of disk space allocation. This one is simple.
When a write request arrives for a particular block, but there is no such a block in the map,
BeyondRAID allocates a new physical block on the device, writes the data to it, and updates the map.
- The mechanism of freeing the disk space. The whole point of thin provisioning is to release the unused blocks and return them to the pool
of free blocks which then can be reused for other data - other filesystem cluster or, theoretically,
for other LUN or even be removed physically, for example if you pull the disk out of Drobo.
This is why you can replace Drobo disks with the smaller ones, decreasing physical size of the disk pack,
given that the data fits on the smaller disks. No traditional NAS or RAID can do this.
Background processes in Drobo
Modern thin provisioning implementations use TRIM to detect which data blocks are no longer used by a filesystem.
All that TRIM does is to inform a device what data blocks should be forgotten according to a current filesystem state.
While initially developed for SSDs, which greatly benefit from TRIM performance-wise,
TRIM technology is not limited to SSDs and can be used in any device. Notably, Microsoft Storage Spaces uses TRIM for its thin provisioning to work.
Drobo, however, was invented before TRIM and earlier than filesystems started supporting TRIM.
That is why Drobo developers had to invent another mechanism for determining what blocks are used and what are free.
They came up with making Drobo aware of the filesystem or several filesystems stored on the disk pack.
Drobo can peek into a filesystem metadata to get the list of occupied and free blocks so that to discard the unused blocks from the map.
This process is implemented as periodical garbage collections.
This very process makes it impossible to recover deleted files or formatted volumes because once it is complete,
the map for the formatted volume becomes empty and deleted files have their blocks unmapped.
However, it is difficult to predict when garbage collector (sometimes also called scavenger) runs – the schedule depends on the I/O load,
the ratio between reads and writes, amount of free space, and a calendar.
If you have deleted some data or formatted Drobo volume,
it is risky to try to recover data without getting the disks out of the Drobo unit or at least without cloning them,
because you cannot control when the garbage collector starts erasing all the map entries pointing to unused data,
which makes the data loss permanent.
- The Drobo is complex.
- The success of recovery depends on the block map - if you have it or not.
- The drives should be removed from the Drobo and cloned before any attempts are made to read them with the same or another Drobo.
Otherwise you risk irreversibly losing the map.