CloundFounders: Cloud Storage Router

The latency of DSS on top of slow SATA disks is of course pretty bad. CloudFounders solves this by using an intelligent flash cache that caches both reads and writes. The so-called “Storage Accelerator” is part of the Cloud Storage Router and runs on top of the DSS backend.

First of all, the typical 4KB writes are all stored on the SSD. The 4KB are aggregated into a Storage Container Object (SCO), typically 16MB in size. As a result, the flash cache is used for what it can do best (working with 4KB blocks) and the DSS SATA backend will only see sequential writes of 16MB. The large SATA magnetic disks perform a lot better with large sequential writes than small random ones.

A server with virtual machines can connect to several Cloud Storage Routers. Blocks of a virtual disk can be spread over several flash caches. The result is that performance can scale with the number of nodes where Cloud Storage Routers are active. The magic to make this work is that the metadata is distributed among all cloud storage routers (using a Paxos distributed database), so “hot blocks” can be transferred from several flash caches of several nodes at the same time. The metadata also contains a hash of each 4KB block. As hash codes of each 4KB block in the cache are compared, the cache is “deduped” and each 4KB block is written only once.

As the blocks of virtual disk are distributed over the DSS, the failure of compute or storage nodes does not need to be disruptive. Compute node failure can be solved by enabling VMware HA, storage nodes failures can be solved by configuring the DSS durability policy. As the metadata is distributed, the remaining cloud storage routers will be able to find the blocks on the DSS that belong to a certain virtual disk.

Last but not least, the Cloud Storage Routers can also distribute the check blocks over cloud storage like Amazon S3 or an Openstack Swift implementation. The data is also secure as the blocks are encoded and spread over many volumes. Only the Cloud Storage Routers know how to assemble the data.

Highly scalable, very reliable, and very flexible (e.g. when used with Amazon S3): the CloudFounder Storage Router sounds almost too good to be true. We have setup a Storage Router in our lab and can confirm that it can do some amazing things like replicating an ESXi VM across the globe and booting it as a Hyper-V VM. We are currently designing some benchmarks to compare it with traditional storage systems, so we hope to report back with some solid tests. But it is safe to say that the combination of Bitspread and the Cloud Storage Router is very different from the traditional RAID enabled SAN and storage gateway.

CloudFounders: No More RAID NetApp: Flash Anywhere
Comments Locked

60 Comments

View All Comments

  • jhh - Wednesday, August 7, 2013 - link

    And Advanced/SDDC/Chipkill ECC, not the old-fashioned single-bit correct/multiple bit detect. The RAM on the disk controller might be small enough for this not to matter, but not on the system RAM.
  • tuxRoller - Monday, August 5, 2013 - link

    Amplidata's dss seems like a better, more forward looking alternative.
  • Sertis - Monday, August 5, 2013 - link

    The Amplistore design seems a bit better than ZFS. ZFS has a hash to detect bit rot within the blocks, while this stores FEC coding that can potentially recover the data within that block without calculating it based on parity from the other drives on the stripe and the I/O that involves. It also seems to be a bit smarter on how it distributes data by allowing you to cross between storage devices to provide recovery at the node level while ZFS is really just limited to the current pool. It has various out of band data rebalancing which isn't really present in ZFS. For example, add a second vdev to a zpool when it's 90% full and there really isn't a process to automatically rebalance the data across the two pools as you add more data. The original data stays on that first vdev, and new data basically sits in the second vdev. It seems very interesting, but I certainly can't afford it, I'll stick with raidz2 for my puny little server until something open source comes out with a similar feature set.
  • Seemone - Tuesday, August 6, 2013 - link

    Are you aware that with ZFS you can specify the number of replicas each data block should have on a per-filesystem basis? ZFS is indeed not very flexible on pool layout and does not rebalance things (as of now), but there's nothing in the on-disk data structure that prevent this. This means it can be implemented and would be applicable on old pools in a non disruptive way. ZFS, also, is open source, its license is simply not compatible with GPLv2, hence ZFS-On-Linux separate distribution.
  • Brutalizer - Tuesday, August 6, 2013 - link

    If you want to rebalance ZFS, you just copy the data back and forth and rebalancing is done. Assume you have data on some ZFS disks in a ZFS raid, and then you add new empty discs, so all data will sit on the old disks. To spread the data evenly to all disks, you need to rebalance the data. Two ways:
    1) Move all data to another server. And then move back the data to your ZFS raid. Now all data are rebalanced. This requires another server, which is a pain. Instead, do like this:
    2) Create a new ZFS filesystem on your raid. This filesystem is spread out on all disks. Move the data to the new ZFS filesystem. Done.
  • Sertis - Thursday, August 8, 2013 - link

    I'm definitely looking forward to these improvements, if they eventually arrive. I'm aware of the multiple copy solution, but if you read the Intel and Amplistore whitepapers, you will see they have very good arguments that their model works better than creating additional copies by spreading out FEC blocks across nodes. I have used ZFS for years, and while you can work around the issues, it's very clear that it's no longer evolving at the same rate since Oracle took over Sun. Products like this keep things interesting.
  • Brutalizer - Tuesday, August 6, 2013 - link

    Theory is one thing, real life another. There are many bold claims and wonderful theoretical constructs from companies, but do they hold up to scrutiny? Researchers injected artificially constructed errors in different filesystems (NTFS, Ext3, etc), and only ZFS detected all errors. Researchers have verified that ZFS seems to combat data corruption. Are there any research on Amplistore's ability to combat datacorruption? Or do they only have bold claims? Until I see research from a third part, independent part, I will continue with the free open source ZFS. CERN is now switching to ZFS for tier-1 and tier-2 long time term storage, because vast amounts of data _will_ have data corruption, CERN says. Here are research papers on data corruption on NTFS, hardware raid, ZFS, NetApp, CERN, etc:
    http://en.wikipedia.org/wiki/ZFS#Data_integrity
    For instance, Tegile, Coraid, GreenByte, etc - are all storage vendors that offers PetaByte Enterprise servers using ZFS.
  • JohanAnandtech - Tuesday, August 6, 2013 - link

    Thanks, very helpful feedback. I will check the paper out
  • mikato - Thursday, August 8, 2013 - link

    And Isilon OneFS? Care to review one? :)
  • bitpushr - Friday, August 9, 2013 - link

    That's because ZFS has had a minimal impact on the professional storage market.

Log in

Don't have an account? Sign up now