|
Storage@desk Overview
Storage@desk (SD for simple) utilizes excess desktop disk capacity within an organization to create large virtual storage volumes that meet target QoS and security goals and provide access to a large number of clients using standard interfaces.
Storage@desk has three objectives:
- 1. As a software solution, Storage@desk harvests the underutilized storage resources in the existing infrastructure.
- 2. Storage@desk provides a block data service via the Internet Small Computer Systems Interface (iSCSI) protocol.
- 3. Each Storage@desk volume can be configured to meet various levels of QoS, allowing users to choose the level of service they need and trade-off between cost, efficiency, performance and quality. Storage@desk evaluates and enforces these QoS levels.
Using Storage@desk, large organizations (both commercial and academic) will harness vast quantities of underutilized storage resources at low marginal cost to themselves, increasing the value of the IT infrastructure that they already have in place. Four scenarios are illustrative of how we imagine Storage@desk being used in large organizations: to provision persistent user storage, as a dynamic storage pool, with HPC clusters, and as a mechanism for off-site storage. One can easily imagine other uses as well.
- Provisioning persistent user storage: Universities and other large enterprises often give out individual allocations to users that must be accessible from anywhere in the organization, and which are backed-up on a regular basis. Users expect that the data is highly persistent. Storage@desk can provide transparent access to user data from anywhere in the organization, and by properly configuring QoS requirements, can be highly available and persistent.
- Dynamic storage pool: We have observed that storage requirements vary by discipline, both in terms of amount (terabytes) and quality (temporal and reliability). High-energy physics projects for example often use huge volumes of data for short periods. When they need the storage, they need it! When they do not, it often sits relatively idle. Storage@desk can be used as a storage buffer (pool) to accommodate very large, temporary, storage demands.
-
HPC clusters: It is now common for organizations to have several clusters in house. For any given cluster, the total amount of spinning storage available (either on the nodes or on a file server) may be less than the sum of all of the projects that use the cluster. Using Storage@desk, administrators allocate storage volumes to each project where the actual data resides on desktops and other machines elsewhere within the organization. QoS hints allow for the data to migrate closer to the cluster (perhaps onto the cluster) before run-time, and migrate off the cluster when the jobs that use the data complete. Similarly, if there are multiple clusters and a meta-scheduler that places jobs on different clusters, Storage@desk can migrate the data close to the computation. Note that even if the data is not migrated, it can still be accessed in a transparent manner by the application (albeit possibly at a lower bandwidth). Further, in the case of read-only access, each node in the cluster can mount a volume individually, thus bypassing a central file system bottleneck.
- Off-site storage: The Storage@desk architecture does not require that the storage servers be local to an organization. Administrators can configure a Storage@desk system to use storage from other organizations, such as other institutions or service providers connected by a metropolitan area network or wide area network. The ability to move off-site volumes that are infrequently accessed or have weak QoS requirements allows Storage@desk to effectively deal with temporary surges in demand.
Currently we are in the final stage of the prototype development. We will release the prototype to the general public once we complete our development and testing.
Related projects:
|