Storing and sharing large volumes of data across geographically separated locations is a difficult problem several different communities face today. The primary challenge is to cut across organizations, private firms, hospitals, and universities in a transparent, distributed, and secure fashion, utilizing the collective bandwidth of the network efficiently, while also providing easy access. Starting from hospitals that deal with large amounts of sensitive patient data (security), to groups requiring billions of small files (metadata), to petabytes of high energy physics data (volume), different sets of storage and access mechanisms are required.
To address this problem, the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt is developing a flexible storage framework called Logistical Storage (L-Store). L-Store is conceptually designed as a complete virtual file system. Designing a file system distributed over the WAN necessitates a rethinking of the traditional file system components to incorporate the more varied quality of service (QoS) issues that arise in the distributed environment. Below are listed several of the most prominent issues. How L-Store addresses these issues is the focus of the rest of this section
Availability. Is the service available for use? If a portion of the network goes down it shouldn’t take the whole file system with it.
Data and metadata integrity. Guarantees need to be in place insuring the data sent from the client and what is stored are the same. End-to-end conditioning is required.
Performance in both metadata (transactions/s) and data transfer(MB/s). Each community has a different blend of theses performance measurements. In the High Energy Physics community a task typically works with a few large (100MB-1GB) files but in the Proteomics world it is vastly different. Their typical data set is comprised of tens of thousands of very small (less than 1K) files. One should be able to tune the L-Store system based on the communities needs.
Security of both metadata and raw data. In addition to normal role based authentication and authorization some of the metadata and data stored may need to be encrypted based on the community. LN supports transfer over an SSL encrypted socket and also AES encryption of the actual data before sending.
Fault tolerance of both metadata and data. One can use replication to provide redundancy for metadata losses but simple replication is inefficient and cost prohibitive for the data. L-Store breaks a file up into multiple blocks that are scattered out to the various storage devices on the WAN. As a result L-Store must be able to handle not just a simple drive failure but also an entire storage appliance. We go one step further and support multiple appliance failures.