Due to scheduling conflicts I will miss the forthcoming workshop on Cloud Computing and Its Applications (CCA08), scheduled to kick off in a couple of days in Chicago. Erik Meijer will be there to present our LINQ-to-Datacenter paper–thanks Erik!. Here’s the abstract:
A plethora of Cloud/fabric frameworks/substrates have emerged within the industry: S3/EC2, Bigtable/Sawzall, Hadoop/PigLatin. Typically these substrates have low-level, idiosyncratic interfaces with data- and query- models heavily influenced by legacy SQL.
The many choices translate into high pain of adoption by developers because of the high risk of making the wrong bet. The SQL-like query model translates into high pain of adoption because it doesn’t appeal to developers who embraced object-oriented languages like C# or Java. The SQL-like data model is suboptimal for MapReduce computations because it is not fully compositional. This conservative approach is puzzling because recent language and tool innovations such as Language Integrated Query (LINQ) address precisely the problem of compositional programming with data in modern object-oriented languages.
The proponents of the current substrates have no incentive to come up with a general and developer-friendly abstraction that hides the idiosyncrasies of their proprietary solutions and graduates from the SQL model to a modern, object-oriented and compositional style.
We propose extending the LINQ programming model to massively-parallel, data-driven computations. LINQ provides a seamless transition path from computing on top of traditional stores like relational databases or XML to computing on the Cloud. It offers an object-oriented, compositional model that hides the idiosyncrasies of the underlying substrates. We anticipate that just as the community already built custom LINQ providers for sources such as Amazon, Flickr, or SharePoint, this model will trigger a similar convergence in the space of Cloud-based storage and computation substrates.