Cloud Computing and LINQ

Due to scheduling conflicts I will miss the forthcoming workshop on Cloud Computing and Its Applications (CCA08), scheduled to kick off in a couple of days in Chicago. Erik Meijer will be there to present our LINQ-to-Datacenter paper–thanks Erik!. Here’s the abstract:

A plethora of Cloud/fabric frameworks/substrates have emerged within the industry: S3/EC2, Bigtable/Sawzall, Hadoop/PigLatin. Typically these substrates have low-level, idiosyncratic interfaces with data- and query- models heavily influenced by legacy SQL.

The many choices translate into high pain of adoption by developers because of the high risk of making the wrong bet. The SQL-like query model translates into high pain of adoption because it doesn’t appeal to developers who embraced object-oriented languages like C# or Java. The SQL-like data model is suboptimal for MapReduce computations because it is not fully compositional. This conservative approach is puzzling because recent language and tool innovations such as Language Integrated Query (LINQ) address precisely the problem of compositional programming with data in modern object-oriented languages.

The proponents of the current substrates have no incentive to come up with a general and developer-friendly abstraction that hides the idiosyncrasies of their proprietary solutions and graduates from the SQL model to a modern, object-oriented and compositional style.

We propose extending the LINQ programming model to massively-parallel, data-driven computations. LINQ provides a seamless transition path from computing on top of traditional stores like relational databases or XML to computing on the Cloud. It offers an object-oriented, compositional model that hides the idiosyncrasies of the underlying substrates. We anticipate that just as the community already built custom LINQ providers for sources such as Amazon, Flickr, or SharePoint, this model will trigger a similar convergence in the space of Cloud-based storage and computation substrates.

Building Distributed Applications with Recompilers

(Cross-posted from my work blog)

My article Volta: Developing Distributed Applications by Recompiling (co-authored with Brian Beckman and Benjamin Livshits) is now available in the Software Development Tools issue of IEEE Software (September/October 2008).

Here’s the abstract:

Mainstream languages and tools are tailored for sequential, non-distributed applications, with support for distributed computing provided only in library APIs. Such programming environments force developers to make decisions about “where-code-runs” early in the application lifecycle, structuring the entire application around partitioning decisions. Performance measurement may reveal that the original partitioning was wrong, but redistributing the application is expensive because redistributing is restructuring. We built a new kind of tool suite that recompiles executables into distributed form based on declarative user annotations, inserting most of the necessary remoting and synchronization boilerplate code, and facilitating post-hoc instrumentation to drive quantitative redistribution. Since the tools operate on the intermediate language CIL, they are compatible with a wide variety of .NET programming languages and eventual execution environments, even those that do not support .NET CIL directly, such as JavaScript.

Enjoy!

Web 2.0 and Generativity

Earlier today on the Microsoft Campus in Redmond I attended a talk by Jonathan Zittrain. This was a timely talk since about a month ago I finished reading his book The Future of the Internet (and How to Stop it)–mentioned first in my previous blog post Services Without Borders. Today’s talk, focused on Civic Technologies, was interesting and engaging–e.g., referring to email as the shared hallucination application, and bringing up Sturgeon’s law. He covered some of the topics discussed in the book, including generativity in the context of Web 2.0.

The book identifies Web 2.0 as one of the trends towards limiting generativity. I didn’t agree with this perspective while reading the book, and as it came up today what better way of revisiting it than talking to the author? As Jonathan gracefully signed a sticker that’s now glued to my copy of the book I asked for a few examples that illustrate how Web 2.0 hampers generativity. He offered a couple, explaining how forcing developers to use a unique key in their programs does just that.

Jonathan makes a good point: many of the sites that provide snippets to be mashed up elsewhere do require unique developer keys. However, as the echoes of the Web 2.0 workshop I organized recently haven’t faded completely in my head, his explanation involves an implicit assumption about the Web 2.0 platform. What is this platform?

  • If you assume that the platform resides on top of providers that require unique developer keys, the mechanism for controlling (and even disabling) generativity is indeed embedded within the Web 2.0 fabric.
  • If you assume that the Web 2.0 platform resides on top of DHTML + HTTP, the access through developer keys resides on a different, higher layer that could be bypassed thus does not hamper generativity.

Now I understand what frame of reference Jonathan used when he pointed his finger at Web 2.0, and thus the reason for my disagreement.

Viral Spread and Scalability

Several threads that we put on the table at last week’s workshop in Zuerich did not get sufficient traction to tackle with the workshop’s participants. However that doesn’t mean that they’re not worthy of pattern mining; on the contrary.

Consider for example scalability and viral spread. There are well known techniques and patterns for designing high-capability Internet-based systems.

However, in a Web 2.0 world, particularly when there is potential for network effects, a site may become popular very quickly. How do you design a system that may need to scale to 1 million users in less than 2 months? I offered iLike as an example, but that was as concrete as we got at the workshop.

 Euro2008Crowd

Back in Redmond my colleague Greg Linden pointed me to a Q&A with Ali Partovi of iLike.com. While the conversation doesn’t go into the technical details, the numbers speak for themselves:

In terms of daily signups, iLike on Facebook trounces anything else we do… iLike on Facebook has been signing up roughly 200,000 new members a day.

Viral spread made possible by social networking adds an interesting twist to scalability. I hope than an increasing number of designers will start talking about their solutions. If you’re interested in sharing data points or articulating them as Web 2.0 patterns, head over to our pattern Wiki. As the iPhone 3G launch demonstrated just a few days ago, miscalculations have the potential to inflict major frustrations with customers, as well as make the headlines.

Services Without Borders

It’s been a few years since my last vacation overseas. Since then I acquired several e-dependencies on services such as Pandora (see my older post on feature extraction) and Hulu, a service I learned about from my colleague Adam Sheppard (you may have read about Adam on Live Labs’ web site). I discovered that these services don’t work from outside the US:

image

image

This is surprising because in both instances the providers know my permanent location from the ZIP code provided when I set up the accounts.

Luckily Pandora and Hulu are not my only options. I am also a Sirius Satellite Radio subscriber, and unlike with the previous accounts that is a paid subscription. Their service did not complain about my accessing it from outside the US. While this finding hasn’t sunk in completely it resonates with what I’m reading in Jonathan Zittrain’s The Future of the Internet (and How to Stop it) about reducing generativity.

Web 2.0: The Next Generation

In his 2003 OOPSLA keynote The Internet Paradigm Shift Tim O’Reilly summarized some of the common traits of the successful applications of the Internet era. They included software built for use in delivering services, dynamic data and languages, architecture of participation, low barriers to experimentation, interoperability, and a few others. Here are a couple of snapshots from his keynote (October 30, 2003).

OOPSLA 2003 - 13 OOPSLA 2003 - 14

A few years later he expanded on and explained the traits as design patterns. He then used the design patterns extracted from web applications such as eBay, Craigslist, Wikipedia, del.icio.us and a few others to define Web 2.0. While today there’s no general consensus on what Web 2.0 really is, many new systems exhibiting Web 2.0 traits have emerged since Tim’s paper–Pownce, SlideShare, friendfeed, reddit, and so forth.

Now why am I telling you this? If you’ve built Web 2.0 applications then you too could leave your fingerprints on the next generation of Web 2.0 design patterns. We are aiming at extracting new patterns from this post-eBay/Craigslist/Wikipedia crop at the Web 2.0 Pattern Mining Workshop at the TOOLS Europe conference. During the 2-day workshop (June 30-July 1) Web 2.0 and pattern experts will crack-open several Web 2.0ish systems, identify the recurring problems and common solutions, and extract new patterns.

Workshop participation is open to anybody who could contribute. If you’re interested check out the Call for Participation and send your proposal by the May 5 deadline. Feel free to contact me with questions or clarifications.

Update:

With the workshop a few days away, here are some updates:

  • Details about the pre-workshop preparation work are available from the workshop’s web site.
  • As part of their pre-workshop preparations, participants are posting selected Web 2.0 sites and questions to the social networking site. (As workshop participation is open to anybody attending TOOLS, you may want to track these posts if you’re planning on joining us.)
  • The TOOLS Europe Social Networking Site provides opportunities to bootstrap networking at the conference, as well as offers a glimpse into one of the topics we’ll look into at the workshop.

Understanding slashdot

I’ve been a slashdot reader since the end of 1997, when I discovered it over the dial-up connection I had at the University of Illinois. While back then I visited /. almost daily, nowadays my visits are much less frequent. During this time the slashdot community expanded and changed (if nothing else we’re all 10 years older). Consequently I no longer have a good grip on how objective and well-researched the typical slashdot post is.

This changed last night, when the slashdot story Microsoft Developing News Sorting Based On Political Bias covered one of the projects I’m involved with (i.e., Blews). The coverage provided some interesting insight about /.

First, in spite of the “news for nerds” tag line, slashdot stories are not necessarily new. Over a week before the /. coverage Matt Hurst blogged about the mainstream media picking up Blews in their TechFest coverage; I also had a similar post. So if you’re looking for fresh nerdy news you’d be better off going elsewhere.

Second, the /. comments cover a wide spectrum: some are objective. Others are amusing. Others make me wonder whether a sequel to Mel Gibson’s 1997 Conspiracy Theory is in the works. Yet they are far from being evenly distributed–on the contrary. So if you’re after a reasonable S/N you’d also be better off seeking that elsewhere. (BTW if Blews resonates with you consider attending ICSWM 2008; several folks from the Blews team as well as myself will be there.)

So with old news and poor S/N what are those coming to /. after?

With Miguel de Icaza on Open Source, Mono, and Moonlight

A few weeks ago I attended Lang.NET Symposium. Charles Torre asked me to participate in a conversation with Miguel de Icaza, who was among the attendees. (While nowadays most people associate Miguel with Mono, our paths crossed–virtually–many years ago, when Tudor Hulubei and Andrei Pitis were working on GIT.) Charles Torre was our host, and we talked about open source, Mono, Moonlight, and various other bits. Our session is now available as a Channel 9 video. (Note: cross-posted from my work blog.)