Monthly Archives: July 2006

Feature Extraction Revisited

A visit to prompted me to revisit the topic of feature extraction. Tim Westergren’s Music Genome Project is probably one of the coolest ways of exploring feature extraction and relevance feedback:

  • The feature extraction part extracts the “phenotypes” from a piece of music you like and uses those features to find similar tunes.
  • The relevance feedback part uses your input (thumbs up/down) to refine the search.

So starting from “Jimmy Smith” and after a few course corrections the suggestions (e.g., Lou Donaldson’s Funky Mama) started to sound like what I was after.It’s great to see feature extraction and relevance feedback demonstrated in such an intuitive way. It’s also great to see that the Music Genome Project got it right. Others are still having problems employing these technologies right. For example, Amazon’s recommendations insist on recommending based on items that I bought but not for myself. I bet they’ll get more mileage (read sales) if their recommendation algorithms would discriminate between an item’s intended recipient and the person buying it. Are you listening?

Read this book: Linked

Albert-László Barabási’s book Linked: The New Science of Networks is probably one of the books that stand out from the ones I read in 2004.The author does a marvelous job of pointing out that many hubs we know of (including social networks such as St. Paul’s) follow a power law. While reading within about DoS attacks, six degrees of separation, Pareto’s law, Google, The Faloutsos brothers, and other intersting stories, keep in mind that this work comes from a group of statistical physicists rather than computer scientists (though the references to the Bose-Einstein condensation would give that out). Good to see that they’re at it again!

Distributed Systems are Hard

Working with distributed systems is hard. Many programmers who otherwise do a very good job writing application code do not fully grasp the challlenges of distributed computing.One of these challenges stems from having to deal with errors that you’d not encounter when writing application code. For example, Apple’s weather dashboard widget provides a quick view at the 7-day forecast. The forecast data comes from some system(s) across the Internet. In other words, the widget is in fact a small distributed application. However, sometimes pulling the forcast data catches it with the pants down, and you’ll end up seeing 7 consecutive Not a Number: Continue reading