Working with distributed systems is hard. Many programmers who otherwise do a very good job writing application code do not fully grasp the challlenges of distributed computing.One of these challenges stems from having to deal with errors that you’d not encounter when writing application code. For example, Apple’s weather dashboard widget provides a quick view at the 7-day forecast. The forecast data comes from some system(s) across the Internet. In other words, the widget is in fact a small distributed application. However, sometimes pulling the forcast data catches it with the pants down, and you’ll end up seeing 7 consecutive Not a Number: Continue reading →
Distributed Systems are Hard
July 9th, 2006 — Uncategorized
CIO Got It Right…
June 15th, 2006 — Uncategorized
My co-author Boris Lublinsky sent a link to an article titled The truth about SOA. It reminded me of several evaluation projects where I recommended that they don’t continue down the SOA path because they don’t need (or are not ready for) these. Glad to see some of my rationale and recommendations reinforced by others!
Sound Engineering and Premature Extrapolation
May 30th, 2006 — Uncategorized
In a recent post Patrick Logan has a few pointers and quotes about Premature Extrapolation. Henry Petroski’s Design Paradigms: Case Histories of Error and Judgment in Engineering provides a similar perspective, albeit from an angle that has nothing to do with computers.
One of the key messages that Petroski’s book is trying to get across revolves around what Patrick calls premature extrapolations. Paraphrasing Petroski, mediocre engineers scale up designs that worked in the past, hoping that they will also work at a larger scale. Good engineers don’t blindly scale; they start from the other end, analyzing potential failures and then designing to prevent them. One of the many examples he uses to illustrate this point is John A. Roebling’s Brooklyn Bridge (which belongs in the latter category) and Clark Eldridge’s Tacoma Narrows Bridge (which belongs in the former). The book offers many other examples.
I’ve been a fan of Alan Kay’s “Good ideas do not always scale” for many years. I’ve even used it in Chapter 18 of PLoPD4. After reading Petroski’s book I thought that any computer scientist going through it ought to be able to come up with that line.
Silicon Valley’s Secret Sauce
May 27th, 2006 — Uncategorized
In his latest essay Paul Graham dissects what makes Silicon Valley “the” Silicon Valley. As someone who lived and worked in two (self proclaimed) Silicon Valley-like technology parks (the Silicon Alps and the Silicon Prairie) I found Graham’s discussion of the key ingredients interesting. (BTW, Wired sheds additional light over the Silicon envy.) While Grenoble has great schools and location (with ski slopes a 45 minute bus ride from the campus), and Chambana great schools and cornfields (with tall, thick corn next to the movie theater’s parking lot), neither achieved the Silicon Valley critical mass while I lived there.I also have the benefit of having read an excellent book on the topic, and am waiting for a second one to be published:
- The Man Behind the Microchip: Robert Noyce and the Invention of Silicon Valley, a great book in at least 3 ways: the story of Robert Noyce; the history of semiconductors (from Ge transistors to Intel’s 4004); history of the Valley.
- Broken Genius : The Rise and Fall of William Shockley, Creator of the Electronic Age is not out yet, but I’ve added it to my wishlist after hearing an NPR interview with the author.
Pattern Languages of Program Design
May 13th, 2006 — work
After a long gestation the fifth volume of Pattern Languages of Program Design (PLoPD) has been published.
Myself, Markus and James selected among patterns workshopped at PLoP conferences from 1998 through 2004. We structured the book in six parts. Part I focuses on design and contains patterns aimed at people designing object systems. As the Internet and embedded systems continue to expand their reach they bring with them concurrency and resource management problems; Part II contains patterns on these topics. Part III continues the shift from one to many applications and contains patterns for distributed systems. The domain specific patterns from Part IV focus on mobile telephony and Web-based applications. Part V shifts gears to architecture and comprises patterns that tackle composition, extensibility, and reuse. Finally, Part VI offers a smorgasbord of meta-patterns for improving the quality of pattern papers and helping their authors.
Here’s what you can find in each chapter:
- The Dynamic Object Model pattern combines ideas from class-based (like Java) and prototype-based (like Self) languages to address dealing with elaborate flexibility requirements.
- The Domain Object Manager allows application code to handle transient and persistent domain objects while supporting multiple data stores or application servers, keeping the domain objects independent of the persistence or middleware APIs.
- Encapsulate Context provides value to developers who are seeking ways to lower the ripple effects of code changes regardless of programming language, allowing them to manage an increasing number of call parameters without introducing global variables.
- A Pattern Language for Efficient, Predictable, Scalable, and Flexible Dispatching Components addresses the challenges associated with developing dispatching components, providing one of the building blocks of a handbook for distributed real-time and embedded middleware.
- Triple-T focuses on writing software for real-time processing is hard and poses unique challenges, covering five patterns harvested from time-triggered bus architectures for safety-critical real-time systems, such as the ones employed by Airbus airplanes or BMW and DaimlerChrysler automobiles.
- Real Time and Resource Overload Language presents a pattern language for designing reactive systems that gracefully accommodate load bursts, applicable to any systems that process incoming requests, such as web servers, middleware, OLTP, and so on.
- Drawing upon examples from several systems that deal with distribution De-Centralized Locking covers a pattern for managing locks in the context of distributed systems.
- The Comparand Pattern focuses on dealing with identity when working with objects from different hosts or processes within a distributed system.
- Service Discovery tackles a common problem in practical ways, distilling solutions from well-established examples such as SLP, JXTA, UPnP, LDAP, DNS, and the now ubiquitous and IEEE 802.11.
- MoRaR is a pattern language focused on mobility and radio resource management.
- Content Conversion and Generation on the Web is a pattern language aimed at people building applications that deal with dynamic HTML generation.
- Plug-ins represent a popular technique for extending applications, allowing users to late-bind new functionality. The Patterns for Plug-Ins pattern language covers techniques mined from a wide body of software, including operating systems, web browsers, graphics programs, and development environments.
- The Grid Middleware Architectural Pattern covers the architectural elements of grid middleware as well as guidelines for implementing and deploying them.
- Targeting application developers integrating components written in different languages or built with hetereogenous component concepts, the Patterns of Component and Language Integration distill insight from systems that include Apache Axis and the Simplified Wrapper Interface Generator (SWIG).
- Patterns for Successful Framework Development
Frameworks covers a set of patterns for mitigating the mismatch between the recommended practice of building frameworks and the reality of many software development projects. - Distilling from 10 years of writing and reviewing patterns Advanced Pattern Writing provides advice for improving pattern writing.
- A Language Designer’s Pattern Language explains how does one generate a pattern language from a system of forces.
- The Language of Shepherding focuses on analyzing and providing feedback about patterns.
- Patterns of the Prairie Houses uses the design themes of Frank Lloyd Wright’s prairie houses as an exploratory vehicle for showing how people should approach pattern mining and writing.
Feature Extraction: From Web Searches to Genomics
January 11th, 2006 — Uncategorized
On Monday 1/9 I heard on NPR’s Motley Fool Show that Google has started working with J. Craig Venter on a Personal Genome project (more about Craig and Celera Genomics in The Gene Wars).
Where’s the connection, and what does Web searching have in common with Genomics? They both employ Feature Extraction.
Feature Extraction, a technique from the field of Information Retrieval, provides the bedrock of Web searching. Feature extraction maps a query from the original search space into a feature space. This mapping ensures that:
- The feature space is much smaller than the search space.
- The search operation in the feature space is implemented in an efficient manner (i.e., fast response times).
By “compressing” the space and simplifying the search Feature Extraction reduces the search time.
In the context of Web searching a search engine first indexes Web documents, mapping each document into a point in the k-dimensional keyword (or feature) space (k is the number of keywords). Typicallly this automatic indexing first removes common words like “and,” “at,” “the.” Then it reduces the remaining words to their normalized form; for example both “computer” and “computation” would be reduced to “comput.” Next a dictionary of synonyms helps to assign each normalized form to a “concept class”. Finally, for each document is representyed as a vector in keyword space.
Once indexing completes, the search engine is ready to answer queries. To do so the engine maps the query into the keyword space, and then uses a similarity measure to find the relevant documents. Good similarity measures take little time to evaluate.
Without Feature Extraction searching a collection of Web documents requires many string matching operations in the search space. In keyword space though documents correspond to multi-dimensional vectors, and using something like the cosine function (see my paper for details) is much faster than string matching.
Web Searching is just one of the many areas that employs Feature Extraction. Virtually any domain that deals with large volumes of data and where queries don’t return exact answers can use this technique. This includes time-series databases such as DNA data, which explains the similarity (pun intended) between Web searching and Genomics.
John Vlissides
December 4th, 2005 — Uncategorized
John Vlissides, one of the Gang of Four, passed away at the end of November. John had made significant contributions to the software patterns community as an author, series editor, speaker, and so on.
I first met John at OOPSLA 1998 in Denver. I already knew a few things about his involvement in the GoF book but nothing compares to meeting someone in person. During one of the OOPSLA receptions we talked for about one hour. The conversation started with my telling John about my research on lightweight workflow architectures at the University of Illinois. We coevered much more though, well beyond patterns. I remember John’s advice about the pursuit of an academic career, and his take on the childcare system in the US.
Following 1998 I ran into John at OOPSLA conferences, and sometimes at the annual post-OOPSLA Hillside event. In 2003 in Annaheim, CA John asked me whether I’d be interested in editing the PLoPD5 volume in his Patterns Series. I don’t remember his exact words, but he said something along the lines of his being impressed with my completing and defending a dissertation in time.
I last saw John at OOPSLA 2004 in Vancouver, BC. Although he was very busy running the conference we had a good chat.
Besides his family John will also be missed by his patterns friends.
Listening to Hillside fellows in Tampa, FL (2001)
![]()
Drumming at the Experience the Music Project in Seattle, WA (2002)
![]()
Google Era Interviewing
November 29th, 2005 — Uncategorized
At the end of the 1990s companies like monster.com reshaped the hiring process–at least the first half of it. They changed the way people found job opportunities. They also changed the way companies found the people they were looking for.
2005 marks 10 years since Amazon.com, Google, Yahoo!, and eBay appeared on the Internet. During these 10 years they expanded tremendously; so did their databases. The data they collected captures a great deal of the Internet’s history. It also captures a great deal of information about the people who may seek employment opportunities with them. Without your realising it, the people interviewing you may know details about yourself that you’d have a hard time remembering. These details could reveal a lot more about yourself than your cover letter and your carefully crafted resume.
While browsing a large web retailer’s web site I had an opportunity to review my order history. I was curious to see how it looked like and went ahead to explore it. I was reminded that:
- In 1997 I bought Concurrent Programming in Java: Design Principles and Patterns
- In 1998 I bought A Polite and Commercial People: England 1727-1783
- In 1999 I bought Tis: A Memoir
- In 2000 I bought Database Nation: The Death of Privacy in the 21st Century
- In 2001 I bought The Deadline: A Novel About Project Management
- In 2002 I bought the Nikon WC-E63 Wide-Angle Converter Lens
- In 2003 I bought Documenting Software Architectures: Views and Beyond
- In 2004 I bought Loosely Coupled: The Missing Pieces of Web Services, and
- In 2005 I bought A Mathematician’s Apology
I remember very well about some of these things: I see
Would a potential employer be interested in this information? Should I apply for a job with this retailer, will they fish it out of their database and put it next to my resume? What does the list tell them about me?
The above questions apply for all of the above web sites, as well as many others: Google has the history of most things I googled for; Yahoo! has the history of many news articles I read; eBay has the records for many of the items I bought from (or sold to) an auction. This information could provide valuable business intelligence when it comes to interviewing a person. The interviewers could infer quite a bit about myself, my professional and personal interests. They could even gage whether I would fit with their culture. In other words, interviewing no longer has to revolve around someone providing credentials and the interesting parties probing around them. Instead, they could supplement the old fashioned techniques with a bit of data mining and inference, leveraging the gigabytes of data they’re sitting on. Welcome to the Google era of interviewing!
Ultimate Pair Programming
November 22nd, 2005 — Uncategorized
Pair programming represents one of the practices of agile software development. Traditionally the two developers work on the same physical workstation. This requires that they’re at the same location.
During the last few months I paired with my colleague Grzegorz Wdowiak while we were about 1000 miles away. How was this possible? Technology to the rescue…
We shared the computer through RealVNC. Greg had the server running on his laptop. The viewer allowed me to view his screen, type through my keyboard, and use the mouse. However, in spite of the large nunber of bits pushed across (particularly when rendering bitmaps like XML Spy’s splash image) this is the “low bandwidth” part of the deal.
The side of pair programming that brings the most value is human communication–more precisely, verbal communication. Instant messaging won’t do. You have to be quick: “no, here, this is the method we should refactor.” Traditionally people far apart used the telephone to talk to each other. However, since pair programming sessions last a few hours the telephone doesn’t provide a cost effective solution–not unless someone else is paying for it. Therefore we used Skype: the quality is amazing (VoIP tends to sound better than many mobile phones), and you can’t beat the price
In addition, most of today’s (decent)laptops have built-in speakers and a microphone, which means that you don’t need anything else.
Oh and by the way, both of us were untethered thanks to WiFi. Telecommuting got a bit closer, and it doesn’t look like the telecoms will make fortunes off it. It would be interesting if IDEs like Eclipse and Visual Studio .NET would support remote pair-programming out of the box.
OOPSLA 2005
October 21st, 2005 — Uncategorized
This year marked my 6th consecutive OOPSLA. Here are some of the things that stood out:
- Michael Jones demonstrated Google Earth in the Croquet workshop. He also gave a quick tour of Giga pixel photography. I found the latter very interesting.
- Robert Haas talked about creativity: “There are the dilluted people who think that they are the most creative in the world; and there are the annoying people who are the most creative.”
- Jimmy Wales talked about the Wikipedia and its struggles to free the culture. With only 2 employees Wikipedia has a broader reach that NYT, LA Times, WSJ, MSNBC.com and Chicago Tribune, all combined.
- Grady Booch gave a talk on creating a handbook of software architecture. My favorite quote: “the best way I often learn is to go into some area I don’t know about”
“there are so many way of describing architecture yet there’s no lnaguage for it.”