If the open source model has a sweet spot, it’s in programming tools. Linus Torvalds’s fabled “world domination” on the desktops of clerks or CEOs may never arrive, but it’s already here on the computers of programmers everywhere. Even in the deepest corners of proprietary stacks, open source tools can be found, often dominating.
The reason is clear: Open source licenses are designed to allow users to revise, fix, and extend their code. The barber or cop may not be familiar enough with code to contribute, but programmers sure know how to fiddle with their tools.
The result is a fertile ecology of ideas and source code, fed by the enthusiasm of application developers who know how to “scratch an itch.” Programmers are a knowledgable and opinionated bunch; open source lets them share their knowledge and implement what they want.
Here is a very unscientific survey of worthwhile open source tools that have caught our eye. Some are entirely new projects; others are old favorites that continue to generate new ways to surprise us as they morph to support the latest programming trends.
This is the beauty of open source. Tweak and recompile, and your old programming tool can be new again.
Rhomobile Rhodes is an open source platform for bundling up Ruby websites and stuffing them into an iPhone app. You can even use jQuery Mobile to handle the layout if you wish. It’s like building a Web app, but you have to remember that the user has big fat fingers instead of a much more precise mouse pointer.
What Git does is it makes practically every copy its own central repository and offers sophisticated tools for merging the resulting proliferation of repositories. With SVN or CVS, users check out just a copy, a subordinate version of the code that must eventually rejoin the center. Git users, on the other hand, create stand-alone repositories with all the rights and privileges of the center. With Git, you can create four or five repositories on your development box and eventually merge them all. To use an analogy, Git is like democracy, while CVS represents the old feudal world.
Of course, not everyone welcomes the flexibility Git provides. Some see this freedom enabling confusion. Proponents counter that you’re not required to use all of Git’s power, but it’s there to help out when the project requires more than a central government. Some developers have create Repo to combat the complexity of Git. A tool for pushing changes through multiple repositories, Repo is, in a way, the re-emergence of central control for the Git ecosystem.
Meant to work closely with Git and Repo, Gerrit allows code validators to send comments to the central Git repository, creating an extensive meta layer of discussion on top of the code itself. In the old days, discussions took place in header comments, but by separating comments to a dedicated layer, Gerrit allows for a more sophisticated discussion that doesn’t force future readers to wade through old change discussions before getting to the code.
Hadoop is a general tool kit for splitting apart the work into pieces that can be computed on separate servers, then joined together into a final product. Google pioneered the idea when it needed to choreograph a vast army of servers to crawl the Web, and now Hadoop offers a general framework that’s being used again and again in similar situtations.
Hadoop’s original simple core may be several years old now, but there’s a great deal of interest in spinoffs that bundle Hadoop with code for tackling specific problems. Mahout, for one, is a scalable machine-learning framework that analyzes large data sets for patterns that might emerge. Hive offers a data warehouse that can be queried with parallel search using HiveQL. This method is fast becoming a popular approach for dealing with massive quantities of Web logs.
These plug-ins are usually pretty easy to string together and glue into a coherent display. There are even some bigger collections of plug-ins that harmonize the widgets. jQuery Mobile, for instance, is dedicated to producing applications that run well on the small screens of smartphones.
While it’s probably not fair to call emacs “new” or “rising,” the platform isn’t dropping off anyone’s radar. Git ranks “emacs lisp” as the 13th most popular language based on projects and interest. By comparison, C# is 12th. Most of the code is built by programmers and for programmers only. One project, Rinari, for instance, turns emacs into a Ruby IDE. Another, MozRepl, allows Mozilla users to monkey around with the guts of Firefox using emacs.
Almost as important as the plug-ins are the sophisticated ecologies that support them, many of which are open source. The Eclipse Marketplace is one such site devoted to helping users discover the tools they need. The site includes a social networking layer, showing who likes a particular plug-in and which plug-ins offer similar or competing solutions, thereby opening your search beyond simple lists of the most popular or the most downloaded.
The Firebug ecology is so fertile that it has spawned a subcategory of plug-ins that extend Firebug itself, often in surprising ways. FirePython, for example, doesn’t actually live on the browser; it gets inserted into the server, where it delivers debugging information to the browser.
Thanks in some measure to Firebug’s popularity among developers, all the major browsers now offer detailed information about the images, scraps of code, and whatnot that make up the page on view — an approach that will only become more common as more software is written to take advantage of increasingly robust browsers.
CoffeeScript seems like a precompiler for JavaScript, but it’s really a full compiler built like all compilers. The creator said, “Underneath all of those embarrassing braces and semicolons, JavaScript has always had a gorgeous object model at its heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way.” In essence, it makes it more like writing Python because the space bar does all of the work that the curly brackets and a few of the other punctuation marks used to.
The wealth of rising open source projects in this area indicates that programmers still haven’t found the optimal mix of features. Cruise Control is the original open source build tool that is well-integrated with most repositories and bug databases. Apache’s Continuum is highly integrated with Maven, and users of Continuum like to say that all you need to do is “point the pom.xml file at the repository.” Another popular project once known only as Hudson is more open to using building scripts written for Ant or a few others. In late 2010, the team broke in two and the group dominated by Oracle’s paid developers kept the name “Hudson,” while the others are creating a new open source build management tool called Jenkins.
Many users stress that constantly building the software and often deploying it almost immediately afterward increases the harmony of the team and prevents programmers from drifting down different paths that require too much time to harmonize. By continuously rebuilding the software and applying unit tests, the team is more likely to converge.
The OpenVidia repository is filled with projects that perform image recognition, searching, and more. It makes a perfect excuse for every programmer to ask their boss for an expensive graphics card with the potential to generate a very high frame rate — er, I mean a very high rate of curing cancer in simulations.
The latest tools make it easier to deploy NoSQL into clouds, many of which are now sold directly to the IT department. Amazon’s SimpleDB can be paid for by the byte, and many other teams are offering additional NoSQL tools as services. Cassandra, for example, is supported by DataStax. MongoDB has inspired more than a handful of cloud hosts. The tools continue to proliferate, boasting almost too many to list. Thank goodness someone is maintaining a list of all the NoSQL databases.
Drupal websites, for instance, often blend traditional modules with additional code inserted to make decisions about data selection and formatting. Although much of this occurs on the back end, Drupal can be configured to allow users to include PHP code in particular data fields. As a result, programmers aren’t pushing compile and run any longer; instead, they’re updating a bit of running code on the fly. They’re usually smart enough to do this on a test version, but sometimes they even update hot, running code because it’s not that hard. What could possibly go wrong?
This is the ultimate end of open source where anything can be altered on the fly.