Access Keys:
Skip to content (Access Key - 0)
Welcome to Muck and Brass, the Snowtide blog site    

Prelude

We're building a web service for which we aim to charge money. Further, the data being pushed around may be confidential or otherwise of a sensitive nature. We have good reasons to do everything we can to ensure that the service is secured "properly":

  • We don't want to have customers charged for work that is requested by a bad actor exploiting a security hole (of course, we'd issue a refund and an apology in such a case, but the impact to our business through unnecessary processing could be sizable).
  • We don't want our customers' data exposed; common vectors for this include sniffing, replay attacks, or simply the use of compromised credentials.

Of course, the impact on our relationship with our customers due to any security breach could be significant and devastating – to our business, our reputation, and potentially even to our customers' affairs completely outside of their use of our web service. So again, we have a lot of reasons to be highly-motivated when it comes to security.

By way of context, let's set the stage with regard to the moving pieces. The web service in question:

  • is built on a JVM stack (with the application itself built with Clojure, of course, using the Compojure framework)
  • has a user-facing, HTML browser interface as well as a "RESTful" API surface ("RESTful", as in, pretty darn close to ROA "style", so the set of URIs involved in delivering the user-facing interface vs. those delivering the REST API are nearly identical).
    • the user-facing interface offers standard form-based authentication, as well as OpenID authentication (which will be recommended only for more casual users and usage).
  • will always, always delivered over SSL. We assume that every bit of data transferred is confidential, so cleartext is an absolute no-no.

OK, let's go find an expert

It is with this mindset that I've been digging into how to approach web service security. Note that I'm no specialist or expert in this area – I'm merely a practitioner that is usually focused on things far, far away from anything security-related. (It may not surprise you that I'm coming to appreciate that fact more and more as I learn about the "state of the art" in web service security.)

Given this, I set out a few weeks ago to see where things stand on the web service security front. Of course, that realm is just as full of cliques and posturing and strawmen and ad hominem attacks as the broader software development world is, so finding a clear path forward is not easy. First, a bit of literature review, as it were, drawn in particular from a flurry of web service security chatter a few years ago (emphasis here and there is mine, I wish I had noticed and grokked the indicated bits earlier, I'll explain below):

  • I started by finding Gunnar Peterson's pair of posts where he compares "REST security" with WS-Security stuffs, where the former (especially approaches like HTTP Basic authentication over SSL) come out sounding like a pretty bad choice:

    people who say REST is simpler than SOAP with WS-Security conveniently ignore things like, oh message level security

    Now if you are at all serious about putting some security mechanisms in to your REST there are some good examples [such as Amazon's implementation of an HMAC authentication scheme].

    Some people in the REST community are able to see the need for message level security so this is heartening somewhat. If the data is distributed and the security model is point to point (at best), we have a problem.

  • Pete Lacey lays out the counterpoint, saying that SSL works just fine for tons and tons of use cases, thankyouverymuch.

    In summary, RESTful security, that is SSL and HTTP Basic/Digest, provides a stable and mature solution that addresses transport level credential passing, encryption, and integrity. It is ubiquitous, simple, and interoperable. It requires no out-of-band contract negotiation or a priori knowledge of how the resource (okay, service) is secured. It leverages your existing security infrastructure and expertise. And it addresses 99% of the use cases you are likely to encounter. SSL does not support message level security, and if that’s a requirement, then leveraging SOAP and WSS makes sense.

  • Unsurprisingly, Sam Ruby backs up Pete Lacey, but the comments on that post are interesting:
    • From Gunnar Peterson:

      I am no way suggesting there is only way to do this or that WS-Security came down on stone tablets. I am also not suggesting that a NSA level of security is appropriate for Google Maps. There are many shades of gray. “good enough” security is a big challenge, and it isnt about black and white security models, it is about risk management

    • From Bill de hÓra:

      I think this is where quantative analysis comes in and a measured assessement of the risk is taken. What has to be protected and what’s the worthwhile cost of doing so? Being software people, that’s beyond the general state of the art. We do gut feelings, flames and opinions.

  • There's a variety of "REST security 'best practices'" posts out there, but a question from StackOverflow links to a variety of additional discussions there that serve as good an indication as any that the accepted way of securing REST web services is Basic auth over SSL.
And now for a bit of hyperbole
Before moving on, I just want to point out that Bill de hÓra's comment above is sadly representative of so many corners of software development.  Let's ponder that for a moment, while realizing that modern society and its continuation absolutely depends upon the software we build (I'm talking collectively, here).

Take a deep breath

Of course, the above is not an exhaustive survey, just the best tidbits I found over the course of a lot of browsing and searching. Here's the upshot, as I see it:

  1. WS-Security et al. ostensibly provide message-level security that ensures that your service can be passed along by untrusted intermediaries.
  2. Standard HTTP authentication (generally Basic) over SSL transport is the de facto standard for securing REST services, but it does nothing for you if message security is important.
  3. More sophisticated authentication mechanisms are available – in particular HMAC, as exemplified by Amazon's web services – which allow services to ensure that a message's author has not been impersonated. This would resolve the potential holes of .

Unfortunately, I didn't grok the whole message vs. transport security issue as quickly as I should have, where SSL provides the latter but the former would only be satisfied by something like WS-Security (again, ostensibly, I certainly can't vouch for it) or HMAC-SHA1 if one were working in a REST environment. If I had come to grips with that point of tension earlier, I would have arrived at my two conclusions much faster:

  1. In our situation, message security is simply not relevant. As Peterson wrote (and I quoted above) "If the data is distributed and the security model is point to point (at best), [REST has] a problem." Well, in our case, data is not distributed, it is transmitted point-to-point (between our customers and us, a third-party external web service), so transport security provided by SSL should be sufficient.
  2. Here's the biggie: assuming we support form-based authentication (of course, over SSL) for browser-based UI interaction, supporting anything more sophisticated than HTTP Basic authentication over SSL for our REST API interactions would be a waste of resources. We could go full-tilt and require HMAC-SHA1 for the REST API or provide only a SOAP API that used WS-Security (and whatever else goes into that), but that would mean nothing if an attacker has the "REST API" provided for browser use available to him. Given this, transport security provided by SSL, and that alone, is simply all we can do.  Put another way: when browser-level security mechanisms improve, then so will our APIs'.

An alternative path would be to host a parallel service, available via a REST API secured via HMAC-SHA1 or a WS-Security-enabled SOAP API, that did not provide any kind of browser-capable entry point. Customers could opt into this if they thought the tradeoff was important. Doing this would be technically trivial (or, perhaps only moderately difficult w.r.t. the SOAP option ), but I've no idea whether the additional degree of security provided by such a parallel service would be of any interest to anyone.

By the way, if I'm totally blowing this, and my conclusions are completely broken, do speak up.

Coming soon: Part II of my investigation/thinking on the subject of web service security, related to OpenID and the management of credentials in general...which should give me all sorts of new opportunities to say foolish things!

Authored by Chas Emerick on Feb 19, 2010 02:07 PM
I find myself slipping back into web development in the new year. I've known this was coming for some time, so I've had a fair chance to carefully choose my weapons:

What has really tied this all together is Maven (and a couple of plugins for it), which has enabled me to fill in a couple of gaps in what is otherwise the most pleasant web development environment I've ever used (where Pylons was the prior champ, FWIW).

The biggest gap is in automatic application reloading/redeployment – in concrete terms, when I save a Clojure source file, my application should be reloaded nearly immediately, thereby avoiding any code-build-deploy cycle. To be precise, this capability is built into Jetty (as it is in many other Java-based app servers). The question is, how to most readily utilize it.

I came across this post by Jim Downing, which describes how to set up a Maven project for a Compojure application, enabling development-mode app reloading using the maven-jetty-plugin (the formatting on that post appears to have degraded since it was published; you can check out the project described in the post here). This certainly appears to fit the bill; unfortunately, the setup that Jim describes there doesn't quite work for me – when I save a source file, the application is automatically redeployed, but no changes are picked up.

Thankfully, the fix is easy. Below is the relevant section of my pom.xml, configuring maven-jetty-plugin to add my Clojure source root as an extra classpath element. This allows Clojure, running in the jetty application server, to find and load any Clojure source files that are newer than their AOT-compiled counterparts in the usual target/classes directory (note the webAppConfig/extraClasspath elements):

<plugin>
    <groupId>org.mortbay.jetty</groupId>
    <artifactId>maven-jetty-plugin</artifactId>
    <version>6.1.15</version>
    <configuration>
        <contextPath>/</contextPath>
        <webAppConfig>
            <extraClasspath>src/main/clojure</extraClasspath>
        </webAppConfig>
        <scanIntervalSeconds>5</scanIntervalSeconds>
        <connectors>
            <connector implementation="org.mortbay.jetty.nio.SelectChannelConnector">
                <port>8080</port>
                <maxIdleTime>60000</maxIdleTime>
            </connector>
        </connectors>
        <scanTargetPatterns>
            <scanTargetPattern>
                <directory>src/main/clojure</directory>
                <includes>
                    <include>**/*.clj</include>
                </includes>
            </scanTargetPattern>
        </scanTargetPatterns>
    </configuration>
</plugin>

With that, I'm just a mvn jetty:run away (or, really, a single click away in NetBeans) from having a development process identical to paster serve --reload, with the added benefit of Clojurey goodness.

♫The more you know...♬♪

(Apologies to those who aren't familiar with American pop culture.)

If you want to compile Clojure code (and really, if you're involved in a project of any size or importance, you should be, if only to avoid forcing Clojure to generate bytecode at runtime, which will slow down the sort of rapid development enabled by automatic app redeployment as describe above), do me a favor and use clojure-maven-plugin. (The post I reference above manually invokes the Clojure compiler using ant's exec task, but that was what you had to do back in July 2009.) It's a great piece of kit, and additionally serves as a perfect gateway drug to Maven – which, despite the controversy, and my own quibbles with various aspects of it, will eventually save your bacon in any larger project.

Authored by Chas Emerick on Jan 08, 2010 08:12 AM
Of course, I'm not so daft as to say that, but:

If you use an imperative programming language that provides for mutable state, that's what you are saying.

For some background, I read this article yesterday, which contains this choice passage (emphasis mine):

Imagine you've implemented a large program in a purely functional way. All the data is properly threaded in and out of functions, and there are no truly destructive updates to speak of. Now pick the two lowest-level and most isolated functions in the entire codebase. They're used all over the place, but are never called from the same modules. Now make these dependent on each other: function A behaves differently depending on the number of times function B has been called and vice-versa.

In C, this is easy! It can be done quickly and cleanly by adding some global variables. In purely functional code, this is somewhere between a major rearchitecting of the data flow and hopeless.

A comment on proggit very concisely summed up just how crazy the above passage is:

Considering that one of the majors reasons to use FP is so that you don't have such inter-dependencies, it's odd to point that out as an issue.

The whole problem with imperative programming is that state gets threaded everywhere, and you can't look at any function individually and know how it will behave. I won't even go into problems associated with concurrency, where state becomes incredibly difficult to reason about if you allow that sort of thing.

I really appreciated the notion of imperative programming "threading state everywhere". Let's drive the point home, though.

Hey, I'm just the messenger

Consider a method you might see in any Java application (I oh-so-love the jvm, so I get to pick on Java), but the same sort of thing applies in C, C++, C#, python, ruby, perl, et al.:

public void doSomething (String arg1, int arg2, FooBar arg3) throws IOException;

Simple enough, right? Hey, we're programming, life is good. But, what if you saw a signature like this:

public void doSomething (String arg1, int arg2, FooBar arg3, .....,
                         String arg316) throws IOException;

316 arguments to a method (which I don't think is actually possible in the jvm, but bear with me)? "That's absurd!", you'd say. The problem, of course, is that the 3-arg doSomething actually has far more arguments than its signature implies:

The behaviour of every function in a mutable, imperative environment is dependent upon the state of all of the other (variables|attributes|bindings|whatever) in your program at the time the function is invoked.

So, if you have 313 other variables in your program, that 3-arg doSomething is functionally (ha!) operating over 316 arguments.

Would you ever intentionally write a method signature that takes 316 arguments? Would you use any library that contained such a function signature? No? Then why are you using tools that force such craziness upon you?

Postscript

Of course, there is a place for mutable, imperative programming. The fellow who wrote the blog post to which I linked above appears to work on games, one of the few places where one could unapologetically use an imperative programming language with mutable state. Update: Looks like the state-of-the-art in game programming is heading towards FP languages more than I thought. Thanks to this comment, here's a LtU thread, with slides, about the guys who wrote Gears of War and the Unreal engine recommending FP as the future of game development.

However, we need to collectively get past encouraging other software developers – the vast majority of whom do not have the particular requirements of game, systems, or embedded development – to inflict the pain of imperative languages and mutable state upon themselves, especially given the concurrency challenges that lie ahead (never mind the general problems such environments present, as I argue above). The languages are ready, the runtimes are widespread...let's stop doing it wrong.

Authored by Chas Emerick on Dec 30, 2009 07:00 AM
Over the past month, I've been gradually porting all of our projects' builds from Ant to Maven. Everything's gone swimmingly, especially given the excellent clojure-maven-plugin, which allowed me to cleave off all of our comparatively complicated ant scripts for building and testing Clojure code. One part that did require some work was the porting of the builds associated with our NetBeans Platform-based applications – so, I thought I'd post a couple of hints to help others over the rough spots.
A plug for NetBeans
We've had a good deal of success in using the NetBeans Platform recently (often referred to as the NB RCP). It provides a metric ton of fairly high-quality plumbing for thick-client applications, and definitely saved our asses in a couple of key areas insofar as we've been able to reuse large pieces of the Platform, essentially unchanged, to meet critical new requirements. Of course, that's why we chose to use it in the first place.

Extemporaneous and Lengthy Background

To be clear, the rough spots in question aren't associated with the actual Mavenization of the NetBeans Platform-based projects – that's a relatively straightforward affair, with archetypes available in the NetBeans IDE to get one started, and very well-documented goals available, all provided by the NBM Maven Plugin. Given an existing ant-based build process, I found the actual porting of the build fairly straightforward.

The dicey part had to do with having a set of Platform artifacts available to build against. Under the ant-based build regime, it was common for those building on top of the NB RCP to keep a set of RCP artifacts available in every build environment. This was always a pain (for potentially-obvious reasons that I don't really want to get into now), and the general non-composability of the ant-based build process drove NB RCP users (and the Platform developers themselves) to extreme lengths of hacking to get stuff working properly. (BTW, just so everyone knows, I'm not picking on Fabrizio here – he's just the one who appears to have pushed the envelope more than anyone else vis á vis improving the composability of the ant-based RCP build process.)

One great thing about the NBM Maven Plugin is that it cuts this knot quite elegantly, making it possible to treat NetBeans Modules (NBMs) as first-class citizens within the maven world. So, if you have a maven repository that contains NBMs (like this one hosted by the NetBeans folks themselves), you can readily add NBM dependencies just like you would jar dependencies from maven central:

<dependency>
   <groupId>org.netbeans.api</groupId>
   <artifactId>org-openide-nodes</artifactId>
   <version>${netbeans.version}</version>
</dependency>

...and the NBM plugin will take care of using those NBM dependencies as appropriate:

  • injecting the NBMs' associated jars into the project's compile classpath
  • adding the NBMs as runtime dependencies of whatever NBM(s) your project/application produces
  • adding the NBMs to the (optional) "update site" associated with your NB RCP application (making remote updating of that application in the field trivial)

And, to complete the cycle, the nbm-maven-plugin provides a nbm packaging type, so that you can build NBMs independently, deploy them as you'd expect, and then compose them without any ceremony into however many NB RCP applications you'd like. No suite-chaining, no special platform or cluster artifacts in every build environment, nothing at all different from what one is used to in any other jvm/maven environment.

The Rough Spot

All of the above works flawlessly (at least it has for me in my ~month of usage). The key prerequisite though, is having access to a repository that contains the Platform NBMs that you'd like to use. The repository that I linked to above does not track NetBeans releases in lockstep (e.g. at the time of this posting, the http://bits.netbeans.org/maven2 repo has NBMs from NetBeans v6.5 and v6.7, but not v6.7.1, or the recently-released v6.8). The solution is to populate your own maven repository with those NBM artifacts.

Deploying NetBeans Platform artifacts to your own repository

This might have been a tedious process, were it not for another handy goal from the NBM Maven Plugin, populate-repository, which will push all of the artifacts produced by a NetBeans Platform build (the NBMs themselves, their sources, javadoc, and appropriate non-NetBeans dependency metadata) into your own maven repository.

There's a fair bit of configuration and setup that goes into this though. A HOWTO is provided by the nbm-maven-plugin project, but there are a number of things that it leaves unspoken. So, here's a dump of what I did to successfully populate a Nexus maven repo with a full set of NetBeans Platform artifacts:

  1. Pull the NetBeans Platform sources from the associated hg repo (I used the release68 repo, as we're targeting v6.8 of the NB RCP now). It appears that populating your repo with NB RCP artifacts from a binary download is possible, but then you'll not have the associated javadoc, source artifacts, etc.
  2. Build the entire project – I'm sure it's possible to restrict the build to certain clusters, but I don't see any reason to optimize this process since doing so only saves a little bit of disk.
    1. You must set your JAVA_HOME environment variable to point to a Sun JDK, especially in linux environments that often come with non-Sun JDKs (I'm looking at you, Ubuntu, with your cute gcj JDK). Not doing this will result in very strange compilation errors.
    2. You must set your ANT_OPTS environment variable to specify a higher-than-default maximum heap (export ANT_OPTS=-Xmx1024m worked for me).
    3. Within the top-level of your NetBeans Platform source checkout, run ant; ant nbms build-source-zips build-javadoc – this will build everything you care about in order to populate your maven repo.
  3. You want to have the NBMs in your repository to have appropriate dependency relationships established with third-party artifacts, right? Achieving this is easy if you have Nexus:
    1. unzip sonatype-work/nexus/storage/central/.index/nexus-maven-repository-index.zip somewhere (I used /tmp/nexus-index).
    2. set the nexusIndexDirectory property in the last step to that the path where you unzipped central's index; the nbm-maven-plugin will search that Lucene index to find dependencies referred to within the Platform's NBMs
  4. set MAVEN_OPTS to specify a higher-than-default maximum heap (export MAVEN_OPTS=-Xmx512m worked for me). I'm not sure why this would be required, but I got OutOfMemoryErrors with max heap set to anything less than 512MB. Perhaps searching the maven central repo index is what pushed allocation so high.
  5. Make sure you don't have a pom.xml in your current directory. Bad things will happen.
  6. Decide on a version number for the deployed artifacts, and use it as the value of the forcedVersion property. I used RELEASE68 to go along with the pattern established at http://bits.netbeans.org/maven2; 6.8 makes more sense to me, but if/when the NetBeans maven repo comes up to date with the NetBeans release schedule, sticking with their convention will allow us to use that authoritative repository with no changes to our projects.
  7. Assuming you're deploying to a release repository, make absolutely sure that you've (temporarily) enabled redeployment for that repository! nbm-maven-plugin deploys some NBMs multiple times (presumably while traversing various dependency graphs), and not enabling redeployment will result in errors (400 errors from Nexus, specifically – I can't say what might happen with different repository managers).
  8. Now for the big finish:
    mvn org.codehaus.mojo:nbm-maven-plugin:3.1:populate-repository -DforcedVersion=RELEASE68 -DnetbeansInstallDirectory=nbbuild/netbeans -DnetbeansSourcesDirectory=nbbuild/build/source-zips -DnexusIndexDirectory=/tmp/nexus-index -DnetbeansJavadocDirectory=nbbuild/build/javadoc -DnetbeansNbmDirectory=nbbuild/nbms -DdeployUrl=<nexus_repo_url> -DskipLocalInstall=true

Whew! Let that sucker run for a while, and you should be left with a maven repository fully populated with NetBeans Platform artifacts.

Authored by Chas Emerick on Dec 28, 2009 02:35 PM
It's strange how some days or weeks have running themes. One theme for me this week programming-wise has been string interpolation:
  • I mentioned it in the #clojure channel on freenode earlier this week (sounds like Rich Hickey isn't a fan of the concept in general, yet),
  • Miles and I talked about it some in connection with the Clojure templating system he's been working on (plug: after recording another episode of the Strictly Professional podcast),
  • and just this morning, I noticed a post by Vassil Dichev about how one might implement string interpolation in Scala

I've become weary of format of late, and all of the other formats out there aren't any more pleasant – variadic (and even keyword or named-argument) string replacement is just a dull tool compared to real interpolation.

The Scala implementation post was the last straw for me, especially because (with all due respect to the Vassil, as he's doing very well with the materials he has at his disposal) it showcases so many of the aspects of Scala that I came to dislike in the course of using it for a year or so: the tortured syntax; the rope, nay, the barbed wire that is implicit conversions; the bear trap of traits.

A Clojure Implementation

OK, enough flame-bait. What I'm really here to do is show how easy it is to add string interpolation to Clojure, and how simple its implementation is:

(ns commons.clojure.strint
 (:use [clojure.contrib.duck-streams :only (slurp*)]))

(defn- silent-read
  [s]
  (try
    (let [r (-> s java.io.StringReader. java.io.PushbackReader.)]
      [(read r) (slurp* r)])
    (catch Exception e))) ; this indicates an invalid form -- s is just string data

(defn- interpolate
  ([s atom?]
    (lazy-seq
      (if-let [[form rest] (silent-read (subs s (if atom? 2 1)))]
        (cons form (interpolate (if atom? (subs rest 1) rest)))
        (cons (subs s 0 2) (interpolate (subs s 2))))))
  ([#^String s]
    (let [start (max (.indexOf s "~{") (.indexOf s "~("))]
      (if (== start -1)
        [s]
        (lazy-seq (cons
                    (subs s 0 start)
                    (interpolate (subs s start) (= \{ (.charAt s (inc start))))))))))

(defmacro <<
  [string]
  `(str ~@(interpolate string)))

Don't mind the namespace – that's just where we put extensions to Clojure-the-language. The public macro << (named as an homage to heredocs) takes a single string argument, and emits a str invocation that concatenates the string data and evaluated expressions contained within that argument.

Example Usage

First, let's get a value we can refer to:

commons.clojure.strint=> (def n 99)

You can do simple value replacement:

commons.clojure.strint=> (<< "There's ~{n} bottles of beer on the wall...")
"There's 99 bottles of beer on the wall..."

And evaluate arbitrary code:

commons.clojure.strint=> (<< "There's ~(dec n) bottles of beer on the wall...")
"There's 98 bottles of beer on the wall..."
commons.clojure.strint=> (<< "There's ~(seq (range n 90 -1))
                              bottles of beer on the wall...")
"There's (99 98 97 96 95 94 93 92 91) bottles of beer on the wall..."

You can use any functions or macros you have available in your Clojure environment:

commons.clojure.strint=> (defn- some-function [] {:name "Chas" :zip-code 01060})
#'commons.clojure.strint/some-function
commons.clojure.strint=> (<< "My name is ~(:name (some-function)), it's nice to meet you.")
"My name is Chas, it's nice to meet you."

...including interop with Java methods:

commons.clojure.strint=> (<< "You have approximately ~(.intValue 5.5) minutes left.")
"You have approximately 5 minutes left."

Caveats

First, let's say what's wrong with this implementation compared to, say, Ruby's string interpolation (I may be missing other points, I'm no Ruby hacker):

  1. Strings cannot be used within interpolated expressions; e.g. this will cause a straightforward parse exception:
    commons.clojure.strint=> (<< "~(str n "another string")")
    #<CompilerException java.lang.IllegalArgumentException:
         Wrong number of args passed to: strint$-LT--LT-
    

    The Clojure reader sees this as providing three arguments to the << macro. Being able to use strings within interpolated expressions would require a "native" Clojure reader macro for interpolated strings, or the ability to define reader macros in "userspace" (Clojure's read table cannot be modified in Clojure code right now – this is an intentional design decision right now).

    Update: pmjordan mentioned on hackernews that you can get around this by escaping the nested strings, like so:

    commons.clojure.strint=> (<< "~(str n \" another string\")")
    "99 another string"
    

    Very true, and very useful in a pinch, but I would definitely consider it to be a wart (and an issue that is insurmountable from Clojure userland right now).

  2. Heredocs aren't available. That's a far more general shortcoming compared to other languages, but is still related to string interpolation. This is significantly mitigated by the fact that Clojure strings are multiline already, but it would be nice in some circumstances to be able to specify a block of text using different delimiters for one-off templating, etc.
  3. Lazy sequences need to be made strict in order for them to print as they do at a REPL (thus the additional seq invocation in the (range n 90 -1)) example above).

Advantages

I'm sure a lot of people will look at this implementation and say, "so what?". Well, it's got a lot going for it:

  1. Simple implementation. Unless you've got a Pavlovian aversion to parentheses (but are somehow immune to piles of braces?), it's very comprehensible.
  2. It's user-land code. Many languages would require a compiler extension or modifications to the language core to pull this off.
  3. The interpolation happens at compile-time! The only processing that occurs at runtime is the concatenation of the chunks of each string, but all of the string and expression parsing happen before your code using the << macro would hit a customer's server or desktop. This is decidedly in contrast with the Scala interpolation implementation, where all of the string parsing is done at runtime; to my knowledge, doing anything else would require a compiler plugin there.
  4. It's fully composible with all other Clojure code. There's no restriction on where you can use the << macro, and no restriction on what Clojure (or Java!) code you can include in interpolation expressions.
  5. There's no magic. Many languages make it very easy to inject magical – as in, opaque – behaviour into your code. The Scala interpolation implementation is no different – to get that special behaviour out of a String, one must call a magical method i in order to rope in the machinery around the InterpolatedString implicit conversion. On the other hand, all of the effects and actors involved in the << macro are local, and its semantics and calling conventions are exactly the same as any other Clojure macro.

Exhale...

So, hopefully that puts string interpolation behind me. I'd love to see something like this become a reader macro in Clojure someday (maybe in conjunction with heredoc support), but in the meantime, this will make a lot of one-off templating jobs a whole lot easier in Clojure compared to using the usual variadic string replacement methods that are otherwise available.

Authored by Chas Emerick on Dec 04, 2009 01:19 PM
Clojure's binding form is amazingly useful, but as with any very long length of rope, you can hang yourself in a cinch with it. So, let's review a couple of traps that I've personally fallen into while using binding of which you should be aware.

Binding is thread-local

This is super-simple, and it's the first thing that one learns upon encountering binding for the first time, but you can get bitten by sloppily thinking that an established binding will migrate to another thread, or by not understanding the concurrency semantics of a function you're calling within your binding form. Consider:

user=> (def *foo* 5)
#'user/*foo*
user=> (defn adder
         [param]
         (+ *foo* param))
#'user/adder
user=> (binding [*foo* 10]
         (doseq [v (pmap adder (repeat 3 5))]
           (println v)))
10
10
10
nil

So, we have a var *foo* holding a default value, and a function adder that just adds its argument to the current thread-local value of *foo*, returning the result. This is obviously just illustrative; you can assume that adder is a function call into an opaque library you're using that takes some arguments and perhaps pulls some configuration or other data from the values bound into some var it specifies as being part of its API.

The problem here is that adder is being invoked in threads other than the thread that is establishing the binding on *foo*; therefore, the value of *foo* within adder is always the default, 5.

The lesson? Bindings do not migrate across thread boundaries. One of the great things about Clojure is you can "do concurrency" using a variety of easy-to-use primitives (e.g. pmap is absolutely the cat's nuts, in that it's a dead-simple way to almost-transparently parallelize computation over a dataset). The ironic downside to that is that whereas thread boundaries are painfully obvious in other languages because of all the ceremony one needs to go through to get results, things like pmap have so little ceremony that it's easy to forget the basics.

One solution to the problem illustrated above would be to change the implementation of adder so that it's explicitly capturing the bound value of *foo*, and returning a new function that does the adding using that binding:

user=> (defn make-adder
         []
         (let [foo-value *foo*]
           #(+ foo-value %)))
#'user/make-adder
user=> (binding [*foo* 10]
         (doseq [v (pmap (make-adder) (repeat 3 5))]
           (println v)))
15
15
15
nil

Parenthetically, it's very much worth noting that all of the wonderful ref/transaction machinery in Clojure is implemented using thread-local bindings. That means that if you try to pmap a function across some set of refs in the course of a transaction (or otherwise attempt to poke at refs in a concurrent environment), things will go very wrong for you. There are ways around this, but they (last I checked) involve manually copying the thread-local bindings associated with any running transaction across thread boundaries – in general, it's not worth the hassle.

Lazy seqs often escape the scope of binding forms, so capture the value of any bound vars you care about explicitly

As wonderful as lazy sequences are, how and when they dereference bound vars isn't always obvious, and is entirely dependent upon how and when those lazy sequences are used/materialized. Consider, assuming *foo* is bound to 5 by default as in our first example:

user=> (defn some-fn
         []
         (lazy-seq [*foo*]))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(5)

What's going on here? The lazy-seq macro returns a lazy sequence, which will evaluate the sequence-producing form provided to it on demand – in this case, after the binding form has returned, therefore ensuring that *foo* has reverted to its default value.

This may become clearer with this example:

user=> (binding [*foo* 10]
         (doall (some-fn)))
(10)

doall forces the full evaluation of a lazy sequence – and in this case, because that evaluation is being performed within the binding form, *foo* and the returned sequence is found to have the value we expect.

These are obviously simplistic examples; the real-world scenario that this applies to is where you might be writing a library, and part of that library's public API are some number of bindable vars that callers can use to configure the behaviour of the library's functions, etc. This is super-useful, especially for libraries where there are a ton of knobs and levers: rather than forcing callers to provide a configuration object on every function call (and therefore forcing you to thread that configuration through all helper functions, etc), using bindings for such things allows callers to only change the defaults they care about, and allows you to code the implementation of the library in a straightforward way.

The lesson? If you are going to use bound values of vars, you need to make sure you capture those bindings before returning any lazy seqs that use those bound values. Aside from using doall as shown above (which defeats the point of using lazy seqs), the solution looks a lot like the make-adder function from the first section (notice a trend?):

user=> (defn some-fn
         []
         (let [foo-val *foo*]
           (lazy-seq [foo-val])))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(10)

Notice that some-fn is now explicitly capturing the bound value of the *foo* var; this ensures that, regardless of when and where or on which thread the lazy seq is materialized, the values it contains are what were bound by the caller of some-fn. This is almost always what you want to have happen.

Too many do not fully realize the degree of flexibility that vars and binding provide to the capable programmer. As is often the case though, power comes with responsibility, and whether one is writing libraries, using them, or casually using binding in localized ways in application code, it needs to be handled with care.

Authored by Chas Emerick on Nov 03, 2009 12:48 PM
Talk to anyone outside of the software world, and you'll quickly realize that one of the most gut-wrenching, anxiety-inducing acts is buying software. Even if one has evaluated the product in question top to bottom, past experience of bugs, botched updates, missing features, and outright failures and crashes has tempered any enthusiasm or confidence that might be felt when the time comes to pull out the credit card or write the purchase order.

Of course, the blame for this lies squarely with the software industry itself – the failures in software quality are well known, both discrete instances as well as in aggregate. Those of us whose business and livelihood are tied to the sale of software (whether sent out the door or delivered as a service) must do whatever we can to reverse this zeitgeist.

Given that, we've decided to adopt a very simple, no-nonsense "Satisfaction Guaranteed" policy for PDFTextStream. Hopefully this will help take the anxiety out of someone's day, somewhere.

This isn't a new idea, of course. Lots of software companies have had guarantees of some sort or another for ages, but I think my first encounter with the concept as a business owner was Joel Spolsky's post from a couple of years ago:

I think that our customers are nice because they’re not worried. They’re not worried because we have a ridiculously liberal return policy: “We don’t want your money if you’re not amazingly happy.”

Joel raised the issue again on a recent StackOverflow podcast, which prompted me to think about our own approach...

What do we do about unhappy customers?

To be honest, our customers are pretty happy. Of course, we occasionally receive a bug report, but we generally knock out patches within a couple of days, and sometimes faster. In the 5 years we've been selling PDFTextStream, we've never had a single request for a refund. Part of that is offering up a very liberal evaluation version, but I'd like to think it's because what we sell does the job it's meant to do very well.

Given that, I've never thought to make a big stink about a refund policy – it just never came up. But hearing Joel and Jeff talk about the ire that they felt towards various companies that refused to issue refunds when they weren't happy with something motivated me to make our de facto policy explicit. Thus, the new "Satisfaction Guaranteed" statement.

Part II: the Open Source Influence

An elephant in the room is the influence of open source software on customers' attitudes towards buying software, and the assessment of risk that goes along with it. As more and more users of technology (just to spread the net as widely as possible) are exposed and become accustomed to the value associated with open source software (which, in simple terms, is generally high because of its zero or near-zero price), it increases pressure on commercial vendors (like us) to up our game along the same vector.

But, the impact of open source software on pricing is a pretty stale story. The real impact is derivative, in that a zero or near-zero price means that the apparent risk associated with using open source software is zero or near-zero. The promise of proprietary, commercial software is that, if it does what the vendor claims (whatever that is), then that software will deliver benefits far in excess of its cost and far in excess of the aggregate benefit provided by the open source alternatives, even given the price differential.

The problem is that a lot of people only turn towards commercial options as a last resort because of the aforementioned historical failures of the software industry vis á vis quality: the apparent risk of commercial options is higher than that associated with open source options, simply because the latter's super-low price is a psychological antidote to any anxiety about quality issues. So, there's flight towards low-priced options, rather than a thorough search for optimal solutions. Injecting an explicit guarantee of performance and reliability (like our new "Satisfaction Guarantee") might be enough to tip the relative apparent risk in favor of the commercial option – or, at the very least, minimize the imbalance so that it's more likely that price won't dominate other factors (which are potentially more relevant to overall benefits).

Of course, this can only work if one's product is actually better than the open source alternatives, and by a good stretch to boot so as to compensate for the price differential. In any case, it's a win-win for the formerly-anxious software user and buyer: they should feel like they have more choice overall, and therefore have a better chance of discovering and adopting the best solution for any given problem, regardless of software licenses and distribution models.

Authored by Chas Emerick on Oct 23, 2009 12:09 PM
Anyone who is accountable for any sufficiently-complex objective is constantly having their focus being pulled away from that larger goal by a thousand different fiddly tasks. Christened as yak shaving some time ago by a fellow at the MIT media lab, the concept has become a favorite shorthand in various programming and software development circles. I only heard of it this year, but it's helped to coalesce my thinking about focused work and the relationship between activity and progress.

In particular, I think it's helpful to occasionally check one's activity using what I'd call "root objective analysis".

Many people in technical fields are familiar with root cause analysis, where a problem or failure is analyzed in such a way as to determine its root cause. There are lots of flavors of root cause analysis, with Five Whys being popular among programmers due to the Joel Effect and probably some loose association between Five Whys and the lean development/startup methodologies that are all the rage these days.

In contrast, root objective analysis runs in the "opposite direction", so to speak: for any given activity, you trace the likely causal link between that activity you're engaged in, and the progress you want to make. In short: "Is what you're doing right now getting you closer to your end goal?" 1 If you do this right, or at all, you'll go down fewer dead-ends, waste less time, and prioritize the yaks you do shave so that you get to your desired end state sooner rather than later.

There's obviously a lot of fuzziness in any kind of speculative analysis like this; if there weren't, then project management would always bring jobs in on time and within budget. However, if your work often leads you far afield of your "main line" of focus, then asking yourself the question above from time to time may help you to ensure that every yak shaving you engage in is necessary, as opposed to a distraction caused by confusing activity for progress.

And Now for Something Completely Different

A yak shaving that is near and dear to my heart is the fable of the software developer and the PDF documents (not surprising, since we talk to a lot of developers who have worked with lots of PDF documents). There are many variations, but the most extreme goes something like this:

  1. Joe the developer needs to get some chunk of data into his company's database (maybe it's financial data, maybe he's working with excerpts of academic journal articles – such details are mostly irrelevant)
  2. The data is only available in PDF documents, and there's a lot of them. Thousands, perhaps millions of chunks of data in as many different PDF documents.
  3. Joe's first thought is that he needs to build a function to extract text from these PDFs so that he can get at the data he needs.  But, after...
    • reading the 1,000+ page PDF specification,
    • adding support for the 8 different versions of the spec,
    • adding support for a half-dozen encryption protocols, and
    • adding support for extracting Chinese (or Japanese, or Korean, or Icelandic with its lovely ð ("eth") character) along with the embedded fonts that go along with it
  4. ...Joe now has spent nearly a year building a one-off PDF text extraction library that (again, depending on the version of the fable) fails on 24% of the documents his company needs to access, and still doesn't run fast enough to finish in the batch window he has to work with.

Seriously, scouts-honor, I've heard this story at least 5 times...and each time right before or right after the developer/company in question purchased PDFTextStream to replace their homebrew PDF library. That, my friends, is activity without progress, yak shaving at its most epic.

Authored by Chas Emerick on Oct 08, 2009 11:11 AM
A favorite hobby-horse among various programming-related communities is to talk about why "Java is dead", and further, that programmers working in the Java ecosystem should really look for greener pastures elsewhere.  You see these sorts of posts pop up on proggit, for example, often enough for it to get old.  That's a lot of hot air, with plenty blowing in the other direction from various folks that have been pushing hard for significant improvements and changes to Java. Both sides are wrong, though, because as a result of its success and a series of historical accidents:
Java-the-language is dead.
Get over it, and realize that because of that fact, you'll probably come to depend upon Java more than you ever thought possible.

The JVM is probably one of the most vibrant platforms for developing new programming languages there is, in part because of the status of Java-the-language.

First, let's settle the premise. In comments on one of his recent blog posts, Joe Darcy, one of the fellows the heads up Sun's management of the JVM and JDK (I'm not sure of his exact title and portfolio), said a couple of key things about the never-ending saga regarding closures in Java:

There are millions upon millions of Java developers who would have to learn about closures if they were added in the platform.

...there is far from unanimity in the Java community on the underlying choice of whether or not closures would be an appropriate language change for Java at this time.

OK, there it is, closures are never going to be added to the Java language.  Done, and done.  And if closures aren't going in, then you can surely bet that other things aren't going to make it, either.  To further make the point, Joe commented on an earlier blog post of his here 2 , saying in reference to a question about why the Java standard libraries don't slough off deprecated APIs:

To date, we have valued continued binary compatibility with code calling the deprecated elements more than cleaning up the API.

This sort of stuff pisses a lot of people off, and leads others to propose mildly absurd things IMO, like forking the Java language into "stable" and "experimental" versions. This a lot of wasted effort.


It seems that Sun decided long ago, through pressure from its customers and developers, that compatibility is more important than innovating at the language level. With that, managing Java and the JDK became more an exercise in stewardship than anything else. The quotes above from an authoritative source are proof-positive that this is the case.

That may make the Java language dead with regard to features, but it's hardly useless – it's simply transitioned to be the stable "systems language" for the JVM that a large swath of programmers (who Sun likely correctly identifies as being uninterested in things like closures, syntactic improvements, etc. etc.) happen to use for applications as well.

Trading off "progress" for stability bestows upon Java at least two characteristics that are shared by other systems languages:

  • screaming into the void about how improvements and changes should be made yesterday is generally pointless and irrelevant
  • knowing that the language is essentially fixed for years to come means that it fades into the background as a very useful artifact for those that want to build on top of a system with well-known characteristics

A side effect of this is that the JVM is a very fertile spot for new(er) languages, where language implementers don't have to worry about their building blocks being taken away or changed radically from year to year 3 . At the same time, the JVM itself has been getting tweaked and tuned heavily under the covers to support non-Java languages, not the least of which is Sun's JavaFX, their entry into the post-Java JVM language fray 4 . So, you want your fork of Java that pushes boundaries? They are many and plentiful, so go choose one, already.

The upshot of all this is that it's more likely than not that over the course of the coming years, your life (and quite likely your professional life as well, if you're involved in software) will come to rely upon Java, the JVM behind it, and many different other language stacks built on one or both of those technologies.

Of course, interop between these languages is a concern: only APIs matching Java's binary signatures are accessible by all languages, there's no standard interface for closures, there's no standard (sane) numeric tower, etc. etc. These things are frustrating if one happens to be working in a polyglot environment, but I've no doubt that necessity will draw the larger players in the JVM language space together to establish certain baselines to ensure interoperability.

In the end, we might have all been better off if the current state of affairs had arrived years ago. A steady drip, drip, drip of Java language improvements serves only to keep developers tied around what is functionally a frozen language, and away from superior alternatives (on the same JVM platform!) if they're so inclined to look up from their work. Since the state of play vis á vis Java-the-language is clear, maybe those that care so deeply about programming language productivity, innovation, and progress can set about enjoying the advantages of the future that Java has ensured for us all.

Authored by Chas Emerick on Oct 01, 2009 10:22 AM

Founder, Snowtide Informatics

About Me

I'm the founder of Snowtide Informatics; we make PDFTextStream, a PDF text extraction library for Java and .NET that a lot of people like and use. I do a lot of programming in Clojure and just a little in Java, trying to make it easier for people to access data from unstructured content.

You should follow me on twitter here

    Topics

    Archives

    1. 2010
      1. February
      2. January
    2. 2009
      1. December
      2. November
      3. October
      4. September
      5. April
      6. March
      7. February
      8. January
    3. 2008
      1. November
      2. July
      3. May
      4. March
    4. 2007
      1. November
      2. October
      3. April
      4. March
      5. February
    5. 2006
      1. December
      2. October
      3. September
      4. August
      5. January
    6. 2005
      1. September
      2. August
      3. July
      4. June
      5. January
    7. 2004
      1. December
      2. September
    Footnotes
    Reference Notes
    1 Careful and clueful readers will recognize this as little more than a distilled version of OODA, the granddaddy of all decision-making formalisms.
    2 I don't mean to pick on Joe, BTW. He just happens to have been relatively visible of late, in conjunction with his appearance on the Java Posse podcast, as well as in various chatter around the recent JVM Language Summmit.
    3 The reality is that if you're a language implementer (or an aspiring one), you have two platforms to choose from, the JVM or the CLR, and it's worth noting that the former appears to be outpacing the latter in terms of attracting innovation in language design. There's a lot one can attribute that to, but having an essentially fixed baseline language (e.g. not what C# is at all) might be a minor contributing factor.
    4 Worth noting is the fact that JavaFX has oodles of features that people have been banging on about for Java to get for years and years. This is further verification that Sun's reticence to produce feature-rich languages has nothing to do with their technical capabilities or general motivations, but with decisions made about Java's status driven by business considerations.
    Adaptavist Theme Builder (3.3.5-conf210) Powered by Atlassian Confluence 3.0.2, the Enterprise Wiki.