Kristian Lyngstøl's Blog

Fun with Gawk

Posted on 2012-09-01

A few years ago I saw some AWK code a colleague had written. Up until that point I'd only really used awk for foo | awk {print $2} type stuff. I decided to take a closer look at AWK, and liked what I found.

Today I frequently use AWK for rapid prototyping or just massaging some input data beyond what's suitable with sed and cut. There are several reasons I use AWK for this. Mainly because it's quite efficient in a prototyping phase, but also because I find it a very fun and natural language to work with.

With GNU AWK (or just GAWK or gawk), you can even get fairly straight forward networking. It's limited of course, but it works well within those limits.

I've already written a munin node in gawk (see github), but today I got a challenge from a friend (well, more like a ruse?):

<Napta> have you not tried to write a modest caching server in gawk yet ? :D
<Kristian> that's fairly easy?
<Napta> so do it!
<Kristian> ......
<Kristian> I hate you
<Kristian> because now I have to

And 26 minutes later it was working quite well.

#!/usr/bin/gawk -f

function say(content) {
        printf "%s", content |& Service
}

function synthetic(status, response, msg) {
        say("HTTP/1.1 " status " " response "\n");
        say("Connection: close\n");
        say("\n");
        say(msg);
}

function reply(url) {
        say("HTTP/1.1 200 OK\n");
        say("Connection: close");
        say(cache[url] "\n");
}

function get(url) {
        print "GET " url " HTTP/1.1\n" |& Backend
        print "Connection: close\n\n" |& Backend
        
        Backend |& getline
        if ($2 != "200") {
                synthetic($2, "Bad backend", "Bad backend? Got: " $0)
        } else {
                cache[url] = ""
                while ((Backend |& getline c)>0)
                        cache[url] = cache[url] "\n" c
                reply(url)
        }
}

function handle_request() {
        Service |& getline
        url=$2
        request=$1
        if (request != "GET") {
                synthetic(413,"Only support GET","We only like GET");
                return;
        }
        if (cache[url]) {
                reply(url);     
                print "Cache hit: " url "\n";
        } else {
                print "Cache miss: " url "\n";
                get(url);
        }
}
        
BEGIN {
        LINT=1
        port = "8080"   
        backend = "kly.no"
        Service = "/inet/tcp/" port "/0/0"
        Backend = "/inet/tcp/0/" backend "/80"
        do {
                handle_request()
                close(Service)
                close(Backend)
        } while(1)

}

Or download it from /code/script/gawk_cacher