Feeds:
Posts
Comments

I used to keep track of what I wanted to read on a piece of paper. What started out as list that could fit on a Post-it, grew rapidly into a few pages. At one point, I typed everything up on the computer. From text file, to other solutions, I finally ended up writing a web application: Booklife.

At the end of 2008, I had just refactored Booklife and I added events. The main purpose of events was to generate an ATOM feed. However, as 2010 rolled around, I realized that all my reading habits were in the database.

Here’s was I came up with:

January

February

March

April

May

June

July

August

September

October

November

December

Notes

All in all, I read 34 books in 2009. At first, I was surprised by how high the number was. Then, I was surprised that I had not read even more. I guess, over time, I’ll have a better idea of how many books I’m going through in a given period of time.

Of those 34 books, 17 are audio books. That was a surprise, I would have thought it would have been less than that. Also: the fact that it’s exactly 50% is a coincidence. I had argued in the past that audio books were increasing my “book throughput”. Sure, if I had not read these books in audio, I might have been able to squeeze in more real books. At the same time, I am not convinced. I fit audio books in contexts where real books are inconvenient: when I’m cleaning, doing the dishes, in transit.

For some reason, I’ve been in a dotfiles refactoring frenzy.

Though I’ve posted about my rake script to do completion in bash before, that was a while back and I’ve improved things since.

capistrano: on github

export COMP_WORDBREAKS=${COMP_WORDBREAKS/\:/}

_check_capfile() {
  if [ ! -e Capfile ]; then
    return
  fi

  local cache_file=".cache_cap_t"

  if [ ! -e "$cache_file" ]; then
    cap -T | awk /^cap / {print $2} > $cache_file
  fi

  local tasks=$(cat $cache_file)
  COMPREPLY=( $(compgen -W "${tasks}" – $2) )
}
complete -F _check_capfile -o default cap

rake: on github

export COMP_WORDBREAKS=${COMP_WORDBREAKS/\:/}

_check_rakefile() {
  if [ ! -e Rakefile ]; then
    return
  fi

  local cache_file=".cache_rake_t"

  if [ ! -e "$cache_file" ]; then
    rake -T | awk /^rake / {print $2} > $cache_file
  fi

  local tasks=$(cat $cache_file)
  COMPREPLY=( $(compgen -W "${tasks}" – $2) )
}
complete -F _check_rakefile -o default rake

There is very little difference between those 2 scripts. In fact, if it wasn’t bash, I would probably refactor this further …

  1. check that the (Cap|Rake)file exists
  2. generate the cache file if it doesn’t exist
  3. use the cache file to do the completion

“rake -T” or “cap -T” will NOT run again until you delete the cache files:

rm_caches() {
  rm -v .cache_* 2>/dev/null
}

James Golick just released friendly!

Friendly is heavily inspired by a certain Friendfeed blog post on how they use MySQL but still store schema-less data.

So, do you want:

  1. a familiar ActiveRecord-like interface?
  2. to change your schema painlessly?
  3. the stability, support, and tools of MySQL?
  4. memcached-backed high performance?

Give it a go:

  1. friendorm.com
  2. on github: friendly

My dotfiles moved

My config files have been on github for quite a while now. However, I named the project etc_config because I store those files in $HOME/etc. I renamed the project to dotfiles. This is both closer to my intent and more in line with what other people are doing.

I changed all the URLs I owned to point to the new project … but I don’t know what else on the github side is broken (watchers, feeds?)

Update: new rake and capistrano bash completion

Rake task completion is awesome. A quick google search will hook you up … (here’s one)

Rake completion must invoke rake and that can be SLOW. I use completion to save time … What’s the point on having bash freeze on rake -T for longer than it would take me to type the whole thing?

I took it upon myself to cache the output of rake -T and to use that if it’s available.

export COMP_WORDBREAKS=${COMP_WORDBREAKS/\:/}

function _check_rakefile() {
  if [ ! -e Rakefile ]; then
    return
  fi

  if [ -e ".rake_t_cache" ]; then
    local tasks=`cat .rake_t_cache | awk {print $2}`
  else
    local tasks=`rake –silent -T | awk {print $2}`
  fi

  COMPREPLY=( $(compgen -W "${tasks}" – $2) )
}
complete -F _check_rakefile -o default rake

The magic is in the .rake_t_cache file. If it’s there, use it, if not … well … we haven’t gained/lost anything, the regular completion behavior is still in place.

Where does .rake_t_cache come from?


function rake_cache() {
  rake -T > .rake_t_cache
}

function rake_cache_clear() {
  if [ -e ".rake_t_cache" ]; then
    rm .rake_t_cache
  fi
}

I have been thinking about a way to detect when the cache file needs to be regenerated. However, I leave it in the capable hands of the user. All the time saved completing tasks with the cache will compensate for the inevitable time when the cache will be stale and swear words will be required.

I used to work on a Java application that ran 24/7 and logged to a file on the system. The log file was rotated every week and it usually stood around 4GB.

When the shit hit the fan, I checked the log and tried to reverse-engineer how things got so bad. This is similar to what investigators do with a black box after a plane crash. How do you inspect a 4GB file with a text editor? You might be surprised to know how far Vim can take you in that direction. I have opened gigabyte-sized files before, and it worked … for some value of “worked”.

Luckily, the log4j format the application used contained the timestamp in ISO 8601 format. It looked something like YYYY-MM-DD hh:mm:ss. Thankfully, this is trivial to parse and guarantees that alphanumeric sorting (read: plain old sorting) will keep the dates in chronological order.

grep and sed

I’ll cover a simpler example and come back to dates later.

(seq might be called gseq on your system)

seq 10000 > 10000.txt

This created a 10000-line file with one number, from 1 to 10000, per line. I used this contrived example instead of ISO 8601 formatted dates because it was simple to generate and the relationship between the line number and the line content is obvious.

The next piece of the puzzle is grep. Grep has the -n/--line-number flag to “prefix each line of output with the line number”.

We’re going to extract from the line containing 444 to the line containing 2000. Of course, we know what those line numbers are because of how we generated this file. This is usually not the case.

too much

Right, we want the first match… Part of the solution is to use a tighter regular expression. Also, and the reason I did this, is to realize that the file will keep being parsed after the first match is found. On huge files, waiting for grep to finish is both time-consuming and unnecessary.

The -m NUM/--max-count=NUM flag will “stop reading a file after NUM matching lines.”

444 - just right

2000 - just right

Combining the line numbers, we can slice the log with sed:

sed -n ‘444,2000p’ 10000.txt

Discussion

Why not skip grep and just RTF sed manual?

sed -n ‘/^444$/,/^2000$/p’ 10000.txt

My reason: I want to visually confirm that my regular expressions matched the right lines. The time I would have saved bypassing grep would be wasted the first time I would open a file which didn’t contain what I really wanted.

Why not just grep for timestamp and use that?

That’s a subtle point. The log files contained YYYY-MM-DD hh:mm:ss at the beginning of almost every line.

Initially, I tried:

grep ‘^2009-06-28 04:’ log.file

To get the log lines between 4am and 5am on a specific date.

This was simple to understand and explain, and it worked beautifully until we realized that it was almost every line … it was missing the stack traces. It was also missing, although rare in that application, other multi-line log messages.

So, I used:

grep -m 1 -n ‘^2009-06-28 04:’ log.file
grep -m 1 -n ‘^2009-06-28 05:’ log.file

and used sed to extract the lines in-between.

I’ve talked about SpiderMonkey before. Being able to instantly evaluate JavaScript code is great but you can use FireBug for that. I argued that the main reason to use SpiderMonkey is to script the command-line. Integrating with JSLint is an example of using your tools intelligently.

JSLint

JSLint is a tool that “looks for problems in JavaScript programs”. Here’s a list of things JSLint looks for (full list):

  • missing semicolons
  • missing curly braces ({})
  • the use of with
  • the unfiltered use of for in
  • the use of eval
  • the implicit use of global variables
  • missing break statements
  • double var definitions
  • the appropriate use of = and == and ===
  • unreachable code

JSLint is just a way to check your code. However, JSLint, as it stands, is a textarea on the web:

textarea

Every time you edit your code, do you want to go jslint.com, paste only the relevant portion of your code, fix the errors locally, rinse and repeat? Anything that causes friction won’t get done. Let’s minimize that.

With SpiderMonkey

As it turns out, JSLint is a JavaScript program that parses JavaScript (!). Knowing that you can evaluate JavaScript on the command-line with SpiderMonkey, you have all the ingredients you need to automate the process.

Most of the solution is here. Look for this button: (direct link)

button

I renamed the file to jslint.js and put it in ~/etc/bin. I then created a shell script to script the process:

#!/bin/sh

filename=${1:?"jslint filename"}

js -f ~/etc/bin/jslint.js < $filename

I named it ~/etc/bin/jslint (chmod +x).

Stupid Example

Let’s take a stupid example: (stupid.js)

function stupid(x) {
  y = x

  if(y == 0)
    return 5
}

What’s wrong with this? Plenty:

output of jslint
After the fixes:

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

JSLint is just one tool. Let’s run this code through SpiderMonkey’s strict mode:

output of SpiderMonkey's strict mode

And this is for a trivial 7-line function. Running JSLint on hundreds of lines of code can be sobering experience. There’s even a warning on the site:

jslint warning

Seriously. As you become better with JavaScript and regularly check your code against JSLint (as a formality, of course), there seems to be no end to how nitpicky JSLint can be.

Finally, you can configure what JSLint will complain about by putting a specially-formatted comment in your file:

/*jslint bitwise: true, undef: true */

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

The undef option will make JSLint complain about using the global variable y which definitely looks like a mistake.

The full list of options is available here.

Netcat tricks

Web development means working at a very high level of abstraction. For the magic to work, a multitude of technologies must also work: networks, sockets, HTTP. Like all leaky abstractions, however, we can sidestep a lot of the complexity until things stop working.

Netcat does exactly what its name says: it cats stuff over a network. It can send or receive bytes over the network.

File transfer

client:

nc send

server:

nc receive

This is probably one of the fastest and most casual way to transfer a file between two systems. Nothing (besides nc) needs to be installed, no authentication or encryption is performed.

I would reserve this use of netcat for post-apocalyptic server crashes where you need to transfer files but nothing is installed and zombies are about to come crashing in.

More realistically …

HTTP tricks

Let’s spy on safari:

safari request

This will happen after you try to open localhost:9999 in a browser. You can see all the headers, in the raw form. This is one level above using a packet sniffer like wireshark

Let’s save the request: (you’ll have to open localhost:9999 again)

safari request file

Feed the request to google.com (or you own web server)

google response file

Have a look at the raw request: (vim response)

google response details

Yeah, it’s gzip compressed. More interestingly, we can use this to mock google:

mocking google

Open localhost:9999 in a browser, get served, byte-for-byte, what google would have served you.

You can use the -k flag to keep the connection open after the file is served:

mocking google keep

This will show you the subsequent requests (images, javascript, css)

Finally, you might want to serve the file more than once:

mocking google repeatedly

Discussion

I’ve used netcat in the past to debug gzip compression on nginx and lighttpd. With browsers and curl/wget all doing-the-right-thing with/without gzip compression, how can you really tell if it’s enabled or not?

You can also use it to spoof requests. Both a request or a response file can be trivially changed whereas the same effect could sometimes be achieved by making significant and/or time-consuming configuration changes to your setup.

Of course, this is not limited to HTTP. Extend the ideas here to fit your life.

Netcat is not the ultimate-solution™. It is called the hacker’s swiss army knife. It’s the kind of program you don’t need until you really do.

Lately, I’ve been working on JavaScript stuff at work. When it comes to JavaScript debugging, Firebug can take you a long way. Firebug is not, however, a very scriptable environment.

SpideMonkey is “Gecko’s JavaScript engine written in C”. For our purposes, SpiderMonkey is a command-line tool to execute JavaScript.

You can build it from source or look for it in your system’s package manager.

spidermonkey in port

Before I continue, SpiderMonkey defines a few functions which are useful for non-browser environments. In the following examples, I’ll be using the print function which outputs a string to STDOUT.

Interactive mode:

spidermonkey, interactive mode

Execute (-e) mode:

spidermonkey, execute mode

File (-f) mode:

spidermonkey, file mode

File (-f) mode (multiple files):

spidermonkey, file mode, multiple files

Combined mode:

spidermonkey, file and interactive mode

Apart from “print”, there are other useful functions:

Command        Usage                  Description
=======        =====                  ===========
load           load(['foo.js' ...])   Load files named by string arguments
readline       readline()             Read a single line from stdin
print          print([exp ...])       Evaluate and print expressions
help           help([name ...])       Display usage and help messages
quit           quit()                 Quit the shell
clear          clear([obj])           Clear properties of object

This is an abridged version of the available functions, here’s the full one.

The ability to import other files (load), read from STDIN (readline), output messages (print), and quit are exactly what’s needed for scripting. In fact, while I was googling for SpiderMonkey, I found this post which used SpiderMonkey as a primitive CGI script.

More realistically, because SpiderMonkey is MUCH faster to startup than Rhino, you can include it in a command-line workflow to unit test or lint your JavaScript.

I think I already established that Vim makes an excellent pager. Let me take it one step further: Vim is a customizable, programmable pager. (!)

There are plenty of cases where you want to pick one (1) thing out of a list. Vim can easily be made into a list picker.

A few examples

  • pick a deep directory under the current directory (pick from find . -type d)
  • pick a GNU screen session out of many (pick from screen -ls)
  • pick a process to kill (pick from ps)

There are basically 2 components to these examples:

  • what command will generate the list
  • what command to run on the selection

The Code

Here’s the PickerMode plugin (put in ~/.vim/plugin/picker.vim)


function PickerMode()
  set cursorline
  nmap <buffer> <CR>    V:w! ~/.picked<CR>:qa!<CR>
endfunction
command -nargs=0 PickerMode :call PickerMode()

Comments:

  • cursorline highlight the line the cursor is on
  • return saves the current line to ~/.picked

Here’s the bash code to invoke vim and execute a command on the selection:


# start vim in PAGER mode, with PickerMode plugin
function vim_picker() {
  vim -c "PickerMode" -R -
}

# 1st parameter is command to generate a list
# 2nd parameter is command to run on selection
# 3rd (optional) parameter is DIRECT selection, bypassing VIM
function pick_with_vim() {
  if [ -e ~/.picked ]; then
    rm ~/.picked
  fi

  if [ -n "$3" ]; then
    eval "$1" | sed -n $3p > ~/.picked
  else
    eval "$1" | vim_picker
  fi

  if [ -e ~/.picked ]; then
    $2 "`cat ~/.picked`"
  fi
}

Comments:

  • the selection is written to a file called ~/.picked
  • the existence of the file ~/.picked proves that you selected something
  • functional programming in bash (!)

Using pick_with_vim

pick a deep directory under the current directory:


# pick from a list of directories (recursive) and cd into it
function c() {
  pick_with_vim "find . -type d" "cd"
}

how to pick from screen:


function screen_r_x() {
  screen -r $1 || screen -x $1
}

function sc() {
  pick_with_vim "screen -ls | awk ‘/^\t/ {print \$1}’" "screen_r_x"
}

Here’s a much simpler rewrite of “go” (my directory bookmark miniapp)


# pick from directories in $HOME/.gorc and cd into it
function go() {
  if [ ! -f $HOME/.gorc ]; then
    echo "$HOME/.gorc does not exist…"
    return 1
  fi

  pick_with_vim "cat $HOME/.gorc" "cd" $1
}

What now?

This code is available as part of my dotfiles on github. (though it is mixed with the rest)

Older Posts »