Feeds:
Posts
Comments

James Golick just released friendly!

Friendly is heavily inspired by a certain Friendfeed blog post on how they use MySQL but still store schema-less data.

So, do you want:

  1. a familiar ActiveRecord-like interface?
  2. to change your schema painlessly?
  3. the stability, support, and tools of MySQL?
  4. memcached-backed high performance?

Give it a go:

  1. friendorm.com
  2. on github: friendly

My dotfiles moved

My config files have been on github for quite a while now. However, I named the project etc_config because I store those files in $HOME/etc. I renamed the project to dotfiles. This is both closer to my intent and more in line with what other people are doing.

I changed all the URLs I owned to point to the new project … but I don’t know what else on the github side is broken (watchers, feeds?)

Rake task completion is awesome. A quick google search will hook you up … (here’s one)

Rake completion must invoke rake and that can be SLOW. I use completion to save time … What’s the point on having bash freeze on rake -T for longer than it would take me to type the whole thing?

I took it upon myself to cache the output of rake -T and to use that if it’s available.

export COMP_WORDBREAKS=${COMP_WORDBREAKS/\:/}

function _check_rakefile() {
  if [ ! -e Rakefile ]; then
    return
  fi

  if [ -e ".rake_t_cache" ]; then
    local tasks=`cat .rake_t_cache | awk {print $2}`
  else
    local tasks=`rake –silent -T | awk {print $2}`
  fi

  COMPREPLY=( $(compgen -W "${tasks}" – $2) )
}
complete -F _check_rakefile -o default rake

The magic is in the .rake_t_cache file. If it’s there, use it, if not … well … we haven’t gained/lost anything, the regular completion behavior is still in place.

Where does .rake_t_cache come from?


function rake_cache() {
  rake -T > .rake_t_cache
}

function rake_cache_clear() {
  if [ -e ".rake_t_cache" ]; then
    rm .rake_t_cache
  fi
}

I have been thinking about a way to detect when the cache file needs to be regenerated. However, I leave it in the capable hands of the user. All the time saved completing tasks with the cache will compensate for the inevitable time when the cache will be stale and swear words will be required.

I used to work on a Java application that ran 24/7 and logged to a file on the system. The log file was rotated every week and it usually stood around 4GB.

When the shit hit the fan, I checked the log and tried to reverse-engineer how things got so bad. This is similar to what investigators do with a black box after a plane crash. How do you inspect a 4GB file with a text editor? You might be surprised to know how far Vim can take you in that direction. I have opened gigabyte-sized files before, and it worked … for some value of “worked”.

Luckily, the log4j format the application used contained the timestamp in ISO 8601 format. It looked something like YYYY-MM-DD hh:mm:ss. Thankfully, this is trivial to parse and guarantees that alphanumeric sorting (read: plain old sorting) will keep the dates in chronological order.

grep and sed

I’ll cover a simpler example and come back to dates later.

(seq might be called gseq on your system)

seq 10000 > 10000.txt

This created a 10000-line file with one number, from 1 to 10000, per line. I used this contrived example instead of ISO 8601 formatted dates because it was simple to generate and the relationship between the line number and the line content is obvious.

The next piece of the puzzle is grep. Grep has the -n/--line-number flag to “prefix each line of output with the line number”.

We’re going to extract from the line containing 444 to the line containing 2000. Of course, we know what those line numbers are because of how we generated this file. This is usually not the case.

too much

Right, we want the first match… Part of the solution is to use a tighter regular expression. Also, and the reason I did this, is to realize that the file will keep being parsed after the first match is found. On huge files, waiting for grep to finish is both time-consuming and unnecessary.

The -m NUM/--max-count=NUM flag will “stop reading a file after NUM matching lines.”

444 - just right

2000 - just right

Combining the line numbers, we can slice the log with sed:

sed -n ‘444,2000p’ 10000.txt

Discussion

Why not skip grep and just RTF sed manual?

sed -n ‘/^444$/,/^2000$/p’ 10000.txt

My reason: I want to visually confirm that my regular expressions matched the right lines. The time I would have saved bypassing grep would be wasted the first time I would open a file which didn’t contain what I really wanted.

Why not just grep for timestamp and use that?

That’s a subtle point. The log files contained YYYY-MM-DD hh:mm:ss at the beginning of almost every line.

Initially, I tried:

grep ‘^2009-06-28 04:’ log.file

To get the log lines between 4am and 5am on a specific date.

This was simple to understand and explain, and it worked beautifully until we realized that it was almost every line … it was missing the stack traces. It was also missing, although rare in that application, other multi-line log messages.

So, I used:

grep -m 1 -n ‘^2009-06-28 04:’ log.file
grep -m 1 -n ‘^2009-06-28 05:’ log.file

and used sed to extract the lines in-between.

I’ve talked about SpiderMonkey before. Being able to instantly evaluate JavaScript code is great but you can use FireBug for that. I argued that the main reason to use SpiderMonkey is to script the command-line. Integrating with JSLint is an example of using your tools intelligently.

JSLint

JSLint is a tool that “looks for problems in JavaScript programs”. Here’s a list of things JSLint looks for (full list):

  • missing semicolons
  • missing curly braces ({})
  • the use of with
  • the unfiltered use of for in
  • the use of eval
  • the implicit use of global variables
  • missing break statements
  • double var definitions
  • the appropriate use of = and == and ===
  • unreachable code

JSLint is just a way to check your code. However, JSLint, as it stands, is a textarea on the web:

textarea

Every time you edit your code, do you want to go jslint.com, paste only the relevant portion of your code, fix the errors locally, rinse and repeat? Anything that causes friction won’t get done. Let’s minimize that.

With SpiderMonkey

As it turns out, JSLint is a JavaScript program that parses JavaScript (!). Knowing that you can evaluate JavaScript on the command-line with SpiderMonkey, you have all the ingredients you need to automate the process.

Most of the solution is here. Look for this button: (direct link)

button

I renamed the file to jslint.js and put it in ~/etc/bin. I then created a shell script to script the process:

#!/bin/sh

filename=${1:?"jslint filename"}

js -f ~/etc/bin/jslint.js < $filename

I named it ~/etc/bin/jslint (chmod +x).

Stupid Example

Let’s take a stupid example: (stupid.js)

function stupid(x) {
  y = x

  if(y == 0)
    return 5
}

What’s wrong with this? Plenty:

output of jslint
After the fixes:

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

JSLint is just one tool. Let’s run this code through SpiderMonkey’s strict mode:

output of SpiderMonkey's strict mode

And this is for a trivial 7-line function. Running JSLint on hundreds of lines of code can be sobering experience. There’s even a warning on the site:

jslint warning

Seriously. As you become better with JavaScript and regularly check your code against JSLint (as a formality, of course), there seems to be no end to how nitpicky JSLint can be.

Finally, you can configure what JSLint will complain about by putting a specially-formatted comment in your file:

/*jslint bitwise: true, undef: true */

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

The undef option will make JSLint complain about using the global variable y which definitely looks like a mistake.

The full list of options is available here.

Netcat tricks

Web development means working at a very high level of abstraction. For the magic to work, a multitude of technologies must also work: networks, sockets, HTTP. Like all leaky abstractions, however, we can sidestep a lot of the complexity until things stop working.

Netcat does exactly what its name says: it cats stuff over a network. It can send or receive bytes over the network.

File transfer

client:

nc send

server:

nc receive

This is probably one of the fastest and most casual way to transfer a file between two systems. Nothing (besides nc) needs to be installed, no authentication or encryption is performed.

I would reserve this use of netcat for post-apocalyptic server crashes where you need to transfer files but nothing is installed and zombies are about to come crashing in.

More realistically …

HTTP tricks

Let’s spy on safari:

safari request

This will happen after you try to open localhost:9999 in a browser. You can see all the headers, in the raw form. This is one level above using a packet sniffer like wireshark

Let’s save the request: (you’ll have to open localhost:9999 again)

safari request file

Feed the request to google.com (or you own web server)

google response file

Have a look at the raw request: (vim response)

google response details

Yeah, it’s gzip compressed. More interestingly, we can use this to mock google:

mocking google

Open localhost:9999 in a browser, get served, byte-for-byte, what google would have served you.

You can use the -k flag to keep the connection open after the file is served:

mocking google keep

This will show you the subsequent requests (images, javascript, css)

Finally, you might want to serve the file more than once:

mocking google repeatedly

Discussion

I’ve used netcat in the past to debug gzip compression on nginx and lighttpd. With browsers and curl/wget all doing-the-right-thing with/without gzip compression, how can you really tell if it’s enabled or not?

You can also use it to spoof requests. Both a request or a response file can be trivially changed whereas the same effect could sometimes be achieved by making significant and/or time-consuming configuration changes to your setup.

Of course, this is not limited to HTTP. Extend the ideas here to fit your life.

Netcat is not the ultimate-solution™. It is called the hacker’s swiss army knife. It’s the kind of program you don’t need until you really do.

Lately, I’ve been working on JavaScript stuff at work. When it comes to JavaScript debugging, Firebug can take you a long way. Firebug is not, however, a very scriptable environment.

SpideMonkey is “Gecko’s JavaScript engine written in C”. For our purposes, SpiderMonkey is a command-line tool to execute JavaScript.

You can build it from source or look for it in your system’s package manager.

spidermonkey in port

Before I continue, SpiderMonkey defines a few functions which are useful for non-browser environments. In the following examples, I’ll be using the print function which outputs a string to STDOUT.

Interactive mode:

spidermonkey, interactive mode

Execute (-e) mode:

spidermonkey, execute mode

File (-f) mode:

spidermonkey, file mode

File (-f) mode (multiple files):

spidermonkey, file mode, multiple files

Combined mode:

spidermonkey, file and interactive mode

Apart from “print”, there are other useful functions:

Command        Usage                  Description
=======        =====                  ===========
load           load(['foo.js' ...])   Load files named by string arguments
readline       readline()             Read a single line from stdin
print          print([exp ...])       Evaluate and print expressions
help           help([name ...])       Display usage and help messages
quit           quit()                 Quit the shell
clear          clear([obj])           Clear properties of object

This is an abridged version of the available functions, here’s the full one.

The ability to import other files (load), read from STDIN (readline), output messages (print), and quit are exactly what’s needed for scripting. In fact, while I was googling for SpiderMonkey, I found this post which used SpiderMonkey as a primitive CGI script.

More realistically, because SpiderMonkey is MUCH faster to startup than Rhino, you can include it in a command-line workflow to unit test or lint your JavaScript.

I think I already established that Vim makes an excellent pager. Let me take it one step further: Vim is a customizable, programmable pager. (!)

There are plenty of cases where you want to pick one (1) thing out of a list. Vim can easily be made into a list picker.

A few examples

  • pick a deep directory under the current directory (pick from find . -type d)
  • pick a GNU screen session out of many (pick from screen -ls)
  • pick a process to kill (pick from ps)

There are basically 2 components to these examples:

  • what command will generate the list
  • what command to run on the selection

The Code

Here’s the PickerMode plugin (put in ~/.vim/plugin/picker.vim)


function PickerMode()
  set cursorline
  nmap <buffer> <CR>    V:w! ~/.picked<CR>:qa!<CR>
endfunction
command -nargs=0 PickerMode :call PickerMode()

Comments:

  • cursorline highlight the line the cursor is on
  • return saves the current line to ~/.picked

Here’s the bash code to invoke vim and execute a command on the selection:


# start vim in PAGER mode, with PickerMode plugin
function vim_picker() {
  vim -c "PickerMode" -R -
}

# 1st parameter is command to generate a list
# 2nd parameter is command to run on selection
# 3rd (optional) parameter is DIRECT selection, bypassing VIM
function pick_with_vim() {
  if [ -e ~/.picked ]; then
    rm ~/.picked
  fi

  if [ -n "$3" ]; then
    eval "$1" | sed -n $3p > ~/.picked
  else
    eval "$1" | vim_picker
  fi

  if [ -e ~/.picked ]; then
    $2 "`cat ~/.picked`"
  fi
}

Comments:

  • the selection is written to a file called ~/.picked
  • the existence of the file ~/.picked proves that you selected something
  • functional programming in bash (!)

Using pick_with_vim

pick a deep directory under the current directory:


# pick from a list of directories (recursive) and cd into it
function c() {
  pick_with_vim "find . -type d" "cd"
}

how to pick from screen:


function screen_r_x() {
  screen -r $1 || screen -x $1
}

function sc() {
  pick_with_vim "screen -ls | awk ‘/^\t/ {print \$1}’" "screen_r_x"
}

Here’s a much simpler rewrite of “go” (my directory bookmark miniapp)


# pick from directories in $HOME/.gorc and cd into it
function go() {
  if [ ! -f $HOME/.gorc ]; then
    echo "$HOME/.gorc does not exist…"
    return 1
  fi

  pick_with_vim "cat $HOME/.gorc" "cd" $1
}

What now?

This code is available as part of my dotfiles on github. (though it is mixed with the rest)

I’ve talked casually about using Vim as a pager before. However, I’m still surprised to see how many people use Vim regularly and don’t know about this feature.

Here’s a quote straight from vim --help

vim [arguments] -               read text from stdin

Admittedly, it’s easy to overlook the hyphen in the explanation.

vim hyphen

Why Vim as a Pager?

If you’re using Vim already, there’s nothing else to install.

If you’re using Vim already, it’s already configured the way you like it.

More importanly, Vim detects the kind of file it is being piped and turns on the appropriate syntax highlighting. Why page in black and white? In this case, “less” is definitely less!

Improving the experience

As a pager, you want to use Vim in read-only mode.

some command | vim -R -

What the difference? Vim doesn’t ask you to save the file if you try to quit. Of course, you can still modify and write the file … the -R flag is just a more reasonable pager default.

PAGER variable and ANSI Escape Sequences

You probably don’t want to set the PAGER variable. Vim doesn’t understand ANSI escape sequences. As such, a command like “man vim | vim -R -” won’t show colors; it will show escape sequences.

vim and ansi

I haven’t found any quick and simple solution to make Vim show ANSI escape sequences, but it’s pretty easy to strip them out before passing the file to Vim:

man vim | col -b | vim -R -

I use less as PAGER. I use vim in explicit cases.

View

The view command gets installed at the same time as vim. It’s just a symlink to vim. Using view is exactly like typing vim -R.

There’s a certain aesthetic in:

some command | view -

But I find that typing vim -R - is easier on my finger’s muscle memory.

What happens when you type git diff? As with all interesting questions, the answer is “it depends…”

Here’s one thing you want git to do:

Vimdiff!

Step 1: add this to your .gitconfig


[diff]
  external = git_diff_wrapper
[pager]
  diff =

Step 2: create a file named git_diff_wrapper, put it somewhere in your $PATH


#!/bin/sh

vimdiff "$2" "$5"

I still have access to the default git diff behavior with the --no-ext-diff flag. Here’s a function I put in my bash configuration files:


function git_diff() {
  git diff --no-ext-diff -w "$@" | vim -R -
}
  • --no-ext-diff : to prevent using vimdiff
  • -w : to ignore whitespace
  • -R : to start vim in read-only mode
  • - : to make vim act as a pager

When it comes to vimdiff, you can get started with this tutorial.

Older Posts »