Feeds:
Posts
Comments

Archive for June, 2009

I used to work on a Java application that ran 24/7 and logged to a file on the system. The log file was rotated every week and it usually stood around 4GB.

When the shit hit the fan, I checked the log and tried to reverse-engineer how things got so bad. This is similar to what investigators do with a black box after a plane crash. How do you inspect a 4GB file with a text editor? You might be surprised to know how far Vim can take you in that direction. I have opened gigabyte-sized files before, and it worked … for some value of “worked”.

Luckily, the log4j format the application used contained the timestamp in ISO 8601 format. It looked something like YYYY-MM-DD hh:mm:ss. Thankfully, this is trivial to parse and guarantees that alphanumeric sorting (read: plain old sorting) will keep the dates in chronological order.

grep and sed

I’ll cover a simpler example and come back to dates later.

(seq might be called gseq on your system)

seq 10000 > 10000.txt

This created a 10000-line file with one number, from 1 to 10000, per line. I used this contrived example instead of ISO 8601 formatted dates because it was simple to generate and the relationship between the line number and the line content is obvious.

The next piece of the puzzle is grep. Grep has the -n/--line-number flag to “prefix each line of output with the line number”.

We’re going to extract from the line containing 444 to the line containing 2000. Of course, we know what those line numbers are because of how we generated this file. This is usually not the case.

too much

Right, we want the first match… Part of the solution is to use a tighter regular expression. Also, and the reason I did this, is to realize that the file will keep being parsed after the first match is found. On huge files, waiting for grep to finish is both time-consuming and unnecessary.

The -m NUM/--max-count=NUM flag will “stop reading a file after NUM matching lines.”

444 - just right

2000 - just right

Combining the line numbers, we can slice the log with sed:

sed -n ‘444,2000p’ 10000.txt

Discussion

Why not skip grep and just RTF sed manual?

sed -n ‘/^444$/,/^2000$/p’ 10000.txt

My reason: I want to visually confirm that my regular expressions matched the right lines. The time I would have saved bypassing grep would be wasted the first time I would open a file which didn’t contain what I really wanted.

Why not just grep for timestamp and use that?

That’s a subtle point. The log files contained YYYY-MM-DD hh:mm:ss at the beginning of almost every line.

Initially, I tried:

grep ‘^2009-06-28 04:’ log.file

To get the log lines between 4am and 5am on a specific date.

This was simple to understand and explain, and it worked beautifully until we realized that it was almost every line … it was missing the stack traces. It was also missing, although rare in that application, other multi-line log messages.

So, I used:

grep -m 1 -n ‘^2009-06-28 04:’ log.file
grep -m 1 -n ‘^2009-06-28 05:’ log.file

and used sed to extract the lines in-between.

Read Full Post »

JSLint with SpiderMonkey

I’ve talked about SpiderMonkey before. Being able to instantly evaluate JavaScript code is great but you can use FireBug for that. I argued that the main reason to use SpiderMonkey is to script the command-line. Integrating with JSLint is an example of using your tools intelligently.

JSLint

JSLint is a tool that “looks for problems in JavaScript programs”. Here’s a list of things JSLint looks for (full list):

  • missing semicolons
  • missing curly braces ({})
  • the use of with
  • the unfiltered use of for in
  • the use of eval
  • the implicit use of global variables
  • missing break statements
  • double var definitions
  • the appropriate use of = and == and ===
  • unreachable code

JSLint is just a way to check your code. However, JSLint, as it stands, is a textarea on the web:

textarea

Every time you edit your code, do you want to go jslint.com, paste only the relevant portion of your code, fix the errors locally, rinse and repeat? Anything that causes friction won’t get done. Let’s minimize that.

With SpiderMonkey

As it turns out, JSLint is a JavaScript program that parses JavaScript (!). Knowing that you can evaluate JavaScript on the command-line with SpiderMonkey, you have all the ingredients you need to automate the process.

Most of the solution is here. Look for this button: (direct link)

button

I renamed the file to jslint.js and put it in ~/etc/bin. I then created a shell script to script the process:

#!/bin/sh

filename=${1:?"jslint filename"}

js -f ~/etc/bin/jslint.js < $filename

I named it ~/etc/bin/jslint (chmod +x).

Stupid Example

Let’s take a stupid example: (stupid.js)

function stupid(x) {
  y = x

  if(y == 0)
    return 5
}

What’s wrong with this? Plenty:

output of jslint
After the fixes:

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

JSLint is just one tool. Let’s run this code through SpiderMonkey’s strict mode:

output of SpiderMonkey's strict mode

And this is for a trivial 7-line function. Running JSLint on hundreds of lines of code can be sobering experience. There’s even a warning on the site:

jslint warning

Seriously. As you become better with JavaScript and regularly check your code against JSLint (as a formality, of course), there seems to be no end to how nitpicky JSLint can be.

Finally, you can configure what JSLint will complain about by putting a specially-formatted comment in your file:

/*jslint bitwise: true, undef: true */

function stupid(x) {
  y = x;

  if(y === 0) {
    return 5;
  }
}

The undef option will make JSLint complain about using the global variable y which definitely looks like a mistake.

The full list of options is available here.

Read Full Post »

Netcat tricks

Web development means working at a very high level of abstraction. For the magic to work, a multitude of technologies must also work: networks, sockets, HTTP. Like all leaky abstractions, however, we can sidestep a lot of the complexity until things stop working.

Netcat does exactly what its name says: it cats stuff over a network. It can send or receive bytes over the network.

File transfer

client:

nc send

server:

nc receive

This is probably one of the fastest and most casual way to transfer a file between two systems. Nothing (besides nc) needs to be installed, no authentication or encryption is performed.

I would reserve this use of netcat for post-apocalyptic server crashes where you need to transfer files but nothing is installed and zombies are about to come crashing in.

More realistically …

HTTP tricks

Let’s spy on safari:

safari request

This will happen after you try to open localhost:9999 in a browser. You can see all the headers, in the raw form. This is one level above using a packet sniffer like wireshark

Let’s save the request: (you’ll have to open localhost:9999 again)

safari request file

Feed the request to google.com (or you own web server)

google response file

Have a look at the raw request: (vim response)

google response details

Yeah, it’s gzip compressed. More interestingly, we can use this to mock google:

mocking google

Open localhost:9999 in a browser, get served, byte-for-byte, what google would have served you.

You can use the -k flag to keep the connection open after the file is served:

mocking google keep

This will show you the subsequent requests (images, javascript, css)

Finally, you might want to serve the file more than once:

mocking google repeatedly

Discussion

I’ve used netcat in the past to debug gzip compression on nginx and lighttpd. With browsers and curl/wget all doing-the-right-thing with/without gzip compression, how can you really tell if it’s enabled or not?

You can also use it to spoof requests. Both a request or a response file can be trivially changed whereas the same effect could sometimes be achieved by making significant and/or time-consuming configuration changes to your setup.

Of course, this is not limited to HTTP. Extend the ideas here to fit your life.

Netcat is not the ultimate-solution™. It is called the hacker’s swiss army knife. It’s the kind of program you don’t need until you really do.

Read Full Post »

Lately, I’ve been working on JavaScript stuff at work. When it comes to JavaScript debugging, Firebug can take you a long way. Firebug is not, however, a very scriptable environment.

SpiderMonkey is “Gecko’s JavaScript engine written in C”. For our purposes, SpiderMonkey is a command-line tool to execute JavaScript.

You can build it from source or look for it in your system’s package manager.

spidermonkey in port

Before I continue, SpiderMonkey defines a few functions which are useful for non-browser environments. In the following examples, I’ll be using the print function which outputs a string to STDOUT.

Interactive mode:

spidermonkey, interactive mode

Execute (-e) mode:

spidermonkey, execute mode

File (-f) mode:

spidermonkey, file mode

File (-f) mode (multiple files):

spidermonkey, file mode, multiple files

Combined mode:

spidermonkey, file and interactive mode

Apart from “print”, there are other useful functions:

Command        Usage                  Description
=======        =====                  ===========
load           load(['foo.js' ...])   Load files named by string arguments
readline       readline()             Read a single line from stdin
print          print([exp ...])       Evaluate and print expressions
help           help([name ...])       Display usage and help messages
quit           quit()                 Quit the shell
clear          clear([obj])           Clear properties of object

This is an abridged version of the available functions, here’s the full one.

The ability to import other files (load), read from STDIN (readline), output messages (print), and quit are exactly what’s needed for scripting. In fact, while I was googling for SpiderMonkey, I found this post which used SpiderMonkey as a primitive CGI script.

More realistically, because SpiderMonkey is MUCH faster to startup than Rhino, you can include it in a command-line workflow to unit test or lint your JavaScript.

Read Full Post »

Follow

Get every new post delivered to your Inbox.