November | 2010 | Jonathan's Techno-tales

Archive for November, 2010

Minimal Statistics on the Command-Line

Posted in cli on November 10, 2010| Leave a Comment »

I was curious to know how many books, on average, I read a month. I don’t expose this information directly on bookpiles either. You could extract it from the RSS feed. It’s one the features I would like to add when I understand what information I want to present and how it is best presented.

In the meantime, I ran a query in the database and came up with this:

2009-01 3
2009-02 2
2009-03 4
2009-04 3
2009-05 3
2009-06 3
2009-07 2
2009-08 3
2009-09 3
2009-10 3
2009-11 4
2009-12 1
2010-01 0
2010-02 1
2010-03 1
2010-04 6
2010-05 2
2010-06 5
2010-07 7
2010-08 2
2010-09 7
2010-10 4

I felt 80% done. Then, I realized I didn’t quite know how I would extract, from the command-line, the sum, mean, standard deviation, minimum and maximum value. Of course, I could run it through R. Or Excel… The question wasn’t how to do statistics in general — it was how to do it as a filter … easily … right now.

A little research didn’t turn out any obvious answer. (please, correct me if I missed an obvious solution)

I wrote my own in awk. (awk is present on ALL the machines I use)

min == "" {min=max=$1}
$1 < min  {min = $1}
$1 > max  {max = $1}
          {sum+=$1; sumsq+=$1*$1}
END {
  print "lines: ", NR;
  print "min:   ", min;
  print "max:   ", max;
  print "sum:   ", sum;
  print "mean:  ", sum/NR;
  print "stddev:", sqrt(sumsq/NR – (sum/NR)**2)
}

Here’s what the output looks like:

I included it in my dotfiles: the awk code and a bootstrap shell script (used above).

Read Full Post »

Jonathan’s Techno-tales

no magic – just tricks

Archive for November, 2010

Minimal Statistics on the Command-Line

About

Archives

del.icio.us

Top Posts