If I told you I had to rename 1,000 files, change the extensions or
change hyphens to underscores:
- how long would it take you?
- what tools would you use?
- what would you do?
- how much would that answer change for 10,000/100,000/1,000,000 files?
Take a moment to think, please, before you keep reading.
This was a situation I was faced with this week. And it reminded me of Steve Yegge’s phone interview blog post You should read it for yourself, but here’s the problem statement:
Last year my team had to remove all the phone numbers from 50,000 Amazon web page templates, since many of the numbers were no longer in service, and we also wanted to route all customer contacts through a single page.
Let’s say you’re on my team, and we have to identify the pages having probable U.S. phone numbers in them. To simplify the problem slightly, assume we have 50,000 HTML files in a Unix directory tree, under a directory called ”/website”. We have 2 days to get a list of file paths to the editorial staff. You need to give me a list of the .html files in this directory tree that appear to contain phone numbers in the following two formats: (xxx) xxx-xxxx and xxx-xxx-xxxx.
How would you solve this problem? Keep in mind our team is on a short (2-day) timeline.
These are not “never-gonna-happen” situations. Your set of skills should include “entreprise” problem solving and “low-level” scripting.
For the curious, here’s how I solved the renaming problem:
find . -name '*.TXT' > src cp src dest vim dest paste src dest > todo vim todo source todo
A good old “find”, some vim regular expression magic, “paste”, and more vim magic (to add “mv” to every line). Another advantage to this technique is that you’ll be able to “preview” the changes before you source the file.