administration mode

The Not-So-Rapid Blog

Selling yourself... »

PHILIP STORRY - MAY 26, 2009 (01:06:20 PM)

I'm not a salesman.

Selling things isn't my strong point.  Selling myself is more of a weak point.  This makes things like CVs and review cycles difficult for me.

Reviews are a pain, because "it's all in a day's work".  I do it, I'm momentarily proud, but unless I write it down I'll forget about it as an accomplishment completely come my next review meeting.

 

But this is an area where I've always thought it's difficult to crow about these things.  Often, the only people that will understand what you did are other geeks.  You end up distilling your achievements down to one-liners that mask the effort and ingenuity involved.

 

For instance, recently I did an analysis on some webservers.  They have lots of databases on them, and we needed to know what was & wasn't being served.

One month's logs were selected - February, as it's short - to make sure we got an accurate slice of data that would include any more occasional access that might be taking place.  There are four webservers, across two clusters, and all of them need to be analysed.

As a yardstick, one server picked at random shows 7.2Gb of text logs giving over 25 million hits. (It turned out that the load was distributed pretty evenly, as it happens.)

 

And all I have to do is figure out which databases those hits were served from.

 

Easy.

I could use a tool like Analog.  It's free, it's reputedly fast - but it requires setting up, and it's overkill.  I just need database names, and don't care about individual documents.  Analog will build a huge database, spend time maintaining that database, and will probably require considerable tweaking to get it to stop thinking about pages (documents, views, agents etc) and start thinking about databases.

Nope, a traditional web analytics system doesn't look like the solution.  On Rocky's Solution Correctness graph, web analytics falls somewhere between "Little overboard" and "God Help Me".

 

So I had a think, and realised that AWK was perfect for the job. It's sometimes thought of as a simple scripting language which processes text line-by-line, but it's actually more powerful.

A short while later, I had a script which reads the log files, determines the database filename, checks in an array to see if the filename has been seen before, and then either increments the filename's counter in that array or adds the database filename to the array.  Then at the end, it prints the array out.

 

Hey presto, one neat list of which databases were served, and how many hits each one got.  No details, just the summary I wanted.  Total time to develop?  An hour or two.  Time to run on 7.2Gb/25 million lines?  28 minutes.

As you can imagine, I was chuffed with that result.  An hour and a half later I had the stats I wanted for all four servers, and could start putting them into Excel and Word to make pretty graphs.

And by the way, if you think that the AWK script sounded like a waste of time, let me tell you that I spent more time trying to get Excel to produce readable charts than I did on developing the AWK script.  You'd think handling labels on a pie chart would be simple, but oh no, Excel begged to differ!

 

So what I have here are two achievements:

  • I wrote a nifty AWK script, which produced the data I wanted quickly and easily.
  • I bludgeoned Excel into submission and managed to get a half-way attractive chart.

 

But of course, in my review, these things will become just:

  • Analysed web server usage

Which is dull, and hides all the skill and effort that it took.

 

And that assumes that I remember that I did this at my next review.  It's all in a day's work, after all...

 

BOOKMARK THIS CONTENT
del.icio.us technorati digg Furl YahooMyWeb Reddit NewsVine
CREATE A NEW COMMENT
required field
required field HTML is not allowed. Hyperlinks will automatically be converted.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30