Thursday, February 19, 2009

No more garbage collection pauses!!

I was listening to a back episode of the Java Posse the other night from the '07 roundup "Whither java?" session (around 63:10), and heard someone mention the "-Xincgc" option for the Sun JVM that changes from the default collector with pauses and all, to an incremental collector.

This changes the behavior from big, ugly, noticeable pauses for garbage collection full sweeps to an incremental model where the pauses aren't noticeable, with the trade-off that it uses more CPU overall. So for batch-type, long-running CPU intensive operations the default collector will out-perform the incremental garbage collector marginally, but for user-visible operations the big noticeable pauses go away.

Technically, this forces the JVM to use the Concurrent Low Pause Collector, as documented in Tuning Garbage Collection with the 5.0 Java Virtual Machine. Interesting reading if you have the time.

Script it Script it Script it!

this has been said many times before, but I'm gonna say it again, anything worth doing twice is worth scripting!

In our environment the development databases are refreshed from production and sanitised weekly. This is scripted, it happens on the dot, every week, without fail (excepting catastrophies!).

All of our local builds are automated, type ant or mvn compile and magic happens.

All of our server builds are automated, svn ci causes a build to kick off automatically, and notifies you if something goes wrong.

The project I'm currently working on requires some manual configuration and then proceeds to turn the database inside out, and is not easily backed out (I've tried, and it was more pain than it was worth!). So for every full integration test there's about 8 separate configuration steps before it can be run. They've all been scripted (SQL in this case). Now I know I can refresh my database from production with sanitised data, run one script to reload the config, and run a full 4hour integration test, all with complete confidence that I'm running off a fully reproducible slate.

The first time you do something, fair enough, do it without regard to rigorous scripting (but save any commands you run). When you come to do it a second time, you have a hard decision to make. If there's even the remotest chance that you'll be called on to do this again, then script it. In fact, even if you probably think you won't have to, script it anyway. The number of times I've had to repeat the thing that "is surely just this once-off and never again!" and later wished I'd scripted it to start with.

I've found in my experience that the time it takes to script something, most of the time, is only slightly slower than the time it takes to do it ad hoc. When you script something you exercise the muscle between your ears, which is worth it if nothing else. Then as soon as you have to do it the second time, it pays for itself, and every single time after that.

I once had to step in for a client and run some web statistics for them as the usual staffer who performed the role was away. This normally took him the best part of a day to perform, between copying log files from the server, setting up the config and waiting for the tool to process the logs, then copying to results to the web directory. The first time I did it, I took about 2 days, maybe three, this time included figuring out the process as well as scripting it while waiting. The second time it took me the 5mins to update the links on the stats page to point to the new stats, as everything had run automatically before I got to work. Time well spent.

I'm sure I'm preaching to the choir here, and everyone reading this blog will be nodding their heads as if I'm ranting about how rocks fall down instead of up, but it's something we all too easily forget in the rush of everyday, to take the time to keep our minds and our tools sharp.

Got a large log file to parse(1GB+?)? Instead of wrestling with some editor, try writing a perl/python script to parse it for you, or even a shell script with a grep pipeline.

Let the machine work hard while you work smart. You'll save time, be more relaxed for it, and who knows, you might even enjoy it!

Monday, February 9, 2009

Two Random things I like about Java...

1) the fact that each and every class file needs to have 0xCAFEBABE in the first 4 byte positions
This actually identifies the file as a valid Java class file in addition to the file extension, no non-class file is likely to have this number in that location.

2 the fact that there is a class called java.text.DontCareFieldPosition!!
The name actually makes sense, it's used by the DecimalFormatter and is package-local, so clients of the API should never be aware of it, and is a null implementation of the FieldPosition interface. Cudos to the developer that argued that name past the API police!!

Sunday, February 8, 2009

The importance of trim()... or how much I hate whitespace!!

So... One of my colleagues went on leave last friday, and Murphys law dictates that that is the perfect time for the application he maintains to suddenly and precipitously cease working in production. Of course this problem fell into my lap to resolve so I set to it around about lunchtime...

The situation is that a third party regularly sends us a variety of data in plain text or spreadsheet format, in this particular case it's spreadsheet. Once we get this, we read it off the filesystem, look up data from the spreadsheet and compare it to data in our systems and perform various actions based on it. This has been in production for about 8months and has worked perfectly the whole time. Until the maintainer went on leave... :-)

Digging in, the first problem was that it wasn't reading dates properly. Hmmm, ok, so we grab the last spreadsheet that worked and compare with that, ah, they'd changed the date format from YYYY-MM-DD to DD/MM/YYYY for no reason at all. Ok, not a biggie, one excel macro later it looks fine, reload but still no workie... what tha?

After more digging we were being bitten by Excel date handling. A cell may look like a date, but if the cell is formatted as a date then internally it's actually just an integer, which needs to be converted. In my checked-out version of code this was handled transparently, but in prod and QUAL this broke as numeric dates were being missed, resulting in nulls. Ok, another excel macro and we have columns of strings that look like dates instead of numbers that look like dates. Now loading this file into QUAL the dates are picked up fine, at which point I noticed that a lookup between the file and our system wasn't working. WTF mate? Each row has several cells that contain codes that need to be converted into rich data structures from a lookup, this data structure then being used for calculations later.

The lookup was broken, why? Well, after several hours of checking differences in environment, checking datasources, checking versions of code and finally in desperation stepping through the lookup code, it dawned on me... The spreadsheet had whitespace in it.

Normally when I read tabular data from disk, or really any untrusted data at all, trim()'ing and validation is the order of the day. If the field is an enumeration, then validate that. If it's a number, check for alphas, dollar signs, commas etc and so on. It's the sort of paranoia that maintenance programming gives you. I had ASSumed that my colleague would have done the same by default, oops no. Which is fine, we all make mistakes.

So, a quick google, this handy KB article and some VBA later, and the spreadsheet was all clean again. Re-running the load it came up roses, which is to say it started working like billy-o where it had previously just cruised through, and all was well with the world.

Moral of the story is, Excel is a pain to work with. Not because it's the spawn of pure evil, but rather because although it *looks* like a datasource with strong data rules, it isn't, and you can never, ever assume that anything inside it is what you think it is until you've checked and made sure. Columns will move around, dates will randomly be strings or numbers, any and every field will randomly be space/tab/zero padded. You simply have to treat it as a hostile datasource.

And never, ever fail silently when something is wrong with the data!! Die loudly and with great gusto!! Please for my sanity!!!

// No record exists may be something wrong with **** in the input file