Monday, March 12, 2007

It's the End of the World! Oh noes!

In the run-up to Y2k there was no shortage of doomsayers. Systems would shut down. Banking software would go tits up and fortunes, if not lost, would be reduced to their 1900 or 1970 values. Planes would drop out of the sky. Software everywhere would crash.

And then came January 1, 2000. The world didn't end. It wasn't even thrown back into the stone age. Planes didn't fall from the sky. Systems didn't shut down. Banking software kept working. The worst problems most of us saw were nuisances, dates like "January 4, 19100" and old calendar apps mistakenly adding a February 29th to the year. The doomsayers disappeared and the IT industry was mocked by a flood of fuckwits chiding us for having made such a big deal about Y2K.

But for the alarms three years prior, it could've gone very differently.

Until the late 1990s, most of the public didn't realise that an awful lot of infrastructure software was as old as dirt, much of it written in ancient (to the point of obscure) languages, some of it loaded on old IBM 3430 tape drives to this day. Up until 1993, the FAA's TRACON was controlled by ARTS-III software written in Ultra for a 1960s UNIVAC processor. These days it's running on PowerPCs and the language is C, but the conversion from Ultra/Univac to C on LynxOS/Motorola 68000 was long and difficult, and with only nine months to go before Y2K, the GAO issued a rather harsh report on the FAA's status.

If you COBOL, 1977-1999 were very lucrative. The now-defunct AccuCobol made a fortune selling their software to programmers trying to refresh their COBOL skills and to companies via the generated binaries and libraries for converted or modernised COBOL systems which were, truth be told, less than efficient. Thousands of programmers pored through tens of millions of lines of code, reaping as much as $0.50 per line for their efforts.

Banks and insurance companies were the first to realise the problem since they pay attention to what might happen with rates, values and tables seven to ten years in advance. The rest of us rarely look at things more than a few years ahead, and so we were unprepared. Microsoft didn't have Y2K updates for Windows 95 and NT until October, 1999. Their office applications weren't completely fixed for months after that.

UNIX was a little better since the OS keeps track of the number of seconds since 1970, but in the midst of all the finger-pointing at Microsoft, UNIX people realised that they had their own NTP time bomb set to blow in the year 2036. There were other bugs, like RFC 822 which used 2-digit years for mail headers.

The programmers worked furiously. And, as always happens, in the rush to repair the known problem, new bugs were introduced. In the rush to release the fixes not all of these new bugs were caught.

And we learned nothing. Here we are seven years later and we're going through the same shit.

On August 8, 2005, US Public Law 109–58 -- the Energy Policy Act of 2005 -- went into effect. Section 110a designated new, longer Daylight Savings Time (DST) periods. Worse, Section 110d allowed a right to reversion in as little as nine months. And no one did anything.

Granted, the problems caused by the one-hour change were relatively minor in comparison to some of the mess that could've happened after Y2K, and the Daily Show rightfully took the piss.



On the surface, yeah, it's a minor inconvenience. For many businesses, however, it's a bit more expensive. Many industries are required by law to keep extremely accurate audit trails. An hour makes a world of difference: either you're compliant or you're not. Scheduling, conferences, payments... businesses face a lot of problems if this problem isn't resolved. So while being late for a doctor's appointment isn't terribly drastic, $LargeTelco's inadvertently not running an entire day's reports is.

It wasn't until shortly before DST reverted back to Standard time in 2006 that anyone really started mentioning the upcoming changes, and with those changes the realisation that the complexity introduced by the changes which were more or less ignored in Israel and Australia in years past could no longer be ignored. DST problems were coming home to America.

Time zone information had normally been stored as a flat look-up. Check the zone, check the date look at the GMT offset and apply. No longer. Change a zone to the new value and all your historical data will be off: programs will apply the current DST info to dates prior, except that March 12, 2006 was Standard Time, not DST.

There's a full magnitude of complexity introduced in dealing with the problem. A flat file can't be used; a 3D table is necessary to account for years. Microsoft didn't have a fix until December, 2006. Companies including $MegaCorp came up with stopgap measures and temporary fixes, themselves buggy or quirky. And we'll ride out the storm for the three weeks between the old dates and new, hoping that most customers don't notice their jobs running an hour late or appearing to be scheduled two hours early.

A lot of customers tested. Those who tested complained. Those who complained kept me very busy. Every department involved, taking cues from upper management's decision regarding releasing the latest version with this time bomb bug in place, adopted a laissez faire attitude and little was done. Documentation was delayed, conferences ignored for "more important" tasks, and the few robe-wearing, long-bearded hippies carrying their signs reading "The End of the World is Nigh" were mostly ignored. That didn't stop us from trying.

We hounded our third-party suppliers for info and fixes. We tested as much as we could. I wrote documents. These documents had to be continually updated as we discovered new information, such as the fact that all version 3 systems were affected and not just 3c and 3g. Oh, and version 4b would be hit, too. A week before the expected changes it turned out that Sun's JREs which also had to be fixed, contained a major bug (ID 6530336) which broke the Eastern Time Zone functionality. The only fix: manual changes. That'll be fun for 20,000-seat call center admins.

Final documents with even more information weren't made public in a timely manner and required a lot of hounding and escalation to initiate their release to our Knowledge Base. Customers had been given incorrect information and had to be informed of our new discoveries. I escalated and called and screamed and bitched until the idea was finally accepted: every monkey had to go through every one of his own tickets to find those which had asked about DST, then send updated information. The cost of that? About 30-60 minutes per monkey on average.

Some of us did special weekend duty, ready for the onslaught of tickets from fuckwits who didn't read our Urgent Notices, pay attention to our direct mails, or who didn't follow our instructions.

It never came. Only a couple tickets related to the subject showed up and these dealt with unexpected problems which we hadn't had the time to test.

My manager Vera walked by this morning and asked how bad it had been. "Only two tickets for us and five for the US." "Well, you see? It wasn't a problem. You were so worried about this for nothing."

They said the same thing in January, 2000 as well. We have the testbeds. We proved that without our work things would've gone very differently, but no one wants to hear that. Shit's working now and that's all they care about. And because I didn't spend all day on the phone Sunday, instead of a few hundred in cash for being available Sunday, I got a T-shirt which is three sizes too small.

You're welcome, you fuckwits.

x-posted from HuSi, where there's a Daylight Savings Time poll.

Labels: , , ,

2 Comments:

Anonymous Anonymous pulled out a crayon and scribbled:

What's the lesson? Send one prescient email fortelling of doom to Vera and her boss and then stay silent until you can say told ya so?

Or they need to judge success differently...

13 March, 2007 16:40  
Anonymous Anonymous pulled out a crayon and scribbled:

Hmm... I bet it says: "I busted my balls for a horde of fuckwits & all I got was this lousy t-shirt"

I had to custom order mine...
"If assholes could fly, this place would be an airport"

14 March, 2007 20:34  

Post a Comment

Links to this post:

Create a Link

<< Home

In compliance with $MegaCorp's general policies as well as my desire to
continue living under a roof and not the sky or a bus shelter, I add this:

DISCLAIMER:
The views expressed on this blog are my own and
do not necessarily reflect the views of $MegaCorp, even if every
single one of my cow-orkers who has discovered this blog agrees with me
and would also like to see the implementation of Root Cause: 17-Fuckwit.