Managing 'Effective Dates'
Managing 'Effective Dates'
Have you ever thought it would be a good idea to add columns such as “EFFECTIVE_START” and “EFFECTIVE_END” to a table and thought “that’ll do the job”? If so, have you ever stayed with the project and witnessed the havoc it wreaks?
Many data-centric systems require the ability to back-date and/or future-date changes. That is to say they need to record the “real world” history of data items. They also usually demand the capability to report “what was known, when it was known” – Thus combining the “real world” history with a log or journal of changes as and when they were applied to the database.
If implemented well, such capabilities allow for forensic interrogation of the state of a database at any past, present and future date in real world time, and at all stages of the writing of the records.
This can be vital to a business – especially where data in a system is directly attributed to safety or financial matters. If we update the system to reflect what we are planning to do, we must still be able to see what the state of our data is “Now”, and if there were to be a problem – an accident perhaps – it will be very important to determine the state of the data at the time various decisions were made, i.e. to navigate to the states of data before and after corrections were applied.
It comes as quite a surprise to most systems architects then, to realise that this is still a very difficult domain.
The obvious answer that occurs to most people is to apply varying combinations of DATE or TIMESTAMP columns to tables; EFFECTIVE_START, EFFECTIVE_END, DATE_MODIFIED and so on. This is not unreasonable; indeed it is the only mechanism available. However, the manner in which this is done has a huge impact on both project functionality and costs. These costs are often overlooked in the estimating process, because the complexity introduced is not appreciated until the hapless developer starts trying to write the SELECT and INSERT, UPDATE, DELETE statements.
The tendency in the early stages of a project is, if not to be entirely dismissive of the issue, then to stick the timestamp columns on a data model and consign it to the “we’ll think about it later” list, based on the assumption “how difficult can it be?”
For generic academic discussions of the subject, two good places to start are:
Temporal Data and the Relational Model, by Chris Date, Hugh Darwen and Nikos Lorentzos
Note that the rubric on the referenced site says, quote: “Current DBMSs provide essentially no temporal features at all” and further “temporal databases are virtually certain to become important sooner rather than later, in the commercial world as well as in academia”. This book was published in 2002, and the astonishing fact is that these two assertions are still pretty much true in 2012. However, some moves are being made – Oracle’s flashback query and flashback data archive, IBM’s temporal tables, are nods in the general direction, but as yet still do not cut the mustard. Why? Well because it really CAN be THAT difficult. These technologies only address the “log of times that changes were applied to the database” dimension – the dimension most commonly referred to as “Transaction Time”. They do not manage to combine this with “Valid Time” – the “real world” history of data items.
Developing Time-Oriented Database Applications in SQL by Richard T Snodgrass
All of the authors and contributors to these two books are high ranking academics who, without doubt, speak authoritatively on the subject. A quick skim of either publication will hopefully firm up the impression that – Yes, it’s difficult. A more thorough reading will do the job for certain.
Assuming the reader to be a busy IT Architect, without the luxury of time to delve into the minutiae of bi-temporal data (so called because it models the combination of “System Logged Time” and “Real World Time”) and the difficulties it presents, a brief explanation of the problem follows.
A very simple schema is the start point. Note the absence of DATE or TIMESTAMP columns.
This is a “current view model”. The database represents an asset register as we understand it “now” – i.e. without reference to “when”. Queries are simple. Inserts, updates and deletes are simple. Foreign and unique keys are enforced by the RDBMS based on whether values exist. Not when they exist, but whether they exist – that is to say: now.
As time rolls on, changes are made to the data, but there is no support in such a model for answering questions such as:
How many corrections have been made to an asset record?
What were those corrections?
What is the life history of an asset as we understand it now?
Are any changes to an asset planned for the future?
For answers to such questions, we need a so-called bitemporal data store. Of course, if the RDBMS were to handle time inherently, this would be the end of the modelling process. The user would merely state which time or times the query or DML is to apply, and then issue the simple statement in exactly the same way as for the current view model.
On the assumption that there is no such facility, the sudden and violent explosion of complexity can be explored:
To quote the Snodgrass book referenced above (p278):
“Although bitemporal tables can be challenging to implement, their support for both valid time and transaction time permits a sophisticated analysis of the evolution of the table, with all the data directly at hand. The alternatives of going back through paper records to reconstruct the sequence of changes that were made, or attempting to extract that sequence from backup tapes or other secondary data sources, are simply not practical”
He defines 3 types of time, so his definitions are used here for the sake of standard terminology. His definitions are:
User-Defined time: an uninterpreted time value [i.e. just a descriptive timestamp attribute – it has no “meaning” in the sense of existence or currency of the row]
Valid time: when a fact was true in the modeled reality
Transaction time: when a fact was stored in the database
He adds:
“These kinds of time are orthogonal: a table can be associated with none, one, two, or even all three kinds of time. Understanding each kind of time and determining which is present in an application is absolutely critical. We will characterize each in detail. SQL-92 has rudimentary support only for user-defined time; the language provides no help whatsoever with the other two types of time. That is left for you to manage, manually, in your application.”
This little paragraph is a more powerful statement than is often immediately realised. It is a reminder that DATE and TIMESTAMP columns hold no meaning to the software which is the RDBMS – the RDBMS cannot impute the meaning of row existence from the presence of an EFFECTIVE_DATE column. This has to be implemented elsewhere, with the knock-on effect of demoting the existing RDBMS mechanisms for enforcing entity and referential integrity to a lesser level of usefulness.
To manage transaction and valid time in the application requires the addition of timestamps, so here is a common first step on the first table:
There are, of course, different possible arrangements for timestamps. These are arranged such that the VT pair denotes the period of validity in real-world time (VT=Valid Time), and the TT pair denoting the period of record in the database (TT=Transaction Time).
The addition of the timestamps gives rise to the first challenge: How to refactor the primary key? Leaving it as it is (on CODE) clearly defeats the object of adding the timestamps in the first place as only one record could exist with a given CODE. Adding various combinations of the new timestamps into the key will also be found to be much less helpful than it would first appear:
This case leaves it impossible to build a series of valid time “entity life history” entries within the same transaction timestamp value.
This is starting to look possible. Looks can be deceiving. As transaction times should always be written as “System Time”, then adding the TT_BEGIN into the primary key will be sufficient for transaction time. This is because the TT_END is really a denormalisation. TT_END is in fact the TT_BEGIN of the next “transaction” or version of the row.
The complexity of what happens when daylight saving is catered for overlooked for now, but of course the clocks going back and running the same hour again will cause the system to attempt to introduce transaction time duplicates.
However there is still a problem with valid time.
These 2 rows are valid as far as a primary key on (CODE, VT_BEGIN and TT_BEGIN) goes. But, on valid time 03-JAN-00, there are 2 current rows – “A Description” and “A Changed Description”. The question “Which description of Asset Status 01 was valid on 3rd Jan 2000?” cannot be answered. The only available column to be added (VT_END) still doesn’t solve the problem entirely by using a primary key constraint.
The inescapable conclusion is that the RDBMS’s primary key mechanism is simply not going to work. If primary keys cannot be made to work, then neither can foreign keys. This, at a stroke, strips the database of much of its primary function. It is a serious and shocking realisation that the addition of a few timestamps breaks an expensive RDBMS.
So even for this simplest table, code must be added somewhere to cope with the inadequacy of the primary key mechanism for temporal tables. This table has no foreign keys, but for those that do, still more code will be required for their enforcement. The logic on which the code is built is not really “business logic”, so it is probable that it is not familiar territory for the local IT community – architects, designers and developers.
At this point it is important to read Chapter 10 of Developing Time-Oriented Database Applications in SQL. This will give as good a description of the extra complexity incurred. If time does not permit reading the whole chapter, then skip straight to the Summary (Section 10.8 on P339 – PDF thumbnail page 363).
Note that much mention is made of creating ASSERTIONS, but these are not available in most commercial databases. For example, check constraints in Oracle must refer only to the row in question – there can be no subqueries. This leaves us with triggers, and/or the production of an API or Data Access layer. Regardless of where such code is placed, it is fraught with complication and therefore careful design, build and testing phases must be estimated for.
Note that much mention is made of creating ASSERTIONS, but these are not available in most commercial databases. For example, check constraints in Oracle must refer only to the row in question – there can be no subqueries. This leaves us with triggers, and/or the production of an API or Data Access layer. Regardless of where such code is placed, it is fraught with complication and therefore careful design, build and testing phases must be estimated for.
Efficient software is layered. That is to say functions are specified, written and tested once, then reused as required. The resulting code is efficient because it significantly reduces the trio of specification, coding and testing requirements. Performance efficiency is orthogonal to this, and while crucial to the success of a project, it has not been in the scope of this discussion.
It is interesting to note that most database-centric projects do not attempt to subsume (reuse) any given established mechanism for handling time, preferring instead to re-invent the wheel. There is, of course, good reason for this. There are no well-known, ergonomic “wheels” available. However, they are there and can be used. For a good critique of the cost/benefits of using one such, please refer to Gartner ID Number: G00159368 (The Temporal Database: An Idea About Time Whose Time Has Come, by Frank W. Schlier, 2008). In this document Schlier recommends that “Database vendors should include TDM [Temporal Data Management] in their core databases or provide a temporal database layer to move TDM from the application domain to infrastructure.”
He also describes how Oracle Consulting in the U.K. has shown Gartner a promising transparent layer for the Oracle RDBMS that provides what appears to be seamless embedding of TDM into the database and the SQL that support it. This approach integrates TDM into standard SQL in an extremely straightforward and simple way. The layer provides the application developer an extended SQL that includes the commands necessary to manage temporal data within SQL.
The layer that was shown to Gartner is an autonomous Time Travel capability available through this website.
Traditional effective date management traps developers in a maze of temporal logic nightmares. You're forced to wrestle with:
Overlapping date ranges that break referential integrity
Gap detection algorithms across multiple related tables
Complex WHERE clauses just to find what was true when
Manual constraint checking for valid-from/valid-to relationships
Cascade update logic when parent effective periods change
History table synchronization across entity relationships
Temporal foreign key validation that standard SQL can't handle
The Ld8a framework completely insulates you from this complexity. No more temporal logic in your application code. No more debugging why the complexity of effective date constraints cause coding issues. No more explaining to stakeholders why "simple" date changes take weeks to implement safely.
The database infrastructure handles all temporal complexity autonomously—overlaps, gaps, cascading changes, and referential integrity across time dimensions—while you focus on business logic that delivers value.
Work where you belong: in the business present.
Every developer intuitively thinks in terms of "current state"—what's true right now. But traditional effective date systems force you to constantly context-switch between:
Current data vs. historical versions
Transaction time vs. business time
What was recorded when vs. what was valid when
Point-in-time queries vs. current state queries
With Ld8a technology, you design and code in the Current View—exactly where your mind naturally works. Simply declare your business entities and relationships as they exist today. The framework automatically:
Captures all temporal semantics behind the scenes
Maintains perfect audit trails without your intervention
Preserves referential integrity across all time dimensions
Enables instant time travel with a simple setting