ComputerWorld.in

Design Challenges in Global Data Warehousing

By David Cox, Head of Delivery, Saksoft Europe on Mar 04, 2010

The data warehouse is now a familiar feature of the technical landscape in many organisations. As data warehousing as a discipline and the technology to support it has matured, ever more ambitious requirements have been tackled.

In recent years there has been increasing demand for data warehouses capable of operating on a global scale. Such implementations collect and consolidate data from multiple geographies and provide reports and analyses back to information consumers across the world.

Although there are several architectural approaches to these global systems all of them have to cope with design issues which tend not to crop up in most local data warehouses. This article explores these issues and suggests ways in which they may be resolved. In particular this article will focus on handling multiple time zones, currencies and languages as well as possible approaches to maintaining a continuous 24/7 update and publishing strategy.

Multiple Time Zones

The vast majority of data warehouses operate on an "overnight update" basis. Data is collected at some point after the end of the working day and processed ready for reporting and analysis the following morning. This tends to work well because the data warehouse is either in "update" mode or "query" mode but not both at the same time. This means update performance is not affected by query processing and vice versa. However for an organisation operating in multiple time zones the notion of "overnight" may not be so straightforward.

Consider a company with operations in eastern Australia, the UK and the west coast of the USA. Figure 1 shows when data is collected, processed and made available for each time zone.

Figure 1

Day 1 starts first in Australian Eastern Time. Data is generated (via normal operational activity) during the working day and processed overnight ready for processing at around 07:00 AET on Day 2.

For the UK and the USA, the same pattern occurs but Day 1 starts later in each case. It is not until around 07:00 of Day 2 in Pacific Time that all the data for Day 1 operations becomes available for reporting. By this time a whole further working day has taken place in eastern Australia. From a global perspective some of the data in the warehouse will be nearly 2 days old...so much for overnight updates!

Tagged as: