Cases Better Data Management
Have you ever received a new subscription offer from a newspaper or anmagazine to which you already subscribed? In addition to being annoyance, sending a superfluous offer to customers increases
marketing costs. So why is this happening? The answer is probably
because of poor data management. The newspaper most likely was unable to
match its existing subscriber list, which it maintained in one place, with
another file containing its list of marketing prospects.
The Globe and Mail, based in Toronto, Canada, was one of those publications that had these problems. In print for 167 years, it is Canada’s largest
newspaper, with a cumulative six-day readership of nearly 3.3 million. The
paper has a very ambitious marketing program, viewing every Canadian
household that does not already subscribe as a prospect. But it has had trouble
housing and managing the data on these prospects.
Running a major newspaper requires managing huge amounts of data,
including circulation data, advertising revenue data, marketing prospect and
“do not contact” data, and data on logistics and deliveries. Add to that the data
required to run any business, including finance and human resources data.
For many years The Globe and Mail housed much of its data in a mainframe
system where the data were not easy to access and analyze. If users needed
any information, they had to extract the data from the mainframe and bring it
to one of a number of local databases for analysis, including those maintained
in Microsoft Access, Foxbase Pro, and Microsoft Excel. This practice created
numerous pockets of data maintained in isolated databases for specific
purposes but no central repository where the most up-to-date data could be
Better Data ManageMent Helps tHe toronto
gloBe anD Mail reacH its custoMers
accessed from a single place. With data scattered in so many different systems
throughout the company, it was very difficult to cross-reference subscribers
with prospects when developing the mailing list for a marketing campaign.
There were also security issues: The Globe and Mail collects and stores customer
payment information, and housing this confidential data in multiple places
makes it more difficult to ensure that proper data security controls are in place.
In 2002, the newspaper began addressing these problems by implementing a
SAP enterprise system with a SAP NetWeaver BW data warehouse that would
contain all of the company’s data from its various data sources in a single
location where the data could be easily accessed and analyzed by business
The first data to populate the data warehouse was advertising sales data,
which is a major source of revenue. In 2007, The Globe and Mail added circulation data to the warehouse, including delivery data details such as how much
time is left on a customer’s subscription and data on marketing prospects from
third-party sources. Data on prospects were added to the warehouse as well.
With all these data in a single place, the paper can easily match prospect and
customer data to avoid targeting existing customers with subscription promotions. It can also match the data to “do not contact” and delivery area data to
determine if a newspaper can be delivered or whether a customer should be
targeted with a promotion for a digital subscription.
Despite the obvious benefits of the new data warehouse, not all of The Globe
and Mail’s business users immediately came on board. People who were used to
extracting data from the mainframe system and manipulating it in their own
local databases or file continued to do the same thing after the data warehouse
went live. They did not understand the concept of a data warehouse or the need
to work towards enterprise-wide data management. The Globe and Mail’s management decided to tackle this new problem by educating its users, especially its
marketing professionals, with the value of having all the organization’s data in a
data warehouse and the tools available for accessing and analyzing these data.
The Globe and Mail’s new data analysis capabilities produced savings from
efficiencies and streamlined processes that paid for the investment in one year.
Marketing campaigns that previously took two weeks to complete now only
take one day. The newspaper can determine its saturation rates in a given area
to guide its marketing plans. And there are fewer complaints from subscribers
and potential subscribers about being contacted unnecessarily.
To capitalize further on data management and analytics, The Globe and Mail
turned to the cloud. A key business goal for the company was to beef up online
content and increase the paper’s digital subscriber base. The Globe and Mail
devoted more resources to digital online content, with different subscription rates
for online-only customers and print customers. To aggressively court digital subscribers, The Globe and Mail had to mine its clickstream data logging user actions
on the Web to target potential digital subscribers based not only on their specific
interests but also their interests on a particular day. The volume of data was too
large to be handled by the company’s conventional Oracle database. The solution
was to use SAP HANA ONE in-memory computing software running on the
Amazon Web Services cloud computing platform, which accelerates data analysis
and processing by storing data in the computer’s main memory (RAM) rather
than on external storage devices. This cloud solution lets The Globe and Mail pay
for only what capabilities it uses on an hourly basis.
Sources: www.theglobeandmail.com, accessed March 1, 2014; “The Globe and Mail Uses SAP
HANA in the Cloud to row Its Digital Audience,” SAP Insider Profiles, April 1, 2013; and
David Hannon, “Spread the News,” SAP Insider Profiles, October-December 2012.
216 part two Information Technology Infrastructure
The experience of The Globe and Mail illustrates the importance of data management. Business performance depends on what a firm can or
cannot do with its data. The Globe and Mail was a large and thriving business,
but both operational efficiency and management decision making were
hampered by fragmented data stored in multiple systems that were difficult to
access. How businesses store, organize, and manage their data has an enormous
impact on organizational effectiveness.
The chapter-opening diagram calls attention to important points raised by
this case and this chapter. The Globe and Mail’s business users were maintaining their own local databases because the company’s data were so difficult to
access in the newspaper’s traditional mainframe system. Marketing campaigns
took much longer than necessary because the required data took so long to
assemble. The solution was to consolidate organizational data in an enterprisewide data warehouse that provided a single source of data for reporting and
analysis. The newspaper had to reorganize its data into a standard companywide format, establish rules, responsibilities, and procedures for accessing and
using the data, provide tools for making the data accessible to users for querying and reporting, and educate its users about the benefits of the warehouse.
The data warehouse boosted efficiency by making the Globe’s data easier to
locate and assemble for reporting. The data warehouse integrated company
data from all of its disparate sources into a single comprehensive database that
could be queried directly. The data were reconciled to prevent errors such as
contacting existing subscribers with subscription offers. The solution improved
customer service while reducing costs. The Globe and Mail increased its ability
to quickly analyze vast quantities of data by using SAP HANA running on
Amazon’s cloud service.
Here are some questions to think about: What was the business impact of
The Globe and Mail’s data management problems? What work had to be done
by both business and technical staff to make sure that the data warehouse
produced the results envisioned by management?
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 217
6.1 What are the problems of managing data
resources in a traditional file
An effective information system provides users with accurate, timely, and relevant information. Accurate information is free of errors. Information is timely when it is available to decision makers when it
is needed. Information is relevant when it is useful and appropriate
for the types of work and decisions that require it.
You might be surprised to learn that many businesses don’t have timely,
accurate, or relevant information because the data in their information systems
have been poorly organized and maintained. That’s why data management is
so essential. To understand the problem, let’s look at how information systems
arrange data in computer files and traditional methods of file management.
File organization terMs anD concepts
A computer system organizes data in a hierarchy that starts with bits and
bytes and progresses to fields, records, files, and databases (see Figure 6.1).
A bit represents the smallest unit of data a computer can handle. A group
of bits, called a byte, represents a single character, which can be a letter, a
number, or another symbol. A grouping of characters into a word, a group
of words, or a complete number (such as a person’s name or age) is called a
field. A group of related fields, such as the student’s name, the course taken,
the date, and the grade, comprises a record; a group of records of the same
type is called a file.
For example, the records in Figure 6.1 could constitute a student course file.
A group of related files makes up a database. The student course file illustrated
in Figure 6.1 could be grouped with files on students’ personal histories and
financial backgrounds to create a student database.
A record describes an entity. An entity is a person, place, thing, or event on
which we store and maintain information. Each characteristic or quality describing a particular entity is called an attribute. For example, Student_ID, Course,
Date, and Grade are attributes of the entity COURSE. The specific values that
these attributes can have are found in the fields of the record describing the
proBleMs witH tHe traDitional File
In most organizations, systems tended to grow independently without a
company-wide plan. Accounting, finance, manufacturing, human resources,
and sales and marketing all developed their own systems and data files.
Figure 6.2 illustrates the traditional approach to information processing.
Each application, of course, required its own files and its own computer
program to operate. For example, the human resources functional area might
have a personnel master file, a payroll file, a medical insurance file, a pension
file, a mailing list file, and so forth until tens, perhaps hundreds, of files and
programs existed. In the company as a whole, this process led to multiple
master files created, maintained, and operated by separate divisions or departments. As this process goes on for 5 or 10 years, the organization is saddled
with hundreds of programs and applications that are very difficult to maintain
218 part two Information Technology Infrastructure
and manage. The resulting problems are data redundancy and inconsistency,
program-data dependence, inflexibility, poor data security, and an inability to
share data among applications.
Data redundancy and inconsistency
Data redundancy is the presence of duplicate data in multiple data files so
that the same data are stored in more than one place or location. Data redundancy occurs when different groups in an organization independently collect
the same piece of data and store it independently of each other. Data redundancy wastes storage resources and also leads to data inconsistency, where
the same attribute may have different values. For example, in instances of
the entity COURSE illustrated in Figure 6.1, the Date may be updated in
some systems but not in others. The same attribute, Student_ID, may also
have different names in different systems throughout the organization. Some
systems might use Student_ID and others might use ID, for example.
Additional confusion might result from using different coding systems
to represent values for an attribute. For instance, the sales, inventory, and
Figure 6.1 tHe Data HierarcHy
A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0
or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be
grouped to form a field, and related fields can be grouped to form a record. Related records can be
collected to form a file, and related files can be organized into a database.
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 219
manufacturing systems of a clothing retailer might use different codes to
represent clothing size. One system might represent clothing size as “extra
large,” whereas another might use the code “XL” for the same purpose. The
resulting confusion would make it difficult for companies to create customer
relationship management, supply chain management, or enterprise systems
that integrate data from different sources.
Program-data dependence refers to the coupling of data stored in files and the
specific programs required to update and maintain those files such that changes
in programs require changes to the data. Every traditional computer program
has to describe the location and nature of the data with which it works. In a
traditional file environment, any change in a software program could require a
change in the data accessed by that program. One program might be modified
from a five-digit to a nine-digit zip code. If the original data file were changed
from five-digit to nine-digit zip codes, then other programs that required the
five-digit zip code would no longer work properly. Such changes could cost
millions of dollars to implement properly.
lack of Flexibility
A traditional file system can deliver routine scheduled reports after extensive
programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion. The information required
by ad hoc requests is somewhere in the system but may be too expensive to
Figure 6.2 traDitional File processing
The use of a traditional approach to file processing encourages each functional area in a corporation
to develop specialized applications. Each application requires a unique data file that is likely to be a
subset of the master file. These subsets of the master file lead to data redundancy and inconsistency,
processing inflexibility, and wasted storage resources.
220 part two Information Technology Infrastructure
retrieve. Several programmers might have to work for weeks to put together the
required data items in a new file.
Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of
knowing who is accessing or even making changes to the organization’s data.
lack of Data sharing and availability
Because pieces of information in different files and different parts of the
organization cannot be related to one another, it is virtually impossible for
information to be shared or accessed in a timely manner. Information cannot
flow freely across different functional areas or different parts of the organization. If users find different values of the same piece of information in two
different systems, they may not want to use these systems because they cannot
trust the accuracy of their data.
6.2 What are the major capabilities of
database management systems (dbms) and
Why is a relational dbms so poWerful?
Database technology cuts through many of the problems of traditional file
organization. A more rigorous definition of a database is a collection of data
organized to serve many applications efficiently by centralizing the data and
controlling redundant data. Rather than storing data in separate files for each
application, data appears to users as being stored in only one location. A single
database services multiple applications. For example, instead of a corporation
storing employee data in separate information systems and separate files for
personnel, payroll, and benefits, the corporation could create a single common
human resources database.
DataBase ManageMent systeMs
A database management system (DBMS) is software that permits an
organization to centralize data, manage them efficiently, and provide access
to the stored data by application programs. The DBMS acts as an interface
between application programs and the physical data files. When the application program calls for a data item, such as gross pay, the DBMS finds this item
in the database and presents it to the application program. Using traditional
data files, the programmer would have to specify the size and format of each
data element used in the program and then tell the computer where they