Data Management, Analytics, and Business Intelligence 3 one page

 
 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Consider your organization or another organization with which you  are familiar. Briefly describe the organization, and then answer the  following questions:

  • What do you think is one of the most interesting uses of Business Intelligence (BI) technology by this organization? Why?
  • Who benefits from this use and in what ways?
  • Are there any downsides?
  • Are there any additional BI that you think this organization should consider? Explain.
  • As a business manager, how can you effectively utilize BI to better do your job?

Embed course material concepts, principles, and theories, which require supporting citations along with at least two scholarly peer reviewed references supporting your answer.  Keep in mind that these scholarly references can be found in the Saudi  Digital Library by conducting an advanced search specific to scholarly  references.

Use standards and APA style guidelines.

You are required to reply to at least two peer discussion question  post answers to this weekly discussion question and/or your instructor’s  response to your posting. These post replies need to be substantial and  constructive in nature. They should add to the content of the post and  evaluate/analyze that post answer. Normal course dialogue doesn’t  fulfill these two peer replies but is expected throughout the course.  Answering all course questions is also required.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

one page 

 

Required

  • Chapter 3 in Information Technology for Management: On-Demand Strategies for Performance, Growth, and Sustainability
  • Eggert, M., & Alberts, J. (2020). Frontiers of business intelligence and analytics 3.0: a taxonomy-based literature review and research agenda. Business Research, 13(2), 685–739. https://doi.org/10.1007/s40685-020-00108-y
  • Al-Barnawi, A., He, Y., Maglaras, L. A., & Janicke, H. (2019). Electronic medical records and risk management in hospitals of Saudi Arabia. Informatics for Health & Social Care, 44(2), 189–203.https://doi.org/10.1080/17538157.2018.1434181

IT for Management: On-Demand Strategies for Performance, Growth, and Sustainability

Eleventh Edition

Turban, Pollard, Wood

Chapter 3

Data Management, Business Intelligence, and Data Analytics

Learning Objectives (1 of 5)
2
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Data management is the practice of securely, efficiently, and cost-effectively.
Collecting
Keeping
Using data
The goal of data management is
To help people, organizations, and connected things
Optimize the use of data
Within the bounds of policy and regulation
So that they can make decisions and take actions that maximize the benefit to the organization
3
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
The work of data management has a wide scope, covering factors such as how to:
Create, access, and update data across a diverse data tier
Store data across multiple clouds and on premises
Provide high availability and disaster recovery
Use data in a growing variety of apps, analytics, and algorithms
Ensure data privacy and security
Archive and destroy data in accordance with retention schedules and compliance requirements
4
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
A formal data management strategy addresses
The activity of users and administrators
The capabilities of data management technologies
The demands of regulatory requirements
The needs of the organization to obtain value from its data
5
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Today data is a kind of capital.
It is an economic factor of production in digital goods and services
Just as an automaker can’t manufacture a new model if it lacks the necessary financial capital
It can’t make its cars autonomous if it lacks the data to feed the onboard algorithms
This new role for data has implications for competitive strategy as well as for the future of computing
Strong management practices and a robust management system are essential for every organization
Regardless of size or type
6
Copyright ©2018 John Wiley & Sons, Inc.

7
Copyright ©2018 John Wiley & Sons, Inc.

8
Copyright ©2018 John Wiley & Sons, Inc.

9
Copyright ©2018 John Wiley & Sons, Inc.

10
Copyright ©2018 John Wiley & Sons, Inc.

11
Copyright ©2018 John Wiley & Sons, Inc.

12
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
What is Data?
The quantities, characters, or symbols on which operations are performed by a computer
Which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media
What is Big Data?
Big Data is a collection of data that is huge in volume
Yet growing exponentially with time
It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently
13
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
14
Copyright ©2018 John Wiley & Sons, Inc.
Big Data In 5 Minutes | What Is Big Data?| Introduction To Big Data |Big Data Explained

Data Management
Bytes(8 Bits)
0.1 bytes: A binary decision
1 byte: A single character
10 bytes: A single word
100 bytes: A telegram OR A punched card
15
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Kilobyte (1000 Bytes)
1 Kilobyte: A very short story
2 Kilobytes: A Typewritten page
10 Kilobytes: An encyclopedic page OR A deck of punched cards
50 Kilobytes: A compressed document image page
100 Kilobytes: A low-resolution photograph
200 Kilobytes: A box of punched cards
500 Kilobytes: A very heavy box of punched cards
16
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Megabyte (1 000 000 Bytes)
1 Megabyte: A small novel OR A 3.5 inch floppy disk
2 Megabytes: A high resolution photograph
5 Megabytes: The complete works of Shakespeare OR 30 seconds of TV-quality video
10 Megabytes: A minute of high-fidelity sound OR A digital chest X-ray
20 Megabytes: A box of floppy disks
50 Megabytes: A digital mammogram
100 Megabytes: 1 meter of shelved books OR A two-volume encyclopedic book
200 Megabytes: A reel of 9-track tape OR An IBM 3480 cartridge tape
500 Megabytes: A CD-ROM OR The hard disk of a PC
17
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Gigabyte (1 000 000 000 Bytes)
1 Gigabyte: A pickup truck filled with paper OR A symphony in high-fidelity sound OR A movie at TV quality
2 Gigabytes: 20 meters of shelved books OR A stack of 9-track tapes
5 Gigabytes: An 8mm Exabyte tape
10 Gigabytes:
20 Gigabytes: A good collection of the works of Beethoven OR 5 Exabyte tapes OR A VHS tape used for digital data
50 Gigabytes: A floor of books OR Hundreds of 9-track tapes
100 Gigabytes: A floor of academic journals OR A large ID-1 digital tape
200 Gigabytes: 50 Exabyte tapes
18
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Terabyte (1 000 000 000 000 Bytes)
1 Terabyte: An automated tape robot OR All the X-ray films in a large technological hospital OR 50000 trees made into paper and printed OR Daily rate of EOS data (1998)
2 Terabytes: An academic research library OR A cabinet full of Exabyte tapes
10 Terabytes: The printed collection of the US Library of Congress
50 Terabytes: The contents of a large Mass Storage System
19
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Petabyte (1 000 000 000 000 000 Bytes)
1 Petabyte: 5 years of EOS data (at 46 mbps)
2 Petabytes: All US academic research libraries
20 Petabytes: Production of hard-disk drives in 1995
200 Petabytes: All printed material OR
Production of digital magnetic tape in 1995
20
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Exabyte (1 000 000 000 000 000 000 Bytes)
5 Exabytes: All words ever spoken by human beings
Zettabyte (1 000 000 000 000 000 000 000 Bytes)
Yottabyte (1 000 000 000 000 000 000 000 000 Bytes)
Xenottabyte (1 000 000 000 000 000 000 000 000 000 Bytes)
Shilentnobyte (1 000 000 000 000 000 000 000 000 000 000 Bytes)
Domegemegrottebyte (1 000 000 000 000 000 000 000 000 000 000 000 Bytes)
21
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Examples Of Big Data
Following are some of the Big Data examples-
The New York Stock Exchange generates about one terabyte of new trade data per day.
22
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day
This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
23
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
24
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Data Growth over the years
25
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
There are three types of data
Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data
Computer science has achieved greater success in developing techniques for working with such kind of data and also deriving value out of it
However, There are problems when the size of such data grows to a huge extent
Typical sizes are being in the rage of multiple zettabytes
Data stored in a relational database management system is one example of a ‘structured’ data
26
Copyright ©2018 John Wiley & Sons, Inc.

Data Management
Unstructured
Any data with unknown form or the structure is classified as unstructured data
In addition to the size being huge
Un-structured data poses multiple challenges in terms of its processing for deriving value out of it
A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.
Organizations have wealth of data available
But they don’t know how to derive value out of it
This data is in its raw form or unstructured format
27
Copyright ©2018 John Wiley & Sons, Inc.

28
Copyright ©2018 John Wiley & Sons, Inc.
Examples Of Un-structured Data
Data Management

Data Management
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS
Example of semi-structured data is a data represented in an XML file.
Examples Of Semi-structured Data
29
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: Databases
Collections of data sets or records stored in a systematic way
Stores data generated by business apps, sensors, operations, and transaction-processing systems (TPS)
The data in databases are extremely volatile
Medium and large enterprises typically have many databases of various types

30
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: Databases
Data in databases are volatile because they can be updated millions of times every second
Especially if they are transaction processing systems (TPS)
The data is constantly changing over time
Transactions, updates, deletions, and database maintenance all contribute to data volatility
31
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: Databases
32
Copyright ©2018 John Wiley & Sons, Inc.
Lessons Basic Data Models of Database

Database Technologies: Data Warehouses
Integrate data from multiple databases and data silos,
Organize them for complex analysis, knowledge discovery, and to support decision making
May require formatting processing and/or standardization
Loaded at specific times making them non-volatile and ready for analysis

33
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: Data Marts
Small-scale data warehouses that support a single function or one department
Enterprises that cannot afford to invest in data warehousing may start with one or more data marts

34
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Business Intelligence (BI)
Tools and techniques that process data and conduct statistical analysis for insight and discovery
Used to discover meaningful relationships in the data, keep informed of real time, detect trends, and identify opportunities and risks
35
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
36
Copyright ©2018 John Wiley & Sons, Inc.
What is Business Intelligence

Database Technologies: BI
Business Intelligence (BI)
Business intelligence is the process by which enterprises use strategies and technologies for analyzing current and historical data
With the objective of improving strategic decision-making and providing a competitive advantage
37
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Business Intelligence (BI)
Business intelligence systems combine
Data gathering
Data storage
Knowledge management
Data analysis
To evaluate and transform complex data into meaningful, actionable information
Which can be used to support more effective strategic, tactical, and operational insights and decision-making
38
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Business Intelligence (BI)
Business intelligence environments consist of a variety of
Technologies
Applications
Processes
Strategies
Products
Technical architectures
To enable the collection, analysis, presentation, and dissemination of internal and external business information
39
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Business intelligence technologies use
Advanced statistics and predictive analytics
To help businesses draw conclusions from data analysis
Discover patterns
Forecast future events in business operations
Business intelligence reporting is not a linear practice
It is a continuous, multifaceted cycle of data access, exploration, and information sharing
Common business intelligence functions include:
40
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Data mining:
Sorting through large datasets using databases, statistics, and machine learning to identify trends and establish relationships
Querying:
A request for specific data or information from a database
Data preparation:
The process of combining and structuring data in order to prepare it for analysis
Reporting:
Sharing operating and financial data analysis with decision-makers so they can draw conclusions and make decisions
41
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: BI
Benchmarking:
Comparing current business processes and performance metrics to historical data to track performance against industry bests
Descriptive analytics:
The interpretation of historical data to draw comparisons and better understand changes that have occurred in a business
Statistical analysis:
Collecting the results from descriptive analytics and applying statistics in order to identify trends
Data visualization:
Provides visual representations such as charts and graphs for easy data analysis
42
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
43
Copyright ©2018 John Wiley & Sons, Inc.
Lessons Basic Data Models of Database

Database Management Systems (DBMS)
Integrate with data collection systems such as TPS and business applications
Transaction Processing Systems (TPS) process the company’s business transactions and thus support the operations of an enterprise
A TPS records a non-inquiry transaction itself, as well as all of its effects, in the database and produces documents relating to the transaction
Organized way to store, access, and manage data
44
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
Stores data in tables consisting of columns and rows, similar to the format of a spreadsheet
Standard database model adopted by most enterprises
DBMS Functions provide an accurate and consistent view of data throughout the enterprise
It enables the organization to make informed, actionable decisions that support the business strategy
45
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
Functions performed by a DBMS to help create such a view are:
Data filtering and profiling:
Process and store data efficiently
Inspect the data for errors, inconsistencies, redundancies, and incomplete information
Data integrity and maintenance:
Correct, standardize, and verify the consistency and integrity of the data.
Data synchronization:
Integrate, match, or link data from disparate sources
46
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
Functions performed by a DBMS to help create such a view are:
Data security:
Check and control data integrity over time.
Data access:
Provide authorized access to data in both planned and ad hoc ways within acceptable time
Today’s computing hardware is capable of crunching through huge datasets that were impossible to manage a few years back
Making them available on-demand via wired or wireless networks
47
Copyright ©2018 John Wiley & Sons, Inc.

Database Technologies: SQL
Relational Database Management Systems (DBMS)
Provides access to data using a declarative language
Declarative language
Simplifies data access by requiring that users only specify what data they want to access without defining how they will be achieved
Structured Query Language (SQL) is an example of declarative language:
SELECT column_name(s)
FROM table_name
WHERE condition
48
Copyright ©2018 John Wiley & Sons, Inc.

Relational Database
A relational database is a type of database that stores and provides access to data points that are related to one another
Relational databases are based on the relational model
An intuitive, straightforward way of representing data in tables
In a relational database, each row in the table is a record with a unique ID called the key
The columns of the table hold attributes of the data
Each record usually has a value for each attribute
Making it easy to establish the relationships among data points
49
Copyright ©2018 John Wiley & Sons, Inc.

Relational Database
50
Copyright ©2018 John Wiley & Sons, Inc.
Relational Database Concepts

Relational Database
51
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
Online Transaction Processing and Online Analytics Processing
Online Transaction Processing (OLTP)
Designed to manage transaction data, which are volatile
Break down complex information into simpler data tables
Strike a balance between transaction-processing efficiency and query efficiency
Cannot be optimized for data mining
52
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
When most business transactions occur
For example when an item is sold or returned
An order is sent or cancelled
A payment or deposit is made
Changes are made immediately to the database
These online changes are additions, updates, or deletions
The database management systems (DBMSs) record and process such transactions in the database
Support queries and reporting
53
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
DBMSs are referred to as online transaction-processing (OLTP) systems
OLTP is a database design that breaks down complex information into simpler data tables
It strikes a balance between transaction-processing efficiency and query efficiency
OLTP databases process millions of transactions per second
54
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
Online Transaction Processing and Online Analytics Processing
Online Analytics Processing (OLAP)
A means of organizing large business databases
Divided into one or more cubes that fit the way business is conducted
Databases cannot be optimized for
Data mining
Complex online analytics-processing (OLAP) systems
Decision support
These limitations led to the introduction of data warehouse technology
55
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
Data warehouses and data marts are optimized for
OLAP
Data mining
BI
Decision support
OLAP is a term used to describe the analysis of complex data from the data warehouse
Databases are optimized for extremely fast transaction processing and query processing
Data warehouses are optimized for analysis
56
Copyright ©2018 John Wiley & Sons, Inc.

OLTP and OLAP Systems
57
Copyright ©2018 John Wiley & Sons, Inc.
SQL Tutorial: OLTP and OLAP

Database Technologies: NOSQL
Trend toward NoSQL Systems
Higher performance
Easy distribution of data on different nodes
Enables scalability and fault tolerance
Greater flexibility
Simpler administration
58
Copyright ©2018 John Wiley & Sons, Inc.

Popular DBMS
DBMSs (mid-2016)
Oracle’s 12C Database
Microsoft’s SQL Server
IBM’s DB2
SAP Sybase Ase
PostgreSQL
59
Copyright ©2018 John Wiley & Sons, Inc.

Learning Objectives (2 of 5)
60
Copyright ©2018 John Wiley & Sons, Inc.

Centralized and Distributed Database Architecture
Centralized Database Architecture
Better control of data quality
Better IT security
Distributed Database Architecture
Allow both local and remote access
Use client/server architecture to process requests

61
Copyright ©2018 John Wiley & Sons, Inc.

Centralized and Distributed Database Architecture
A centralized database stores all data in a single central computer such as a mainframe or server
A distributed database stores portions of the database on multiple computers within a network.
For decades the main database platform consisted of centralized database files on massive mainframe computers
62
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
63
Copyright ©2018 John Wiley & Sons, Inc.

Centralized and Distributed Database Architecture
Benefits of centralized database configurations include:
Better control of data quality
Data consistency is easier when data are kept in one physical location
Data additions, updates, and deletions can be made in a supervised and orderly fashion
Better IT security
Data are accessed via the centralized host computer
They can be protected more easily from unauthorized access or modification
64
Copyright ©2018 John Wiley & Sons, Inc.

Centralized and Distributed Database Architecture
Disadvantage of centralized databases
Like all centralized systems
Transmission delay occurs when users are geographically dispersed
More powerful hardware and networks can compensate for this disadvantage
65
Copyright ©2018 John Wiley & Sons, Inc.

Centralized and Distributed Database Architecture
Distributed databases use client/server architecture to process information requests
The databases are stored on servers that reside
In the company’s data centers
A private cloud
A public cloud
66
Copyright ©2018 John Wiley & Sons, Inc.

Database Management Systems (DBMS)
Advantages of a distributed database include
Reliability
if one site crashes, the system will keep running –
Speed
It’s faster to search a part of a database than the whole
Disadvantages of a distributed database include
If there’s a problem with the network that the distributed database is using
It can cause availability issues
Appropriate hardware and software can be expensive to purchase
67
Copyright ©2018 John Wiley & Sons, Inc.

Dirty Data
Garbage In, Garbage Out
Dirty Data
Lacks integrity/validation and reduces user trust
Incomplete, out of context, outdated, inaccurate, inaccessible, or overwhelming

68
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

69
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

70
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

71
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

72
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

73
Copyright ©2018 John Wiley & Sons, Inc.

Characteristics of Poor Quality or Dirty Data
Characteristic Description
Incomplete Missing data
Outdated or Invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data—and the versions are inconsistent or in conflict with each other
Non-standardized Data are stored in incompatible formats—and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

74
Copyright ©2018 John Wiley & Sons, Inc.

The consequences of not cleaning ‘dirty data’
Each dirty data point, or record, has a financial impact if not resolved
The costs of poor-quality data can spread throughout a company
It can affect systems from shipping and receiving to accounting and customer service
Data errors typically arise from the functions or departments that generate or create the data–and not within the IT department
When all costs are considered
The value of finding and fixing the causes of data errors becomes clear
75
Copyright ©2018 John Wiley & Sons, Inc.

The consequences of not cleaning ‘dirty data’
In a time of decreased budgets, some organizations may not have the resources for such projects
They may not even be aware of the problem
Others may be spending most of their time fixing problems
Leaving them with no time to work on preventing them
The benefits of acting preventatively against dirty data are considerable
While the short run cost of cleaning and preventing data is unrealistic for some companies
The long term conclusion is far more expensive
76
Copyright ©2018 John Wiley & Sons, Inc.

The consequences of not cleaning ‘dirty data’
Bad data are costing U.S. businesses hundreds of billions of dollars a year
It is has an impact on a corporation’s ability to ride out tough economic climates
Incorrect and outdated values
Missing data
Inconsistent data formats
Can all cause
Lost customers
Lost sales
Reduced revenue
Misallocation of resources
Flawed pricing strategies
77
Copyright ©2018 John Wiley & Sons, Inc.

Data Life Cycle and Data Principles (1 of 2)
Principle of Diminishing Data Value
The value of data diminishes as they age
Blind spots (lack of data availability) of 30 days or longer inhibit peak performance
Global financial services institutions rely on near-real-time data for peak performance
Principle of 90/90 Data Use
As high as 90 percent, is seldom accessed after 90 days (except for auditing purposes)
Roughly 90 percent of data lose most of their value after 3 months

78
Copyright ©2018 John Wiley & Sons, Inc.

Data Life Cycle and Data Principles (2 of 2)
Principle of data in context
The capability to capture, process, format, and distribute data in near real time or faster
Requires a huge investment in data architecture
The investment can be justified on the principle that data must be integrated, processed, analyzed, and formatted in “actionable information”

79
Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.11 Data life cycle
80
Copyright ©2018 John Wiley & Sons, Inc.

Master Data Management (MDM)
There are a number of definitions of exactly what MDM means
There are however some constant themes
MDM is focused on Master or Reference Data
Certain dimensions of data quality are critical to enabling effective MDM
Timeliness, accuracy, completeness, meaning
Continuous data improvement and a well-managed data quality strategy are essential
81
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
81

Master Data Management (MDM)
Every organization typically has data on
Customers
Products
Employees
Physical assets
These data items are seldom held in one location
They are often physically scattered around the business
In various applications
Spreadsheets
Physical media such as paper and reports

82
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
82

Master Data Management (MDM)
Different parts of the business will have different concepts and definitions for the same business entity and relationship
By way of example
An employee may be recorded in a payroll, HR, training and expense management system of an employer
But back in the real world they are the same person.
Typical examples of master data include
Customers, Employees, Vendors, Suppliers, Parts, Products, Locations, Contact Mechanisms, Profiles, Accounting Items, Contracts, Policies.
83
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
83

Master Data Management (MDM)
External data is a typical form of reference data
Whereas standard business objects such as customer, employee, parts and so on are classed as master data
When building MDM strategies external data becomes incredibly important for creating a surrogate source of “truth“
84
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
84

Master Data Management (MDM)
Master data management (MDM) processes
Integrate data from various sources or enterprise applications
To create a more complete (unified) view of a customer, product, or other entity
MDM consolidates data from various data sources into a master reference file
This feeds data back to the applications
Creating accurate and consistent data across the enterprise
85
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
85

Master Data Management (MDM)
Some vendors claim their MDM solution creates
“a single version of the truth,”
Sadly this is probably not true
In reality, MDM cannot create a single unified version of the data
Constructing a completely unified view of all master data is simply not possible
86
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
86

Master Data Management (MDM)
An MDM includes tools for cleaning and auditing the master data elements
Plus tools for integrating and synchronizing data to make them more accessible
MDM offers a solution for managers who are frustrated with how fragmented and dispersed their data sources are
87
Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. The data life cycle is a model that illustrates the way data travel through an organization. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps.
2. Master data management (MDM) is a process whereby companies integrate data from various sources or enterprise applications to provide a more complete or unified view of an entity (customer, product, etc.) Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim probably is not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data simply is not possible. Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise.
 
3. Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
4. A centralized database stores all data in a single central compute such as a mainframe or server.
A distributed database stores portions of the database on multiple computers within a network.
5. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of
various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality
87

Figure 3.12 An enterprise has transactional, master, and analytical data.
88
Copyright ©2018 John Wiley & Sons, Inc.

Learning Objectives (3 of 5)
89
Copyright ©2018 John Wiley & Sons, Inc.

Databases & Data Warehouses
Databases typically store data to support
Different functions
Information systems
Reporting
Analysis
Each information system in an organization may have multiple databases that are accumulating data
They have specific uses and requirements
90
Copyright ©2018 John Wiley & Sons, Inc.

Databases & Data Warehouses
A data warehouse is a decision support system which stores historical data from across the organization
Processes it
Enabling the data to be used for critical business analysis, reports and dashboards
A data warehouse system stores data from numerous sources
These are typically structured
Online Transaction Processing (OLTP) data such as invoices and financial transactions
Enterprise Resource Planning (ERP) data
Customer Relationship Management (CRM) data
The data warehouse focuses on data relevant for business analysis, organizes and optimizes it to enable efficient analysis
91
Copyright ©2018 John Wiley & Sons, Inc.

Databases & Data Warehouses
Data warehouses that pull together data from disparate sources and databases across an entire enterprise
Are called enterprise data warehouses (EDW)
Data warehouses store data from various source systems and databases across an enterprise
To run analytical queries against huge datasets collected over long time periods
92
Copyright ©2018 John Wiley & Sons, Inc.

Databases & Data Warehouses
93
Copyright ©2018 John Wiley & Sons, Inc.
Database VS Data Warehouse

Data Warehouses: Enterprise data warehouses (EDW)
Data warehouses that pull together data from disparate sources and databases across an entire enterprise
Warehouses are the primary source of cleansed data for analysis, reporting, and Business Intelligence (BI)
Their high costs can be subsidized by using data marts
Data warehouses are the primary source of cleansed data for analysis, reporting, and Business Intelligence (BI)
Often the data are summarized in ways that enable quick responses to queries
For instance, query results can reveal changes in customer behavior and drive the decision to redevelop the advertising strategy

94
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses: Enterprise data warehouses (EDW)
Data marts are lower-cost, scaled-down versions of a data warehouse
They can be implemented in a much shorter time
Data marts serve a specific department or function
Such as finance, marketing, or operations
Since they store smaller amounts of data
They are faster, easier to use, and navigate.
The high cost of data warehouses can make them too expensive for a company to implement
Making data marts a better option.

95
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Data lakes and data warehouses are both widely used for storing big data
However they are not interchangeable terms
A data lake is a vast pool of raw data
The purpose for which is not yet defined
A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.
96
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
The two types of data storage are often confused
But the only real similarity between them is their high-level purpose of storing data.
The distinction is important because they serve different purposes
While a data lake works for one company
A data warehouse will be a better fit for another

97
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
98
Copyright ©2018 John Wiley & Sons, Inc.
Data Lake VS Data Warehouse

Data Warehouses & Data Lakes
Four key differences between a data lake and a data warehouse
Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators
99
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Four key differences between a data lake and a data warehouse
Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators
100
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Four key differences between a data lake and a data warehouse
Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators
101
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Four key differences between a data lake and a data warehouse
Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators
102
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Four key differences between a data lake and a data warehouse
Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators
103
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Data structure: raw vs. processed
Raw data is data that has not yet been processed for a purpose
Data lakes primarily store raw, unprocessed data, while data warehouses store processed and refined data.
Because of this, data lakes typically require much larger storage capacity than data warehouses
104
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Additionally, raw, unprocessed data is malleable
Can be quickly analyzed for any purpose, and is ideal for machine learning
The risk of all that raw data, however, is that
data lakes sometimes become data swamps 
Without appropriate data quality and data governance measures in place
By storing only processed data, data warehouses
Save on pricey storage space
Do not maintain data that may never be used
Additionally, processed data can be easily understood by a larger audience
105
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Purpose: undetermined vs in-use
The purpose of individual data pieces in a data lake is not fixed
Raw data flows into a data lake
Sometimes with a specific future use in mind
Sometimes just to have on hand
This means that data lakes have less organization and less filtration of data than their counterpart.
106
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Processed data is raw data that has been put to a specific use
Since data warehouses only house processed data
All of the data in a data warehouse has been used for a specific purpose within the organization
This means that storage space is not wasted on data that may never be used
107
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Users: data scientists vs business professionals
Data lakes are often difficult to navigate by those unfamiliar with unprocessed data
Raw, unstructured data usually requires a data scientist and specialized tools to understand and translate it for any specific business use
However, there is growing momentum behind data preparation tools
That create self-service access to the information stored in data lakes
108
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Processed data is used in charts, spreadsheets, tables, etc.
Most, if not all, of the employees at a company can read it
Processed data stored in data warehouses
Only requires that the user be familiar with the topic represented
109
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Accessibility: flexible vs secure
Accessibility and ease of use refers to the use of data repository as a whole
Not the data within them
Data lake architecture has no structure and is therefore easy to access and easy to change
Any changes that are made to the data can be done quickly since data lakes have very few limitations.
110
Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses & Data Lakes
Data warehouses are, by design, more structured
A benefit of data warehouse architecture is that the processing and structure of data
Makes the data itself easier to decipher
However, the limitations of structure make data warehouses difficult and costly to manipulate
111
Copyright ©2018 John Wiley & Sons, Inc.

Data Preparation: Procedures to Prepare EDW Data for Analytics
Extract from designated databases
Transform by standardizing formats, cleaning the data, integration
Loading into a data warehouse

112
Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.15 Database, data warehouses and marts, and BI architecture.
113
Copyright ©2018 John Wiley & Sons, Inc.

Data Preparation: Procedures to Prepare EDW Data for AnalyticsExtract, transform and load (ETL) is a type if data integration It refers a three-step process of extracting data from databases Converting it into a format that can be analyzed Then storing it into a data warehouse The three procedures of ETL are:1. Extracted from designated databases.2. Transformed by standardizing formats, cleaning the data, integrating them.3. Loaded into a data warehouse.
114Copyright ©2018 John Wiley & Sons, Inc.

Data Preparation: Procedures to Prepare EDW Data for AnalyticsChange data capture (CDC) captures the changes made at data sources Then applies those changes throughout enterprise data stores to keep data synchronizedCDC minimizes the resources required for ETL processes by only dealing with data changes
115Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouses: ADWActive Data Warehouse (ADW)Real-time data warehousing and analyticsTransform by standardizing formats, cleaning the data, integrationThey provideInteraction with a customer to provide superior customer serviceRespond to business events in near real timeShare up-to-date status data among merchants, vendors, and associates 116Copyright ©2018 John Wiley & Sons, Inc.

Data Warehouse Processing: Hadoop and MapReduceHadoop is an Apache processing platform that places no conditions on the processed data structureMapReduce provides a reliable, fault-tolerant software framework to write applications easily that process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity softwareMap stage: breaks up huge data into subsetsReduce stage: recombines partial results 117Copyright ©2018 John Wiley & Sons, Inc.

Learning Objectives (4 of 5)118Copyright ©2018 John Wiley & Sons, Inc.

Data Analytics and Data Discovery DefinedData Analytics This is a technique of qualitatively or quantitatively analyzing a data set This is designed to reveal patters, trends, and associations that often relate to human behavior and interactionTo enhance productivity and business gainBig data This refers to an extremely large data set that is too large or complex to be analyzed using traditional data processing techniques119Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementToday’s organizations need a data management solutions That provides an efficient way to manage data across a diverse but unified data tierData management systems are built on data management platforms and can include Databases Data lakes and warehousesBig data management systemsData analytics, etc.120Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementAll these components work together as a “data utility” To deliver the data management capabilities for appsThe analytics and algorithms that use the data originated by those appsAlthough current tools help database administrators (DBAs) automate many of the traditional management tasksManual intervention is still often required because of the size and complexity of most database deployments 121Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementWhenever manual intervention is requireThe chance for errors increasesReducing the need for manual data management is a key objective of a new data management technology, The autonomous database
122Copyright ©2018 John Wiley & Sons, Inc.

Four V’s of Data AnalyticsVariety: The analytic environment has expanded from pulling data from enterprise systems to include big data and unstructured sources. Volume: Large volumes of structured and unstructured data are analyzed. Velocity: Speed of access to reports that are drawn from data defines the difference between effective and ineffective analytics. Veracity: Validating data and extracting insight that manager and workers can trust are key factors successful analytics. Trust in analytics. Trust analytics has grown more difficult with the explosion of data sources. 123Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers
1. Databases are:Designed and optimized to ensure that every transaction gets recorded and stored immediately.Volatile because data are constantly being updated, added, or edited.OLTP systems.Medium and large enterprises typically have many databases of various types.Data warehouses are:Designed and optimized for analysis and quick response to queries.Nonvolatile. This stability is important to being able to analyze the data and make comparisons. When data are stored, they might never be changed or deleted in order to do trend analysis or make comparisons with newer data.OLAP systems.Subject-oriented, which means that the data captured are organized to have similar data linked together.Data warehouses integrate data collected over long time periods from various source systems, including multiple databases and data silos.
2. Data marts are lower-cost, scaled-down versions of a data warehouse that can be implemented in a much shorter time, for example, in less than 90 days. Data marts serve a specific department or function, such as finance, marketing, or operations. Since they store smaller amounts of data, they are faster, easier to use, and navigate.  3. ETL refers to three procedures – Extract, Transform, and Load – used in moving data from databases to a data warehouse. Data are extracted from designated databases, transformed by standardizing formats, cleaning the data, integrating them, and loaded into a data warehouse.
4. CDC, the acronym for Change Data Capture, refers to processes which capture the changes made at data sources and then apply those changes throughout enterprise data stores to keep data synchronized. CDC minimizes the resources required for ETL processes by only dealing with data changes. 5. An ADW provides real-time data warehousing and analytics, not for executive strategic decision making, but rather to support operations. Some advantages for a company of using an ADW might be interacting with a customer to provide superior customer service, responding to business events in near real time, or sharing up-to-date status data among merchants, vendors, customers, and associates. 6. The high cost of data warehouses can make them too expensive for a company to implement. Data marts are lower-cost, scaled-down versions that can be implemented in a much shorter time, for example, in less than 90 days. Data marts serve a specific department or function, such as finance, marketing, or operations. Since they store smaller amounts of data, they are faster, and easier to use and navigate. 7. Machine-generated sensor data are becoming a larger proportion of big data (Figure 3.16). Analyzing them can lead to optimizing cost savings and productivity gains. Manufacturers can track the condition of operating machinery and predict the probability of failure, as well as track wear and determine when preventive maintenance is needed.Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement, in addition to reducing operational expenses. 8. Apache Hadoop is a widely used processing platform which places no conditions on the structure of the data it can process.  Hadoop implements MapReduce in two stages:Map stage: MapReduce breaks up the huge dataset into smaller subsets; then distributes the subsets among multiple servers where they are partially processed.Reduce stage: The partial results from the map stage are then recombined and made available for analytic tools
123

124Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementCharacteristics Of Big DataVolume The name Big Data itself is related to a size which is enormousSize of data plays a very crucial role in determining value out of dataAlso, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data125Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementCharacteristics Of Big DataVarietyVariety refers to heterogeneous sources and the nature of data, both structured and unstructuredDuring earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applicationsThis variety of unstructured data poses certain issues for storage, mining and analyzing data.126Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementCharacteristics Of Big DataVelocity The term ‘velocity’ refers to the speed of generation of dataHow fast the data is generated and processed to meet the demands, determines real potential in the data.Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous
127Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementCharacteristics Of Big DataVeracity This refers to inconsistencies and uncertainty in dataData which is available can sometimes get messy and quality and accuracy are difficult to control.Big Data is also variable because of the multitude of data dimensions resulting from multiple disparate data types and sources.Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete Information.128Copyright ©2018 John Wiley & Sons, Inc.

Data ManagementCharacteristics Of Big DataValue The bulk of Data having no Value is of no good to the company, unless you turn it into something useful.Data in itself is of no use or importance but it needs to be converted into something valuable to extract InformationWith the help of advanced data analytics, useful insights can be derived from the collected data These insights, in turn, are what add value to the decision-making129Copyright ©2018 John Wiley & Sons, Inc.

130Copyright ©2018 John Wiley & Sons, Inc.

Data Analytics: Human Expertise is NeededTo interpret the output of analytics, Big Data Specialists and Business Intelligence Analysts perform many tasksData preparation for analysis through data cleansing techniques, to eliminate duplicates or incomplete dataDirty data degrade the value of analyticsData must be put into meaningful context 131Copyright ©2018 John Wiley & Sons, Inc.

Data Discovery: Data and Text MiningCreating Business ValueData Mining: Software that enables users to analyze data from various dimension or anglesCategorize them, and find correlative patterns among fields in the data warehouseText Mining: Broad category involving interpreted words and concepts in contextSentiment Analysis: Trying to understand consumer intent
132Copyright ©2018 John Wiley & Sons, Inc.

Data Analytics and Data Discovery Why are human expertise and judgment important to data analytics? Human expertise and judgment are needed to interpret the output of analyticsData are worthless if you cannot analyze, interpret, understand, and apply the results in context
133Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
133

Data Analytics and Data Discovery Human expertise is necessary because Analytics alone cannot explain the reasons for trends or relationshipsKnow what action to takeProvide sufficient context to determine what the numbers represent and how to interpret themData analytics, human expertise and high-quality data, are needed to obtain actionable information
134Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
134

Data Analytics and Data Discovery What is the relationship between data quality and the value of analytics? Dirty data degrades the value of analyticsThe “cleanliness” of data is very important to data mining and analysis projectsAnalysts have complained they spend so much time on manual, error-prone processes to clean the data Large data volumes and variety mean more data that are dirty and harder to handle
135Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
135

Data Analytics and Data Discovery Why do data need to be put into a meaningful context?Data must be put into meaningful contextIf the wrong analysis or datasets are usedThe output would be nonsenseManagers need context in order to understand how to interpret traditional and big data.
136Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
136

Data Analytics and Data Discovery How can manufacturers and health care benefit from data analytics? The health care industry is using big data to identify biological targets for drugs Eliminate failures before they reach the human testing stage Big data and the analytics that go with it could be a key element in further in-creasing the success rates in pharmaceutical R & D.137Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
137

Data Analytics and Data Discovery In the health-care and pharmaceutical industriesData growth is generated from several sourcesIncluding the R & D process itselfRetailersPatients CaregiversEffectively utilizing these data will help pharmaceutical companies better identify new potential drug candidatesDevelop such candidates into effective, approved productsGet reimbursed more quickly 138Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
138

Data Analytics and Data Discovery How does data mining provide value? Data mining software enables users to analyze data from various dimensions or anglesCategorize themFind correlations or patterns among fields in the data warehouseUp to 75 percent of an organization’s data are Nonstructured word-processing documentsSocial mediaText messagesAudioVideoImages and diagramsFaxes and memosCall center or claims notes, etc.
139Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
139

Data Analytics and Data Discovery 140Copyright ©2018 John Wiley & Sons, Inc.

What is Data Mining I Introduction to Data Mining
https://www.youtube.com/watch?v=grRwJ5jZBog

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
140

Data Analytics and Data Discovery What is text mining?Text mining is a broad category that involves interpreting words and concepts in context Any customer becomes a brand advocate or adversary By freely expressing opinions and attitudes that reach millions of other current or prospective customers on social mediaText mining helps companies tap into the explosion of customer opinions expressed onlineSocial commentary and social media are being mined for sentiment analysis or to understand consumer intent141Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
141

Data Analytics and Data Discovery Innovative companies know they could be more successful in meeting their customers’ needsIf they just understood their customers betterTools and techniques for analyzing text, documents, and other nonstructured content are available from several vendorsCombining data and text mining can create even greater value. Mining text or nonstructural data enables organizations to forecast the future Instead of merely reporting the pastForecasting methods using existing structured data and nonstructured text From both internal and external sources Provide the best view of what lies ahead
142Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
142

Data Analytics and Data Discovery 143Copyright ©2018 John Wiley & Sons, Inc.

What is text mining?

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
143

Data Analytics and Data Discovery What are the basic steps involved in text analytics?With text analytics, information is extracted from large quantities of various types of textual informationThe basic steps involved in text analytics include:Exploring. First, documents are exploredThis might occur in the form of simple word counts in a document collectionOr by manually creating topic areas to categorize documents after reading a sample of themA challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang144Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
144

Data Analytics and Data Discovery PreprocessingBefore analysis or the automated categorization of contentThe text may need to be preprocessed to standardize it to the extent possibleAs in traditional analysis, up to 80 percent of preprocessing time can be spent pre-paring and standardizing the dataMisspelled words, abbreviations, and slang may need to be transformed into consistent termsFor instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm”145Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
145

Data Analytics and Data Discovery Categorizing and ModelingCategorizing messages or documents from information contained within them can be achieved using statistical models and business rulesAs with traditional model developmentSample documents are examined to train the models Additional documents are then processed to validate the accuracy and precision of the modelFinally new documents are evaluated using the final model (scored)Models can then be put into production for the automated processing of new documents as they arrive146Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
146

Data Analytics and Data Discovery Categorizing and ModelingCategorizing messages or documents from information contained within them can be achieved using statistical models and business rulesAs with traditional model developmentSample documents are examined to train the models Additional documents are then processed to validate the accuracy and precision of the modelFinally new documents are evaluated using the final model (scored)Models can then be put into production for the automated processing of new documents as they arrive147Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
147

Learning Objectives (5 of 5)148Copyright ©2018 John Wiley & Sons, Inc.

Business Intelligence: Key to competitive advantageAcross industries in all size enterprisesUsed in operational management, business process, and decision makingProvides moment of value to decision makersUnites data, technology, analytics, & human knowledge to optimize decisionsBI “unites data, technology, analytics, and human knowledge to optimize business decision and ultimately drive an enterprise’s success” (The Data Warehousing Institute)
149Copyright ©2018 John Wiley & Sons, Inc.

Business Intelligence ChallengesChallengesData selection and qualityAlignment with business strategy and BI strategyAlignmentClearly articulates business strategyDeconstructs business strategy into targetsIdentifies PKIsPrioritizes PKIsCreates a plan based on prioritiesTransform based on strategic results and changes 150Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.17: Business Intelligence Factors: Four factors contributing to increased use of BI
151Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.17: Business Intelligence Factors: Four factors contributing to increased use of BI
152Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.17: Business Intelligence Factors: Four factors contributing to increased use of BI
153Copyright ©2018 John Wiley & Sons, Inc.

Figure 3.17: Business Intelligence Factors: Four factors contributing to increased use of BI
154Copyright ©2018 John Wiley & Sons, Inc.

Business Intelligence ArchitectureAdvances in response to big data and end-user performance demandsHosted on public or private cloudsLimits IT staff and controls costsMay slow response time, add security and backup risks
155Copyright ©2018 John Wiley & Sons, Inc.

Electronic Records ManagementBusiness RecordsDocumentation of a business event, action, decision, or transactionElectronic Records Management (EMR)Workflow software, authoring tools, scanners, and databases that manage and archive electronic documents and image paper documentsIndex and store documents according to company policy or legal complianceSuccess depends on partnership of key players 156Copyright ©2018 John Wiley & Sons, Inc.

ERM Practices and StandardsBest PracticesEffective systems capture all business dataInput from online forms, bar codes, sensors, websites, social sites, copiers, emails, and moreIndustry StandardsAssociation for Information and Image Management (AIIM; ) www.aim.orgNational Archives and Records Administration (NARA; ) www.archives.govARMA International (formerly the Association of Records Managers and Administrators; ) www.arma.org 157Copyright ©2018 John Wiley & Sons, Inc.

ERM Benefits: an ERM can help a business
Access and use the content contained in documentsCut labor costs by automating business processesReduce time and effort to locate require information for decision makingImprove content security, thereby reducing intellectual property theft risksMinimize content printing, storing, and searching costs 158Copyright ©2018 John Wiley & Sons, Inc.

ERM: Disaster Recovery, Business Continuity, and ComplianceDoes the software meet the organization’s needs? For example, can the DMS be installed on the existing network? Can it be purchased as a service? Is the software easy to use and accessible from Web browsers, office applications, and email applications? If not, people will not use it. Does the software have lightweight, modern Web and graphical user interfaces that effectively support remote users? Before selecting a vendor, it is important to examine workflows and how data, documents, and communications flow throughout the company. 159Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:Human expertise and judgment are needed to interpret the output of analytics because it takes expertise to properly prepare the data for analysis.
2. Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Data quality is the key to meaningful analytics.
3. If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance
4. Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
 5. Data mining is used to discover knowledge that you did not know existed in the databases. Answers may vary. A data mining example: The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers then combined that data with product and category related popularity scores which were generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10 to 15 percent increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales. 6. Up to 75 percent of an organization’s data are non-structured word processing documents, social media, text messages, audio, video, images and diagrams, fax and memos, call center or claims notes, and so on. Text mining is a broad category that involves interpreting words and concepts in context. Then the text is organized, explored, and analyzed to provide actionable insights for managers. With text analytics, information is extracted out of large quantities of various types of textual information. It can be combined with structured data within an automated process. Innovative companies know they could be more successful in meeting their customers’ needs if they just understood them better. Text analytics is proving to be an invaluable tool in doing this.  7. The basic steps involved in text mining/analytics include:Exploration. First, documents are explored. This might be in the form of simple word counts in a document collection, or manually creating topic areas to categorize documents by reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.Preprocessing. Before analysis or the automated categorization of the content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80 percent of the time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into a consistent term. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”Categorizing and Modeling. Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models then can be put into production for automated processing of new documents as they arrive.
159

Business Intelligence and Electronic Records ManagementWhat are the business benefits of BI? What are two data-related challenges that must be resolved for BI to produce meaningful insight? What are the steps in a BI governance program? What does it mean to drill down into data, and why is it important? What four factors are contributing to increased use of BI? Why is ERM a strategic issue rather than simply an IT issue? Why might a company have a legal duty to retain records? Give an example. Why is creating backups an insufficient way to manage an organization’s documents? 160Copyright ©2018 John Wiley & Sons, Inc.

Suggested Answers:
1. BI provides data at the moment of value to a decision maker—enabling it to extract crucial facts from enterprise data in real time or near real time. BI solutions help an organization to know what questions to ask and to find answers to those questions. BI tools integrate and consolidate data from various internal and external sources and then process them into information to make smart decisions. According to The Data Warehousing Institute (TDWI), BI “unites data, technology, analytics, and human knowledge to optimize business decisions and ultimately drive an enterprise’s success. BI programs… transform data into usable, actionable business information” (TDWI, 2012).Managers use business analytics to make better-informed decisions and hopefully provide them with a competitive advantage. BI is used to analyze past performance and identify opportunities to improve future performance. 2. Data selection and data quality. Information overload is a major problem for executives and for employees. Another common challenge is data quality, particularly with regard to online information, because the source and accuracy might not be verifiable. 3. The mission of a BI governance program is to achieve the following:Clearly articulate business strategies.Deconstruct the business strategies into a set of specific goals and objectives—the targets.Identify the key performance indicators (KPIs) that will be used to measure progress toward each target.Prioritize the list of KPIs.Create a plan to achieve goals and objectives based on the priorities.Estimate the costs needed to implement the BI plan.Assess and update the priorities based on business results and changes in business strategy. 4. Drilling down into the data is going from highly consolidated or summarized figures into the detail numbers from which they were derived. Sometimes a summarized view of the data is all that is needed; however, drilling down into the data, from which the summary came, provides the ability to do more in-depth analyses.
5. Smart Devices Everywhere creating demand for effortless 24/7 access to insights. Data is Big Business when they provide insight that supports decisions and action. Advanced Bl and Analytics help to ask questions that were previously unknown and unanswerable. Cloud Enabled Bl and Analytics are providing low-cost and flexible solutions. 6. Because senior management must ensure that their companies comply with legal and regulatory duties, managing electronic records (e-records) is a strategic issue for organizations in both the public and private sectors. The success of ERM depends greatly on a partnership of many key players, namely, senior management, users, records managers, archivists, administrators, and most importantly, IT personnel. Properly managed, records are strategic assets. Improperly managed or destroyed, they become liabilities. 7. Companies need to be prepared to respond to an audit, federal investigation, lawsuit, or any other legal action against them. Types of lawsuits against companies include patent violations, product safety negligence, theft of intellectual property, breach of contract, wrongful termination, harassment, discrimination, and many more. 8. Simply creating backups of records is not sufficient because the content would not be organized and indexed to retrieve them accurately and easily. The requirement to manage records—regardless of whether they are physical or digital—is not new. ERM systems consist of hardware and software that manage and archive electronic documents and image paper documents; then index and store them according to company policy. Properly managed, records are strategic assets. Improperly managed or destroyed, they become liabilities.
160

CopyrightCopyright © 2018 John Wiley & Sons, Inc.All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.161Copyright ©2018 John Wiley & Sons, Inc.

161

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP