Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Transcript of Business Intelligence
is a decision support system, whose aim is to help make business decisions, strategic as well as operational/tactical.
Although data warehouses provide data to business intelligence applications, all BI applications are not dependent on data warehouses to provide them with the data that they need.
Is critical for constructing, managing, and using the data warehouse. It provides useful information for locating data stored in a data warehouse.
4.8% in 2009
13% in 2010
17% in 2011
7% in 2012
Top 5 BI Vendors Worldwide
Top 10 CIO Business and Technology Priorities 2012
Can work, and be integrated, with other applications such as enterprise resource planning (ERP) and customer relationship management (CRM).
The use of unstructured data is limited or nonexistent.
Types of data warehouses
Enterprise data warehouse (EDW)
Operational data store (ODS) *
An operational data store (ODS) is a subject-oriented database that contains structured data extracted directly from operational transaction system data sources. Usually, it will contain very little summarized and historical data. An ODS is stored independently of the production system database. It contains current—or near current—data, and its objective is to meet the ad hoc query, tactical day-to-day needs of operational users. An ODS is also used as a staging area before data is imported into a data warehouse. In contrast to a data warehouse, which contains static data, an ODS can be frequently updated from operational systems—even in real-time.
Database tables are grouped together by subject areas
Data is structured into "facts" and "dimensions
Denormalization creates redundant data, in very large tables, that reduces or eliminates the requirement to join multiple tables when a query is run.
A star schema, which is the most-widely used method for designing data warehouse databases
A snowflake schema, which is derived from the star schema, adds a hierarchical structure to the dimension tables. The snowflake schema can be used when the dimension table has subcategories, or more than one level of dimension tables, and more efficient access is required.
intermediate database, an ODS, into which source data is loaded and staged before being sent to its final destination—the data warehouse
A data warehouse system can contain many ODSs.
It is estimated that 50%-75% of the time required to build a data warehouse is spent on ETL, which includes the following tasks:
Analyzing the data
Extracting data from the sources
Conditioning and transforming source data for meeting the requirements (technical and business)
Loading the data
Master Data Management (MDM)
Set of processes, governance, policies, standards and tools that consistently defines and manages the master data (i.e. non-transactional data entities) of an organization
An MDM tool can be used to support Master Data Management
If the data transformation is extremely complex, for example, it might be better to develop in-house custom programs for the ETL process.
The following is a list of well-known ETL tool vendors:
IBM (InfoSphere Information Server, DataStage)
SAP Business Objects (Data Integrator/Data Services)
Microsoft (SQL Server Integration Services)
Oracle Data Integrator (ODI)
Oracle Warehouse Builder (OWB)
SAP (Sybase® ETL)
SAS (SAP Enterprise Data Integration Server)
Adeptia® (Adeptia ETL Suite)
DB Software Laboratory® (Advanced ETL Processor)
Others include Syncsort®, ETI®, iWay Software®, and Open Text®
The following vendors provide open-source ETL tools:
Pentaho® (Pentaho Data Integration)
Jaspersoft® (Jaspersoft ETL)
Talend® (Data Integration)
The following companies provide the leading relational database products:
There are three leading open-source products:
A key data warehouse design decision concerns the database type to be selected: relational database or multidimensional (OLAP) database. The selection of the database type influences the choice of the data access tool, which can be a simple relational query tool, an OLAP tool that provides a multidimensional view of the data, or some other type of specialized decision support tool.
For specialized or complex analysis requirements, multidimensional databases are often preferred.
Spreadsheets such as Excel
Tools that can access multidimensional databases
DSS tools that can perform multidimensional analysis against relational databases
Statistical-analysis tools, which can perform complex statistical analyses
Artificial intelligence and advanced analysis tools
Data visualization tool that consolidates and presents, on a single screen, different items that can be used for reporting, analysis, monitoring, and control
Must answer a business question or resolve a business problem.
Tool that is linked to an organization's business strategy and performance goals
In contrast to a dashboard, which provides the status as of a given moment, a scorecard indicates the progress, or lack thereof, toward specific goals. A scorecard can be displayed on a dashboard.
SAP (Business Objects and Crystal Reports)
IBM (Cognos and SPSS)
Others include LogiXML, Actuate®, and Clarity®
The following are open-source vendors:
Leading vendors of front-end access tools
Technique that analyzes large volumes of data for determining patterns and relationships, using advanced statistical analysis and modeling techniques.
Generate a hypothesis and correlate factors that can be used to analyze and improve the business
The probability of finding patterns is increased as the number of algorithms used is increased.
Aims to discover patterns in the data.
The following characteristics define data marts:
Do not have to be based on the enterprise data model
Are typically limited to one or a few subject areas (such as sales and finance)
Each contain only a small part of an organization's data; enterprise can have many
Can be implemented as a small project and within months
Are easy to design, build, and test; are less expensive to implement and maintain
Have far fewer users than a data warehouse
Can have one or multiple sources—a data warehouse, another data mart, or an OLTP database
Have databases that are far smaller—typically only a few gigabytes
Require simpler hardware and supporting technical infrastructure
Can be implemented by staff with less experience and fewer technical skills
Are typically built by different project teams without a common design, process, tools, hardware, or software
Can be built independently, in a staggered manner, as needs evolve
Can make possible future integration difficult to achieve because of independent construction
Ways to load the data mart
Data is fed from an EDW to the data mart(s); any changes to the EDW are propagated to all associated data marts receiving its feeds.
Data is fed to the data mart(s) by direct extract(s) from the source system(s).
Common data warehouse/data mart implementation approaches
There are two widely used approaches for building data marts:
Top-down: Build an EDW and then construct dependent data marts, which are its highly summarized subsets.
Bottom-up: Build independent data marts, whose foundation is the enterprise data model, which can then be used to construct an EDW.
Is a business intelligence tool that addresses the need to perform multidimensional analysis. It is based on an analytical technique that combines data access tools with an analytical database engine. In contrast to the rows and columns structure of relational databases, OLAP uses a multidimensional view of data
Ability to perform "what if" analysis, a powerful tool that can simulate the effect of decisions.
MOLAP - Multidimensional OLAP
ROLAP - Relational OLAP
HOLAP - Hybrid OLAP
Is a proprietary database that stores data in an array format, in cubes, rather than the normal tables used by relational databases.
Multidimensional data structures can be visualized as cubes of data
Vendors who provide OLAP servers
Oracle (Hyperion and OBIEE)
Microsoft (Microsoft Analysis Services)
MicroStrategy (MicroStrategy OLAP Services)
Jedox AG (Palo OLAP Server)
SAS Institute (SAS OLAP Server)
Information Builders (WebFOCUS)
OLAP tools from vendors
The largest vendors in the OLAP market are Microsoft, Oracle, SAP, IBM, and MicroStrategy. There are many smaller upstart vendors who are introducing products, with some supporting SaaS, including Birst™, Oco, and GoodData®.
The worldwide business intelligence analytics and performance management revenue growth
is a large analytical database, which is populated from a variety of source systems that typically run the business
It contains transaction as well as non-transaction data and is designed for querying, reporting, and analysis.
Expensive to implement and maintain
It acts as the corporate-centralized repository where consistent, detailed, summarized, current, as well as historical data is stored.
Implementing quick changes is not easy
The data is organized according to subjects such as customer, vendor, bookings, orders, and products. This contrasts with classical applications that are organized by business functions such as loans, finance, inventory, and the like.
The major subject areas are physically implemented as a series of related database tables.
Designing datawarehouse databases
These are databases where data is staged, integrated, and stored, prior to being made available through data warehouses and data marts.
Cubes can dramatically simplify access to the data in the warehouse, as well as provide excellent performance.
* Is a somewhat integrated copy of operational data that is frequently updated. Also provides a platform for you to feed operational data into the data warehouse.