SHOP
Shop!
DISCOVER
Library
DISCUSS
Forums
 

2009/08/27

Building an Oracle Data Vault – Part 3: Loading HUBS AND LINKS

Filed under: DV Model & Methodology — Bruce McCartney @ 08:18

Here we will load the HUBS and LINKS using simple Oracle PL/SQL.  Loading HUBS is pretty straight forward.  Following is a flow diagram.  Using PL/SQL, I will use the Oracle MERGE command to perform a set-based UPSERT.  This technique scales well.

Here is a code snippet in (PL/SQL) , note that we only have one STG source for this HUB.  If there were more, I simply UNION all the sources together in the SELECT.

MERGE INTO HUB_GLOBAL_NAME hub
USING
(
SELECT DISTINCT
GLOBAL_NAME AS GLOBAL_NAME,
STG_DATA_SOURCE, STG_LAST_UPDATE_DTTM
FROM STG_ALL_TABLES
) stg
ON (hub.GLOBAL_NAME = stg.GLOBAL_NAME)
WHEN MATCHED THEN
UPDATE
SET
hub.HUB_LAST_SEEN_DTTM = hub.HUB_LOAD_DTTM
where (HUB_LAST_SEEN_DTTM != HUB_LOAD_DTTM)
WHEN NOT MATCHED THEN
INSERT
(
HUB_GLOBAL_NAME_SQN, GLOBAL_NAME, HUB_LOAD_DTTM,
HUB_DATA_SOURCE,HUB_LAST_SEEN_DTTM
)
VALUES (
HUB_GLOBAL_NAME_SQN.NEXTVAL,
stg.GLOBAL_NAME,
stg.STG_LAST_UPDATE_DTTM,
stg.STG_DATA_SOURCE,
stg.STG_LAST_UPDATE_DTTM
)
;

I will comment on DATEs used later, suffice to say the LOAD_DTTM comes from the STAGE table and the LAST_SEEN_DTTM is an indicator that show the date of the last point in time the Business Key(s) were not found in ANY feed.  Loading LINKS is very similar to HUBs; the source for the mapping is a list of distinct combinations of keys.  Again, the Oracle MERGE can do the trick.

MERGE INTO LNK_ALL_TABLES lnk
USING
(
SELECT /*+ RULE */  DISTINCT
HUB_GLOBAL_NAME.HUB_GLOBAL_NAME_SQN,
HUB_OWNER.HUB_OWNER_SQN,
HUB_TABLE_NAME.HUB_TABLE_NAME_SQN, STG_ALL_TABLES.STG_LAST_UPDATE_DTTM, STG_ALL_TABLES.STG_DATA_SOURCE
FROM STG_ALL_TABLES,
HUB_GLOBAL_NAME,
HUB_OWNER,
HUB_TABLE_NAME
WHERE
STG_ALL_TABLES.GLOBAL_NAME = HUB_GLOBAL_NAME.GLOBAL_NAME
AND HUB_GLOBAL_NAME.HUB_LOAD_DTTM = HUB_GLOBAL_NAME.HUB_LAST_SEEN_DTTM
AND STG_ALL_TABLES.OWNER = HUB_OWNER.OWNER
AND HUB_OWNER.HUB_LOAD_DTTM = HUB_OWNER.HUB_LAST_SEEN_DTTM
AND STG_ALL_TABLES.TABLE_NAME = HUB_TABLE_NAME.TABLE_NAME
AND HUB_TABLE_NAME.HUB_LOAD_DTTM = HUB_TABLE_NAME.HUB_LAST_SEEN_DTTM
) stg
ON (
stg.HUB_GLOBAL_NAME_SQN = lnk.HUB_GLOBAL_NAME_SQN
and stg.HUB_OWNER_SQN = lnk.HUB_OWNER_SQN
and stg.HUB_TABLE_NAME_SQN = lnk.HUB_TABLE_NAME_SQN
)
WHEN MATCHED THEN
UPDATE
SET
lnk.LNK_LAST_SEEN_DTTM    = lnk.LNK_LOAD_DTTM
WHERE   (LNK_LAST_SEEN_DTTM != LNK_LOAD_DTTM)
WHEN NOT MATCHED THEN
INSERT
(
LNK_ALL_TABLES_SQN,HUB_GLOBAL_NAME_SQN,HUB_OWNER_SQN,HUB_TABLE_NAME_SQN,LNK_LAST_SEEN_DTTM,LNK_LOAD_DTTM,LNK_DATA_SOURCE)
VALUES (
LNK_ALL_TABLES_SQN.NEXTVAL,
stg.HUB_GLOBAL_NAME_SQN,
stg.HUB_OWNER_SQN,
stg.HUB_TABLE_NAME_SQN,
stg.STG_LAST_UPDATE_DTTM, stg.STG_LAST_UPDATE_DTTM, stg.STG_DATA_SOURCE
);

This approach scales very nicely too.  Next time I will load the SATELLITE srtucture.

2009/08/09

Building an Oracle Data Vault – Part 2: the data vault model

Filed under: DV Model & Methodology — Bruce McCartney @ 21:59

In Part One, we defined our STAGE environment.  Now we need to develop a Data Vault Model.

Following the methodology, we end up with the following HUBs representing 3 “business keys” – OWNER, TABLE_NAME and GLOBAL_NAME.:

CREATE TABLE HUB_TABLE_NAME (

“HUB_TABLE_NAME_SQN” number(38) NOT NULL ,

“TABLE_NAME” varchar2(30) NOT NULL ,

“HUB_LAST_SEEN_DTTM” date NOT NULL , “HUB_LOAD_DTTM” date NOT NULL ,

“HUB_DATA_SOURCE” varchar2(60) NOT NULL ,

CONSTRAINT PH_TABLE_NAME

PRIMARY KEY (HUB_TABLE_NAME_SQN)

)

CREATE TABLE HUB_GLOBAL_NAME (

“HUB_GLOBAL_NAME_SQN” number(38) NOT NULL ,

“GLOBAL_NAME” varchar2(30) NOT NULL ,

“HUB_LOAD_DTTM” date NOT NULL , “HUB_LAST_SEEN_DTTM” date NOT NULL ,

“HUB_DATA_SOURCE” varchar2(60) NOT NULL ,

CONSTRAINT PH_GLOBAL_NAME

PRIMARY KEY (HUB_GLOBAL_NAME_SQN)

)

CREATE TABLE HUB_OWNER (

“HUB_OWNER_SQN” number(38) NOT NULL ,

“OWNER” varchar2(30) NOT NULL ,

“HUB_LAST_SEEN_DTTM” date NOT NULL , “HUB_LOAD_DTTM” date NOT NULL ,

“HUB_DATA_SOURCE” varchar2(60) NOT NULL ,

CONSTRAINT PH_OWNER

PRIMARY KEY (HUB_OWNER_SQN)

)

/

Next we have a LINK table associating these 3 HUBS:

CREATE TABLE LNK_ALL_TABLES (

“LNK_ALL_TABLES_SQN” number(38) NOT NULL ,

“HUB_GLOBAL_NAME_SQN” number(38) NOT NULL ,

“HUB_OWNER_SQN” number(38) NOT NULL ,

“HUB_TABLE_NAME_SQN” number(38) NOT NULL ,

“LNK_LAST_SEEN_DTTM” date NOT NULL ,

“LNK_LOAD_DTTM” date NOT NULL ,

“LNK_DATA_SOURCE” varchar2(60) NOT NULL ,

CONSTRAINT PL_ALL_TABLES

PRIMARY KEY (LNK_ALL_TABLES_SQN),

CONSTRAINT FL_1_ALL_TABLES

FOREIGN KEY (HUB_GLOBAL_NAME_SQN)

REFERENCES HUB_GLOBAL_NAME

(HUB_GLOBAL_NAME_SQN),

CONSTRAINT FL_2_ALL_TABLES

FOREIGN KEY (HUB_OWNER_SQN)

REFERENCES HUB_OWNER

(HUB_OWNER_SQN),

CONSTRAINT FL_3_ALL_TABLES

FOREIGN KEY (HUB_TABLE_NAME_SQN)

REFERENCES HUB_TABLE_NAME

(HUB_TABLE_NAME_SQN)

)

And finally a SATELLITE table for the LINK containing descriptive columns:

CREATE TABLE SAT_ALL_TABLES (

“LNK_ALL_TABLES_SQN” number(38) NOT NULL ,

“SAT_LOAD_DTTM” date NOT NULL ,

“SAT_LOAD_END_DTTM” date NULL ,

“TABLESPACE_NAME” varchar2(30) NULL ,

“NUM_ROWS” number NULL ,

“LAST_ANALYZED” date NULL ,

“SAT_DATA_SOURCE” varchar2(60) NOT NULL ,

CONSTRAINT PS_ALL_TABLES

PRIMARY KEY (LNK_ALL_TABLES_SQN,SAT_LOAD_DTTM),

CONSTRAINT FS_ALL_TABLES

FOREIGN KEY (LNK_ALL_TABLES_SQN)

REFERENCES LNK_ALL_TABLES

(LNK_ALL_TABLES_SQN)

)

Next time, we will start loading the data from STAGE.  I will be demonstrating PL/SQL based loading routines – the principles are not database specific.

2009/08/05

Building an Oracle Data Vault – Part 1: the data source

Filed under: DV Model & Methodology — Tags: — Bruce McCartney @ 19:51
At a customer site recently, I was asked to begin building a System Metrics Data Vault. This is a data vault containing metrics about the system, and a good example to get you started as the ’source’ for this data vault is the Oracle data dictionary (dba_tables system view). The objective is to maintain the table row count over time, so here are the steps I went through for this exercise.
Step 1 – Source system analysis
I want to create a very simple data vault to track all tables in all databases. The source model would then be:

CREATE TABLE all_tables
(
global_name VARCHAR2(30 BYTE) NOT NULL,
OWNER VARCHAR2(30 BYTE) NOT NULL,
TABLE_NAME VARCHAR2(30 BYTE) NOT NULL,
TABLESPACE_NAME VARCHAR2(30 BYTE),
NUM_ROWS NUMBER,
LAST_ANALYZED DATE
)
;
ALTER TABLE all_tables ADD (
PRIMARY KEY
(global_name,OWNER, TABLE_NAME)
);
Note that I have extended the DBA_TABLES view with the column GLOBAL_NAME. This allows me to build a database independent vault for all databases in my enterprise.
Step 2 – Stage Area
The data vault architecture requires a STAGE area. In this case, I have used a simple view to create this stage area – this approach can then be used in conjunction with database links to pull data from multiple databases. Note that special access must be granted to create views of data dictionary tables, and to select accross database links.
CREATE OR REPLACE VIEW STG_ALL_TABLES
(GLOBAL_NAME, OWNER, TABLE_NAME, TABLESPACE_NAME, NUM_ROWS,
LAST_ANALYZED, STG_LAST_UPDATE_DTTM, STG_DATA_SOURCE)
AS
select global_name,
“OWNER”,
“TABLE_NAME”,
“TABLESPACE_NAME” ,
“NUM_ROWS” ,
“LAST_ANALYZED” ,
sysdate as “STG_LAST_UPDATE_DTTM”,
‘ALL_TABLES’ as “STG_DATA_SOURCE”
from all_tables, global_name
/
Our stage area is now ready to be used as a source for the data vault.  In Part 2, I’ll put a small DV model around this.

2009/08/04

Data Vault Experiences

Filed under: DV Model & Methodology — Bruce McCartney @ 09:40

I am finally going to get this done.  A blog on Data Vault Experiences.  I hope to publish various ideas and personal experience using the Data Vault here.   Please feel free to comment, and thanks to Dan for getting the Data Vault Institute going.

My specialization is with Oracle database.  I have been working with Oracle for 20 years, formally with Oracle Corporation.  Much of my experiences are with Oracle data warehouses, and I have used PL/SQL to build most of them.  I have also used Informatica and Oracle Warehouse Builder as well, but if you are an Oracle shop – PL/SQL works just fine.

From a Data Vault point of view, I am a certified modeller (from Dan’s company RapidACE) and have been building Data Vaults for almost 4 years.  I also am a certified partner with RapidACE in using their tools.
My plan is to start with a 5 series of posts walking through a simple example of a data vault I build and the process used.

Powered by WordPress