您的位置:首页 > 其它

DataStage tip: using job parameters without losing your mind

2008-09-19 17:18 281 查看
A job parameter in the ETL environment is much like a parameter in
other products, it lets you change the way your programs behave at
run-time by tweaking or changing parameters to alter the way the job
behaves. This constant changing can either ease your support burden or
drive your support staff mad.

In this blog we will be looking at
project specific environment variables and how they become happy job
parameters. This is also my first blog with diagrams to show steps.

This
is a happy parameter method because an administrator or support person
can modify the value of a parameter, such as a database password, and
every job in production will run with the new value. Yippee!

If
you have trouble with the pictures below you can read the alternative
text (which isn't nearly as good) or go straight to the photo album.

Objectives
A
job parameter is a way to change a property within a job without having
to alter and recompile it. The trick to job parameters is to manage
them so the values can be easily maintained and there are not too many
manual maintenance steps that can result in incorrect job execution.

Creating Project Specific Environment Variables
Finding
the place to set these variables is a bit of a Minataurs Maze, we can
only hope that under DataStage Hawk we get a short cut to this screen
for the benefit of admin staff, here is the sequence of buttons:



Type
in all the required job parameters that are going to be shared between
jobs, note the passwords are set to a type of encrypted and will only
show up as asterisks here and in any job logs:



*
Database login values are job parameters as they change when you move
from development through to production and database passwords should be
modified on a regular timetable (though some sites never change them!).
* File directories can change when you move from development to production and they can change if new disk partitions are added.
*
DB2 settings such as transaction size can be changed by support to test
different performance settings or to match modifications to the
database setup.

The problem with these settings is that they all
sit in the one folder, this makes them a bit cumberson to administer
and find from within a job. Folder names can be added by hacking the
DSParams file found in the project home directory:



This
looks a lot neater. Note how now only the value field is modifiable,
moving these values into folders prevents the entries from accidentally
being deleted by administrators, they become protected entries. This is
a risky modification if you attempt it directly in production as you
may corrupt your DSParams file, always take a backup. The mofication is
done by adding a folder name into the file entry:
[EnvVarDefns]
DW_DB_INST/User Defined/-1/String//0/Project/DW DB Client Instance

Changed to:
[EnvVarDefns]
DW_DB_INST/User Defined/DB DW/-1/String//0/Project/DW DB Client Instance

The
text in bold shows the hack, the folder name has been inserted into the
second field on that line adding the sub directory "DB DW" under the
folder called "User Defined". I do not think IBM support will be too
thrilled with this hack, would really help if the next version had
folder support built in.

Adding to a job
Now we have
our variables for the project we can add them to our jobs. They are
added via the standard job parameters form via the verbose but slightly
misleading "Add Environment Variable.." button which doesn't add an
environment variable but rather brings an existing environment variable
into your job as a job parameter:



The
job parameters need to be set to the magic word "$PROJDEF" to ensure
they are dynamically set each time the job is run. Think of it like the
DataStage equivelent of abra cadabra. When the job runs and it finds a
job parameter value of $PROJDEF it goes looking for the value within
the DSParams file and sets it. Even the password is set to $PROJDEF,
use cut and paste into the encrypted password entry field instead of
trying to type it to avoid mistakes:



These job properties can now be used in a DB2 database stage:



Low Administration
So
now the job will dynamically pick up job parameter values using
whatever has been entered by the Administrator for that parameter. The
parameter values on the individual jobs never need to be changed.

The View Data Drawback
In
the current version you cannot easily use the database or sequential
file "View Data" option as it fails with the magic word $PROJDEF, it
treats it as a normal word and does not do the magic substitution. It
becomes a pain as you find yourself repeatedly switching the magic
world for real values manually each time you want to View Data. Since I
use View Data all the time in development (fast way to make sure a
stage is setup correctly) I use normal job parameters at the
parallel/server job level and use environment variable job parameters
at the sequence job level.

The sequence job maps project specific environment variables into normal job parameters:



In
this example the database login values required for View Data to work
have been turned into normal job parameters within the parallel job.
The two DB2 properties are still $ parameters within the job because
they do not affect view data. The PROCESS_ID and PROCESS_DATE are both
fully dynamic parameters set by some BASIC code within the Sequence job
User Variables stage. They are set to a counter ID and the current
processing date.

The restriction on this design is that users
should only run jobs via a Sequence job, if they try to run the job
directly the normal job parameters may have incorrect default values.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: