DataStage Parallel routines made really easy
2008-09-19 17:11
344 查看
DataStage is a powerful ETL tool with lot of inbuilt stages/routines
which can do most of the functionalities required; for those things
DataStage EE can’t do, there are parallel routines which can be
written in C++.
This primer can teach you how you can create a parallel routine in
few minutes, regardless of whether or not you are a C/C++ programmer.
But to write some real good codes you might have to learn some C++
programming. Starting C programming with Linux is a good link to start with.
Before we begin, few points to be noted:
Parallel routines are C++ components built and compiled
external to DataStage. Note - they must be compiled as C++ components,
not C.
This C++ program should be without main() and compiled using the
compiler option specified under “APT_COMPILEOPT†which can be found
under Administrator parameter option and create an object (*.o). This
will create runtime libraries which are compiled code, without main ie.
non self-contained executable file.
Compiler and compiler options can be found in
DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler
Ex: compiler = g++
compiler options = -O -fPIC -Wno-deprecated –c
Compile command syntax
Compiler : compiler options : {filename with extenstion}
Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extenstion}
Here's the typical sequence of steps for creating a DataStage parallel routine:
1) Create
Create a C++ program with main()
Test it and if successful remove the main()
2) Compile
Compile using the compiler option specified under
“APT_COMPILEOPTâ€. Note:Compiler and compiler options can be found
in "DataStage --> Administrator --> Properties --> Environment
--> Parallel --> Compiler" and create an object (*.o) file and
put this object file onto this directory.
3) Link
Link the above object (*.o) to a DataStage Parallel routine by making the relevant entries in General tab:
Routine Name: {Parallel Routine Name}
Type: External Function
Object Type: Object / Library
External subroutine name: {Function Name specified inside your C++ program}
Library Path: {Specified in 2) Compile section + object (*.o) file name }
Also specify the Return Type and if you have any input parameters to be passed specify that in Arguments tab.
4) Execute
Now your parallel routine will be available inside your job. Include and compile your job and execute.
Step by step Example:
Creating a shared object
1) Create a C++ program with main()
Create a text file with cpp extn (Ex: OBJTEST.cpp )
Ex:
Test this program
Copy your compiler specification from
"DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler"
and compile the created C++ program
Syntax: g++ program.cpp –o program
Ex: g++ OBJTEST.cpp -o OBJTEST
Run/Execute using the below command
Syntax: ./program
./OBJTEST
Output --> Hello World - Object Testing
If you get above output, that means your program is successfully executed.
Re-write the program without main()
Ex:
2)Compile the program
Get compiler and compiler options from:
DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler
Ex: compiler = g++
compiler options = -O -fPIC -Wno-deprecated –c
Compile command syntax
Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extenstion}
Execute the below command:
g++ -O -fPIC -Wno-deprecated –c OBJTEST.cpp
This will make and object file with .o extn -->Ex: OBJTEST.o
Move this object file to any of the Library Path of your preference:
Ex: /datastage/Ascential/DataStage/PXEngine/lib
I usually put in "lib" directory. You can locate your "lib" directory from Library Path (LD_LIBRARY_PATH).
3) Link
Link the above object (*.o) to a DataStage Parallel routine.
In the repository pallet “right click†and chose “New parallel routine†and make these entries in the General tab:
Routine Name: {Parallel Routine Name} Ex: OBJECTTEST
Type: External Function
Object Type: Object
External subroutine name: {Function Name specified inside your C++ program}
Ex: ObjTestOne (Remember? This is the function name we replaced for main() ie. char * ObjTestOne() )
Library Path: {Specified in Compile section + object (*.o) file name }
Ex: /datastage/Ascential/DataStage/PXEngine/lib/OBJTEST.o
Return Type: char*
Note:As we don’t have any input parameters to be passed we are not making any entries in Arguments tab.
Now save and close the window.
4) Execute
Create a test job and call this parallel routine inside your job.
Ex: Row Generator --> Transformer --> Sequential File
In the transformer call this routine in your output column derivation. Compile and run the job.
which can do most of the functionalities required; for those things
DataStage EE can’t do, there are parallel routines which can be
written in C++.
This primer can teach you how you can create a parallel routine in
few minutes, regardless of whether or not you are a C/C++ programmer.
But to write some real good codes you might have to learn some C++
programming. Starting C programming with Linux is a good link to start with.
Before we begin, few points to be noted:
Parallel routines are C++ components built and compiled
external to DataStage. Note - they must be compiled as C++ components,
not C.
This C++ program should be without main() and compiled using the
compiler option specified under “APT_COMPILEOPT†which can be found
under Administrator parameter option and create an object (*.o). This
will create runtime libraries which are compiled code, without main ie.
non self-contained executable file.
Compiler and compiler options can be found in
DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler
Ex: compiler = g++
compiler options = -O -fPIC -Wno-deprecated –c
Compile command syntax
Compiler : compiler options : {filename with extenstion}
Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extenstion}
Here's the typical sequence of steps for creating a DataStage parallel routine:
Create --> Compile --> Link --> Execute
1) Create
Create a C++ program with main()
Test it and if successful remove the main()
2) Compile
Compile using the compiler option specified under
“APT_COMPILEOPTâ€. Note:Compiler and compiler options can be found
in "DataStage --> Administrator --> Properties --> Environment
--> Parallel --> Compiler" and create an object (*.o) file and
put this object file onto this directory.
3) Link
Link the above object (*.o) to a DataStage Parallel routine by making the relevant entries in General tab:
Routine Name: {Parallel Routine Name}
Type: External Function
Object Type: Object / Library
External subroutine name: {Function Name specified inside your C++ program}
Library Path: {Specified in 2) Compile section + object (*.o) file name }
Also specify the Return Type and if you have any input parameters to be passed specify that in Arguments tab.
4) Execute
Now your parallel routine will be available inside your job. Include and compile your job and execute.
Step by step Example:
Creating a shared object
1) Create a C++ program with main()
Create a text file with cpp extn (Ex: OBJTEST.cpp )
Ex:
#include <stdlib.h> #include <stdio.h> int main() { char* OutStr; OutStr="Hello World - Object Testing"; printf(OutStr); return 0; }
Test this program
Copy your compiler specification from
"DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler"
and compile the created C++ program
Syntax: g++ program.cpp –o program
Ex: g++ OBJTEST.cpp -o OBJTEST
Run/Execute using the below command
Syntax: ./program
./OBJTEST
Output --> Hello World - Object Testing
If you get above output, that means your program is successfully executed.
Re-write the program without main()
Ex:
#include <stdlib.h> #include <stdio.h> char * ObjTestOne() { char* OutStr; OutStr="Hello World - Object Testing"; return OutStr; }
2)Compile the program
Get compiler and compiler options from:
DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler
Ex: compiler = g++
compiler options = -O -fPIC -Wno-deprecated –c
Compile command syntax
Compiler : compiler options : {filename with extenstion}
Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extenstion}
Execute the below command:
g++ -O -fPIC -Wno-deprecated –c OBJTEST.cpp
This will make and object file with .o extn -->Ex: OBJTEST.o
Move this object file to any of the Library Path of your preference:
Ex: /datastage/Ascential/DataStage/PXEngine/lib
I usually put in "lib" directory. You can locate your "lib" directory from Library Path (LD_LIBRARY_PATH).
3) Link
Link the above object (*.o) to a DataStage Parallel routine.
In the repository pallet “right click†and chose “New parallel routine†and make these entries in the General tab:
Routine Name: {Parallel Routine Name} Ex: OBJECTTEST
Type: External Function
Object Type: Object
External subroutine name: {Function Name specified inside your C++ program}
Ex: ObjTestOne (Remember? This is the function name we replaced for main() ie. char * ObjTestOne() )
Library Path: {Specified in Compile section + object (*.o) file name }
Ex: /datastage/Ascential/DataStage/PXEngine/lib/OBJTEST.o
Return Type: char*
Note:As we don’t have any input parameters to be passed we are not making any entries in Arguments tab.
Now save and close the window.
4) Execute
Create a test job and call this parallel routine inside your job.
Ex: Row Generator --> Transformer --> Sequential File
In the transformer call this routine in your output column derivation. Compile and run the job.
相关文章推荐
- DataStage Parallel routines
- DataStage Parallel Routines
- Datastage parallel routine aix unix hp-ux编译c程序 .o文件 .a文件
- 开发datastage parallel routine
- DataStage Server Routines
- HTTP Made Really Easy
- 如何用C开发DataStage Parallel Routine
- HTTP Made Really Easy(HTTP的一些协议规范)
- HTTP Made Really Easy(HTTP的一些协议规范)
- HTTP Made Really Easy
- HTTP Made Really Easy
- Operating System: Three Easy Pieces --- Why It Gets Worse: Shared Data (Note)
- ETL 工具下载全集 包括 Informatica Datastage Cognos( 持续更新)
- FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading
- Oracle Performance Regression Testing... made easy!
- Peer Code Reviews Made Easy with Eclipse Plug-In
- 学会定制MapReduce里的partition,sort和grouping,Secondary Sort Made Easy
- Advanced Web Sites Made Easy
- How to Create, Use and Maintain DataStage 8 Parameter Sets
- Rails Solutions: Ruby on Rails Made Easy