My Journey as Oracle DBA+DevOps Engineer: DataPump

Showing posts with label DataPump. Show all posts

Saturday, December 6, 2014

Data Pump - Exporting and Importing Directly Across the Network

Sometimes you need to create a testing database and load it with production data. In scenarios like this, the production box is usually located remotely from the development box. Data Pump provides you the ability to take an export and directly import it into your target database without creating any intermediate dump files. This is a fast and efficient way to create new environments from existing environments.

An example will help illustrate how this works. For this example, the production database users are STAR2, CIA_APP, and CIA_SEL. You want to move these users into a testing database and name them STAR_JUL, CIA_APP_JUL, and CIA_SEL_JUL.
This task requires the following high-level steps:

1. Create users in the destination database to be imported into. Here’s a sample script that creates the users in the testing database:

define star_user=star_jul
define star_user_pwd=star_jul_pwd
define cia_app_user=cia_jul_dec
define cia_app_user_pwd=cia_app_jul_pwd
define cia_sel_user=cia_sel_jul
define cia_sel_user_pwd=cia_sel_jul_pwd
--
create user &&star_user identified by &&star_user_pwd;
grant connect,resource to &&star_user;
alter user &&star_user default tablespace dim_data;
--
create user &&cia_app_user identified by &&cia_app_user_pwd;
grant connect,resource to &&cia_app_user;
alter user &&cia_app_user default tablespace cia_data;
--
create user &&cia_sel_user identified by &&cia_app_user_pwd;
grant connect,resource to &&cia_app_user;
alter user &&cia_sel_user default tablespace cia_data;

2. Create a database link in your testing database that points to your production database. The remote user referenced in the CREATE DATABASE LINK statement has the DBA role granted to it in the production database. Here’s a sample CREATE DATABASE LINK script:

create database link dk connect to darl identified by foobar using 'dwdb1:1522/dwrep1';

3. Create a directory that points to the location where you want your log file to go:

SQL> create or replace directory engdev as '/orahome/oracle/ddl/engdev';

4. Run the import command on the testing box. This command references the remote database via the NETWORK_LINK parameter. This command also instructs Data Pump to map the production database user names to the newly created
users in the testing database.

$ impdp darl/engdev directory=engdev network_link=dk schemas='STAR2,CIA_APP,CIA_SEL' remap_schema=STAR2:STAR_JUL,CIA_APP:CIA_APP_JUL,CIA_SEL:CIA_SEL_JUL parallel=4

This technique allows you to move large amounts of data between disparate databases without having to create or copy any dump files or datafiles. This is a very powerful Data Pump feature that lets you quickly and efficiently transfer data.

If you don’t have Oracle Net connectivity between the two databases, then the steps to accomplish the same task are as follows:

1. Export the production database.
2. Copy the dump file to the testing database.
3. Import the dump file into the testing database.

Data Pump - Listing the Contents of Dump Files - SQLFILE option

Data Pump has a very robust method of creating a file that contains all the SQL that’s executed when an import job runs. Data Pump uses the DBMS_METADATA package to create the DDL that you can use to recreate objects in the Data Pump dump file.

Use the SQLFILE option of Data Pump import to list the contents of a Data Pump export file. This example creates a file named expfull.sql that contains the SQL statements that the import process calls (the file is placed in the directory defined by the DPUMP_DIR2 directory object):

$ impdp hr/hr DIRECTORY=dpump_dir1 DUMPFILE=expfull.dmp SQLFILE=dpump_dir2:expfull.sql

In the previous command, if you don’t specify a separate directory (like dpump_dir2 in this example), then the SQL file is written to the location specified in the DIRECTORY option.

Tip You must run the previous command as a user with DBA privileges or the schema that performed the Data Pump export. Otherwise, you get an empty SQL file without the expected SQL statements in it.

When you use the SQLFILE option with an import, the impdp process doesn’t import any data. It only creates a file that contains the SQL commands that would be run by the import process. It’s sometimes handy to generate a SQL file for the following reasons:

• To preview and verify the SQL statements before running the import
• To manually run the SQL to pre-create database objects
• To create a DDL file that you can later inspect to see if there are differences in users from exports at different points in time

Data Pump - Estimating the Size of Export Jobs

If you’re about to export a large amount of data, you can estimate the size of the file that Data Pump creates before you run the export. You may want to do this because you’re concerned about the amount of space an export job needs.

To estimate the size, use the ESTIMATE_ONLY parameter. This example estimates the size of the export file for an entire database:

$ expdp dbauser/foo estimate_only=y full=y logfile=n

Here’s a snippet of the output:

Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. estimated "REP_MV"."REG_QUEUE_REP" 9.606 GB
. estimated "REP_MV"."CWP_USER_PROFILE" 2.589 GB
. estimated "REP_MV"."PRODUCT_INSTANCES_MV" 1.620 GB
. estimated "REP_MV"."ROLES_MV" 1.550 GB
. estimated "STAR2"."F_DOWNLOADS":"DOWN_P_5" 1.450 GB
. estimated "STAR2"."F_DOWNLOADS":"DOWN_P_11" 1.414 GB

Similarly, you can specify a schema name to get an estimate of the size required to export a user:

$ expdp dbauser/foo estimate_only=y schemas=star2 logfile=n

Here’s an example of estimating the size required for two tables:

$ expdp dbauser/foo estimate_only=y tables=star2.f_downloads,star2.f_installations logfile=n

Data Pump - Use a Parameter File

Instead of typing commands on the command line, in many situations it’s preferable to store the commands in a file and then reference the file when executing Data Pump export or import. Using parameter files makes tasks more repeatable and less prone to errors. You can place the commands in a file once and then reference that file multiple times.

Additionally, some of the Data Pump commands (like FLASHBACK_TIME) require the use of quotation marks; in these situations, it’s sometimes hard to predict how the operating system will interpret the quotation marks. Whenever a command requires quotation marks, it’s highly preferable to use a parameter file.

To use a parameter file, first create an operating text file that contains the commands you want to use to control the behavior of your job. This example uses the Linux/Unix vi command to create a text file named imp.par:

$ vi imp.par

Now, place the following commands in the imp.par file:

userid=darl/foo
directory=dp
dumpfile=invp.dmp
logfile=invp.log
tables=f_sales

Next, reference the parameter file via the PARFILE command-line option:

$ impdp parfile=imp.par

Data Pump import processes the parameters in the file as if they were typed on the command line. If you find yourself repeatedly typing the same commands, then consider using a parameter file to increase your efficiency.

Thursday, December 4, 2014

Data Pump - Exporting Data

A small amount of setup is required when you run a Data Pump export job. Here are the steps:

1. Create a database-directory object that points to an operating-system directory that you want to write/read Data Pump files to (expdp)/from(impdp)
2. Grant read, write on the directory to the database user running the export.
3. From the operating-system prompt, run the expdp utility.

Creating a Database Directory
Before you run a Data Pump job, you must create a database-directory object that corresponds to a physical location on disk that specifies where the dump files and log files are created. Use the CREATE DIRECTORY command to accomplish this. Below creates a directory named dp_dir and specifies that it is to map to the /oradump physical location on disk:

SQL> create directory dp_dir as '/oradump';

To view the details of the newly created directory, issue this query:

SQL> select owner, directory_name, directory_path from dba_directories;

By default, when you install Oracle, one default directory object is created named DATA_PUMP_DIR. If you don’t specify the DIRECTORY parameter when exporting or importing, Oracle by default attempts to use the default database-directory object. The default directory associated with DATA_PUMP_DIR can vary depending on the version of Oracle. On some systems, it may be ORACLE_HOME/rdbms/log; on other systems, it may point to ORACLE_BASE/admin/ORACLE_SID/dpdump. You have to inspect DBA_DIRECTORIES to verify the default location for your system.

Granting Access to the Directory
You need to grant permissions on the database-directory object to a user that wants to use Data Pump. Use the GRANT statement to allocate the appropriate privileges. If you want a user to be able to read from and write to the directory, you must grant security access as follows:

SQL> grant read, write on directory dp_dir to darl;

All directory objects are owned by the SYS user. If you’re using a user account that has the DBA role granted to it, then you have the read and write privileges on the directory object.

Taking an Export
When the directory object and grants are in place, you can use Data Pump to export information from a database. DBAs typically use exports for point-in-time backups of data and metadata. You can use these exports to either restore database objects or move data to different database environments. Suppose you recently created a table and populated it with data:

SQL> create table inv(inv_id number);
SQL> insert into inv values (123);

Now, you want to export the table. This example uses the previously created directory named DP_DIR. Data Pump uses the directory path specified by the directory object as the location on disk to write the dump file and log file:

$ expdp darl/foo directory=dp_dir tables=inv dumpfile=exp.dmp logfile=exp.log

The expdp job creates a file named exp.dmp in the /oradump directory that contains the information required to re-create the INV table and populate it with data as of the time the export was taken. In addition, a log file named exp.log is created in the /oradump directory that contains all the logging information associated with this export job.

If you don’t specify a dump-file name, Data Pump creates a file named expdat.dmp. If a file named expdat.dmp already exists in the directory, then Data Pump throws an error. If you don’t specify a log-file name, then Data Pump creates one named export.log. If a file already exists (named export.log), then Data Pump overwrites it.

Data Pump Architecture

Data Pump consists of the following components:

• expdp (Data Pump export utility)
• impdp (Data Pump import utility)
• DBMS_DATAPUMP PL/SQL package (Data Pump API)
• DBMS_METADATA PL/SQL package (Data Pump Metadata API)

The expdp and impdp utilities use the DBMS_DATAPUMP and DBMS_METADATA built-in PL/SQL packages when exporting and importing data and metadata. The DBMS_DATAPUMP package moves entire databases or subsets of data between database environments. The DBMS_METADATA package exports and imports information about database objects.

Note You can call the DBMS_DATAPUMP and DBMS_METADATA packages independently (outside of expdp and impdp) from SQL*Plus. I rarely call these packages directly from SQL*Plus; but you may have a specific scenario where itfs desirable to interact directly with them.

When you start a Data Pump export or import job, a master operating-system process is initiated on the database server. This master process name has the format ora_dmNN_<SID>. On Linux/Unix systems, you can view this process from the operating-system prompt using the ps command:

$ ps -ef | grep ora_dm
oracle 14950 717 0 10:59:06 ? 0:10 ora_dm00_STAGE

Depending on the degree of parallelism and the work specified, a number of worker processes are also started. The master process coordinates the work between master and worker processes. The worker process names have the format ora_dwNN_<SID>.

Also, when a user starts an export or import job, a database status table is created (owned by the user who starts the job). This table exists for the duration of the Data Pump job. The name of the status table is dependent on what type of job you’re running. The table is named with the format SYS_<OPERATION>_<JOB_MODE>_NN, where OPERATION is either EXPORT or IMPORT. JOB_MODE can be FULL, SCHEMA, TABLE, TABLESPACE, and so on.

For example, if you’re exporting a schema, a table is created in your account with the name SYS_EXPORT_SCHEMA_NN, where NN is a number that makes the table name unique in the user’s schema. This status table contains information such as the objects exported/imported, start time, elapsed time, rows, error count, and so on. The status table has over 80 columns.
The status table is dropped by Data Pump upon successful completion of an export or import job. If you use the KILL_JOB interactive command, the master table is also dropped. If you stop a job with the STOP_JOB interactive command, the table isn’t removed and is used in the event you restart the job. If your job terminates abnormally, the master table is retained. You can delete the status table if you don’t plan to restart the job.

When Data Pump runs, it uses a database-directory object to determine where to write and read dump files and log files. Usually, you specify which directory object you want Data Pump to use. If you don’t specify a directory object, a default is used. A Data Pump export creates an export file and a log file. The export file contains the objects being exported. The log file contains a record of the job activities.

Below shows the architectural components related to a Data Pump export job. Here’s how you initiate the job from the command line:

$ expdp user/pwd dumpfile=exp.dmp logfile=exp.log directory=dp_dir

In this example, a database-directory object named DP_DIR is defined to reference the /oradump operating-system directory. The output files are defined via the command line to be exp.dmp and exp.log.

Below displays the architectural components of a Data Pump import job. Here’s how you initiate the job from the command line:

$ impdp user/pwd dumpfile=exp.dmp logfile=imp.log directory=dp_dir

In the above command, the Data Pump import reads from a dump file named exp.dmp, which is located in the operating-system directory referenced by the database-directory object named DP_DIR. The import job reads the dump file and populates database objects.

For each Data Pump job, you must ensure that you have access to a directory object.

Data Pump

Data Pump was introduced in Oracle Database 10g. It replaces the older exp/imp utilities. Data Pump enables you to quickly move data and/or metadata from one environment to another. You can use Data Pump in a variety of ways:

• Point-in-time logical backups of the entire database or subsets of data
• Replicating entire databases or subsets of data for testing or development
• Quickly generating Data Definition Language (DDL) required to re-create objects

Sometimes DBAs hold on to the old exp/imp utilities because they’re familiar with the syntax and these utilities get the job done quickly. Even if those legacy utilities are easy to use, you should consider using Data Pump going forward. Data Pump contains substantial functionality over the old exp/imp utilities:

• Performance with large data sets, allowing you to efficiently export and import gigabytes of data
• Interactive command-line utility, which gives you the ability to disconnect and then later re-attach to active Data Pump jobs
• Ability to export and import large amounts of data from a remote database directly into a local database without creating a dump file
• Ability to make on-the-fly changes to schemas, tablespaces, datafiles, and storage settings from export to import
• Sophisticated filtering of objects and data
• Security controlled via database-directory objects
• Advanced features such as compression and encryption