Wednesday, December 31, 2014

ASM disk group administration-Creating a disk group

In general, once an ASM instance completes the ASM disk's discovery operation, you can then use the ASM disks to either build a disk group or add it to a pre-existing disk group. You need at least one eligible ASM disk (with HEADER_STATUS either
CANDIDATE, FORMER, or PROVISIONED) to be able to build a disk group or to add to a pre-existing disk group. There are various methods to create a disk group, such as, DBCA (in pre 11g R2), ASMCA (with 11g R2), Grid Control, and CREATE DISKGROUP SQL statement in the SQLPLUS prompt. Although the easiest and most convenient way to build and manage a disk group is to use the GUI tools such as DBCA, ASMCA, or Grid Control, we are going to demonstrate how to create and manage a disk group using a set of SQL statements.

In order to create a disk group on the local ASM instance, you first need to identify the eligible ASM disks discovered by the instance. The following SQL statements are useful to list the eligible ASM disks:

export ORACLE_SID=+ASM1
export ORACLE_HOME=$ASM_HOME (in 11gR1)
export ORACLE_HOME=$ORA_GRID_HOME (in 11gR2)

SQL> SELECT path,header_status,mode_status,total_mb
FROM v$asm_disk WHERE header_status IN ('CANDIDATE','FORMER','PROVISIONED');

The query lists each disk's information along with the path, the header status, and size of the disk. Once you list the disks, you can use the following set of SQL statements to create a new disk group with different levels of mirroring options.

The following SQL statement creates a new disk group named DATA with one ASM disk, sde1 located under the /dev/ location. The EXTERNAL REDUNDANCY clause in the SQL statement indicates that you are relying on the STORAGE-level mirroring
(protection) option, not using the Oracle-provided mirroring level:

CREATE DISKGROUP data EXTERNAL REDUNDANCY DISK '/dev/sde1';

The following SQL statement creates a new disk group named DATA that consists of two failure groups with one disk to each failure group, using the Oracle-provided NORMAL redundancy (a two-way mirroring) level:

CREATE DISKGROUP data NORMAL REDUNDANCY FAILGROUP fgp1 DISK '/dev/sde1', FAILGROUP fgp2 DISK '/dev/sdf1';

The following SQL statement creates a new disk group named DATA with three failure groups with one disk to each failure group, using the Oracle-provided highest level of redundancy, a three-way mirroring-level option:

CREATE DISKGROUP data HIGH REDUNDANCY
FAILGROUP fgp1 DISK '/dev/sde1', FAILGROUP fgp2 DISK '/dev/sdf1', FAILGROUP fgp3 DISK '/dev/sdg1';

Note: Failure groups are used to copy the redundant copies of every extent. When the mirroring option is skipped, ASM applies the NORMAL REDUNDANCY (a two-way mirroring) level by default.

After a disk group is successfully built, it will then automatically mount in the local instance. If a server parameter file is being used (SPFILE), the name of the disk group is successfully added to the ASM_DISKGROUP initialization parameter in order
to mount the disk group automatically on ASM instance restarts. In order to make the disk group available on other ASM instances in a cluster, you simply need to mount the disk group running the ALTER DISKGROUP data MOUNT statement (ensure the same set of disks are accessible on the other nodes).

Tuesday, December 30, 2014

ASM instance startup/shutdown

Managing an ASM instance is no different from managing the typical RDBMS database instances. The ASM instance could be managed by either using a set of SQLPLUS commands or the cluster aware SRVCTL utility. Nevertheless, it is strongly
recommended that you use the SRVCTL utility for managing the (start/stop) ASM instance in an RAC environment.

The ASM instance can be opened either in NOMOUNT, MOUNT, or RESTRICTED modes with the STARTUP command at the SQLPLUS prompt. When you have a planned maintenance on an ASM instance, you can open the ASM instance in a RESTRICT
mode to avoid any possible connections from the database instances. When the ASM instance is opened gracefully, it first discovers the disks and then mounts all the existing disk groups on the local instance.

To shut down the local ASM instance, you can use the various options available with the SHUTDOWN command in the SQLPLUS prompt. The supported options are NORMAL, IMMEDIATE, TRANSACTIONAL, and ABORT. As mentioned previously, you can use the SRVCTL utility to bring down the ASM instance as well. The following list of examples demonstrates how to start up/shut down an ASM instance using SQLPLUS and SRVCTL utilities:

srvctl stop asm –n raclinux1 –o normal:immediate:transactional:abort

export ORACLE_SID=+ASM1
export ORACLE_HOME=$ASM_HOME (in 11gR1)
export ORACLE_HOME=$ORA_GRID_HOME (in 11gR2)
sqlplus / as sysasm
SQL> SHUTDOWN NORMAL:IMMEDIATE:TRANSACTIONAL:ABORT

This example stops the ASM instance on raclinux1 node. Alternatively, you can also use either of the shutdown options.

srvctl start asm –n raclinux1 –o nomount:mount:restrict

export ORACLE_SID=+ASM1
export ORACLE_HOME=$ASM_HOME (in 11gR1)
export ORACLE_HOME=$ORA_GRID_HOME (in 11gR2)
sqlplus / as sysasm
SQL> STARTUP NOMOUNT:MOUNT:RESTRICT


This startup command starts up the ASM instance on raclinux1 node. When you start the instance in NOMOUNT mode, an ASM instance will be started without mounting the existing disk groups. When the instance is started in RESTRICT mode, no database
instance can attach to the ASM instance and all the disk groups are opened in restricted mode too.

Ensure that the database instances that are currently associated with the local ASM instance are stopped prior to shutting down the local ASM instance to avoid encountering an ASM instance shutdown error. In this case, the ASM instance will remain opened. Alternatively, you could use the ABORT option to forcefully shut down the currently connected database's instance before shutting down the ASM instance. It is also strongly recommended to dismount any existing Oracle Cluster File System (ACFS) to avoid any application I/O errors.

Note: If the voting disk and OCR files are placed in a disk group, you will not be able to stop the ASM instance.
In order to stop the ASM instance, you need to stop the cluster.

Friday, December 26, 2014

ASM dynamic views

To manage and supervise the ASM instance and its primary components (disk and disk group) effectively, you really need to know the ASM specific dynamic views, and how to map them to extract useful information about disks, disk groups, and
so on. This section will help you to understand the use of the ASM specific dynamic views available in ASM.

There are about 19 dynamic views available, as of 11g R2, and each dynamic view provides different helpful information. All the ASM dynamic views are predefined with V$ASM_. In the following section, we are going to focus on a handful of ASM
dynamic views.

V$ASM_DISK
When an ASM instance completes the disk discovery operation by reading the disk header information, all disks (used and usable) will then list in the V$ASM_DISK view. Each individual disk has a row in the V$ASM_DISK dynamic view and contains very useful information. The PATH column specifies the disk's name and location. The HEADER_STATUS column, in most cases, contains the following three possible values:

• CANDIDATE: Indicates that the unused disks are ready for use.
• FORMER: Indicates that the disk was formerly part of a disk group and is now ready for use.
• MEMBER: Indicates that the disk is currently part of an active disk group.

Apart from the preceding values, the view also contains other useful information about the disks, such as total size, free size, physical reads, redundancy level, and so on. On the flipside, every time you run a query against this view, ASM initiates disk
discovery operations for the new disks, where it reads all disk header information. Querying against this view could be an expensive operation at times, and could impact performance.

The following list of SQL statements demonstrates some of the useful queries against the view. However, prior to running these commands, ensure you are connected to an ASM instance through sqlplus with SYSASM privilege:

SELECT path,header_status,total_mb FROM v$asm_disk WHERE header_status in ('CANDIDATE','FORMER','PROVISIONED');

The preceding command displays information about the disks that are eligible to use.

V$ASM_DISKGROUP
After a disk group is successfully created in the local ASM instance, the disk group summary is visible in the V$ASM_DISKGROUP view. Each disk group maintains a row in the view along with the important information, such as disk group number and name, total disk group size, used space, free space, redundancy type, compatibility, mount state, and so on. Every time a query is run against the view, it is likely to have a similar impact to querying the V$ASM_DISK views. The following SQL command extracts the mounted disk group's name, total disk group size, and the free space left in the group:

SELECT name,state,total_mb,usable_file_mb FROM v$asm_diskgroup;

V$ASM_OPERATION
V$ASM_OPERATION is one of the useful views that displays a row for each long running operation in the ASM instance. For example, when a disk is being dropped or attached to an existing disk group, an ASM should initiate and complete the
rebalancing operations just before releasing the subject disk. Therefore, the view will present useful details, such as the amount of work that has been completed, and show the estimated time (in minutes) required to complete the operations. This should help
you to understand how long the operation will take to complete.

V$ASM_DISK_STAT
Although the V$ASM_DISK and V$ASM_DISK_STAT views display nearly identical information, querying the V$ASM_DISK_STAT view results is a less expensive operation in comparison to the V$ASM_DISK view. On the flip side, this view doesn't display the details about new disks on the system that the ASM instance has yet to discover. As querying the view is less expensive, it is strongly recommended that you use this view in order to display the information and statistics (read/write)
about the existing disks.

V$ASM_DISKGROUP_STAT
The V$ASM_DISKGROUP_STAT view displays statistical information about the mounted disk groups in the ASM instance. Unlike the V$ASM_DISKGROUP view, a query against this view doesn't result in new disk discovery operations but is less expensive in terms of performance. Therefore, it is recommended to use this view to display existing disk group information and statistical information about the disk groups.

V$ASM_CLIENT
When the V$ASM_CLIENT view is queried in the ASM instance, it displays the information about the database instances that are using the disk groups mounted and managed by the ASM instance.

ASM background processes

In addition to the typical set of instance background processes, an ASM instance comes with a few additional background processes that assist the ASM instance to perform its course of action. Therefore, understanding the individual role played by
these ASM-specific background processes in order to know how they help the ASM instance to manage and carry out its functionality, will be helpful to DBAs. In this section, we are going to cover the most useful ASM-specific background processes
and the part played by the individual ASM-specific background processes:

image

ASM instance configuration and management

Staring with 10g R1 and onwards, Oracle supports two types of instances: RDBMS and ASM. An ASM instance is a special kind of instance with only Shared Global Area (SGA) that typically consists of memory components such as Shared Pool,
Large Pool, Free Memory and ASM Cache, and a set of default background process, with some additional ASM-specific background processes.
The ASM instance doesn't hold any physical structure and the name of the instance typically starts with +ASM. The instance would be generally named as +ASMn (where n represents the instance number) in a cluster environment. The INSTANCE_TYPE initialization parameter controls the role played by these two instances. The ASM instance configuration
requires only a handful of initialization parameters. These parameters are discussed in the following section.

ASM instances efficiently support both non-RAC and RAC databases. Only one ASM instance is required per node, irrespective of the number of databases/instances running on the node. In an RAC environment, you need to configure one ASM
instance per node and these ASM instances across the cluster communicate with each other using an interconnected network communication. The database instances interact with the ASM instance to manage the database datafiles. ASM instances
typically listen on 1521 port by default.

ASM initialization parameters

In this section, we are going to summarize the essential ASM instance initialization parameters, their usage, and recommended values:

image

ASM disk group

A disk group is a logical container for one or more ASM disks and is the highest level of data structure in ASM. When a database is configured to employ the disk group, the disk group then becomes the default location for its datafiles. The disk group
can be used to place various database file types, such as datafiles, online redo, archivelogs, RMAN backupsets, OCR and Voting disks (in 11g R2), and more. ASM also provides the flexibility of utilizing a single disk group by multiple ASM instances
and databases across a cluster.

After a disk group is successfully created and mounted for the first time in the ASM instance, the name of the disk group is automatically affiliated with the ASM_DISKGROUPS initialization parameter to be able to mount the disk group at ASM
instance restarts.

In general, when a datafile is created in a disk group, the datafile extents are striped/distributed evenly across the available disks of the disk group. Optionally, you can also set the following specified mirroring level at the disk group to protect the data
integrity by storing redundant copies of data (extents) in a separate failure group to cope with the disk outage symptom:

• External redundancy: Relies on the STORAGE (RAID)-level mirroring redundancy option to protect the data
• Normal redundancy: Provides a default two-way mirroring option
• High redundancy: Provides a three-way mirroring redundancy option of ASM files

As of 11g R2, the following limits have been imposed on the ASM instance:
• A maximum of 63 disk groups in a storage system
• 1 million files per disk group


The following diagram illustrates the structure of a disk group with three disks assigned to it:
image

ASM disk

A disk is a primary element of an ASM instance. A disk could be built or formed either from a Logical Unit Number (LUN) by a storage array, a disk partition or an entire disk, or a logical volume or Network Attached File (NFS). An ASM instance discovers the disks within the paths specified with the ASM_DISKSTRING initialization parameter. Whenever a new disk is discovered by the ASM instance, the header status of the disk is set to the CANDIDATE flag and makes the disk available in the instance.
After the disks are successfully discovered by the local ASM instance, they will appear in the V$ASM_DISK dynamic view of the local ASM instance and the disks are ready to use to build a new ASM disk group, or can be added to any pre-existing
ASM disk groups. However, when no particular paths for the disks are specified with the ASM_DISKSTRING initialization parameter, ASM by default look in the following OS specific paths to discover the disks:

image

On a cluster environment, ensure that all ASM disks are visible across all nodes and each node must have the exact set of permissions (660) and ownership (oracle:dba) to avoid running into any sorts of problems. Oracle also strongly advises to have the
same size of disks in a disk group to maintain the disk group balance. On the other hand, you have the flexibility to define a different naming convention for a disk across the nodes in a cluster.

The following limits, as of 11g R2, have been imposed on ASM disks:
• A maximum of 10,000 disks
• Up to 2 TB maximum storage per ASM disk, with EXADATA 4PB per ASM disk

Wednesday, December 24, 2014

Filesystem versus ASM storage architecture

The following image shows the comparison between a typical filesystem and ASM storage involved when a database is created:

image_thumb[3]

Diagram A: In this diagram, an application is being connected to a database that is running on Node 1 and the database datafiles are configured on a typical filesystem storage. As summarized earlier, the filesystem is built from a Logical Volume Group (LVG) and the LVG is prepared on the shared storage. Therefore, to prepare the storage for the database, (filesystem and volume groups), you generally need a third-party tool, such as Volume Manager.

Diagram B: In this diagram, an application is being connected to a database that is running on Node 1 and the database datafiles are configured on ASM storage. ASM storage renders the capabilities of the filesystem and volume manager and manages the storage. In contrast to a filesystem, ASM storage does not require any third-party tools to manage the storage for the database and eliminates the need of building a volume group and filesystem creation. This would bypass the layers involved
between the database and storage, thus improving the read performance.

Overview of Automatic Storage Management (ASM)

Automatic Storage Management (ASM) is an option and a new feature of Oracle 10g and onwards that simplifies the storage management for all Oracle database file types. It renders the capabilities of a volume manager and filesystem together into
the Oracle database kernel. Although, it inherits the Stripe And Mirror Everything (SAME) functionality, it strips the data (extents) evenly across the ASM disks of a disk group by default and provides the mirroring functionality as an option. The
management and administration of ASM is made easy through a well-known set of SQL statements, such as, CREATE, ALTER, DROP, and through GUI tools.

While the ASM was initially intended for managing and maintaining only the Oracle database files and other related files, its functionality has been significantly improved in 11g R1 and R2 versions to manage all types of data. The following are some of the key features and benefits of ASM:

• It simplifies the storage configuration management for Oracle datafiles and other files.
• It eliminates the need for third-party software, (for example, volume manager) to manage the storage for the databases.
When a datafile is created, it is divided into equally sized (1, 2, 4, 8, 16, 32,or 64 MB) extents that are scattered evenly across the disks of a disk group to provide balanced I/O to improve performance and prevent hot spot symptoms.
• It is built on the Stripe and Mirror Everything (SAME) functionality.
• It supports both non-RAC and RAC databases efficiently.
• It has the ability to add and remove ASM disks online without actually disturbing the ongoing operations.
• It can be managed and administrated using a set of known SQL statements.
• It performs automatic online redistribution for the data whenever a disk is being added or dropped.
• With ASM 11g R2, in addition to all database file types, it can also be used to store non-Oracle datafile types such as binaries, images, and so on. Beginning with ASM 11g R2, it provides the ability of a preferred read functionality, when ASM mirroring features are enabled.
• It supports multiversioning of databases.
• It supports a multipathing feature to prevent outages from disk path failures.

Saturday, December 13, 2014

Cassandra - Installing Cassandra

1. First Download the Cassandra installable tarball (as I am going to install in Red Hat Linus machine) from the official Cassandra website to the location in your machine where you want to install it - http://cassandra.apache.org/download/

image

2. Cassandra is written in Java and it requires Java 7 or above is to be installed before we install Cassandra. Ensure in your machine java 7 or above is installed.  Java version 7 naming looks like 1.7.x.x
The command to check what version of java installed is simple, in the terminal type in :   java –version as shown below

image

3. Cassandra installation is very simple – untaring the tarball makes the cassandra installed (or all the executables and tools are available to get started with Cassandra ) and the installable is of around 22MB size only.

Once the tarball is downloaded in the machine where you want to install it, make a directory named – cassandra in your home directory and place the tarball in it.

To untar the tarball use the below command

$ tar -xvzf apache-cassandra-2.1.2-bin.tar.gz

[rvalusa@ol6-11gGG ~]$ cd cassandra/
[rvalusa@ol6-11gGG cassandra]$ ls
apache-cassandra-2.1.2-bin.tar.gz
[rvalusa@ol6-11gGG cassandra]$ tar -xvzf apache-cassandra-2.1.2-bin.tar.gz
apache-cassandra-2.1.2/bin/
apache-cassandra-2.1.2/conf/
apache-cassandra-2.1.2/conf/triggers/
apache-cassandra-2.1.2/interface/
apache-cassandra-2.1.2/javadoc/
apache-cassandra-2.1.2/javadoc/org/
apache-cassandra-2.1.2/javadoc/org/apache/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/auth/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/auth/class-use/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/cache/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/cache/class-use/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/cli/
apache-cassandra-2.1.2/javadoc/org/apache/cassandra/cli/class-use/
..........
..........
..........
..........

apache-cassandra-2.1.2/tools/bin/cassandra-stress.bat
apache-cassandra-2.1.2/tools/bin/cassandra-stressd
apache-cassandra-2.1.2/tools/bin/cassandra.in.bat
apache-cassandra-2.1.2/tools/bin/cassandra.in.sh
apache-cassandra-2.1.2/tools/bin/json2sstable
apache-cassandra-2.1.2/tools/bin/json2sstable.bat
apache-cassandra-2.1.2/tools/bin/sstable2json
apache-cassandra-2.1.2/tools/bin/sstable2json.bat
apache-cassandra-2.1.2/tools/bin/sstablelevelreset
apache-cassandra-2.1.2/tools/bin/sstablemetadata
apache-cassandra-2.1.2/tools/bin/sstablemetadata.bat
apache-cassandra-2.1.2/tools/bin/sstablerepairedset
apache-cassandra-2.1.2/tools/bin/sstablesplit
apache-cassandra-2.1.2/tools/bin/sstablesplit.bat
apache-cassandra-2.1.2/tools/bin/token-generator

[rvalusa@ol6-11gGG cassandra]$ ls
apache-cassandra-2.1.2  apache-cassandra-2.1.2-bin.tar.gz
[rvalusa@ol6-11gGG cassandra]$ cd apache-cassandra-2.1.2
[rvalusa@ol6-11gGG apache-cassandra-2.1.2]$ ls
bin          conf       javadoc  LICENSE.txt  NOTICE.txt  tools
CHANGES.txt  interface  lib      NEWS.txt     pylib

[rvalusa@ol6-11gGG apache-cassandra-2.1.2]$


image

Viewing The Main Configuration File:
Cassandra has a main configuration named – cassandra.yaml in conf directory.
Details on YAML are in here - http://en.wikipedia.org/wiki/YAML.  It is a recursive acronym – Yaml Ain’t Markup Language

Below are some of the parameter properties set in the cassandra.yaml config file

[rvalusa@ol6-11gGG conf]$ pwd
/home/rvalusa/cassandra/apache-cassandra-2.1.2/conf
[rvalusa@ol6-11gGG conf]$ cd cassandra.yaml

cluster_name: 'Test Cluster'
num_tokens: 256
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
endpoint_snitch: SimpleSnitch

Providing Permissions to Cassandra directories:
[rvalusa@ol6-11gGG lib]$ su -
Password:
[root@ol6-11gGG ~]# mkdir /var/lib/cassandra <== In here cassandra data files will be residing
[root@ol6-11gGG ~]# mkdir /var/log/cassandra <== In here cassandra system logs will be residing
[root@ol6-11gGG ~]# chown -R rvalusa /var/lib/cassandra
[root@ol6-11gGG ~]# chown -R rvausa /var/log/cassandra

Cassandra - Getting Started With The Architecture

Understanding That Cassandra Is A Distributed Database
Cassandra is a distributed database, all nodes in cluster has same functionality when compared with each other. There is no master or slave nodes thus eliminating the single point of failure. Data is replicated across the nodes to high availability.

In Cassandra, cluster can easily be spread across more than one data center allowing for high availability even if one data center completely goes down.

image

Cassandra Documentation is available at - http://www.datastax.com/docs

Snitch: Snitch is how the nodes in a cluster know about the topology of the cluster
Ways to Define Snitch:
- Dynamic Snitching: Monitors the performance of reads from the various replicas and chooses the best replica based on this history
- SimpleSnitch: For single-data center deployments only.
- RackInferringSnitch: Determines the location of nodes by rack and data center corresponding to the IP addresses.
- PropertyFileSnitch: Determines the location of nodes by rack and data center.
- GossipingPropertyFileSnitch: Automatically updates all nodes using gossip when adding new nodes.
- EC2Snitch: Use with Amazon EC2 in a single region.
- EC2MultiRegionSnitch: Use with Amazon EC2 in multiple regions.
- GoogleCloudSnitch
- CloudstackSnitch


Gossip: Gossip is how the nodes in a cluster communicate to each other.
Every ONE second, each node communicates with up to three other nodes, exchanging information about itself and all the other nodes that it has information about.
Gossip is the internal communication method for nodes in a cluster to talk to each other.

For external communication, such as from an application to a Cassandra database, CQL(Cassandra Query Language) or Thrift are used.

How data distribution is done across the Nodes in Cassandra ?
Data Distribution is done through consistent hashing algorithm, to strive for even distribution of data across the nodes in a cluster.
Rather than all of the rows of a table existing on only on node, the rows are distributed across the nodes in the cluster, in an attempt to evenly spread out the load of the table’s data.
For example, notice the following rows of data, to be inserted in a table within a Cassandra database. (The data will be spread across the nodes based on the hash algorithm used which is illustrated below)

image

To distribute the rows across the nodes, a Partitioner is used.
The Partitioner uses an algorithm to determine which node a given row of data will go to
The default partitioner in Cassandra is Murmur3

Murmur3 takes the values in the first column* (Depending upon the table definition more than one column can also be used by Partitioner Murmur3) of the row to generate a unique number between  -263 and 263.

So based on the hashing algorithm the above table Home_ID column row data turn into as below
H01033638 –>  -7456322128261119046
H01545551 –>  -2487391024765843411
H00999943 –>  6394005945182357732


Similarly each node in a cluster has an end point value assigned to it manually which decides which row data will get distributed to which node. Each node is responsible for the token values between its endpoint and the endpoint of the previous node.

image
Therefore, the –7456322128261119046 data is owned by the –4611686018427387904 node

image

Node token ranges are calculated using the below formula or Murmur3 calculator - http://www.geroba.com/cassandra/cassandra-token-calculator/

image

Replication: A Replication Factor must be specified whenever a database is defined.
The Replication Factor specifies how many instances  of the data there will be within a given database.
Although 1 can be specified, it is common to specify 2, 3, or more, so that if a node goes down, there is at least one other replica of the data, so that the data is not lost with the down node.
 

Virtual Nodes: Virtual nodes are an alternative way to assign token ranges to nodes, and are now the default in Cassandra.
With virtual nodes, instead of a node being responsible for just one token range, it is instead responsible for many small token ranges (by default, 256 of them)
Virtual nodes allow for assigning a high number of ranges to a powerful computer (e.g: 512) and a lower number of ranges (e.g: 128) to a less powerful computer.
Virtual nodes (aka vnodes) were created to make it easier to add new nodes to a cluster while keeping the cluster balanced.
When a new node is added, it receives many small token range slices from the existing nodes, to maintain a balanced cluster.
With the old way, of static token ranges, it was common to double the number of nodes, so that the end-point for the new nodes could be a value half of the value of the existing end-points.

Cassandra–Introduction to Cassandra

Understanding What Cassandra Is:
Cassandra is a

- Open Source
- A NoSQL (Not Only SQL) database Technology
- A distributed database technology
- A big data technology which provides massive scalability
- Commonly used to create a database that is spread across nodes in more than one data center, for high availability
- Based on Amazon Dynamo and Google Big Table
- Fault Tolerant
- Highly Performant
- Decentralized – No Single point of failure
- Durable – Data is not lost even one of the data center goes down
- Elastic – Read/Write throughput increases linearly as new nodes are added to cluster

What Cassandra Is Being Used For:

Use cases of Cassandra are listed in – planetcassandra.org/apache-cassandra-use-cases/

Companies running their applications on Apache Cassandra have realized benefits which have directly improved their business. Cassandra is capable of handling all of the big data challenges that might arise: massive scalability, an always on architecture, high performance, strong security, and ease of management, to name a few

- Product Catalog/Playlist – Coursera, Comcast, Netflix, Hulu, Sky, Soundcloud etc.,
- Recommendation/Personalization - Bazaarvoice, Outbrain, eBay etc.,
- Fraud Detection - Barracuda Networks, F-Secure etc.,
- Messaging -  Accenture, CallFire, eBuddy, The New York Times etc.,
- IOT/Sensor Data - NASA, AppDynamics, Lucid, Aeris etc.,

Saturday, December 6, 2014

Data Pump - Exporting and Importing Directly Across the Network

Sometimes you need to create a testing database and load it with production data. In scenarios like this, the production box is usually located remotely from the development box. Data Pump provides you the ability to take an export and directly import it into your target database without creating any intermediate dump files. This is a fast and efficient way to create new environments from existing environments.

An example will help illustrate how this works. For this example, the production database users are STAR2, CIA_APP, and CIA_SEL. You want to move these users into a testing database and name them STAR_JUL, CIA_APP_JUL, and CIA_SEL_JUL.
This task requires the following high-level steps:

1. Create users in the destination database to be imported into. Here’s a sample script that creates the users in the testing database:

define star_user=star_jul
define star_user_pwd=star_jul_pwd
define cia_app_user=cia_jul_dec
define cia_app_user_pwd=cia_app_jul_pwd
define cia_sel_user=cia_sel_jul
define cia_sel_user_pwd=cia_sel_jul_pwd
--
create user &&star_user identified by &&star_user_pwd;
grant connect,resource to &&star_user;
alter user &&star_user default tablespace dim_data;
--
create user &&cia_app_user identified by &&cia_app_user_pwd;
grant connect,resource to &&cia_app_user;
alter user &&cia_app_user default tablespace cia_data;
--
create user &&cia_sel_user identified by &&cia_app_user_pwd;
grant connect,resource to &&cia_app_user;
alter user &&cia_sel_user default tablespace cia_data;


2. Create a database link in your testing database that points to your production database. The remote user referenced in the CREATE DATABASE LINK statement has the DBA role granted to it in the production database. Here’s a sample CREATE DATABASE LINK script:

create database link dk connect to darl identified by foobar using 'dwdb1:1522/dwrep1';

3. Create a directory that points to the location where you want your log file to go:

SQL> create or replace directory engdev as '/orahome/oracle/ddl/engdev';

4. Run the import command on the testing box. This command references the remote database via the NETWORK_LINK parameter. This command also instructs Data Pump to map the production database user names to the newly created
users in the testing database.

$ impdp darl/engdev directory=engdev network_link=dk schemas='STAR2,CIA_APP,CIA_SEL' remap_schema=STAR2:STAR_JUL,CIA_APP:CIA_APP_JUL,CIA_SEL:CIA_SEL_JUL parallel=4

This technique allows you to move large amounts of data between disparate databases without having to create or copy any dump files or datafiles. This is a very powerful Data Pump feature that lets you quickly and efficiently transfer data.

If you don’t have Oracle Net connectivity between the two databases, then the steps to accomplish the same task are as follows:

1. Export the production database.
2. Copy the dump file to the testing database.
3. Import the dump file into the testing database.

Data Pump - Listing the Contents of Dump Files - SQLFILE option

Data Pump has a very robust method of creating a file that contains all the SQL that’s executed when an import job runs. Data Pump uses the DBMS_METADATA package to create the DDL that you can use to recreate objects in the Data Pump dump file.

Use the SQLFILE option of Data Pump import to list the contents of a Data Pump export file. This example creates a file named expfull.sql that contains the SQL statements that the import process calls (the file is placed in the directory defined by the DPUMP_DIR2 directory object):

$ impdp hr/hr DIRECTORY=dpump_dir1 DUMPFILE=expfull.dmp SQLFILE=dpump_dir2:expfull.sql

In the previous command, if you don’t specify a separate directory (like dpump_dir2 in this example), then the SQL file is written to the location specified in the DIRECTORY option.

Tip You must run the previous command as a user with DBA privileges or the schema that performed the Data Pump export. Otherwise, you get an empty SQL file without the expected SQL statements in it.

When you use the SQLFILE option with an import, the impdp process doesn’t import any data. It only creates a file that contains the SQL commands that would be run by the import process. It’s sometimes handy to generate a SQL file for the following reasons:

• To preview and verify the SQL statements before running the import
• To manually run the SQL to pre-create database objects
• To create a DDL file that you can later inspect to see if there are differences in users from exports at different points in time

Data Pump - Estimating the Size of Export Jobs

If you’re about to export a large amount of data, you can estimate the size of the file that Data Pump creates before you run the export. You may want to do this because you’re concerned about the amount of space an export job needs.

To estimate the size, use the ESTIMATE_ONLY parameter. This example estimates the size of the export file for an entire database:

$ expdp dbauser/foo estimate_only=y full=y logfile=n

Here’s a snippet of the output:

Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. estimated "REP_MV"."REG_QUEUE_REP" 9.606 GB
. estimated "REP_MV"."CWP_USER_PROFILE" 2.589 GB
. estimated "REP_MV"."PRODUCT_INSTANCES_MV" 1.620 GB
. estimated "REP_MV"."ROLES_MV" 1.550 GB
. estimated "STAR2"."F_DOWNLOADS":"DOWN_P_5" 1.450 GB
. estimated "STAR2"."F_DOWNLOADS":"DOWN_P_11" 1.414 GB


Similarly, you can specify a schema name to get an estimate of the size required to export a user:

$ expdp dbauser/foo estimate_only=y schemas=star2 logfile=n

Here’s an example of estimating the size required for two tables:

$ expdp dbauser/foo estimate_only=y tables=star2.f_downloads,star2.f_installations logfile=n

Data Pump - Use a Parameter File

Instead of typing commands on the command line, in many situations it’s preferable to store the commands in a file and then reference the file when executing Data Pump export or import. Using parameter files makes tasks more repeatable and less prone to errors. You can place the commands in a file once and then reference that file multiple times.

Additionally, some of the Data Pump commands (like FLASHBACK_TIME) require the use of quotation marks; in these situations, it’s sometimes hard to predict how the operating system will interpret the quotation marks. Whenever a command requires quotation marks, it’s highly preferable to use a parameter file.

To use a parameter file, first create an operating text file that contains the commands you want to use to control the behavior of your job. This example uses the Linux/Unix vi command to create a text file named imp.par:

$ vi imp.par

Now, place the following commands in the imp.par file:

userid=darl/foo
directory=dp
dumpfile=invp.dmp
logfile=invp.log
tables=f_sales


Next, reference the parameter file via the PARFILE command-line option:

$ impdp parfile=imp.par

Data Pump import processes the parameters in the file as if they were typed on the command line. If you find yourself repeatedly typing the same commands, then consider using a parameter file to increase your efficiency.

Thursday, December 4, 2014

Data Pump - Exporting Data

A small amount of setup is required when you run a Data Pump export job. Here are the steps:

1. Create a database-directory object that points to an operating-system directory that you want to write/read Data Pump files to (expdp)/from(impdp)
2. Grant read, write on the directory to the database user running the export.
3. From the operating-system prompt, run the expdp utility.

Creating a Database Directory
Before you run a Data Pump job, you must create a database-directory object that corresponds to a physical location on disk that specifies where the dump files and log files are created. Use the CREATE DIRECTORY command to accomplish this. Below creates a directory named dp_dir and specifies that it is to map to the /oradump physical location on disk:

SQL> create directory dp_dir as '/oradump';

To view the details of the newly created directory, issue this query:

SQL> select owner, directory_name, directory_path from dba_directories;

By default, when you install Oracle, one default directory object is created named DATA_PUMP_DIR. If you don’t specify the DIRECTORY parameter when exporting or importing, Oracle by default attempts to use the default database-directory object. The default directory associated with DATA_PUMP_DIR can vary depending on the version of Oracle. On some systems, it may be ORACLE_HOME/rdbms/log; on other systems, it may point to ORACLE_BASE/admin/ORACLE_SID/dpdump. You have to inspect DBA_DIRECTORIES to verify the default location for your system.

Granting Access to the Directory
You need to grant permissions on the database-directory object to a user that wants to use Data Pump. Use the GRANT statement to allocate the appropriate privileges. If you want a user to be able to read from and write to the directory, you must grant security access as follows:

SQL> grant read, write on directory dp_dir to darl;

All directory objects are owned by the SYS user. If you’re using a user account that has the DBA role granted to it, then you have the read and write privileges on the directory object.

Taking an Export
When the directory object and grants are in place, you can use Data Pump to export information from a database. DBAs typically use exports for point-in-time backups of data and metadata. You can use these exports to either restore database objects or move data to different database environments. Suppose you recently created a table and populated it with data:

SQL> create table inv(inv_id number);
SQL> insert into inv values (123);

Now, you want to export the table. This example uses the previously created directory named DP_DIR. Data Pump uses the directory path specified by the directory object as the location on disk to write the dump file and log file:

$ expdp darl/foo directory=dp_dir tables=inv dumpfile=exp.dmp logfile=exp.log

The expdp job creates a file named exp.dmp in the /oradump directory that contains the information required to re-create the INV table and populate it with data as of the time the export was taken. In addition, a log file named exp.log is created in the /oradump directory that contains all the logging information associated with this export job.

If you don’t specify a dump-file name, Data Pump creates a file named expdat.dmp. If a file named expdat.dmp already exists in the directory, then Data Pump throws an error. If you don’t specify a log-file name, then Data Pump creates one named export.log. If a file already exists (named export.log), then Data Pump overwrites it.

Data Pump Architecture

Data Pump consists of the following components:

• expdp (Data Pump export utility)
• impdp (Data Pump import utility)
• DBMS_DATAPUMP PL/SQL package (Data Pump API)
• DBMS_METADATA PL/SQL package (Data Pump Metadata API)

The expdp and impdp utilities use the DBMS_DATAPUMP and DBMS_METADATA built-in PL/SQL packages when exporting and importing data and metadata. The DBMS_DATAPUMP package moves entire databases or subsets of data between database environments. The DBMS_METADATA package exports and imports information about database objects.

Note You can call the DBMS_DATAPUMP and DBMS_METADATA packages independently (outside of expdp and impdp) from SQL*Plus. I rarely call these packages directly from SQL*Plus; but you may have a specific scenario where itfs desirable to interact directly with them.

When you start a Data Pump export or import job, a master operating-system process is initiated on the database server. This master process name has the format ora_dmNN_<SID>. On Linux/Unix systems, you can view this process from the operating-system prompt using the ps command:

$ ps -ef | grep ora_dm
oracle 14950 717 0 10:59:06 ? 0:10 ora_dm00_STAGE


Depending on the degree of parallelism and the work specified, a number of worker processes are also started. The master process coordinates the work between master and worker processes. The worker process names have the format ora_dwNN_<SID>.

Also, when a user starts an export or import job, a database status table is created (owned by the user who starts the job). This table exists for the duration of the Data Pump job. The name of the status table is dependent on what type of job you’re running. The table is named with the format SYS_<OPERATION>_<JOB_MODE>_NN, where OPERATION is either EXPORT or IMPORT. JOB_MODE can be FULL, SCHEMA, TABLE, TABLESPACE, and so on.

For example, if you’re exporting a schema, a table is created in your account with the name SYS_EXPORT_SCHEMA_NN, where NN is a number that makes the table name unique in the user’s schema. This status table contains information such as the objects exported/imported, start time, elapsed time, rows, error count, and so on. The status table has over 80 columns.
The status table is dropped by Data Pump upon successful completion of an export or import job. If you use the KILL_JOB interactive command, the master table is also dropped. If you stop a job with the STOP_JOB interactive command, the table isn’t removed and is used in the event you restart the job. If your job terminates abnormally, the master table is retained. You can delete the status table if you don’t plan to restart the job.

When Data Pump runs, it uses a database-directory object to determine where to write and read dump files and log files. Usually, you specify which directory object you want Data Pump to use. If you don’t specify a directory object, a default is used. A Data Pump export creates an export file and a log file. The export file contains the objects being exported. The log file contains a record of the job activities.

Below shows the architectural components related to a Data Pump export job. Here’s how you initiate the job from the command line:

$ expdp user/pwd dumpfile=exp.dmp logfile=exp.log directory=dp_dir

In this example, a database-directory object named DP_DIR is defined to reference the /oradump operating-system directory. The output files are defined via the command line to be exp.dmp and exp.log.

image

Below displays the architectural components of a Data Pump import job. Here’s how you initiate the job from the command line:

$ impdp user/pwd dumpfile=exp.dmp logfile=imp.log directory=dp_dir

In the above command, the Data Pump import reads from a dump file named exp.dmp, which is located in the operating-system directory referenced by the database-directory object named DP_DIR. The import job reads the dump file and populates database objects.

image

For each Data Pump job, you must ensure that you have access to a directory object.

Data Pump

Data Pump was introduced in Oracle Database 10g. It replaces the older exp/imp utilities. Data Pump enables you to quickly move data and/or metadata from one environment to another. You can use Data Pump in a variety of ways:

• Point-in-time logical backups of the entire database or subsets of data
• Replicating entire databases or subsets of data for testing or development
• Quickly generating Data Definition Language (DDL) required to re-create objects

Sometimes DBAs hold on to the old exp/imp utilities because they’re familiar with the syntax and these utilities get the job done quickly. Even if those legacy utilities are easy to use, you should consider using Data Pump going forward. Data Pump contains substantial functionality over the old exp/imp utilities:

• Performance with large data sets, allowing you to efficiently export and import gigabytes of data
• Interactive command-line utility, which gives you the ability to disconnect and then later re-attach to active Data Pump jobs
• Ability to export and import large amounts of data from a remote database directly into a local database without creating a dump file
• Ability to make on-the-fly changes to schemas, tablespaces, datafiles, and storage settings from export to import
• Sophisticated filtering of objects and data
• Security controlled via database-directory objects
• Advanced features such as compression and encryption