Replicating Data to Impala Clusters
Replicating Impala Metadata
Note: This feature is not available if the source and destination clusters run CDH 5.12 or
higher.Impala metadata replication is performed as a part of Hive replication. Impala replication is only supported between two CDH clusters. The Impala and Hive services must be running on both clusters.
- Schedule Hive replication as described in Configuring Replication of Hive/Impala Data.
- Confirm that the Replicate Impala Metadata option is set to Yes on the Advanced tab in the Create Hive Replication dialog.
Note: To run queries or execute DDL statements on tables that have been replicated to a destination
cluster, you must run the Impala INVALIDATE METADATA statement on the destination cluster to prevent queries from failing. See INVALIDATE METADATA StatementInvalidating Impala Metadata
For Impala clusters that do not use LDAP authentication, you can configure Hive/Impala replication jobs to automatically invalidate Impala metadata after replication completes.
The configuration causes the Hive/Impala replication job to run the Impala INVALIDATE METADATA statement on the destination cluster after completing the replication. The statement purges the metadata of the replicated tables and views within the destination cluster's Impala upon completion of replication, allowing other Impala clients at the destination to query these tables successfully with accurate results. However, this operation is potentially unsafe if DDL operations are being performed on any of the replicated tables or views while the replication is running. In general, directly modifying replicated data/metadata on the destination is not recommended. Ignoring this can lead to unexpected or incorrect behavior of applications and queries using these tables or views.
- Schedule a Hive/Impala replication as described in Configuring Replication of Hive/Impala Data.
- On the Advanced tab, select the Invalidate Impala Metadata on Destination option.
Alternatively, you can run the INVALIDATE METADATA statement manually for replicated tables. For more information about the statement, see INVALIDATE METADATA Statement.
Note: If the source contains Hive UDFs, you must run the INVALIDATE
METADATA statement manually and without any tables specified even if you configure the automatic invalidation.| << Monitoring the Performance of Hive/Impala Replications | ©2016 Cloudera, Inc. All rights reserved | Using Snapshots with Replication >> |
| Terms and Conditions Privacy Policy |