【HDFS】distcp报错Check0sum mismatch

时间:2022-07-22
本文章向大家介绍【HDFS】distcp报错Check0sum mismatch,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

本来想写个 spark 任务来导数据的,但是时间有限,为了快速实现把数据从 HDFS 集群 A 转移到集群 B,还是选择用 hadoop distcp 命令来拷贝数据。具体的命令如下。

hadoop distcp hdfs://clusterA/xxx hdfs://clusterB:/xxx

没想到报错了。

错误信息的分析也很简单,就是没有 Check-sum 这个文件。看一下 help 信息。

# bin/hadoop distcp
usage: distcp OPTIONS [source_path...] <target_path>
              OPTIONS
 -async                 Should distcp execution be blocking
 -atomic                Commit all changes or none
 -bandwidth <arg>       Specify bandwidth per map in MB
 -delete                Delete from target, files missing in source
 -f <arg>               List of files that need to be copied
 -filelimit <arg>       (Deprecated!) Limit number of files copied to <= n
 -i                     Ignore failures during copy
 -log <arg>             Folder on DFS where distcp execution logs are
                        saved
 -m <arg>               Max number of concurrent maps to use for copy
 -mapredSslConf <arg>   Configuration for ssl config file, to use with
                        hftps://
 -overwrite             Choose to overwrite target files unconditionally,
                        even if they exist.
 -p <arg>               preserve status (rbugp)(replication, block-size,
                        user, group, permission)
 -sizelimit <arg>       (Deprecated!) Limit number of files copied to <= n
                        bytes
 -skipcrccheck          Whether to skip CRC checks between source and
                        target paths.
 -strategy <arg>        Copy strategy to use. Default is dividing work
                        based on file sizes
 -tmp <arg>             Intermediate work path to be used for atomic
                        commit
 -update                Update target, copying only missingfiles or
                        directories

注意 -skipcrccheckt-update 两个命令要一起用,用过之后,在拷贝数据文件之后,就不会再去校验 Check sum 文件了。

Reference

  1. https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html