如何使用Sqoop2
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。 Fayson的github:https://github.com/fayson/cdhproject 提示:代码块部分可以左右滑动查看噢
1.文档编写目的
通sqoop1一样,sqoop2同样也是在Hadoop和关系型数据库之间互传数据的工具,只不过sqoop2引入sqoop server,集中化管理connector,而sqoop1只是客户端工具。
下图是sqoop1的架构;
版本:目前sqoop1社区版最新为1.4.7
架构:使用sqoop客户端直接提交
访问方式:通过sqoop1命令在后台终端直接访问
安全性:命令或者脚本中直接指定数据库,用户名和密码
下图是sqoop2的架构:
版本:目前sqoop2社区版最新为1.99.7
架构:引入sqoop服务端,通过connector实现集中管理
访问方式:命令行或脚本,RESTful API,Java API,Web UI
安全性:引入基于角色的权限管理机制
我们看看sqoop1和sqoop2的功能差别:
无论Cloudera还是Apache官网都不建议sqoop2直接上生产,因为功能尚不完善。
本文Fayson会以实操的方式,从安装sqoop2到使用进行说明介绍。
- 内容概述
1.Sqoop2安装
2.在MySQL中建立库表并插入数据
3.将MySQL中的数据导入到HDFS通过Sqoop2
4.常见问题
- 测试环境
1.操作系统RedHat7.3
2.CM和CDH版本为5.13.1
3.集群已启用Kerberos
2.Sqoop2安装
1.从CM主页“添加服务”
2.选择Sqoop2
3.选择一个节点安装Sqoop2的服务端
4.配置不用修改,继续下一步,不过数据库Cloudera官网建议使用PostgreSQL。
5.等待服务器启动成功。
安装完毕。
3.Sqoop2的使用
3.1在MySQL中创建库表,并插入数据
create database sqoop2 default character set utf8;
CREATE USER 'sqoop2'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON sqoop2. * TO 'sqoop2'@'%';
FLUSH PRIVILEGES;
use sqoop2;
create table widgets (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_name VARCHAR(64) NOT NULL,
price DECIMAL(10,2),
design_date DATE,
version INT,
design_comment VARCHAR(100));
insert into widgets values (null,'sprocket',0.25,'2010-02-10',1,'connects two gizmos');
insert into widgets values (null,'gizmo',4.00,'2009-11-30',4,null);
insert into widgets values (null,'gadget',99.99,'1983-08-13',13,'our flagship product');
(可左右滑动)
3.2创建MySQL和HDFS的链接
1.设置客户端配置服务器
[root@ip-172-31-16-68 ~]# sqoop2
Sqoop home directory: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/sqoop2
Feb 28, 2018 9:47:31 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Sqoop Shell: Type 'help' or 'h' for help.
sqoop:000> set server --host ip-172-31-16-68.ap-southeast-1.compute.internal --port 12000
Server is set successfully
(可左右滑动)
2.查看连接器
sqoop:000> show connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+----+------------------------+------------------+------------------------------------------------------+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------+----------------------+
| 1 | kafka-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| 2 | kite-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| 3 | hdfs-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| 4 | generic-jdbc-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------+----------------------+
(可左右滑动)
3.创建MySQL连接器
sqoop:000> create link -cid 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql link
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://ip-172-31-16-68.ap-southeast-1.compute.internal/sqoop2
Username: sqoop2
Password: ********
JDBC Connection Properties:
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2
(可左右滑动)
4.创建HDFS连接器
sqoop:000> create link --cid 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs link
Link configuration
HDFS URI: hdfs://ip-172-31-16-68.ap-southeast-1.compute.internal:8020/
New link was successfully created with validation status OK and persistent id 1
(可左右滑动)
3.3创建Sqoop2的job
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: sqoop2
From database configuration
Schema name: sqoop2
Table name: widgets
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:
ToJob configuration
Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 1
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory:
Throttling resources
Extractors:
Loaders:
There are issues with entered data, please revise your input:
Name: sqoop2
From database configuration
Schema name: sqoop2
Table name: widgets
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:
ToJob configuration
Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Error message: Can't be null nor empty
Output directory: hdfs://ip-172-31-16-68.ap-southeast-1.compute.internal:8020/fayson
Throttling resources
Extractors:
Loaders:
New job was successfully created with validation status OK and persistent id 1
sqoop:000> show job
+----+--------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+--------+----------------+--------------+---------+
| 1 | sqoop2 | 4 | 3 | true |
+----+--------+----------------+--------------+---------+
(可左右滑动)
3.4.启动Sqoop2的job
1.在后端直接执行命令启动Sqoop2的job
sqoop:000> start job -j 1 -s
Submission details
Job ID: 1
Server URL: http://ip-172-31-16-68.ap-southeast-1.compute.internal:12000/sqoop/
Created by: fayson
Creation date: 2018-02-28 22:33:00 EST
Lastly updated by: fayson
External ID: job_1519874824956_0002
http://ip-172-31-16-68.ap-southeast-1.compute.internal:8088/proxy/application_1519874824956_0002/
2018-02-28 22:33:00 EST: BOOTING - Progress is not available
2018-02-28 22:33:11 EST: RUNNING - 0.00 %
2018-02-28 22:33:21 EST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 8800
MB_MILLIS_MAPS: 9011200
TOTAL_LAUNCHED_MAPS: 2
MILLIS_MAPS: 8800
VCORES_MILLIS_MAPS: 8800
SLOTS_MILLIS_REDUCES: 0
OTHER_LOCAL_MAPS: 2
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 406069248
SPILLED_RECORDS: 0
FAILED_SHUFFLE: 0
CPU_MILLISECONDS: 1890
COMMITTED_HEAP_BYTES: 619708416
VIRTUAL_MEMORY_BYTES: 3162779648
MAP_OUTPUT_RECORDS: 3
SPLIT_RAW_BYTES: 219
GC_TIME_MILLIS: 75
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_READ_OPS: 0
FILE_WRITE_OPS: 0
FILE_BYTES_READ: 0
FILE_LARGE_READ_OPS: 0
HDFS_BYTES_READ: 219
FILE_BYTES_WRITTEN: 336174
HDFS_LARGE_READ_OPS: 0
HDFS_BYTES_WRITTEN: 146
HDFS_READ_OPS: 2
HDFS_WRITE_OPS: 2
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_WRITTEN: 3
ROWS_READ: 3
Job executed successfully
(可左右滑动)
2.在YARN的8088界面查看
3.查看导入到HDFS的文件
[root@ip-172-31-16-68 ~]# hadoop fs -ls /fayson
Found 4 items
-rw-r--r-- 3 sqoop2 supergroup 55 2018-02-28 22:33 /fayson/9c013cbb-3f05-49cb-a3c1-fe70ac47241c.txt
-rw-r--r-- 3 sqoop2 supergroup 91 2018-02-28 22:33 /fayson/f5d66a1b-4c7f-4dde-8549-7139e2dc7543.txt
drwxr-xr-x - fayson supergroup 0 2018-02-22 21:14 /fayson/jars
-rw-r--r-- 3 fayson supergroup 674 2018-02-04 09:35 /fayson/krb5.conf
[root@ip-172-31-16-68 ~]# hadoop fs -cat /fayson/9c013cbb-3f05-49cb-a3c1-fe70ac47241c.txt /fayson/f5d66a1b-4c7f-4dde-8549-7139e2dc7543.txt
1,'sprocket',0.25,'2010-02-10',1,'connects two gizmos'
2,'gizmo',4.00,'2009-11-30',4,NULL
3,'gadget',99.99,'1983-08-13',13,'our flagship product'
(可左右滑动)
至此,Sqoop2的安装使用,到导入数据到HDFS完毕。
4.常见问题
1.因为Fayson的集群安装了Kerberos,需要配置一下sqoop2的Kerberos认证,否则会报错。如下:
sqoop:000> create link --cid 3
Creating link for connector with id 3
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0004:Unable to find valid Kerberos ticket cache (kinit)
(可左右滑动)
将org.apache.sqoop.security.authentication.type=KERBEROS加入到/etc/sqoop2/conf/sqoop.properties
同步到所有节点
重启Sqoop2服务
2.需要在sqoop2服务节点的的sqoop2目录引入2个jar包,否则会报错如下:
引入以下两个jar包
[root@ip-172-31-16-68 lib]# cd /opt/cloudera/parcels/CDH/lib/hadoop/client
[root@ip-172-31-16-68 client]# ln -s ../../hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar .
[root@ip-172-31-16-68 client]# ln -s ../../hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar .
(可左右滑动)
说了这么多,最后总结重点,sqoop2会在CDH6直接放弃,所以sqoop2还是别用了。
参考:
https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_sqoop2_installation.html
https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop
提示:代码块部分可以左右滑动查看噢 为天地立心,为生民立命,为往圣继绝学,为万世开太平。 温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
推荐关注Hadoop实操,第一时间,分享更多Hadoop干货,欢迎转发和分享。
原创文章,欢迎转载,转载请注明:转载自微信公众号Hadoop实操
- 一个简单的bigfile tablespace无法扩展的案例处理 (r8笔记第31天)
- 测试环境的迁移式升级和数据整合(r8笔记第27天)
- 增量数据丢失的原因分析(r8笔记第26天)
- Java全栈开发Spring学习第三天
- 一个oracle bug的简单验证(r8笔记第45天)
- 批量导出csv文件的基本尝试(r8笔记第44天)
- Golang 中的并发限制与超时控制
- 一条简单的报警信息发现的oracle bug(r8笔记第42天)
- 一条insert语句导致的性能问题分析(一)(r8笔记第40天)
- 一条insert语句导致的性能问题分析(二)(r8笔记第43天)
- dataguard中的密码文件管理(r8笔记第39天)
- Mybatis_day02
- Golang构建HTTP服务(一)--- net/http库源码笔记
- Golang构建HTTP服务(二)--- Handler,ServeMux与中间件
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法