Sqoop-1.4.7-部署与常见案例

时间:2022-07-26
本文章向大家介绍Sqoop-1.4.7-部署与常见案例,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

该文章是基于 Hadoop2.7.6_01_部署Hive-1.2.1_01_安装部署 进行的

1. 前言

在一个完整的大数据处理系统中,除了hdfs+mapreduce+hive组成分析系统的核心之外,还需要数据采集、结果数据导出、任务调度等不可或缺的辅助系统,而这些辅助工具在hadoop生态体系中都有便捷的开源框架,如图所示:

1.1. 概述

sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。

  导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;

  导出数据:从Hadoop的文件系统中导出数据到关系数据库

1.3. 工作机制

将导入或导出命令翻译成mapreduce程序来实现

在翻译出的mapreduce中主要是对inputformat和outputformat进行定制

2. Sqoop的安装部署

2.1. 软件部署

 1 [yun@mini01 software]$ pwd
 2 /app/software
 3 [yun@mini01 software]$ tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 
 4 [yun@mini01 software]$ mv sqoop-1.4.7.bin__hadoop-2.6.0 /app/sqoop-1.4.7
 5 [yun@mini01 software]$ cd /app/
 6 [yun@mini01 ~]$ ln -s sqoop-1.4.7/ sqoop
 7 [yun@mini01 ~]$ ll
 8 total 28
 9 …………
10 lrwxrwxrwx  1 yun yun    12 Aug  2 11:40 sqoop -> sqoop-1.4.7/
11 drwxr-xr-x  9 yun yun   318 Dec 19  2017 sqoop-1.4.7

2.2. 配置修改

 1 [yun@mini01 conf]$ pwd
 2 /app/sqoop/conf
 3 [yun@mini01 conf]$ ll
 4 total 32
 5 -rw-rw-r-- 1 yun yun 3895 Dec 19  2017 oraoop-site-template.xml
 6 -rw-rw-r-- 1 yun yun 1404 Dec 19  2017 sqoop-env-template.cmd
 7 -rwxr-xr-x 1 yun yun 1345 Dec 19  2017 sqoop-env-template.sh
 8 -rw-rw-r-- 1 yun yun 6044 Dec 19  2017 sqoop-site-template.xml
 9 -rw-rw-r-- 1 yun yun 6044 Dec 19  2017 sqoop-site.xml
10 [yun@mini01 conf]$ cp -a sqoop-env-template.sh sqoop-env.sh 
11 [yun@mini01 conf]$ cat sqoop-env.sh 
12 # Licensed to the Apache Software Foundation (ASF) under one or more
13 ………………
14 
15 # Set Hadoop-specific environment variables here.
16 
17 #Set path to where bin/hadoop is available  # 修改的配置
18 export HADOOP_COMMON_HOME=${HADOOP_HOME}
19 
20 #Set path to where hadoop-*-core.jar is available  # 修改的配置
21 export HADOOP_MAPRED_HOME=${HADOOP_HOME}
22 
23 #set the path to where bin/hbase is available
24 #export HBASE_HOME=
25 
26 #Set the path to where bin/hive is available  # 修改的配置
27 export HIVE_HOME=${HIVE_HOME}
28 
29 #Set the path for where zookeper config dir is
30 #export ZOOCFGDIR=

2.3. 加入mysql的jdbc驱动包

1 [yun@mini01 software]$ pwd
2 /app/software
3 [yun@mini01 software]$ cp -a mysql-connector-java-5.1.46.jar /app/sqoop/lib 

2.4. 加入hive的执行包

1 [yun@mini01 lib]$ pwd
2 /app/hive/lib
3 [yun@mini01 lib]$ cp -a hive-exec-1.2.1.jar /app/sqoop/lib/

避免出现

2.5. 验证启动

 1 [yun@mini01 bin]$ pwd
 2 /app/sqoop/bin
 3 [yun@mini01 bin]$ ./sqoop-version 
 4 18/08/02 14:56:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
 5 Sqoop 1.4.7
 6 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
 7 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
 8 [yun@mini01 bin]$ ./sqoop help  # 查看帮助 
 9 18/08/02 15:30:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
10 usage: sqoop COMMAND [ARGS]
11 
12 Available commands:
13   codegen            Generate code to interact with database records
14   create-hive-table  Import a table definition into Hive
15   eval               Evaluate a SQL statement and display the results
16   export             Export an HDFS directory to a database table
17   help               List available commands
18   import             Import a table from a database to HDFS
19   import-all-tables  Import tables from a database to HDFS
20   import-mainframe   Import datasets from a mainframe server to HDFS
21   job                Work with saved jobs
22   list-databases     List available databases on a server
23   list-tables        List available tables in a database
24   merge              Merge results of incremental imports
25   metastore          Run a standalone Sqoop metastore
26   version            Display version information
27 
28 See 'sqoop help COMMAND' for information on a specific command.

3. 数据库信息

1 # 建库
2 CREATE DATABASE sqoop_test DEFAULT CHARACTER SET utf8 ;
3 # 建账号  数据库在mini03机器上
4 grant all on sqoop_test.* to sqoop_test@'%' identified by 'sqoop_test';
5 grant all on sqoop_test.* to sqoop_test@'mini03' identified by 'sqoop_test';
6 # 刷新权限
7 flush privileges;

3.1. 表信息

表emp:

id

name

deg

salary

dept

1201

gopal

manager

50000

TP

1202

manisha

Proof reader

50000

TP

1203

khalil

php dev

30000

AC

1204

prasanth

php dev

30000

AC

1205

kranthi

admin

20000

TP

表emp_add:

id

name

deg

salary

1201

288A

vgiri

jublee

1202

108I

aoc

sec-bad

1203

144Z

pgutta

hyd

1204

78B

old city

sec-bad

1205

720X

hitec

sec-bad

表emp_conn:

id

name

deg

1201

2356742

gopal@tp.com

1202

1661663

manisha@tp.com

1203

8887776

khalil@ac.com

1204

9988774

prasanth@ac.com

1205

1231231

kranthi@tp.com

4. Sqoop的数据导入

“导入工具”导入单个表从RDBMS到HDFS。表中的每一行被视为HDFS的记录。所有记录都存储为文本文件的文本数据(或者Avro、sequence文件等二进制数据)

1 $ sqoop import (generic-args) (import-args)
2 $ sqoop-import (generic-args) (import-args)

4.1. 导入表数据到HDFS

 1 [yun@mini01 sqoop]$ pwd
 2 /app/sqoop 
 3 [yun@mini01 sqoop]$ bin/sqoop import   
 4   --connect jdbc:mysql://mini03:3306/sqoop_test   
 5   --username sqoop_test  
 6   --password sqoop_test   
 7   --table emp   
 8   --m 1
 9 18/08/02 15:57:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
10 18/08/02 15:57:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11 ………………
12 18/08/02 15:58:47 INFO mapreduce.Job:  map 100% reduce 0%
13 18/08/02 15:58:47 INFO mapreduce.Job: Job job_1533196573365_0001 completed successfully
14 ………………
15 18/08/02 15:58:47 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 47.2865 seconds (3.1933 bytes/sec)
16 18/08/02 15:58:47 INFO mapreduce.ImportJobBase: Retrieved 5 records.

查看导入的数据

 1 [yun@mini02 ~]$ hadoop fs -ls /user/yun/emp
 2 Found 2 items
 3 -rw-r--r--   2 yun supergroup          0 2018-08-02 15:58 /user/yun/emp/_SUCCESS
 4 -rw-r--r--   2 yun supergroup        151 2018-08-02 15:58 /user/yun/emp/part-m-00000
 5 [yun@mini02 ~]$ hadoop fs -cat /user/yun/emp/part-m-00000
 6 1201,gopal,manager,50000,TP
 7 1202,manisha,Proof reader,50000,TP
 8 1203,khalil,php dev,30000,AC
 9 1204,prasanth,php dev,30000,AC
10 1205,kranthi,admin,20000,TP

4.2. 导入表到HDFS指定目录

1 [yun@mini01 sqoop]$ pwd
2 /app/sqoop
3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
4   --username sqoop_test --password sqoop_test 
5   --target-dir /sqoop_test/table_emp/queryresult 
6   --table emp --num-mappers 1

  注意:如果没有目录,那么会创建

查看导入的数据

 1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult
 2 Found 2 items
 3 -rw-r--r--   2 yun supergroup          0 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/_SUCCESS
 4 -rw-r--r--   2 yun supergroup        151 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/part-m-00000
 5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult/part-m-00000
 6 1201,gopal,manager,50000,TP
 7 1202,manisha,Proof reader,50000,TP
 8 1203,khalil,php dev,30000,AC
 9 1204,prasanth,php dev,30000,AC
10 1205,kranthi,admin,20000,TP

4.3. 导入关系表到HIVE

 1 [yun@mini01 sqoop]$ pwd
 2 /app/sqoop
 3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
 4   --username sqoop_test --password sqoop_test 
 5   --table emp --hive-import 
 6   --num-mappers 1
 7 18/08/02 17:40:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
 8 18/08/02 17:40:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
 9 ………………
10 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 20.6744 seconds (7.3037 bytes/sec)
11 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Retrieved 5 records.
12 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp
13 18/08/02 17:41:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14 18/08/02 17:41:10 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive
15 18/08/02 17:41:10 INFO hive.HiveImport: Loading uploaded data into Hive
16 18/08/02 17:41:12 INFO hive.HiveImport: 
17 18/08/02 17:41:12 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/app/sqoop-1.4.7/lib/hive-exec-1.2.1.jar!/hive-log4j.properties
18 18/08/02 17:41:20 INFO hive.HiveImport: OK
19 18/08/02 17:41:20 INFO hive.HiveImport: Time taken: 1.677 seconds
20 18/08/02 17:41:20 INFO hive.HiveImport: Loading data to table default.emp
21 18/08/02 17:41:21 INFO hive.HiveImport: Table default.emp stats: [numFiles=1, totalSize=151]
22 18/08/02 17:41:21 INFO hive.HiveImport: OK
23 18/08/02 17:41:21 INFO hive.HiveImport: Time taken: 0.629 seconds
24 18/08/02 17:41:21 INFO hive.HiveImport: Hive import complete.
25 18/08/02 17:41:21 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

查看导入的数据

 1 hive (default)> show tables;
 2 OK
 3 emp
 4 Time taken: 0.031 seconds, Fetched: 1 row(s)
 5 hive (default)> select * from emp;
 6 OK
 7 1201    gopal    manager    50000.0    TP
 8 1202    manisha    Proof reader    50000.0    TP
 9 1203    khalil    php dev    30000.0    AC
10 1204    prasanth    php dev    30000.0    AC
11 1205    kranthi    admin    20000.0    TP
12 Time taken: 0.489 seconds, Fetched: 5 row(s)

4.4. 导入表到HIVE指定库指定表

 1 [yun@mini01 sqoop]$ pwd
 2 /app/sqoop
 3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
 4   --username sqoop_test --password sqoop_test 
 5   --table emp 
 6   --delete-target-dir 
 7   --fields-terminated-by 't' 
 8   --hive-import 
 9   --hive-database sqoop_test 
10   --hive-table hive_emp 
11   --num-mappers 1

注意:hive的库sqoop_test,必须先建立。否则会报:FAILED: SemanticException [Error 10072]: Database does not exist: sqoop_test

查看导入的数据

 1 0: jdbc:hive2://mini01:10000> use sqoop_test;
 2 No rows affected (0.049 seconds)
 3 0: jdbc:hive2://mini01:10000> show tables;
 4 +-----------+--+
 5 | tab_name  |
 6 +-----------+--+
 7 | hive_emp  |
 8 +-----------+--+
 9 1 row selected (0.076 seconds)
10 0: jdbc:hive2://mini01:10000> select * from hive_emp;
11 +--------------+----------------+---------------+------------------+----------------+--+
12 | hive_emp.id  | hive_emp.name  | hive_emp.deg  | hive_emp.salary  | hive_emp.dept  |
13 +--------------+----------------+---------------+------------------+----------------+--+
14 | 1201         | gopal          | manager       | 50000.0          | TP             |
15 | 1202         | manisha        | Proof reader  | 50000.0          | TP             |
16 | 1203         | khalil         | php dev       | 30000.0          | AC             |
17 | 1204         | prasanth       | php dev       | 30000.0          | AC             |
18 | 1205         | kranthi        | admin         | 20000.0          | TP             |
19 +--------------+----------------+---------------+------------------+----------------+--+
20 5 rows selected (0.162 seconds)

4.5. 导入表数据子集

4.5.1. where子句的导入

1 [yun@mini01 sqoop]$ pwd
2 /app/sqoop
3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
4   --username sqoop_test --password sqoop_test 
5   --table emp_add 
6   --where "city ='sec-bad'" 
7   --target-dir /sqoop_test/table_emp/queryresult2 
8   --num-mappers 1

查看导入的数据

1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult2
2 Found 2 items
3 -rw-r--r--   2 yun supergroup          0 2018-08-02 19:33 /sqoop_test/table_emp/queryresult2/_SUCCESS
4 -rw-r--r--   2 yun supergroup         72 2018-08-02 19:33 /sqoop_test/table_emp/queryresult2/part-m-00000
5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult2/part-m-00000
6 1202,108I,aoc,sec-bad
7 1204,78B,old city,sec-bad
8 1205,720X,hitec,sec-bad

4.5.2. query按需导入

1 [yun@mini01 sqoop]$ pwd
2 /app/sqoop
3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
4   --username sqoop_test --password sqoop_test 
5   --query 'select id,name,deg from emp WHERE  id>1203 and $CONDITIONS' 
6   --split-by id 
7   --fields-terminated-by 't' 
8   --target-dir /sqoop_test/table_emp/queryresult4 
9   --num-mappers 1

查看导入的数据

1 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00000
2 1204    prasanth    php dev
3 1205    kranthi    admin

4.6. 增量导入

所需参数

1 --check-column (col)
2 --incremental (mode)        # mode include append and lastmodified  
3 --last-value (value)

示例

 1 [yun@mini01 sqoop]$ pwd
 2 /app/sqoop
 3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
 4   --username sqoop_test --password sqoop_test 
 5   --table emp 
 6   --incremental append 
 7   --check-column id 
 8   --last-value 1202 
 9   --fields-terminated-by 't' 
10   --target-dir /sqoop_test/table_emp/queryresult4 
11   --num-mappers 1

查看导入的数据

 1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult4/
 2 Found 3 items
 3 -rw-r--r--   2 yun supergroup          0 2018-08-02 19:47 /sqoop_test/table_emp/queryresult4/_SUCCESS
 4 -rw-r--r--   2 yun supergroup         41 2018-08-02 19:47 /sqoop_test/table_emp/queryresult4/part-m-00000
 5 -rw-r--r--   2 yun supergroup         88 2018-08-02 20:16 /sqoop_test/table_emp/queryresult4/part-m-00001
 6 [yun@mini02 ~]$ 
 7 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00000
 8 1204    prasanth    php dev
 9 1205    kranthi    admin
10 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00001
11 1203    khalil    php dev    30000    AC
12 1204    prasanth    php dev    30000    AC
13 1205    kranthi    admin    20000    TP

5. Sqoop的数据导出

将数据从HDFS导出到RDBMS数据库

导出前,目标表必须存在于目标数据库中。

  • 默认操作是从将文件中的数据使用INSERT语句插入到表中
  • 更新模式下,是生成UPDATE语句更新表数据
1 $ sqoop export (generic-args) (export-args)
2 $ sqoop-export (generic-args) (export-args)

5.1. 示例

数据

 1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult
 2 Found 2 items
 3 -rw-r--r--   2 yun supergroup          0 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/_SUCCESS
 4 -rw-r--r--   2 yun supergroup        151 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/part-m-00000
 5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult/part-m-00000
 6 1201,gopal,manager,50000,TP
 7 1202,manisha,Proof reader,50000,TP
 8 1203,khalil,php dev,30000,AC
 9 1204,prasanth,php dev,30000,AC
10 1205,kranthi,admin,20000,TP

1、首先需要手动创建mysql中的目标表

 1 MariaDB [(none)]> use sqoop_test;
 2 Database changed
 3 
 4 MariaDB [sqoop_test]> CREATE TABLE employee (
 5           id INT NOT NULL PRIMARY KEY,
 6           name VARCHAR(20),
 7           deg VARCHAR(20),
 8           salary INT,
 9           dept VARCHAR(10));
10 Query OK, 0 rows affected (0.00 sec)
11 
12 MariaDB [sqoop_test]> show tables;
13 +----------------------+
14 | Tables_in_sqoop_test |
15 +----------------------+
16 | emp                  |
17 | emp_add              |
18 | emp_conn             |
19 | employee             |
20 +----------------------+
21 4 rows in set (0.00 sec)

2、然后执行导出命令

1 [yun@mini01 sqoop]$ pwd
2 /app/sqoop
3 [yun@mini01 sqoop]$ bin/sqoop export 
4   --connect jdbc:mysql://mini03:3306/sqoop_test 
5   --username sqoop_test --password sqoop_test 
6   --table employee 
7   --export-dir /sqoop_test/table_emp/queryresult/

3、验证表mysql命令行

 1 MariaDB [sqoop_test]> select * from employee;
 2 +------+----------+--------------+--------+------+
 3 | id   | name     | deg          | salary | dept |
 4 +------+----------+--------------+--------+------+
 5 | 1201 | gopal    | manager      |  50000 | TP   |
 6 | 1202 | manisha  | Proof reader |  50000 | TP   |
 7 | 1203 | khalil   | php dev      |  30000 | AC   |
 8 | 1204 | prasanth | php dev      |  30000 | AC   |
 9 | 1205 | kranthi  | admin        |  20000 | TP   |
10 +------+----------+--------------+--------+------+
11 5 rows in set (0.00 sec)