内部表和外部表的区别

时间:2019-03-14
本文章向大家介绍内部表和外部表的区别,主要包括内部表和外部表的区别使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
Managed and External Tables(内部表和外部表)
Hive上有两种类型的表,一种是Managed Table(默认的),另一种是External Table(加上EXTERNAL关键字)。它俩的主要区别在于:当我们drop表时,Managed Table会同时删去
data(存储在HDFS上)和meta data(存储在MySQL),而External Table只会删meta data。
hive> use default;
OK
Time taken: 1.054 seconds
hive> show tables;
OK
Time taken: 0.559 seconds
# 创建内部表和外部表
hive> create table managed_table(
    > id int,
    > name string 
    > );
OK
Time taken: 0.677 seconds
hive> create external table external_table(
    > id int,
    > name string 
    > );
OK
Time taken: 0.146 seconds
hive> show tables;
OK
external_table
managed_table
Time taken: 0.05 seconds, Fetched: 2 row(s)
# HDFS中查看
[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hive/warehouse
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:40 /user/hive/warehouse/external_table
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:26 /user/hive/warehouse/hive1.db
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:28 /user/hive/warehouse/hive2.db
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:39 /user/hive/warehouse/managed_table
# MySQL中查看
mysql> select * from hive_meta.tbls\G;
*************************** 1. row ***************************
            TBL_ID: 11
       CREATE_TIME: 1529138399
             DB_ID: 1
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 11
          TBL_NAME: managed_table
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
            TBL_ID: 12
       CREATE_TIME: 1529138409
             DB_ID: 1
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 12
          TBL_NAME: external_table
          TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
2 rows in set (0.00 sec)

# 删除内部表和外部表
hive> drop table managed_table;
OK
Time taken: 1.143 seconds
hive> drop table external_table;
OK
Time taken: 0.265 seconds
# 再次查看
[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hive/warehouse
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:40 /user/hive/warehouse/external_table
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:26 /user/hive/warehouse/hive1.db
drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:28 /user/hive/warehouse/hive2.db
mysql> select * from hive_meta.tbls\G;
Empty set (0.00 sec)

ERROR: 
No query specified

如何查看一个表是内部表还是外部表,进入那个表所在的hive库,执行desc formatted tablename(表名);

hive (d6_hive)> desc formatted emp;

得到到的信息中有一个Table Type,后边会标明他是MANAGED_TABLE 还是External Table