Hive第二天学习内容总结Hive 第三天DDL特别注意一下,没事别删除数据DML
Hive 第三天
[toc]
第二天内容回顾
Hive帮助文档的地址
https://cwiki.apache.org/confluence/display/Hive/Home
Hive SQL Language Manual: Commands, CLIs, Data Types,
DDL (create/drop/alter/truncate/show/describe), Statistics (analyze), Indexes, Archiving,
DML (load/insert/update/delete/merge, import/export, explain plan),
Queries (select), Operators and UDFs, Locks, Authorization
File Formats and Compression: RCFile, Avro, ORC, Parquet; Compression, LZO
Procedural Language: Hive HPL/SQL
Hive Configuration Properties
Hive Clients
Hive Client (JDBC, ODBC, Thrift)
HiveServer2: Overview, HiveServer2 Client and Beeline, Hive Metrics
DDL
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
database
- create
- drop
- alter
- Use
Table
Create
CREATE [TEMPORARY] [EXTERNAL] TABLE
Create Table As Select (CTAS)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path];
三种类型表
临时表:TEMPORARY
跟Hive的Session生命周期一致,Hive Client 关闭|退出 表也一起删除了 临时表的优先级比其他表高:当临时表与其他表名一致时,我们操作的是临时表 直到我们把临时表Drop掉,或者Alter掉,我们才可以操作其他表
外部表:EXTERNAL
只管理元数据,Drop表的时候,只删除原数据,HDFS上的数据,不会被删除 需要指定Location
内部表:没有修饰词
全部管理,元数据和HDFS上的数据,删除就都没了
特别注意一下,没事别删除数据
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
ROW FORMAT
原始数据,用什么样的格式,加载到我们Hive表 加载到我们表里的数据,原始数据不会改变
PARTITIONED BY
对我们数据进行分区
STORED AS
数据存储的文件格式
LOCATION
存放在HDFS 上目录的位置
Drop Truncate
DML
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
LOAD
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOCAL本地
LOCAL和inpath组合,决定是从hdfs上读取数据,还是从客户端位置读取数据
我们加载数据的时候,实际是把一个数据文件,移动到Hive warehouse目录下面,表名的这个目录
HDFS 上 直接就挪过去了
LOCAL 是上传到临时目录,然后在移动到相应的位置
OVERWRITE
是否覆盖原有数据
如果不覆盖原有数据的话,把原来的数据,复制到hive数据目录下,就会重复了xxx_copy
PARTITION
分区,根据PARTITION (gender='male',age='35')
INSERT
into Hive tables from queries
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
Example
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
into Hive tables from SQL
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
案例
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
INSERT INTO TABLE pageviews PARTITION (datestamp)
VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
QQ截图20171103203710.png
- 如何使用AndroidStudio将开源项目library发布到jcenter
- Android Studio 使用Gradle多渠道打包
- 某些浏览器中因cookie设置HttpOnly标志引起的安全问题
- 偷懒新姿势,打造属于RecyclerView的万能适配器Adapter和ViewHolder
- 科普哈希长度扩展攻击(Hash Length Extension Attacks)
- 分析 WordPress 3.8.2 修復的cookie偽造漏洞
- 技术宅打造全能美剧播放器
- 判断是否支持Heartbeat的NSE脚本
- [原创]Fluent NHibernate之旅二--Entity Mapping
- [原创]Fluent NHibernate之旅(三)-- 继承
- Web应用手工渗透测试——用SQLMap进行SQL盲注测试
- IIS4\IIS5 CGI环境块伪造0day漏洞
- [原创]Fluent NHibernate之旅(四)-- 关系(上)
- 基于流量的OpenSSL漏洞利用检测方法
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- opencv 图像腐蚀和图像膨胀的实现
- PHP实现微信退款的方法示例
- 基于Python和C++实现删除链表的节点
- python让函数不返回结果的方法
- PHP微商城开源代码实例
- PHP小程序支付功能完整版【基于thinkPHP】
- CodeIgniter框架实现的整合Smarty引擎DEMO示例
- PHP微信支付功能示例
- PHP中ltrim()函数的用法与实例讲解
- Laravel 中创建 Zip 压缩文件并提供下载的实现方法
- pytorch随机采样操作SubsetRandomSampler()
- Pytorch上下采样函数–interpolate用法
- scrapy框架携带cookie访问淘宝购物车功能的实现代码
- 浅析Python __name__ 是什么
- PHP判断访客是否手机端(移动端浏览器)访问的方法总结【4种方法】