ClickHouse入门实例-样例数据(ontime)
时间:2022-07-25
本文章向大家介绍ClickHouse入门实例-样例数据(ontime),主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
参考官方文档https://clickhouse.tech/docs/en/getting-started/example-datasets/ontime/
1、下载数据
https://yadi.sk/d/pOZxpa42sDdgm
2、加压缩
[root@elastic1 data]# ll
total 15439336
-rw-r--r-- 1 root root 3532492212 Sep 14 15:38 ontime.csv.xz
...
[root@elastic1 data]# xz -dk ontime.csv.xz
[root@elastic1 data]#
[root@elastic1 data]# ls -lh /tpdata/data/
total 75G
-rw-r--r-- 1 root root 62G Sep 14 15:38 ontime.csv
-rw-r--r-- 1 root root 3.3G Sep 14 15:38 ontime.csv.xz
....
[root@elastic1 data]#
3、新建表
CREATE TABLE `ontime` (
`Year` UInt16,
`Quarter` UInt8,
`Month` UInt8,
`DayofMonth` UInt8,
`DayOfWeek` UInt8,
`FlightDate` Date,
`UniqueCarrier` FixedString(7),
`AirlineID` Int32,
`Carrier` FixedString(2),
`TailNum` String,
`FlightNum` String,
`OriginAirportID` Int32,
`OriginAirportSeqID` Int32,
`OriginCityMarketID` Int32,
`Origin` FixedString(5),
`OriginCityName` String,
`OriginState` FixedString(2),
`OriginStateFips` String,
`OriginStateName` String,
`OriginWac` Int32,
`DestAirportID` Int32,
`DestAirportSeqID` Int32,
`DestCityMarketID` Int32,
`Dest` FixedString(5),
`DestCityName` String,
`DestState` FixedString(2),
`DestStateFips` String,
`DestStateName` String,
`DestWac` Int32,
`CRSDepTime` Int32,
`DepTime` Int32,
`DepDelay` Int32,
`DepDelayMinutes` Int32,
`DepDel15` Int32,
`DepartureDelayGroups` String,
`DepTimeBlk` String,
`TaxiOut` Int32,
`WheelsOff` Int32,
`WheelsOn` Int32,
`TaxiIn` Int32,
`CRSArrTime` Int32,
`ArrTime` Int32,
`ArrDelay` Int32,
`ArrDelayMinutes` Int32,
`ArrDel15` Int32,
`ArrivalDelayGroups` Int32,
`ArrTimeBlk` String,
`Cancelled` UInt8,
`CancellationCode` FixedString(1),
`Diverted` UInt8,
`CRSElapsedTime` Int32,
`ActualElapsedTime` Int32,
`AirTime` Int32,
`Flights` Int32,
`Distance` Int32,
`DistanceGroup` UInt8,
`CarrierDelay` Int32,
`WeatherDelay` Int32,
`NASDelay` Int32,
`SecurityDelay` Int32,
`LateAircraftDelay` Int32,
`FirstDepTime` String,
`TotalAddGTime` String,
`LongestAddGTime` String,
`DivAirportLandings` String,
`DivReachedDest` String,
`DivActualElapsedTime` String,
`DivArrDelay` String,
`DivDistance` String,
`Div1Airport` String,
`Div1AirportID` Int32,
`Div1AirportSeqID` Int32,
`Div1WheelsOn` String,
`Div1TotalGTime` String,
`Div1LongestGTime` String,
`Div1WheelsOff` String,
`Div1TailNum` String,
`Div2Airport` String,
`Div2AirportID` Int32,
`Div2AirportSeqID` Int32,
`Div2WheelsOn` String,
`Div2TotalGTime` String,
`Div2LongestGTime` String,
`Div2WheelsOff` String,
`Div2TailNum` String,
`Div3Airport` String,
`Div3AirportID` Int32,
`Div3AirportSeqID` Int32,
`Div3WheelsOn` String,
`Div3TotalGTime` String,
`Div3LongestGTime` String,
`Div3WheelsOff` String,
`Div3TailNum` String,
`Div4Airport` String,
`Div4AirportID` Int32,
`Div4AirportSeqID` Int32,
`Div4WheelsOn` String,
`Div4TotalGTime` String,
`Div4LongestGTime` String,
`Div4WheelsOff` String,
`Div4TailNum` String,
`Div5Airport` String,
`Div5AirportID` Int32,
`Div5AirportSeqID` Int32,
`Div5WheelsOn` String,
`Div5TotalGTime` String,
`Div5LongestGTime` String,
`Div5WheelsOff` String,
`Div5TailNum` String
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;
4、导入数据
[root@client tpdata]# cat ontime.csv | clickhouse-client --password --query "INSERT INTO ontime FORMAT CSV" --max_insert_block_size=100000
Password for user (default):
[root@client tpdata]#
报错超时
[root@client tpdata]# clickhouse-client --password --query "OPTIMIZE TABLE ontime FINAL"
Password for user (default):
Timeout exceeded while receiving data from server. Waited for 300 seconds, timeout is 300 seconds.
[root@client tpdata]#
5、查询实例
(1)查询总记录数
[root@client tpdata]# clickhouse-client --password --query "select count(*) from ontime"
Password for user (default):
166628027
[root@client tpdata]#
(2)没有平均班次
SELECT avg(c1)
FROM
(
SELECT Year, Month, count(*) AS c1
FROM ontime
GROUP BY Year, Month
);
(3)查询从2000年到2008年每天的航班数
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
(4)查询从2000年到2008年每周延误超过10分钟的航班数。
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
(5)查询2000年到2008年每个机场延误超过10分钟以上的次数
SELECT Origin, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY Origin
ORDER BY c DESC
LIMIT 10;
(6)查询2007年各航空公司延误超过10分钟以上的次数
SELECT Carrier, count(*)
FROM ontime
WHERE DepDelay>10 AND Year=2007
GROUP BY Carrier
ORDER BY count(*) DESC;
(7)查询2007年各航空公司延误超过10分钟以上的百分比
SELECT Carrier, avg(DepDelay>10)*100 AS c3
FROM ontime
WHERE Year=2007
GROUP BY Carrier
ORDER BY c3 DESC
(8)同上一个查询一致,只是查询范围扩大到2000年到2008年
SELECT Carrier, avg(DepDelay>10)*100 AS c3
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY Carrier
ORDER BY c3 DESC;
(9)每年航班延误超过10分钟的百分比
SELECT Year, avg(DepDelay>10)*100
FROM ontime
GROUP BY Year
ORDER BY Year;
(10)每年更受人们喜爱的目的地
SELECT DestCityName, uniqExact(OriginCityName) AS u
FROM ontime
WHERE Year >= 2000 and Year <= 2010
GROUP BY DestCityName
ORDER BY u DESC LIMIT 10;
(11)按年份统计
SELECT Year, count(*) AS c1
FROM ontime
GROUP BY Year;
(12)查询2010年最受欢迎的目的地
SELECT
OriginCityName,
DestCityName,
count(*) AS flights,
bar(flights, 0, 20000, 40)
FROM ontime
WHERE Year = 2010
GROUP BY
OriginCityName,
DestCityName
ORDER BY flights DESC
LIMIT 20
(13)最长飞行时间
SELECT
OriginCityName,
DestCityName,
count(*) AS flights,
avg(AirTime) AS duration
FROM ontime
GROUP BY
OriginCityName,
DestCityName
ORDER BY duration DESC
- 添加php的memcached扩展模块
- Android TextView中显示图片
- Nginx配置中的log_format用法梳理(设置详细的日志格式)
- 分享一个刷网页PV的python小脚本
- mysql完整备份时过滤掉某些库
- Jquery 结合Json控制Select下拉框
- ExtJs学习笔记(23)-ScriptTagProxy+XTemplate+WCF跨域取数据
- Centos7.2下Jumpserver V4.0环境安装部署记录
- 利用JQuery实现更简单的Ajax跨域请求
- 运维工作中sed常规操作命令梳理
- linux下安装php的imagick扩展模块(附php升级脚本)
- 用JS + WCF打造轻量级WebPart
- Android之assets资源
- ExtJs学习笔记(24)-Drag/Drop拖动功能
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 详解React Native监听Android回退按键与程序化退出应用
- 详解android webView独立进程通讯方式
- Android编程中File文件常见存储与读取操作demo示例
- Android读取properties配置文件的实例详解
- Android开发实现popupWindow弹出窗口自定义布局与位置控制方法
- Android编程实现使用handler在子线程中更新UI示例
- Android编程实现图片放大缩小功能ZoomControls控件用法实例
- 详解Android MacAddress 适配心得
- Android编程使用GestureDetector实现简单手势监听与处理的方法
- 【MySQL】通过SQL_Thread快速恢复binlog
- 渗透系列之flask框架开启debug模式漏洞分析
- Android之ImageSwitcher的实例详解
- Android中HTTP请求中文乱码解决办法
- Android编程之播放器MediaPlayer实现均衡器效果示例
- Android studio点击跳转WebView详解