hbase集群region数量和大小的影响
1、Region数量的影响
2、region大小的影响
169.2.2. Number of regions per RS - upper bound
In production scenarios, where you have a lot of data, you are normally concerned with the maximum number of regions you can have per server. too many regions has technical discussion on the subject. Basically, the maximum number of regions is mostly determined by memstore memory usage. Each region has its own memstores; these grow up to a configurable size; usually in 128-256 MB range, see hbase.hregion.memstore.flush.size. One memstore exists per column family (so there’s only one per region if there’s one CF in the table). The RS dedicates some fraction of total memory to its memstores (see hbase.regionserver.global.memstore.size). If this memory is exceeded (too much memstore usage), it can cause undesirable consequences such as unresponsive server or compaction storms. A good starting point for the number of regions per RS (assuming one table) is:
((RS memory) * (total memstore fraction)) / ((memstore size)*(# column families))
This formula is pseudo-code. Here are two formulas using the actual tunable parameters, first for HBase 0.98+ and second for HBase 0.94.x.
- HBase 0.98.x
((RS Xmx) * hbase.regionserver.global.memstore.size) / (hbase.hregion.memstore.flush.size * (# column families))
- HBase 0.94.x
((RS Xmx) * hbase.regionserver.global.memstore.upperLimit) / (hbase.hregion.memstore.flush.size * (# column families))+
If a given RegionServer has 16 GB of RAM, with default settings, the formula works out to 16384*0.4/128 ~ 51 regions per RS is a starting point. The formula can be extended to multiple tables; if they all have the same configuration, just use the total number of families.
This number can be adjusted; the formula above assumes all your regions are filled at approximately the same rate. If only a fraction of your regions are going to be actively written to, you can divide the result by that fraction to get a larger region count. Then, even if all regions are written to, all region memstores are not filled evenly, and eventually jitter appears even if they are (due to limited number of concurrent flushes). Thus, one can have as many as 2-3 times more regions than the starting point; however, increased numbers carry increased risk.
For write-heavy workload, memstore fraction can be increased in configuration at the expense of block cache; this will also allow one to have more regions.
169.2.3. Number of regions per RS - lower bound
HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly the entire cluster will be idle. This really can’t be stressed enough, since a common problem is loading 200MB data into HBase and then wondering why your awesome 10 node cluster isn’t doing anything.
On the other hand, if you have a very large amount of data, you may also want to go for a larger number of regions to avoid having regions that are too large.
169.2.4. Maximum region size
For large tables in production scenarios, maximum region size is mostly limited by compactions - very large compactions, esp. major, can degrade cluster performance. Currently, the recommended maximum region size is 10-20Gb, and 5-10Gb is optimal. For older 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
The size at which the region is split into two is generally configured via hbase.hregion.max.filesize; for details, see arch.region.splits.
If you cannot estimate the size of your tables well, when starting off, it’s probably best to stick to the default region size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up).
In HBase 0.98, experimental stripe compactions feature was added that would allow for larger regions, especially for log data. See ops.stripe.
169.2.5. Total data size per region server
According to above numbers for region size and number of regions per region server, in an optimistic estimate 10 GB x 100 regions per RS will give up to 1TB served per region server, which is in line with some of the reported multi-PB use cases. However, it is important to think about the data vs cache size ratio at the RS level. With 1TB of data per server and 10 GB block cache, only 1% of the data will be cached, which may barely cover all block indices.
原文地址:https://www.cnblogs.com/dtmobile-ksw/p/11374058.html
- 用谷歌浏览器来当手机模拟器
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(27)-权限管理系统-分配用户给角色
- ASP.NET MVC5+EF6+EasyUI 后台管理系统-分配角色给用户
- 体验vs11 Beta
- 构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(25)-权限管理系统-系统管理员(附生成器)
- 构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(24)-权限管理系统-将权限授权给角色
- 构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(23)-权限管理系统-角色组模块
- jQuery Gallery Plugin在Asp.Net中使用
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(22)-权限管理系统-模块导航制作
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(20)-权限管理系统-根据权限获取菜单
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(19)-权限管理系统-用户登录
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(21)-权限管理系统-跑通整个系统
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(18)-权限管理系统-表数据
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(17)-LinQ动态排序
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- keras的backend 设置 tensorflow,theano操作
- spring-boot-route(三)实现多文件上传
- PHP attributes()函数讲解
- PHP children()函数讲解
- spring-boot-route(四)全局异常处理
- PHP registerXPathNamespace()函数讲解
- Python闭包装饰器使用方法汇总
- spring-boot-route(五)整合Swaager2生成接口文档
- spring-boot-route(六)整合JApiDocs生成接口文档
- Python unittest基本使用方法代码实例
- spring-boot-route(七)整合jdbcTemplate操作数据库
- Pytorch 卷积中的 Input Shape用法
- 解决TensorFlow程序无限制占用GPU的方法
- 基于Python的自媒体小助手—登录页面的实现代码
- PHP addAttribute()函数讲解