【TBase开源版测评】分布式数据库复制表关联查询

在项目中有时候会涉及到数据节点之间的数据交互，有时候会带来比较大的网络开销，同时性能也不佳，可以尝试复制表来解决此类问题，本次我们就来体验一下TBase复制表关联查询的功能。我们的项目中有一个字典表中，其中保存了项目中会用到的一些常量定义，比如性别、通知类型、消息类型、订单类型、支付类型等，这些数据需要经常参与join操作、表数据量比较小，一旦定义之后在整个项目运行过程中变化不多。

一、体验流程

1、在TBase集群中创建ha_dict表并使用distribute by replication关键字创建复制表。

create table ha_dict(id int,parent_id int,code text,dict_key int,dict_value text,sort int,remark text,is_deleted int) distribute by replication;

为了方便对比，我们同时创建了ha_dict_old表。

create table ha_dict(id int,parent_id int,code text,dict_key int,dict_value text,sort int,remark text,is_deleted int);

从原有项目数据库中将ha_dict表的数据导入到这两个表中。

2、创建一个用户表，里面有性别需要join关联字典表中搜索。

create table ha_user(id int,username text,sex int) distribute by shard(id);

往用户表中导入数据。

3、使用join查询关联两个不同的表进行查看：

explain select * from ha_user as u join ha_dict as d on u.sex=d.dict_key where u.id=6 and d.code='sex';

explain select * from ha_user as u join ha_dict_old as d on u.sex=d.dict_key where u.id=6 and d.code='sex';

如上图所示，创建ha_dict表的时候增加distribute by replication关键字之后，非shard key的join下推到了dn001进行，在很多场景下是可以提升查询性能的。

同时使用java语言对这两种查询方式进行性能比对，参考官方应用接入指南的代码，加上时间统计：

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class TbaseApplication {
 public static void main(String[] args)
 {
  Connection c = null;
        Statement stmt = null;
        try {
          Class.forName("org.postgresql.Driver");
          c = DriverManager.getConnection("jdbc:postgresql://152.136.155.36:30004/postgres?currentSchema=public&binaryTransfer=false","tbase", "tbase");
          System.out.println("Opened database successfully");
          long beginTime=System.currentTimeMillis();
          System.out.println(beginTime);
          stmt = c.createStatement();
          String sql = "select * from ha_user as u join ha_dict_old as d on u.sex=d.dict_key where u.id=6 and d.code='sex';" ;
// String sql = "select * from ha_user as u join ha_dict as d on u.sex=d.dict_key where u.id=6 and d.code='sex';" ;
          ResultSet result=stmt.executeQuery(sql);
          long endTime=System.currentTimeMillis();
          System.out.println(endTime);
          System.out.println("执行时间:"+(endTime-beginTime));
          stmt.close();
          c.close();
        } catch ( Exception e ) {
          System.err.println( e.getClass().getName()+": "+ e.getMessage() );
          System.exit(0);
        }
 }
}

关联ha_dict_old表查询时输出如下图：

关联ha_dict表查询时输出如下图：

二、体验总结

通过本次体验，对TBase的分布式数据库复制表关联查询功能有了深刻的认识，虽然因为我们的实验数据比较小，不知道这样的性能参数是否能做为参考，后期如果有性能上需要提升或是要求比较高的项目，可以在这方面再多做体验和测试。