导读 | 时常都会有人问MySQL分区表要如何使用,MySQL分区表的和Oracle的差远了,该不该用MySQL分区表。 |
其实该不该用,我也不能给予很好的建议。还是那句话,觉得适合自己才是最好的。觉得自己可以搞定分区表那就用。
我多虑了- 好很好的使用分区表就需要做好对开发人员培训的准备,让他们知道要怎么样才能很好的使用分区表。
- 最好需要有一个自动化的计划,定时的自动处理分区的问题。
- DELETE 删除数据不会释放磁盘空间,DROP PARITION 会释放磁盘空间,这样节省了空间的同时,也不会因为delete标记数据过多带来性能问题。
创建 MySQL 分区数据
DROPTABLEord_order; --创建订单分区表 CREATETABLEord_order( order_idBIGINTNOTNULLAUTO_INCREMENTCOMMENT'订单ID', user_idINTNOTNULLCOMMENT'用户ID', goods_idINTNOTNULLCOMMENT'商品ID', order_priceINTNOTNULLDEFAULT0COMMENT'订单实际价格(分)', create_timeDATETIMENOTNULLDEFAULTCURRENT_TIMESTAMPCOMMENT'创建时间', PRIMARYKEY(order_id,create_time) ) PARTITIONBYLIST(YEAR(create_time)*100+MONTH(create_time)) ( PARTITIONp201601VALUESIN(201601), PARTITIONp201602VALUESIN(201602), PARTITIONp201603VALUESIN(201603), PARTITIONp201604VALUESIN(201604), PARTITIONp201605VALUESIN(201605), PARTITIONp201606VALUESIN(201606), PARTITIONp201607VALUESIN(201607), PARTITIONp201608VALUESIN(201608), PARTITIONp201609VALUESIN(201609), PARTITIONp201610VALUESIN(201610), PARTITIONp201611VALUESIN(201611), PARTITIONp201612VALUESIN(201612) ); --插入相关数据 INSERTINTOord_orderVALUES (NULL,10000001,11111111,1000,'2016-01-1301:00:10'), (NULL,10000001,11111112,2000,'2016-01-1302:00:20'), (NULL,10000001,11111113,3000,'2016-01-1303:00:30'), (NULL,10000001,11111114,4000,'2016-01-1304:00:40'), (NULL,10000001,11111115,5000,'2016-01-1305:00:50'), (NULL,10000001,11111111,1000,'2016-02-1301:00:10'), (NULL,10000001,11111112,2000,'2016-02-1302:00:20'), (NULL,10000001,11111113,3000,'2016-02-1303:00:30'), (NULL,10000001,11111114,4000,'2016-02-1304:00:40'), (NULL,10000001,11111115,5000,'2016-02-1305:00:50'), (NULL,10000001,11111111,1000,'2016-03-1301:00:10'), (NULL,10000001,11111112,2000,'2016-03-1302:00:20'), (NULL,10000001,11111113,3000,'2016-03-1303:00:30'), (NULL,10000001,11111114,4000,'2016-03-1304:00:40'), (NULL,10000001,11111115,5000,'2016-03-1305:00:50'), (NULL,10000001,11111111,1000,'2016-04-1301:00:10'), (NULL,10000001,11111112,2000,'2016-04-1302:00:20'), (NULL,10000001,11111113,3000,'2016-04-1303:00:30'), (NULL,10000001,11111114,4000,'2016-04-1304:00:40'), (NULL,10000001,11111115,5000,'2016-04-1305:00:50'), (NULL,10000001,11111111,1000,'2016-05-1301:00:10'), (NULL,10000001,11111112,2000,'2016-05-1302:00:20'), (NULL,10000001,11111113,3000,'2016-05-1303:00:30'), (NULL,10000001,11111114,4000,'2016-05-1304:00:40'), (NULL,10000001,11111115,5000,'2016-05-1305:00:50'), (NULL,10000001,11111111,1000,'2016-06-1301:00:10'), (NULL,10000001,11111112,2000,'2016-06-1302:00:20'), (NULL,10000001,11111113,3000,'2016-06-1303:00:30'), (NULL,10000001,11111114,4000,'2016-06-1304:00:40'), (NULL,10000001,11111115,5000,'2016-06-1305:00:50'), (NULL,10000001,11111111,1000,'2016-07-1301:00:10'), (NULL,10000001,11111112,2000,'2016-07-1302:00:20'), (NULL,10000001,11111113,3000,'2016-07-1303:00:30'), (NULL,10000001,11111114,4000,'2016-07-1304:00:40'), (NULL,10000001,11111115,5000,'2016-07-1305:00:50'), (NULL,10000001,11111111,1000,'2016-08-1301:00:10'), (NULL,10000001,11111112,2000,'2016-08-1302:00:20'), (NULL,10000001,11111113,3000,'2016-08-1303:00:30'), (NULL,10000001,11111114,4000,'2016-08-1304:00:40'), (NULL,10000001,11111115,5000,'2016-08-1305:00:50'), (NULL,10000001,11111111,1000,'2016-09-1301:00:10'), (NULL,10000001,11111112,2000,'2016-09-1302:00:20'), (NULL,10000001,11111113,3000,'2016-09-1303:00:30'), (NULL,10000001,11111114,4000,'2016-09-1304:00:40'), (NULL,10000001,11111115,5000,'2016-09-1305:00:50'), (NULL,10000001,11111111,1000,'2016-10-1301:00:10'), (NULL,10000001,11111112,2000,'2016-10-1302:00:20'), (NULL,10000001,11111113,3000,'2016-10-1303:00:30'), (NULL,10000001,11111114,4000,'2016-10-1304:00:40'), (NULL,10000001,11111115,5000,'2016-10-1305:00:50'), (NULL,10000001,11111111,1000,'2016-11-1301:00:10'), (NULL,10000001,11111112,2000,'2016-11-1302:00:20'), (NULL,10000001,11111113,3000,'2016-11-1303:00:30'), (NULL,10000001,11111114,4000,'2016-11-1304:00:40'), (NULL,10000001,11111115,5000,'2016-11-1305:00:50'), (NULL,10000001,11111111,1000,'2016-12-1301:00:10'), (NULL,10000001,11111112,2000,'2016-12-1302:00:20'), (NULL,10000001,11111113,3000,'2016-12-1303:00:30'), (NULL,10000001,11111114,4000,'2016-12-1304:00:40'), (NULL,10000001,11111115,5000,'2016-12-1305:00:50'); --查看分区p201601数据 SELECT*FROMord_orderPARTITION(p201601); --组合成的rowkey SELECTCONCAT(user_id,10000000000-UNIX_TIMESTAMP(create_time),goods_id) FROMord_orderPARTITION(p201601);结合HBase咯
创建HBase表 ord_order
由于版本兼容的问题,这边我需要先创建好HBase对应的表。不然会报不能自动创建 column family 的错误。
使用 hbase 创建 ord_order 表
hbase(main):033:0>create'ord_order',{NAME=>'cf1'}
使用 Sqoop 将MySQL的ord_order 表的 p201601 分区的数据导入HBase表。
/usr/local/sqoop/bin/sqoopimport\ --connectjdbc:mysql://192.168.137.11:3306/test \ --usernameHH\ --passwordoracle\ --query'SELECTCONCAT(user_id,10000000000-UNIX_TIMESTAMP(create_time),goods_id)ASorder_id,order_price,create_timeFROMord_orderPARTITION(p201601)WHERE$CONDITIONS'\ --hbase-tableord_order\ --hbase-create-table\ --hbase-row-keyorder_id\ --split-byorder_id\ --column-familycf1\ -m1
导入成功后就可以在MySQL上面将相关分区删除,并且创建之后需要的分区
ALTERTABLEord_order ADDPARTITION(PARTITIONp201701VALUESIN(201701)); ALTERTABLEord_orderDROPPARTITIONp201601;
查看Hbase中导入的数据
hbase(main):001:0>scan'ord_order' ROWCOLUMN+CELL 10000001854736755011111115column=cf1:create_time,timestamp=1479224942888,value=2016-01-1305:00:50.0 10000001854736755011111115column=cf1:order_price,timestamp=1479224942888,value=5000 10000001854737116011111114column=cf1:create_time,timestamp=1479224942888,value=2016-01-1304:00:40.0 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000 10000001854737477011111113column=cf1:create_time,timestamp=1479224942888,value=2016-01-1303:00:30.0 10000001854737477011111113column=cf1:order_price,timestamp=1479224942888,value=3000 10000001854737838011111112column=cf1:create_time,timestamp=1479224942888,value=2016-01-1302:00:20.0 10000001854737838011111112column=cf1:order_price,timestamp=1479224942888,value=2000 10000001854738199011111111column=cf1:create_time,timestamp=1479224942888,value=2016-01-1301:00:10.0 10000001854738199011111111column=cf1:order_price,timestamp=1479224942888,value=1000 5row(s)in0.5390secondsROW KEY 设计详解
HBase中的row key为 user_id, 10000000000-UNIX_TIMESTAMP(create_time), goods_id 3个字段组成。
这边值得注意的是 10000000000-UNIX_TIMESTAMP(create_time), 这样设计的原因是为了让订单能按时间的倒序排列, 这样就符合 越新的数据越先显示
如: 现在需要对用户 10000001 的订单进行分页, 每页两条数据, 并且按时间的倒序排序(最新订单最 先显示)
hbase(main):003:0>scan'ord_order',{COLUMNS=>['cf1:order_price'],ROWPREFIXFILTER=>'10000001',LIMIT=>2} ROWCOLUMN+CELL 10000001854736755011111115column=cf1:order_price,timestamp=1479224942888,value=5000 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000
点击下一页的数据:
hbase(main):004:0>scan'ord_order',{COLUMNS=>['cf1:order_price'],LIMIT=>3,STARTROW=>'10000001854737116011111114'} ROWCOLUMN+CELL 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000 10000001854737477011111113column=cf1:order_price,timestamp=1479224942888,value=3000 10000001854737838011111112column=cf1:order_price,timestamp=1479224942888,value=2000 3row(s)in0.0260seconds 上面获得了三行数据,在实际展现的时候去除第一行就好了,实际展示如下: 10000001854737477011111113column=cf1:order_price,timestamp=1479224942888,value=3000 10000001854737838011111112column=cf1:order_price,timestamp=1479224942888,value=2000
点击上一页
hbase(main):008:0>scan'ord_order',{COLUMNS=>['cf1:order_price'],LIMIT=>3,STARTROW=>'10000001854737477011111113',REVERSED=>true} ROWCOLUMN+CELL 10000001854737477011111113column=cf1:order_price,timestamp=1479224942888,value=3000 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000 10000001854736755011111115column=cf1:order_price,timestamp=1479224942888,value=5000 3row(s)in0.0640seconds 上面同样获得了三条数据,我们需要去除第一行,让后按数据集合倒序显示 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000 10000001854736755011111115column=cf1:order_price,timestamp=1479224942888,value=5000 ↓↓↓↓↓上面两行是集合数据下面两行数倒序遍历集合的数据(也是最终显示的数据) 10000001854736755011111115column=cf1:order_price,timestamp=1479224942888,value=5000 10000001854737116011111114column=cf1:order_price,timestamp=1479224942888,value=4000总结
对 HBase 的设计和使用是门学问,在实际使用中,一般需要和开发产品一起讨论如何设计 row key 比较好。当然,一般多多阅读过来人的经验往往也能够解决很多问题。因为你遇到的问题,别人可能也会遇到。
原文来自: