RCFile SequenceFile and Avro comparison Test
2013-05-17 10:19
274 查看
Hive原始文件1421M,按snappy压缩之后结果:
Hadoop Cluster Network Usage:
Hadoop Cluster CPU Usage:
结论:
RCFile在读速度是最高的。
AVRO File最占用CPU。
RCFile HDFS read在输入一样的情况下会比较低。
AVRO file hive表不支持增加列(例子:alter table test_avro add columns(x int)),其他都是支持的。
AVRO压缩效果最好(压缩使用的CPU最多)。
| | | | | select count(*) from table | select count(*) from (select key from table where key='') a; | |
file type | table name | row number | Map个数 | File Size(M) | count(*) (S) | count(key) (S) | HDFS Read |
text | test_text2 | 58336344 | 7 | 852.2 | 29.8 | 29.9 | 852.2 |
sequence | test_sequence | 58336344 | 4 | 906.1 | 42.6 | 41.9 | 916.4 |
rcfile | test_rc | 58336344 | 4 | 826.8 | 34 | 34.4 | 754.3 |
avro | test_avro2 | 58336344 | 3 | 590.9 | 75.8 | 90.7 | 591 |
Hadoop Cluster CPU Usage:
结论:
RCFile在读速度是最高的。
AVRO File最占用CPU。
RCFile HDFS read在输入一样的情况下会比较低。
AVRO file hive表不支持增加列(例子:alter table test_avro add columns(x int)),其他都是支持的。
AVRO压缩效果最好(压缩使用的CPU最多)。
相关文章推荐
- hive sequencefile 和rcfile 效率对比
- Hadoop中数据序列化的常用方式:SequenceFile, Avro, Thrift, ProtoBuff -- 待完善
- hal testapp and file open and close, and ES and PES dump. HAL doc position. hal compile
- hive存储格式sequencefile和rcfile的对比
- Comparison of NTFS and FAT File Systems(网摘)
- hive sequencefile 和rcfile 效率对比
- SIT to test the file and folder permission
- hive存储格式sequencefile和rcfile的对比
- JMeter学习-027-JMeter参数文件(脚本分发)路径问题:jmeter.threads.JMeterThread: Test failed! java.lang.IllegalArgumentException: File distributed.csv must exist and be readable解决方法
- Robbie's Notes on File System Security in Linux (and comparison to Windows NT)
- 9 Best File Comparison and Difference (Diff) Tools for Linux
- zebra/quagga ospf and unh test section 3
- How to increase swap size with a swap file and Partition
- A class file was not written. The project may be inconsistent, if so try refreshing this project and building it. eclipse提示错误
- FileUpload and UpdatePanel
- doctest exec using python ./file.py -v
- VBS Get Sql Server Table Data and Export to CSV File...
- HDU 5504 GT and sequence(排除陷阱就是正解)——BestCoder Round #60
- How to generate core dump file and debug with it?
- Reading and modify OS X plist file by command line