Performance testing HBase using YCSB
2011-01-17 19:53
447 查看
There are many new serving databases available, including:
PNUTS
BigTable
HBase
Hypertable
Azure
Cassandra
CouchDB
Voldemort
MongoDb
Dynomite
…and many others
It is difficult to decide which system is right for your
application, partially because the features differ between systems, and
partially because there is not an easy way to compare the performance of
one system versus another.
The goal of the YCSB
project is to develop a
framework and common set of workloads for evaluating the performance of
different “key-value” and “cloud” serving stores. The project comprises
two things:
The YCSB
Client, an extensible workload generator
The Core workloads, a set of workload scenarios to be executed by the generator
Although the core workloads provide a well rounded picture of a
system’s performance, the Client is extensible so that you can define
new and different workloads to examine system aspects, or application
scenarios, not adequately covered by the core workload. Similarly, the
Client is extensible to support benchmarking different databases.
Although we include sample code for benchmarking HBase, Cassandra and
MongoDB, it is straightforward to write a new interface layer to
benchmark your favorite database.
A common use of the tool is to benchmark multiple systems and compare
them. For example, you can install multiple systems on the same hardware
configuration, and run the same workloads against each system. Then you
can plot the performance of each system (for example, as latency versus
throughput curves) to see when one system does better than another.
文章来源: http://blog.lars-francke.de/2010/08/16/performance-testing-hbase-using-ycsb/
I assume most of you know what HBase
is but just in case here is a snippet from Wikipedia
:
HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java.
Yahoo has published a paper
and the accompanying tool
(YCSB) about Benchmarking Cloud Serving Systems with YCSB
.
At the moment I am not interested in comparing different database
systems against each other but instead to only benchmark HBase. This is
useful to test custom patches and their performance impact or to test
different configuration options.
No matter which kind of workload you choose however keep in mind that
this is an artificial benchmark and it can’t replace a test with your
real data and load.
In this short blog post I’m going to outline how to get YCSB running
against a current version of HBase. I’m going to show this on a single
machine. In a real test setup you should of course be running YCSB on a
different machine (or multiple machines
) than your HBase cluster. A YCSB benchmark consists of two phases: a load
and a transaction
phase. The load
phase measures various statistics while importing a bunch of data into the database while the transaction
phase does just that, i.e. transactions on the data. There are multiple
predefined workloads that mimic typical database usage scenarios and
you can also define your own.
While you’ll probably run it against an already set up cluster I will
be using HBase in standalone mode here in its second development
release of 0.89.
For YSCB I’ve used the latest version checked out from Github but the latest released version (0.1.2
at the time of this writing) should work equally well. So do this:
As you can see YCSB requires a table called
in HBase and it has to contain one column family with an arbitrary name (i.e.
in my case). YCSB also needs all the libraries (jars) that the HBase
client needs to run. The easiest is to just copy everything from HBase’s
directory to the appropriate directory in YCSB.
A few things to note here:
This loads only 1000 records into HBase. You will want to increase the number to 100 million or more on a real test.
The documentation
is pretty good so make sure to read it should you have problems.
The documentation suggests not specifying properties (like
recordcount) on the command line but in a property file instead. You’ll
find instructions on how to do this on the aforementioned page.
The
parameter causes YCSB to print status messages to System.err every ten seconds, remove it if you don’t want them.
After the load operation has finished you can find statistics in the
file
Now we’ll run the transactions part of the workload (again, for explanations see the documentation of YCSB):
or
After each run you should inspect the
file. For explanations I’ll once again refer to the documentation. We’ve used
in these examples but there are in fact multiple predefined workloads (which are listed and explained in the documentation
).
That’s it. As you can see YCSB is pretty easy to set up. I still hope
this guide was helpful in getting started with it. Let me know if you
have any questions.
So you have a HBase cluster running somewhere and now you’re trying
to run YCSB from another machine but it doesn’t work because it can’t
connect to ZooKeeper?
If so try to copy your hbase-site.xml config from your cluster in the classpath of YCSB and try again.
Copy your
with all the configuration options to the
directory and add it to your classpath like this:
更多信息参考:
Getting Started
https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
PNUTS
BigTable
HBase
Hypertable
Azure
Cassandra
CouchDB
Voldemort
MongoDb
Dynomite
…and many others
It is difficult to decide which system is right for your
application, partially because the features differ between systems, and
partially because there is not an easy way to compare the performance of
one system versus another.
The goal of the YCSB
project is to develop a
framework and common set of workloads for evaluating the performance of
different “key-value” and “cloud” serving stores. The project comprises
two things:
The YCSB
Client, an extensible workload generator
The Core workloads, a set of workload scenarios to be executed by the generator
Although the core workloads provide a well rounded picture of a
system’s performance, the Client is extensible so that you can define
new and different workloads to examine system aspects, or application
scenarios, not adequately covered by the core workload. Similarly, the
Client is extensible to support benchmarking different databases.
Although we include sample code for benchmarking HBase, Cassandra and
MongoDB, it is straightforward to write a new interface layer to
benchmark your favorite database.
A common use of the tool is to benchmark multiple systems and compare
them. For example, you can install multiple systems on the same hardware
configuration, and run the same workloads against each system. Then you
can plot the performance of each system (for example, as latency versus
throughput curves) to see when one system does better than another.
文章来源: http://blog.lars-francke.de/2010/08/16/performance-testing-hbase-using-ycsb/
I assume most of you know what HBase
is but just in case here is a snippet from Wikipedia
:
HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java.
Yahoo has published a paper
and the accompanying tool
(YCSB) about Benchmarking Cloud Serving Systems with YCSB
.
At the moment I am not interested in comparing different database
systems against each other but instead to only benchmark HBase. This is
useful to test custom patches and their performance impact or to test
different configuration options.
No matter which kind of workload you choose however keep in mind that
this is an artificial benchmark and it can’t replace a test with your
real data and load.
In this short blog post I’m going to outline how to get YCSB running
against a current version of HBase. I’m going to show this on a single
machine. In a real test setup you should of course be running YCSB on a
different machine (or multiple machines
) than your HBase cluster. A YCSB benchmark consists of two phases: a load
and a transaction
phase. The load
phase measures various statistics while importing a bunch of data into the database while the transaction
phase does just that, i.e. transactions on the data. There are multiple
predefined workloads that mimic typical database usage scenarios and
you can also define your own.
Requirements/Setup
I am using a clean Ubuntu 10.04 installation but this should work on other distributions just as well.While you’ll probably run it against an already set up cluster I will
be using HBase in standalone mode here in its second development
release of 0.89.
For YSCB I’ve used the latest version checked out from Github but the latest released version (0.1.2
at the time of this writing) should work equally well. So do this:
$ sudo apt-get -y install ant openjdk-6-jdk git-core $ export JAVA_HOME= /usr/lib/jvm/java-6-openjdk/ $ wget http: //apache .easy-webs.de /hbase/hbase-0 .89.20100726 /hbase-0 .89.20100726-bin. tar .gz $ tar xvzf hbase-0.89.20100726-bin. tar .gz $ hbase-0.89.20100726 /bin/start-hbase .sh $ hbase-0.89.20100726 /bin/hbase shell create 'usertable' , 'family' exit $ git clone http: //github .com /brianfrankcooper/YCSB .git $ cp hbase-0.89.20100726 /lib/ * YCSB /db/hbase/lib $ cd YCSB $ ant $ ant dbcompile-hbase
usertable
in HBase and it has to contain one column family with an arbitrary name (i.e.
family
in my case). YCSB also needs all the libraries (jars) that the HBase
client needs to run. The easiest is to just copy everything from HBase’s
lib
directory to the appropriate directory in YCSB.
Running YCSB
At this point we should have HBase running somewhere and YCSB and its HBase driver compiled. Time to load some data into HBase.This loads only 1000 records into HBase. You will want to increase the number to 100 million or more on a real test.
The documentation
is pretty good so make sure to read it should you have problems.
The documentation suggests not specifying properties (like
recordcount) on the command line but in a property file instead. You’ll
find instructions on how to do this on the aforementioned page.
The
-s
parameter causes YCSB to print status messages to System.err every ten seconds, remove it if you don’t want them.
After the load operation has finished you can find statistics in the
load.dat
file
Now we’ll run the transactions part of the workload (again, for explanations see the documentation of YCSB):
transactions.dat
file. For explanations I’ll once again refer to the documentation. We’ve used
workloada
in these examples but there are in fact multiple predefined workloads (which are listed and explained in the documentation
).
That’s it. As you can see YCSB is pretty easy to set up. I still hope
this guide was helpful in getting started with it. Let me know if you
have any questions.
So you have a HBase cluster running somewhere and now you’re trying
to run YCSB from another machine but it doesn’t work because it can’t
connect to ZooKeeper?
If so try to copy your hbase-site.xml config from your cluster in the classpath of YCSB and try again.
Copy your
hbase-site.xml
with all the configuration options to the
db/hbase/conf
directory and add it to your classpath like this:
java -cp "build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/" ...
更多信息参考:
Getting Started
https://github.com/brianfrankcooper/YCSB/wiki/Getting-StartedRunning a Workload:
https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload相关文章推荐
- Performance testing HBase using YCSB
- Top 10 Favorite Performance Counters in Web Site Load Testing Using VSTS 2008
- Network Performance Toolkit: Using Open Source Testing Tools
- HBase Performance Testing
- Examples of Performance Analysis using NS
- Collecting Performance Metrics Using SQL Server DMV
- Performance Improvement in ASP.NET using Caching
- 使用YCSB对HBASE进行测试
- Slow performance occurs when you copy data to a TCP server by using a Windows Sockets API program
- SPI testing utility (using spidev driver)
- Performance vs. load vs. stress testing
- HOW TO Analyze ASP.NET Web Application Performance by Using the Performance Administration Tool
- Performance testing architecture
- (OK) install-ns3——using-testing
- MDS setting for testing PDef business components using AM tester
- MongoDB Index using when use sorting in the stress testing.
- Boost performance of pagination with infinite scrolling using Slice
- Compute and storage clouds using wide area high performance networks
- [Angular + Unit] AngularJS Unit testing using Karma
- How To Troubleshoot Oracle Redo Log Reading Extract Slow Performance Issue using TESTMAPPINGSPEED (文