<< 我收藏的链接(19) | 首页 | 软件版本号的意义 >>

HBase distributed database Getting Started

HBase is a scalable, distributed database built on Hadoop Core.

Requirements

  • Java 1.5.x, preferably from Sun.
  • ssh must be installed and sshd must be running to use Hadoop's scripts to manage remote Hadoop daemons.
  • HBase currently is a file handle hog. The usual default of 1024 on *nix systems is insufficient if you are loading any significant amount of data into regionservers. See the FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs? for how to up the limit.

Getting Started

What follows presumes you have obtained a copy of HBase and are installing for the first time. If upgrading your HBase instance, see Upgrading.

  • ${HBASE_HOME}: Set HBASE_HOME to the location of the HBase root: e.g. /user/local/hbase.

Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of your Java installation.

If you are running a standalone operation, there should be nothing further to configure; proceed to Running and Confirming Your Installation. If you are running a distributed operation, continue reading.

Distributed Operation

Distributed mode requires an instance of the Hadoop Distributed File System (DFS). See the Hadoop requirements and instructions for how to set up a DFS.

Once you have confirmed your DFS setup, configuring HBase requires modification of the following two files: ${HBASE_HOME}/conf/hbase-site.xml and ${HBASE_HOME}/conf/regionservers. The former needs to be pointed at the running Hadoop DFS instance. The latter file lists all the members of the HBase cluster.

Use hbase-site.xml to override the properties defined in ${HBASE_HOME}/conf/hbase-default.xml (hbase-default.xml itself should never be modified). At a minimum the hbase.master and the hbase.rootdir properties should be redefined in hbase-site.xml to configure the host:port pair on which the HMaster runs (read about the HBase master, regionservers, etc) and to point HBase at the Hadoop filesystem to use. For example, adding the below to your hbase-site.xml says the master is up on port 60000 on the host example.org and that HBase should use the /hbase directory in the HDFS whose namenode is at port 9000, again on example.org:

<configuration>

<property>
<name>hbase.master</name>
<value>example.org:60000</value>
<description>The host and port that the HBase master runs at.
</description>
</property>

<property>
<name>hbase.rootdir</name>
<value>hdfs://example.org:9000/hbase</value>
<description>The directory shared by region servers.
</description>
</property>

</configuration>

The regionserver file lists all the hosts running HRegionServers, one host per line (This file is HBase synonym of the hadoop slaves file at ${HADOOP_HOME}/conf/slaves).

Of note, if you have made HDFS client configuration on your hadoop cluster, hbase will not see this configuration unless you do one of the following:

  • Add a pointer to your HADOOP_CONF_DIR to CLASSPATH in hbase-env.sh
  • Add a copy of hadoop-site.xml to ${HBASE_HOME}/conf, or
  • If only a small set of HDFS client configurations, add them to hbase-site.xml
An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will make files will create files with the default of 3 unless you do the above to make the configuration available to hbase.

Running and Confirming Your Installation

If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.

If you are running a distributed cluster you will need to start the Hadoop DFS daemons before starting HBase and stop the daemons after HBase has shut down. Start and stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh. Ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.

Start HBase with the following command:

${HBASE_HOME}/bin/start-hbase.sh

Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell to obtain a shell against HBase from which you can execute HQL commands (HQL is a severe subset of SQL). In the HBase shell, type help; to see a list of supported HQL commands. Note that all commands in the HBase shell must end with ;. Test your installation by creating, viewing, and dropping a table, as per the help instructions. Be patient with the create and drop operations as they may each take 10 seconds or more. To stop HBase, exit the HBase shell and enter:

${HBASE_HOME}/bin/stop-hbase.sh

If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.

The default location for logs is ${HBASE_HOME}/logs.

HBase also puts up a UI listing vital attributes. By default its deployed on the master host at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational http server at 60030).

Upgrading

After installing a new HBase on top of data written by a previous HBase version, before starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate migration script. It will make any adjustments to the filesystem data under hbase.rootdir necessary to run the HBase version. It does not change your install unless you explicitly ask it to.

Example API Usage

Once you have a running HBase, you probably want a way to hook your application up to it. If your application is in Java, then you should use the Java API. Here's an example of what a simple client might look like. This example assumes that you've created a table called "myTable" with a column family called "myColumnFamily".

import org.apache.hadoop.hbase.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HStoreKey;
import org.apache.hadoop.hbase.HScannerInterface;
import org.apache.hadoop.io.Text;
import java.io.IOException;

public class MyClient {

public static void main(String args[]) throws IOException {
// You need a configuration object to tell the client where to connect.
// But don't worry, the defaults are pulled from the local config file.
HBaseConfiguration config = new HBaseConfiguration();

// This instantiates an HTable object that connects you to the "myTable"
// table.
HTable table = new HTable(config, new Text("myTable"));

// Tell the table that you'll be updating row "myRow". The lockId you get
// back uniquely identifies your batch of updates. (Note, however, that
// only one update can be in progress at a time. This is fixed in HBase
// version 0.2.0.)
long lockId = table.startUpdate(new Text("myRow"));

// The HTable#put method takes the lockId you got from startUpdate, a Text
// that describes what cell you want to put a value into, and a byte array
// that is the value you want to store. Note that if you want to store
// strings, you have to getBytes() from the string for HBase to understand
// how to store it. (The same goes for primitives like ints and longs and
// user-defined classes - you must find a way to reduce it to bytes.)
table.put(lockId, new Text("myColumnFamily:columnQualifier1"),
"columnQualifier1 value!".getBytes());

// Deletes are batch operations in HBase as well.
table.delete(lockId, new Text("myColumnFamily:cellIWantDeleted"));

// Once you've done all the puts you want, you need to commit the results.
// The HTable#commit method takes the lockId that you got from startUpdate
// and pushes the batch of changes you made into HBase.
table.commit(lockId);

// Alternately, if you decide that you don't want the changes you've been
// accumulating anymore, you can use the HTable#abort method.
// table.abort(lockId);

// Now, to retrieve the data we just wrote. Just like when we store them,
// the values that come back are byte arrays. If you happen to know that
// the value contained is a string and want an actual string, then you
// must convert it yourself.
byte[] valueBytes = table.get(new Text("myRow"),
new Text("myColumnFamily:columnQualifier1"));
String valueStr = new String(valueBytes);

// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table.
HStoreKey row = new HStoreKey();
SortedMap

There are many other methods for putting data into and getting data out of HBase, but these examples should get you started. See the HTable javadoc for more methods. Additionally, there are methods for managing tables in the HBaseAdmin class.

If your client is NOT Java, then you should consider the Thrift or REST libraries.

Related Documentation

标签 : ,



发表评论 发送引用通报