public final class HBaseClient extends Object
Unlike the traditional HBase client (HTable
), this client should be
instantiated only once. You can use it with any number of tables at the
same time. The only case where you should have multiple instances is when
you want to use multiple different clusters at the same time.
If you play by the rules, this client is (in theory :D
) completely
thread-safe. Read the documentation carefully to know what the requirements
are for this guarantee to apply.
This client is fully non-blocking, any blocking operation will return a
Deferred
instance to which you can attach a Callback
chain
that will execute when the asynchronous operation completes.
HBaseRpc
instances passed to this classHBaseRpc
passed to a method of this class should not be
changed or re-used until the Deferred
returned by that method
calls you back. Changing or re-using any HBaseRpc
for
an RPC in flight will lead to unpredictable results and voids
your warranty.
durable
argument. When an edit
requests to be durable, the success of the RPC guarantees that the edit is
safely and durably stored by HBase and won't be lost. In case of server
failures, the edit won't be lost although it may become momentarily
unavailable. Setting the durable
argument to false
makes
the operation complete faster (and puts a lot less strain on HBase), but
removes this durability guarantee. In case of a server failure, the edit
may (or may not) be lost forever. When in doubt, leave it to true
(or use the corresponding method that doesn't accept a durable
argument as it will default to true
). Setting it to false
is useful in cases where data-loss is acceptable, e.g. during batch imports
(where you can re-run the whole import in case of a failure), or when you
intend to do statistical analysis on the data (in which case some missing
data won't affect the results as long as the data loss caused by machine
failures preserves the distribution of your data, which depends on how
you're building your row keys and how you're using HBase, so be careful).
Bear in mind that this durability guarantee holds only once the RPC has
completed successfully. Any edit temporarily buffered on the client side
or in-flight will be lost if the client itself crashes. You can control
how much buffering is done by the client by using setFlushInterval(short)
and you can force-flush the buffered edits by calling flush()
. When
you're done using HBase, you must not just give up your
reference to your HBaseClient
, you must shut it down gracefully by
calling shutdown()
. If you fail to do this, then all edits still
buffered by the client will be lost.
NOTE: This entire section assumes that you use a distributed file system that provides HBase with the required durability semantics. If you use HDFS, make sure you have a version of HDFS that provides HBase the necessary API and semantics to durability store its data.
throws
clausesDeferred
object they return to you can carry an
exception that you should handle (using "errbacks", see the javadoc of
Deferred
). In order to be able to do proper asynchronous error
handling, you need to know what types of exceptions you're expected to face
in your errbacks. In order to document that, the methods of this API use
javadoc's @throws
to spell out the exception types you should
handle in your errback. Asynchronous exceptions will be indicated as such
in the javadoc with "(deferred)".
For instance, if a method foo
pretends to throw an
UnknownScannerException
and returns a Deferred<Whatever>
,
then you should use the method like so:
HBaseClient client = ...;This code callsDeferred
<Whatever>
d = client.foo(); d.addCallbacks(newCallback
<Whatever, SomethingElse>
() { SomethingElse call(Whatever arg) { LOG.info("Yay, RPC completed successfully!"); return new SomethingElse(arg.getWhateverResult()); } String toString() { return "handle foo response"; } }, newCallback
<Exception, Object>
() { Object call(Exception arg) { if (arg instanceofUnknownScannerException
) { LOG.error("Oops, we used the wrong scanner?", arg); return otherAsyncOperation(); // returns aDeferred<Blah>
} LOG.error("Sigh, the RPC failed and we don't know what to do", arg); return arg; // Pass on the error to the next errback (if any); } String toString() { return "foo errback"; } });
foo
, and upon successful completion transforms the
result from a Whatever
to a SomethingElse
(which will then
be given to the next callback in the chain, if any). When there's a
failure, the errback is called instead and it attempts to handle a
particular type of exception by retrying the operation differently.Modifier and Type | Field and Description |
---|---|
static byte[] |
EMPTY_ARRAY
An empty byte array you can use.
|
Constructor and Description |
---|
HBaseClient(String quorum_spec)
Constructor.
|
HBaseClient(String quorum_spec,
String base_path)
Constructor.
|
HBaseClient(String quorum_spec,
String base_path,
ClientSocketChannelFactory channel_factory)
Constructor for advanced users with special needs.
|
HBaseClient(String quorum_spec,
String base_path,
Executor executor)
Constructor for advanced users with special needs.
|
Modifier and Type | Method and Description |
---|---|
Deferred<Boolean> |
atomicCreate(PutRequest edit)
Atomically insert a new cell in HBase.
|
Deferred<Long> |
atomicIncrement(AtomicIncrementRequest request)
Atomically and durably increments a value in HBase.
|
Deferred<Long> |
atomicIncrement(AtomicIncrementRequest request,
boolean durable)
Atomically increments a value in HBase.
|
Deferred<Long> |
bufferAtomicIncrement(AtomicIncrementRequest request)
Buffers a durable atomic increment for coalescing.
|
Deferred<Boolean> |
compareAndSet(PutRequest edit,
byte[] expected)
Atomic Compare-And-Set (CAS) on a single cell.
|
Deferred<Boolean> |
compareAndSet(PutRequest edit,
String expected)
Atomic Compare-And-Set (CAS) on a single cell.
|
long |
contendedMetaLookupCount()
Deprecated.
|
Deferred<Object> |
delete(DeleteRequest request)
Deletes data from HBase.
|
Deferred<Object> |
ensureTableExists(byte[] table)
Ensures that a given table really exists.
|
Deferred<Object> |
ensureTableExists(String table)
Ensures that a given table really exists.
|
Deferred<Object> |
ensureTableFamilyExists(byte[] table,
byte[] family)
Ensures that a given table/family pair really exists.
|
Deferred<Object> |
ensureTableFamilyExists(String table,
String family)
Ensures that a given table/family pair really exists.
|
Deferred<Object> |
flush()
Flushes to HBase any buffered client-side write operation.
|
Deferred<ArrayList<KeyValue>> |
get(GetRequest request)
Retrieves data from HBase.
|
short |
getFlushInterval()
Returns the maximum time (in milliseconds) for which edits can be buffered.
|
int |
getIncrementBufferSize()
Returns the capacity of the increment buffer.
|
Timer |
getTimer()
Returns the timer used by this client.
|
Deferred<RowLock> |
lockRow(RowLockRequest request)
Acquires an explicit row lock.
|
Scanner |
newScanner(byte[] table)
Creates a new
Scanner for a particular table. |
Scanner |
newScanner(String table)
Creates a new
Scanner for a particular table. |
Deferred<Object> |
prefetchMeta(byte[] table)
Eagerly prefetches and caches a table's region metadata from HBase.
|
Deferred<Object> |
prefetchMeta(byte[] table,
byte[] start,
byte[] stop)
Eagerly prefetches and caches part of a table's region metadata from HBase.
|
Deferred<Object> |
prefetchMeta(String table)
Eagerly prefetches and caches a table's region metadata from HBase.
|
Deferred<Object> |
prefetchMeta(String table,
String start,
String stop)
Eagerly prefetches and caches part of a table's region metadata from HBase.
|
Deferred<Object> |
put(PutRequest request)
Stores data in HBase.
|
long |
rootLookupCount()
Deprecated.
|
short |
setFlushInterval(short flush_interval)
Sets the maximum time (in milliseconds) for which edits can be buffered.
|
int |
setIncrementBufferSize(int increment_buffer_size)
Changes the size of the increment buffer.
|
Deferred<Object> |
shutdown()
Performs a graceful shutdown of this instance.
|
ClientStats |
stats()
Returns a snapshot of usage statistics for this client.
|
long |
uncontendedMetaLookupCount()
Deprecated.
|
Deferred<Object> |
unlockRow(RowLock lock)
Releases an explicit row lock.
|
public static final byte[] EMPTY_ARRAY
Scanner.setStartKey(byte[])
and Scanner.setStopKey(byte[])
.public HBaseClient(String quorum_spec)
quorum_spec
- The specification of the quorum, e.g.
"host1,host2,host3"
.public HBaseClient(String quorum_spec, String base_path)
quorum_spec
- The specification of the quorum, e.g.
"host1,host2,host3"
.base_path
- The base path under which is the znode for the
-ROOT- region.public HBaseClient(String quorum_spec, String base_path, Executor executor)
NOTE: Only advanced users who really know what they're
doing should use this constructor. Passing an inappropriate thread
pool, or blocking its threads will prevent this HBaseClient
from working properly or lead to poor performance.
quorum_spec
- The specification of the quorum, e.g.
"host1,host2,host3"
.base_path
- The base path under which is the znode for the
-ROOT- region.executor
- The executor from which to obtain threads for NIO
operations. It is strongly encouraged to use a
Executors.newCachedThreadPool()
or something equivalent unless
you're sure to understand how Netty creates and uses threads.
Using a fixed-size thread pool will not work the way you expect.
Note that calling shutdown()
on this client will NOT
shut down the executor.
NioClientSocketChannelFactory
public HBaseClient(String quorum_spec, String base_path, ClientSocketChannelFactory channel_factory)
Most users don't need to use this constructor.
quorum_spec
- The specification of the quorum, e.g.
"host1,host2,host3"
.base_path
- The base path under which is the znode for the
-ROOT- region.channel_factory
- A custom factory to use to create sockets.
Note that calling shutdown()
on this client will also cause the
shutdown and release of the factory and its underlying thread pool.
public ClientStats stats()
public Deferred<Object> flush()
Deferred
, whose callback chain will be invoked when
everything that was buffered at the time of the call has been flushed.
Note that this doesn't guarantee that ALL outstanding RPCs have completed. This doesn't introduce any sort of global sync point. All it does really is it sends any buffered RPCs to HBase.
public short setFlushInterval(short flush_interval)
This interval will be honored on a "best-effort" basis. Edits can be buffered for longer than that due to GC pauses, the resolution of the underlying timer, thread scheduling at the OS level (particularly if the OS is overloaded with concurrent requests for CPU time), any low-level buffering in the TCP/IP stack of the OS, etc.
Setting a longer interval allows the code to batch requests more efficiently but puts you at risk of greater data loss if the JVM or machine was to fail. It also entails that some edits will not reach HBase until a longer period of time, which can be troublesome if you have other applications that need to read the "latest" changes.
Setting this interval to 0 disables this feature.
The change is guaranteed to take effect at most after a full interval has elapsed, using the previous interval (which is returned).
flush_interval
- A positive time interval in milliseconds.IllegalArgumentException
- if flush_interval < 0
.public int setIncrementBufferSize(int increment_buffer_size)
NOTE: because there is no way to resize the existing buffer, this method will flush the existing buffer and create a new one. This side effect might be unexpected but is unfortunately required.
This determines the maximum number of counters this client will keep
in-memory to allow increment coalescing through
bufferAtomicIncrement(org.hbase.async.AtomicIncrementRequest)
.
The greater this number, the more memory will be used to buffer increments, and the more efficient increment coalescing can be if you have a high-throughput application with a large working set of counters.
If your application has excessively large keys or qualifiers, you might consider using a lower number in order to reduce memory usage.
increment_buffer_size
- The new size of the buffer.IllegalArgumentException
- if increment_buffer_size < 0
.public Timer getTimer()
All timeouts, retries and other things that need to "sleep asynchronously" use this timer. This method is provided so that you can also schedule your own timeouts using this timer, if you wish to share this client's timer instead of creating your own.
The precision of this timer is implementation-defined but is guaranteed to be no greater than 20ms.
public short getFlushInterval()
The default value is an unspecified and implementation dependant, but is guaranteed to be non-zero.
A return value of 0 indicates that edits are sent directly to HBase without being buffered.
setFlushInterval(short)
public int getIncrementBufferSize()
Note this returns the capacity of the buffer, not the number of items currently in it. There is currently no API to get the current number of items in it.
public Deferred<Object> shutdown()
Flushes
all buffered edits.Deferred
, whose callback chain will be invoked once all
of the above have been done. If this callback chain doesn't fail, then
the clean shutdown will be successful, and all the data will be safe on
the HBase side (provided that you use durable
edits). In case of a failure (the "errback" is invoked) you may want to
retry the shutdown to avoid losing data, depending on the nature of the
failure. TODO(tsuna): Document possible / common failure scenarios.public Deferred<Object> ensureTableFamilyExists(String table, String family)
It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.
Both strings are assumed to use the platform's default charset.
table
- The name of the table you intend to use.family
- The column family you intend to use in that table.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.TableNotFoundException
- (deferred) if the table doesn't exist.NoSuchColumnFamilyException
- (deferred) if the family doesn't exist.public Deferred<Object> ensureTableFamilyExists(byte[] table, byte[] family)
It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.
table
- The name of the table you intend to use.family
- The column family you intend to use in that table.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.TableNotFoundException
- (deferred) if the table doesn't exist.NoSuchColumnFamilyException
- (deferred) if the family doesn't exist.public Deferred<Object> ensureTableExists(String table)
It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.
table
- The name of the table you intend to use.
The string is assumed to use the platform's default charset.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.TableNotFoundException
- (deferred) if the table doesn't exist.public Deferred<Object> ensureTableExists(byte[] table)
It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.
table
- The name of the table you intend to use.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.TableNotFoundException
- (deferred) if the table doesn't exist.public Deferred<ArrayList<KeyValue>> get(GetRequest request)
request
- The get
request.public Scanner newScanner(byte[] table)
Scanner
for a particular table.table
- The name of the table you intend to scan.public Scanner newScanner(String table)
Scanner
for a particular table.table
- The name of the table you intend to scan.
The string is assumed to use the platform's default charset.public Deferred<Long> atomicIncrement(AtomicIncrementRequest request)
This is equivalent to
atomicIncrement
(request, true)
request
- The increment request.long
value that results from the increment.public Deferred<Long> bufferAtomicIncrement(AtomicIncrementRequest request)
This increment will be held in memory up to the amount of time allowed
by getFlushInterval()
in order to allow the client to coalesce
increments.
Increment coalescing can dramatically reduce the number of RPCs and write load on HBase if you tend to increment multiple times the same working set of counters. This is very common in user-facing serving systems that use HBase counters to keep track of user actions.
If client-side buffering is disabled (getFlushInterval()
returns
0) then this function has the same effect as calling
atomicIncrement(AtomicIncrementRequest)
directly.
request
- The increment request.long
value that results from the increment.public Deferred<Long> atomicIncrement(AtomicIncrementRequest request, boolean durable)
request
- The increment request.durable
- If true
, the success of this RPC guarantees that
HBase has stored the edit in a durable fashion.
When in doubt, use atomicIncrement(AtomicIncrementRequest)
.long
value that results from the increment.public Deferred<Object> put(PutRequest request)
Note that this provides no guarantee as to the order in which subsequent
put
requests are going to be applied to the backend. If you need
ordering, you must enforce it manually yourself by starting the next
put
once the Deferred
of this one completes successfully.
request
- The put
request.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.
TODO(tsuna): Document failures clients are expected to handle themselves.public Deferred<Boolean> compareAndSet(PutRequest edit, byte[] expected)
Note that edits sent through this method cannot be batched, and
won't be subject to the flush interval
. This
entails that write throughput will be lower with this method as edits
have to be sent out to the wire one by one.
This request enables you to atomically update the value of an existing
cell in HBase using a CAS operation. It's like a PutRequest
except that you also pass an expected value. If the last version of the
cell identified by your PutRequest
matches the expected value,
HBase will atomically update it to the new value.
If the expected value is the empty byte array, HBase will atomically create the cell provided that it doesn't exist already. This can be used to ensure that your RPC doesn't overwrite an existing value. Note however that this trick cannot be used the other way around to delete an expected value atomically.
edit
- The new value to write.expected
- The expected value of the cell to compare against.
This byte array will NOT be copied.true
the CAS succeeded, otherwise
the CAS failed because the value in HBase didn't match the expected value
of the CAS request.public Deferred<Boolean> compareAndSet(PutRequest edit, String expected)
Note that edits sent through this method cannot be batched.
edit
- The new value to write.expected
- The expected value of the cell to compare against.
This string is assumed to use the platform's default charset.true
the CAS succeeded, otherwise
the CAS failed because the value in HBase didn't match the expected value
of the CAS request.compareAndSet(PutRequest, byte[])
public Deferred<Boolean> atomicCreate(PutRequest edit)
Note that edits sent through this method cannot be batched.
This is equivalent to calling
compareAndSet
(edit,
EMPTY_ARRAY)
edit
- The new value to insert.true
if the edit got atomically
inserted in HBase, false
if there was already a value in the
given cell.compareAndSet(PutRequest, byte[])
public Deferred<RowLock> lockRow(RowLockRequest request)
For a description of what row locks are, see RowLock
.
request
- The request specify which row to lock.RowLock
.unlockRow(org.hbase.async.RowLock)
public Deferred<Object> unlockRow(RowLock lock)
For a description of what row locks are, see RowLock
.
lock
- The lock to release.Object
has not special meaning and can be null
(think of it as Deferred<Void>
).public Deferred<Object> delete(DeleteRequest request)
request
- The delete
request.Object
has not special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.public Deferred<Object> prefetchMeta(String table)
table
- The name of the table whose metadata you intend to prefetch.Object
has no special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.public Deferred<Object> prefetchMeta(String table, String start, String stop)
The part to prefetch is identified by a row key range, given by
start
and stop
.
table
- The name of the table whose metadata you intend to prefetch.start
- The start of the row key range to prefetch metadata for.stop
- The end of the row key range to prefetch metadata for.Object
has no special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.public Deferred<Object> prefetchMeta(byte[] table)
table
- The name of the table whose metadata you intend to prefetch.Object
has no special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.public Deferred<Object> prefetchMeta(byte[] table, byte[] start, byte[] stop)
The part to prefetch is identified by a row key range, given by
start
and stop
.
table
- The name of the table whose metadata you intend to prefetch.start
- The start of the row key range to prefetch metadata for.stop
- The end of the row key range to prefetch metadata for.Object
has no special meaning and can be null
(think of it as Deferred<Void>
). But you probably want to attach
at least an errback to this Deferred
to handle failures.@Deprecated public long rootLookupCount()
-ROOT-
were performed.
This number should remain low. It will be 1 after the first access to
HBase, and will increase by 1 each time the .META.
region moves
to another server, which should seldom happen.
This isn't to be confused with the number of times we looked up where
the -ROOT-
region itself is located. This happens even more
rarely and a message is logged at the INFO whenever it does.
@Deprecated public long uncontendedMetaLookupCount()
stats()
.
uncontendedMetaLookups()
instead..META.
were performed (uncontended).
This number indicates how many times we had to lookup in .META.
where a key was located. This only counts "uncontended" lookups, where
the thread was able to acquire a "permit" to do a .META.
lookup.
The majority of the .META.
lookups should fall in this category.
@Deprecated public long contendedMetaLookupCount()
.META.
were performed (contended).
This number indicates how many times we had to lookup in .META.
where a key was located. This only counts "contended" lookups, where the
thread was unable to acquire a "permit" to do a .META.
lookup,
because there were already too many .META.
lookups in flight.
In this case, the thread was delayed a bit in order to apply a bit of
back-pressure on the caller, to avoid creating .META.
storms.
The minority of the .META.
lookups should fall in this category.