public final class Scanner extends Object
This class is not synchronized as it's expected to be used from a single thread at a time. It's rarely (if ever?) useful to scan concurrently from a shared scanner using multiple threads. If you want to optimize large table scans using extra parallelism, create a few scanners and give each of them a partition of the table to scan. Or use MapReduce.
Unlike HBase's traditional client, there's no method in this class to
explicitly open the scanner. It will open itself automatically when you
start scanning by calling nextRows()
. Also, the scanner will
automatically call close()
when it reaches the end key. If, however,
you would like to stop scanning before reaching the end key, you
must call close()
before disposing of the scanner. Note that
it's always safe to call close()
on a scanner.
If you keep your scanner open and idle for too long, the RegionServer will
close the scanner automatically for you after a timeout configured on the
server side. When this happens, you'll get an
UnknownScannerException
when you attempt to use the scanner again.
Also, if you scan too slowly (e.g. you take a long time between each call
to nextRows()
), you may prevent HBase from splitting the region if
the region is also actively being written to while you scan. For heavy
processing you should consider using MapReduce.
A Scanner
is not re-usable. Should you want to scan the same rows
or the same table again, you must create a new one.
byte
arrays in argumentbyte[]
in argument will copy it.
For more info, please refer to the documentation of HBaseRpc
.
String
s in argumentModifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_NUM_KVS
|
static int |
DEFAULT_MAX_NUM_ROWS
The default maximum number of rows to scan per RPC.
|
Modifier and Type | Method and Description |
---|---|
void |
clearFilter()
Clears any filter that was previously set on this scanner.
|
Deferred<Object> |
close()
Closes this scanner (don't forget to call this when you're done with it!).
|
byte[] |
getCurrentKey()
Returns the row key this scanner is currently at.
|
ScanFilter |
getFilter()
Returns the possibly-
null filter applied to this scanner. |
long |
getMaxNumBytes()
Returns the maximum number of bytes returned at once by the scanner.
|
int |
getMaxNumKeyValues()
Maximum number of
KeyValue s the server is allowed to return. |
long |
getMaxTimestamp()
Returns the maximum timestamp to scan (exclusive).
|
int |
getMaxVersions()
Returns the maximum number of versions to return for each cell scanned.
|
long |
getMinTimestamp()
Returns the minimum timestamp to scan (inclusive).
|
Deferred<ArrayList<ArrayList<KeyValue>>> |
nextRows()
Scans a number of rows.
|
Deferred<ArrayList<ArrayList<KeyValue>>> |
nextRows(int nrows)
Scans a number of rows.
|
void |
setFamilies(byte[][] families,
byte[][][] qualifiers)
Specifies multiple column families to scan.
|
void |
setFamilies(String... families)
Specifies multiple column families to scan.
|
void |
setFamily(byte[] family)
Specifies a particular column family to scan.
|
void |
setFamily(String family)
Specifies a particular column family to scan.
|
void |
setFilter(ScanFilter filter)
Specifies the filter to apply to this scanner.
|
void |
setKeyRegexp(String regexp)
Sets a regular expression to filter results based on the row key.
|
void |
setKeyRegexp(String regexp,
Charset charset)
Sets a regular expression to filter results based on the row key.
|
void |
setMaxNumBytes(long max_num_bytes)
Sets the maximum number of bytes returned at once by the scanner.
|
void |
setMaxNumKeyValues(int max_num_kvs)
Sets the maximum number of
KeyValue s the server is allowed to
return in a single RPC response. |
void |
setMaxNumRows(int max_num_rows)
Sets the maximum number of rows to scan per RPC (for better performance).
|
void |
setMaxTimestamp(long timestamp)
Sets the maximum timestamp to scan (exclusive).
|
void |
setMaxVersions(int versions)
Sets the maximum number of versions to return for each cell scanned.
|
void |
setMinTimestamp(long timestamp)
Sets the minimum timestamp to scan (inclusive).
|
void |
setQualifier(byte[] qualifier)
Specifies a particular column qualifier to scan.
|
void |
setQualifier(String qualifier)
Specifies a particular column qualifier to scan.
|
void |
setQualifiers(byte[][] qualifiers)
Specifies one or more column qualifiers to scan.
|
void |
setServerBlockCache(boolean populate_blockcache)
Sets whether or not the server should populate its block cache.
|
void |
setStartKey(byte[] start_key)
Specifies from which row key to start scanning (inclusive).
|
void |
setStartKey(String start_key)
Specifies from which row key to start scanning (inclusive).
|
void |
setStopKey(byte[] stop_key)
Specifies up to which row key to scan (exclusive).
|
void |
setStopKey(String stop_key)
Specifies up to which row key to scan (exclusive).
|
void |
setTimeRange(long min_timestamp,
long max_timestamp)
Sets the time range to scan.
|
String |
toString() |
public static final int DEFAULT_MAX_NUM_KVS
KeyValue
s the server is allowed
to return in a single RPC response to a Scanner
.
This default value is exposed only as a hint but the value itself is not part of the API and is subject to change without notice.
setMaxNumKeyValues(int)
,
Constant Field Valuespublic static final int DEFAULT_MAX_NUM_ROWS
This default value is exposed only as a hint but the value itself is not part of the API and is subject to change without notice.
setMaxNumRows(int)
,
Constant Field Valuespublic byte[] getCurrentKey()
public void setStartKey(byte[] start_key)
start_key
- The row key to start scanning from. If you don't invoke
this method, scanning will begin from the first row key in the table.
This byte array will NOT be copied.IllegalStateException
- if scanning already started.public void setStartKey(String start_key)
IllegalStateException
- if scanning already started.setStartKey(byte[])
public void setStopKey(byte[] stop_key)
stop_key
- The row key to scan up to. If you don't invoke
this method, or if the array is empty (stop_key.length == 0
),
every row up to and including the last one will be scanned.
This byte array will NOT be copied.IllegalStateException
- if scanning already started.public void setStopKey(String stop_key)
IllegalStateException
- if scanning already started.setStopKey(byte[])
public void setFamily(byte[] family)
family
- The column family.
This byte array will NOT be copied.IllegalStateException
- if scanning already started.public void setFamily(String family)
public void setFamilies(byte[][] families, byte[][][] qualifiers)
If qualifiers
is not null
, then qualifiers[i]
is assumed to be the list of qualifiers to scan in the family
families[i]
. If qualifiers[i]
is null
, then
all the columns in the family families[i]
will be scanned.
families
- Array of column families names.qualifiers
- Array of column qualifiers. Can be null
.
This array of byte arrays will NOT be copied.IllegalStateException
- if scanning already started.public void setFamilies(String... families)
public void setQualifier(byte[] qualifier)
Note that specifying a qualifier without a family has no effect.
You need to call setFamily(byte[])
too.
qualifier
- The column qualifier.
This byte array will NOT be copied.IllegalStateException
- if scanning already started.public void setQualifier(String qualifier)
public void setQualifiers(byte[][] qualifiers)
Note that specifying qualifiers without a family has no effect.
You need to call setFamily(byte[])
too.
qualifiers
- The column qualifiers.
These byte arrays will NOT be copied.IllegalStateException
- if scanning already started.public void setFilter(ScanFilter filter)
filter
- The filter. If null
, then no filter will be used.public ScanFilter getFilter()
null
filter applied to this scanner.public void clearFilter()
This is a shortcut for setFilter(org.hbase.async.ScanFilter)
(null)
public void setKeyRegexp(String regexp)
This is equivalent to calling
setFilter
(new
KeyRegexpFilter
(regexp))
regexp
- The regular expression with which to filter the row keys.public void setKeyRegexp(String regexp, Charset charset)
This is equivalent to calling
setFilter
(new
KeyRegexpFilter
(regexp, charset))
regexp
- The regular expression with which to filter the row keys.charset
- The charset used to decode the bytes of the row key into a
string. The RegionServer must support this charset, otherwise it will
unexpectedly close the connection the first time you attempt to use this
scanner.public void setServerBlockCache(boolean populate_blockcache)
populate_blockcache
- if false
, the block cache of the server
will not be populated as the rows are being scanned. If true
(the
default), the blocks loaded by the server in order to feed the scanner
may be added to the block cache, which will make subsequent read
accesses to the same rows and other neighbouring rows faster. Whether or
not blocks will be added to the cache depend on the table's configuration.
If you scan a sequence of keys that is unlikely to be accessed again in
the near future, you can help the server improve its cache efficiency by
setting this to false
.
IllegalStateException
- if scanning already started.public void setMaxNumRows(int max_num_rows)
Every time nextRows()
is invoked, up to this number of rows may
be returned. The default value is DEFAULT_MAX_NUM_ROWS
.
This knob has a high performance impact. If it's too low, you'll do too many network round-trips, if it's too high, you'll spend too much time and memory handling large amounts of data. The right value depends on the size of the rows you're retrieving.
If you know you're going to be scanning lots of small rows (few cells, and each cell doesn't store a lot of data), you can get better performance by scanning more rows by RPC. You probably always want to retrieve at least a few dozen kilobytes per call.
If you want to err on the safe side, it's better to use a value that's a bit too high rather than a bit too low. Avoid extreme values (such as 1 or 1024) unless you know what you're doing.
Note that unlike many other methods, it's fine to change this value while scanning. Changing it will take affect all the subsequent RPCs issued. This can be useful you want to dynamically adjust how much data you want to receive at once (provided that you can estimate the size of your rows).
max_num_rows
- A strictly positive integer.IllegalArgumentException
- if the argument is zero or negative.public void setMaxNumKeyValues(int max_num_kvs)
KeyValue
s the server is allowed to
return in a single RPC response.
If you're dealing with wide rows, in which you have many cells, you may
want to limit the number of cells (KeyValue
s) that the server
returns in a single RPC response.
The default is DEFAULT_MAX_NUM_KVS
, unlike in HBase's client
where the default is -1
. If you set this to a negative value,
the server will always return full rows, no matter how wide they are. If
you request really wide rows, this may cause increased memory consumption
on the server side as the server has to build a large RPC response, even
if it tries to avoid copying data. On the client side, the consequences
on memory usage are worse due to the lack of framing in RPC responses.
The client will have to buffer a large RPC response and will have to do
several memory copies to dynamically grow the size of the buffer as more
and more data comes in.
max_num_kvs
- A non-zero value.IllegalArgumentException
- if the argument is zero.IllegalStateException
- if scanning already started.public int getMaxNumKeyValues()
KeyValue
s the server is allowed to return.setMaxNumKeyValues(int)
public void setMaxVersions(int versions)
By default a scanner will only return the most recent version of
each cell. If you want to get all possible versions available,
pass Integer.MAX_VALUE
in argument.
versions
- A strictly positive number of versions to return.IllegalStateException
- if scanning already started.IllegalArgumentException
- if versions <= 0
public int getMaxVersions()
public void setMaxNumBytes(long max_num_bytes)
HBase may actually return more than this many bytes because it will not truncate a row in the middle.
This value is only used when communicating with HBase 0.95 and newer. For older versions of HBase this value is silently ignored.
max_num_bytes
- A strictly positive number of bytes.IllegalStateException
- if scanning already started.IllegalArgumentException
- if max_num_bytes <= 0
public long getMaxNumBytes()
setMaxNumBytes(long)
public void setMinTimestamp(long timestamp)
KeyValue
s that have a timestamp strictly less than this one
will not be returned by the scanner. HBase has internal optimizations to
avoid loading in memory data filtered out in some cases.
timestamp
- The minimum timestamp to scan (inclusive).IllegalArgumentException
- if timestamp < 0
.IllegalArgumentException
- if timestamp > getMaxTimestamp()
.setTimeRange(long, long)
public long getMinTimestamp()
public void setMaxTimestamp(long timestamp)
KeyValue
s that have a timestamp greater than or equal to this one
will not be returned by the scanner. HBase has internal optimizations to
avoid loading in memory data filtered out in some cases.
timestamp
- The maximum timestamp to scan (exclusive).IllegalArgumentException
- if timestamp < 0
.IllegalArgumentException
- if timestamp < getMinTimestamp()
.setTimeRange(long, long)
public long getMaxTimestamp()
public void setTimeRange(long min_timestamp, long max_timestamp)
KeyValue
s that have a timestamp that do not fall in the range
[min_timestamp; max_timestamp[
will not be returned by the
scanner. HBase has internal optimizations to avoid loading in memory
data filtered out in some cases.
min_timestamp
- The minimum timestamp to scan (inclusive).max_timestamp
- The maximum timestamp to scan (exclusive).IllegalArgumentException
- if min_timestamp < 0
IllegalArgumentException
- if max_timestamp < 0
IllegalArgumentException
- if min_timestamp > max_timestamp
public Deferred<ArrayList<ArrayList<KeyValue>>> nextRows(int nrows)
this.setMaxNumRows
(nrows); this.nextRows
();
nrows
- The maximum number of rows to retrieve.setMaxNumRows(int)
,
nextRows()
public Deferred<ArrayList<ArrayList<KeyValue>>> nextRows()
The last row returned may be partial if it's very wide and
setMaxNumKeyValues(int)
wasn't called with a negative value in
argument.
Once this method returns null
once (which indicates that this
Scanner
is done scanning), calling it again leads to an undefined
behavior.
KeyValue
and each element in the list returned represents a different row. Rows
are returned in sequential order. null
is returned if there are
no more rows to scan. Otherwise its size
is
guaranteed to be less than or equal to the value last given to
setMaxNumRows(int)
.setMaxNumRows(int)
,
setMaxNumKeyValues(int)
public Deferred<Object> close()
Closing a scanner already closed has no effect. The deferred returned will be called back immediately.
Object
has not special meaning and can be null
.