Class RopeByteString
- All Implemented Interfaces:
Serializable
,Iterable<Byte>
ByteStrings
formed by concatenation of other ByteStrings, without
copying the data in the pieces. The concatenation is represented as a tree whose leaf nodes are
each a ByteString.LeafByteString
.
Most of the operation here is inspired by the now-famous paper BAP95 Ropes: an Alternative to Strings hans-j. boehm, russ atkinson and michael plass
The algorithms described in the paper have been implemented for character strings in
com.google.common.string.Rope
and in the c++ class cord.cc
.
Fundamentally the Rope algorithm represents the collection of pieces as a binary tree. BAP95 uses a Fibonacci bound relating depth to a minimum sequence length, sequences that are too short relative to their depth cause a tree rebalance. More precisely, a tree of depth d is "balanced" in the terminology of BAP95 if its length is at least F(d+2), where F(n) is the n-th Fibonacci number. Thus for depths 0, 1, 2, 3, 4, 5,... we have minimum lengths 1, 2, 3, 5, 8, 13,...
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
This class implements the balancing algorithm of BAP95.private static final class
This class is a continuable tree traversal, which keeps the state information which would exist on the stack in a recursive traversal instead on a stack of "Bread Crumbs".private class
This class is theRopeByteString
equivalent forByteArrayInputStream
.Nested classes/interfaces inherited from class com.google.protobuf.ByteString
ByteString.AbstractByteIterator, ByteString.ByteIterator, ByteString.CodedBuilder, ByteString.LeafByteString, ByteString.Output
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteString
private final int
(package private) static final int[]
BAP95.private final ByteString
private static final long
private final int
private final int
Fields inherited from class com.google.protobuf.ByteString
CONCATENATE_BY_COPY_SIZE, EMPTY, MAX_READ_FROM_CHUNK_SIZE, MIN_READ_FROM_CHUNK_SIZE
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
RopeByteString
(ByteString left, ByteString right) Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings. -
Method Summary
Modifier and TypeMethodDescriptionConstructs a read-onlyjava.nio.ByteBuffer
whose content is equal to the contents of this byte string.Constructs a list of read-onlyjava.nio.ByteBuffer
objects such that the concatenation of their contents is equal to the contents of this byte string.byte
byteAt
(int index) Gets the byte at the given index.(package private) static ByteString
concatenate
(ByteString left, ByteString right) Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count.private static ByteString
concatenateBytes
(ByteString left, ByteString right) Concatenates two strings by copying data values.void
copyTo
(ByteBuffer target) Copies bytes into a ByteBuffer.protected void
copyToInternal
(byte[] target, int sourceOffset, int targetOffset, int numberToCopy) Internal (package private) implementation ofByteString.copyTo(byte[],int,int,int)
.boolean
private boolean
equalsFragments
(ByteString other) Determines if this string is equal to another of the same length by iterating over the leaf nodes.protected int
Return the depth of the tree representing thisByteString
, if any, whose root is this node.(package private) byte
internalByteAt
(int index) Gets the byte at the given index, assumes bounds checking has already been performed.protected boolean
Determines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds.boolean
Tells whether thisByteString
represents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.iterator()
Return aByteString.ByteIterator
over the bytes in the ByteString.(package private) static int
minLength
(int depth) Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds.Creates aCodedInputStream
which can be used to read the bytes.newInput()
Creates anInputStream
which can be used to read the bytes.(package private) static RopeByteString
newInstanceForTest
(ByteString left, ByteString right) Create a new RopeByteString for testing only while bypassing all the defenses ofconcatenate(ByteString, ByteString)
.protected int
partialHash
(int h, int offset, int length) Compute the hash across the value bytes starting with the given hash, and return the result.protected int
partialIsValidUtf8
(int state, int offset, int length) Tells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence.private void
int
size()
Gets the number of bytes.substring
(int beginIndex, int endIndex) Takes a substring of this one.protected String
toStringInternal
(Charset charset) Constructs a newString
by decoding the bytes using the specified charset.(package private) Object
(package private) void
writeTo
(ByteOutput output) Writes thisByteString
to the providedByteOutput
.void
writeTo
(OutputStream outputStream) Writes a copy of the contents of this byte string to the specified output stream argument.(package private) void
writeToInternal
(OutputStream out, int sourceOffset, int numberToWrite) Internal version ofByteString.writeTo(OutputStream,int,int)
that assumes all error checking has already been done.(package private) void
writeToReverse
(ByteOutput output) This method behaves exactly the same asByteString.writeTo(ByteOutput)
unless theByteString
is a rope.Methods inherited from class com.google.protobuf.ByteString
checkIndex, checkRange, concat, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFromUtf8, copyTo, copyTo, endsWith, hashCode, isEmpty, newCodedBuilder, newOutput, newOutput, peekCachedHashCode, readFrom, readFrom, readFrom, startsWith, substring, toByteArray, toString, toString, toString, toStringUtf8, unsignedLexicographicalComparator, wrap, wrap, wrap, writeTo
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
minLengthByDepth
static final int[] minLengthByDepthBAP95. Let Fn be the nth Fibonacci number. ARopeByteString
of depth n is "balanced", i.e flat enough, if its length is at least Fn+2, e.g. a "balanced"RopeByteString
of depth 1 must have length at least 2, of depth 4 must have length >= 8, etc.There's nothing special about using the Fibonacci numbers for this, but they are a reasonable sequence for encapsulating the idea that we are OK with longer strings being encoded in deeper binary trees.
For 32-bit integers, this array has length 46.
The correctness of this constant array is validated in tests.
-
totalLength
private final int totalLength -
left
-
right
-
leftLength
private final int leftLength -
treeDepth
private final int treeDepth -
serialVersionUID
private static final long serialVersionUID- See Also:
-
-
Constructor Details
-
RopeByteString
Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings.- Parameters:
left
- string on the left of this node, should havesize() > 0
right
- string on the right of this node, should havesize() > 0
-
-
Method Details
-
concatenate
Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count. The result is either aByteString.LeafByteString
or aRopeByteString
depending on which optimizations, if any, were applied.Small pieces of length less than
ByteString.CONCATENATE_BY_COPY_SIZE
may be copied by value here, as in BAP95. Large pieces are referenced without copy.- Parameters:
left
- string on the leftright
- string on the right- Returns:
- concatenation representing the same sequence as the given strings
-
concatenateBytes
Concatenates two strings by copying data values. This is called in a few cases in order to reduce the growth of the number of tree nodes.- Parameters:
left
- string on the leftright
- string on the right- Returns:
- string formed by copying data bytes
-
newInstanceForTest
Create a new RopeByteString for testing only while bypassing all the defenses ofconcatenate(ByteString, ByteString)
. This allows testing trees of specific structure. We are also able to insert empty leaves, though these are dis-allowed, so that we can make sure the implementation can withstand their presence.- Parameters:
left
- string on the left of this noderight
- string on the right of this node- Returns:
- an unsafe instance for testing only
-
minLength
static int minLength(int depth) Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Defaults toInteger.MAX_VALUE
ifdepth >= minLengthByDepth.length
in order to avoid anArrayIndexOutOfBoundsException
.- Parameters:
depth
- tree depth- Returns:
- minimum balanced length
-
byteAt
public byte byteAt(int index) Gets the byte at the given index. ThrowsArrayIndexOutOfBoundsException
for backwards-compatibility reasons although it would more properly beIndexOutOfBoundsException
.- Specified by:
byteAt
in classByteString
- Parameters:
index
- index of byte- Returns:
- the value
- Throws:
ArrayIndexOutOfBoundsException
-index
is invalid input: '<' 0 or >= size
-
internalByteAt
byte internalByteAt(int index) Description copied from class:ByteString
Gets the byte at the given index, assumes bounds checking has already been performed.- Specified by:
internalByteAt
in classByteString
- Parameters:
index
- index of byte- Returns:
- the value
-
size
public int size()Description copied from class:ByteString
Gets the number of bytes.- Specified by:
size
in classByteString
- Returns:
- size in bytes
-
iterator
Description copied from class:ByteString
Return aByteString.ByteIterator
over the bytes in the ByteString. To avoid auto-boxing, you may get the iterator manually and callByteString.ByteIterator.nextByte()
.- Specified by:
iterator
in interfaceIterable<Byte>
- Overrides:
iterator
in classByteString
- Returns:
- the iterator
-
getTreeDepth
protected int getTreeDepth()Description copied from class:ByteString
Return the depth of the tree representing thisByteString
, if any, whose root is this node. If this is a leaf node, return 0.- Specified by:
getTreeDepth
in classByteString
- Returns:
- tree depth or zero
-
isBalanced
protected boolean isBalanced()Determines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Note that this definition of balanced is one where sub-trees of balanced trees are not necessarily balanced.- Specified by:
isBalanced
in classByteString
- Returns:
- true if the tree is balanced
-
substring
Takes a substring of this one. This involves recursive descent along the left and right edges of the substring, and referencing any wholly contained segments in between. Any leaf nodes entirely uninvolved in the substring will not be referenced by the substring.Substrings of
length < 2
should result in at most a single recursive call chain, terminating at a leaf node. Thus the result will be aByteString.LeafByteString
.- Specified by:
substring
in classByteString
- Parameters:
beginIndex
- start at this indexendIndex
- the last character is the one before this index- Returns:
- substring leaf node or tree
-
copyToInternal
protected void copyToInternal(byte[] target, int sourceOffset, int targetOffset, int numberToCopy) Description copied from class:ByteString
Internal (package private) implementation ofByteString.copyTo(byte[],int,int,int)
. It assumes that all error checking has already been performed and thatnumberToCopy > 0
.- Specified by:
copyToInternal
in classByteString
-
copyTo
Description copied from class:ByteString
Copies bytes into a ByteBuffer.To copy a subset of bytes, you call this method on the return value of
ByteString.substring(int, int)
. Example:byteString.substring(start, end).copyTo(target)
- Specified by:
copyTo
in classByteString
- Parameters:
target
- ByteBuffer to copy into.
-
asReadOnlyByteBuffer
Description copied from class:ByteString
Constructs a read-onlyjava.nio.ByteBuffer
whose content is equal to the contents of this byte string. The result uses the same backing array as the byte string, if possible.- Specified by:
asReadOnlyByteBuffer
in classByteString
- Returns:
- wrapped bytes
-
asReadOnlyByteBufferList
Description copied from class:ByteString
Constructs a list of read-onlyjava.nio.ByteBuffer
objects such that the concatenation of their contents is equal to the contents of this byte string. The result uses the same backing arrays as the byte string.By returning a list, implementations of this method may be able to avoid copying even when there are multiple backing arrays.
- Specified by:
asReadOnlyByteBufferList
in classByteString
- Returns:
- a list of wrapped bytes
-
writeTo
Description copied from class:ByteString
Writes a copy of the contents of this byte string to the specified output stream argument.- Specified by:
writeTo
in classByteString
- Parameters:
outputStream
- the output stream to which to write the data.- Throws:
IOException
- if an I/O error occurs.
-
writeToInternal
Description copied from class:ByteString
Internal version ofByteString.writeTo(OutputStream,int,int)
that assumes all error checking has already been done.- Specified by:
writeToInternal
in classByteString
- Throws:
IOException
-
writeTo
Description copied from class:ByteString
Writes thisByteString
to the providedByteOutput
. Calling this method may result in multiple operations on the targetByteOutput
.This method may expose internal backing buffers of the
ByteString
to theByteOutput
in order to avoid additional copying overhead. It would be possible for a maliciousByteOutput
to corrupt theByteString
. Use with caution!- Specified by:
writeTo
in classByteString
- Parameters:
output
- the output target to receive the bytes- Throws:
IOException
- if an I/O error occurs- See Also:
-
writeToReverse
Description copied from class:ByteString
This method behaves exactly the same asByteString.writeTo(ByteOutput)
unless theByteString
is a rope. For ropes, the leaf nodes are written in reverse order to thebyteOutput
.- Specified by:
writeToReverse
in classByteString
- Parameters:
output
- the output target to receive the bytes- Throws:
IOException
- if an I/O error occurs- See Also:
-
toStringInternal
Description copied from class:ByteString
Constructs a newString
by decoding the bytes using the specified charset.- Specified by:
toStringInternal
in classByteString
- Parameters:
charset
- encode using this charset- Returns:
- new string
-
isValidUtf8
public boolean isValidUtf8()Description copied from class:ByteString
Tells whether thisByteString
represents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.More precisely, returns
true
whenever:Arrays.equals(byteString.toByteArray(), new String(byteString.toByteArray(), "UTF-8").getBytes("UTF-8"))
This method returns
false
for "overlong" byte sequences, as well as for 3-byte sequences that would map to a surrogate character, in accordance with the restricted definition of UTF-8 introduced in Unicode 3.1. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate character byte sequences.See the Unicode Standard,
Table 3-6. UTF-8 Bit Distribution,
Table 3-7. Well Formed UTF-8 Byte Sequences.- Specified by:
isValidUtf8
in classByteString
- Returns:
- whether the bytes in this
ByteString
are a well-formed UTF-8 byte sequence
-
partialIsValidUtf8
protected int partialIsValidUtf8(int state, int offset, int length) Description copied from class:ByteString
Tells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence. This method accepts and returns a partial state result, allowing the bytes for a complete UTF-8 byte sequence to be composed from multipleByteString
segments.- Specified by:
partialIsValidUtf8
in classByteString
- Parameters:
state
- either0
(if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytesoffset
- offset of the first byte to checklength
- number of bytes to check- Returns:
-1
if the partial byte sequence is definitely malformed,0
if it is well-formed (no additional input needed), or, if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
-
equals
- Specified by:
equals
in classByteString
-
equalsFragments
Determines if this string is equal to another of the same length by iterating over the leaf nodes. On each step of the iteration, the overlapping segments of the leaves are compared.- Parameters:
other
- string of the same length as this one- Returns:
- true if the values of this string equals the value of the given one
-
partialHash
protected int partialHash(int h, int offset, int length) Description copied from class:ByteString
Compute the hash across the value bytes starting with the given hash, and return the result. This is used to compute the hash across strings represented as a set of pieces by allowing the hash computation to be continued from piece to piece.- Specified by:
partialHash
in classByteString
- Parameters:
h
- starting hash valueoffset
- offset into this value to start looking at data valueslength
- number of data values to include in the hash computation- Returns:
- ending hash value
-
newCodedInput
Description copied from class:ByteString
Creates aCodedInputStream
which can be used to read the bytes. Using this is often more efficient than creating aCodedInputStream
that wraps the result ofByteString.newInput()
.- Specified by:
newCodedInput
in classByteString
- Returns:
- stream based on wrapped data
-
newInput
Description copied from class:ByteString
Creates anInputStream
which can be used to read the bytes.The
InputStream
returned by this method is guaranteed to be completely non-blocking. The methodInputStream.available()
returns the number of bytes remaining in the stream. The methodsInputStream.read(byte[])
,InputStream.read(byte[],int,int)
andInputStream.skip(long)
will read/skip as many bytes as are available. The methodInputStream.markSupported()
returnstrue
.The methods in the returned
InputStream
might not be thread safe.- Specified by:
newInput
in classByteString
- Returns:
- an input stream that returns the bytes of this byte string.
-
writeReplace
Object writeReplace() -
readObject
- Throws:
IOException
-