Class RopeByteString

java.lang.Object
com.google.protobuf.ByteString
com.google.protobuf.RopeByteString
All Implemented Interfaces:
Serializable, Iterable<Byte>

final class RopeByteString extends ByteString
Class to represent ByteStrings formed by concatenation of other ByteStrings, without copying the data in the pieces. The concatenation is represented as a tree whose leaf nodes are each a ByteString.LeafByteString.

Most of the operation here is inspired by the now-famous paper BAP95 Ropes: an Alternative to Strings hans-j. boehm, russ atkinson and michael plass

The algorithms described in the paper have been implemented for character strings in com.google.common.string.Rope and in the c++ class cord.cc.

Fundamentally the Rope algorithm represents the collection of pieces as a binary tree. BAP95 uses a Fibonacci bound relating depth to a minimum sequence length, sequences that are too short relative to their depth cause a tree rebalance. More precisely, a tree of depth d is "balanced" in the terminology of BAP95 if its length is at least F(d+2), where F(n) is the n-th Fibonacci number. Thus for depths 0, 1, 2, 3, 4, 5,... we have minimum lengths 1, 2, 3, 5, 8, 13,...

  • Field Details

    • minLengthByDepth

      static final int[] minLengthByDepth
      BAP95. Let Fn be the nth Fibonacci number. A RopeByteString of depth n is "balanced", i.e flat enough, if its length is at least Fn+2, e.g. a "balanced" RopeByteString of depth 1 must have length at least 2, of depth 4 must have length >= 8, etc.

      There's nothing special about using the Fibonacci numbers for this, but they are a reasonable sequence for encapsulating the idea that we are OK with longer strings being encoded in deeper binary trees.

      For 32-bit integers, this array has length 46.

      The correctness of this constant array is validated in tests.

    • totalLength

      private final int totalLength
    • left

      private final ByteString left
    • leftLength

      private final int leftLength
    • treeDepth

      private final int treeDepth
    • serialVersionUID

      private static final long serialVersionUID
      See Also:
  • Constructor Details

    • RopeByteString

      private RopeByteString(ByteString left, ByteString right)
      Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings.
      Parameters:
      left - string on the left of this node, should have size() > 0
      right - string on the right of this node, should have size() > 0
  • Method Details

    • concatenate

      static ByteString concatenate(ByteString left, ByteString right)
      Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count. The result is either a ByteString.LeafByteString or a RopeByteString depending on which optimizations, if any, were applied.

      Small pieces of length less than ByteString.CONCATENATE_BY_COPY_SIZE may be copied by value here, as in BAP95. Large pieces are referenced without copy.

      Parameters:
      left - string on the left
      right - string on the right
      Returns:
      concatenation representing the same sequence as the given strings
    • concatenateBytes

      private static ByteString concatenateBytes(ByteString left, ByteString right)
      Concatenates two strings by copying data values. This is called in a few cases in order to reduce the growth of the number of tree nodes.
      Parameters:
      left - string on the left
      right - string on the right
      Returns:
      string formed by copying data bytes
    • newInstanceForTest

      static RopeByteString newInstanceForTest(ByteString left, ByteString right)
      Create a new RopeByteString for testing only while bypassing all the defenses of concatenate(ByteString, ByteString). This allows testing trees of specific structure. We are also able to insert empty leaves, though these are dis-allowed, so that we can make sure the implementation can withstand their presence.
      Parameters:
      left - string on the left of this node
      right - string on the right of this node
      Returns:
      an unsafe instance for testing only
    • minLength

      static int minLength(int depth)
      Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Defaults to Integer.MAX_VALUE if depth >= minLengthByDepth.length in order to avoid an ArrayIndexOutOfBoundsException.
      Parameters:
      depth - tree depth
      Returns:
      minimum balanced length
    • byteAt

      public byte byteAt(int index)
      Gets the byte at the given index. Throws ArrayIndexOutOfBoundsException for backwards-compatibility reasons although it would more properly be IndexOutOfBoundsException.
      Specified by:
      byteAt in class ByteString
      Parameters:
      index - index of byte
      Returns:
      the value
      Throws:
      ArrayIndexOutOfBoundsException - index is invalid input: '<' 0 or >= size
    • internalByteAt

      byte internalByteAt(int index)
      Description copied from class: ByteString
      Gets the byte at the given index, assumes bounds checking has already been performed.
      Specified by:
      internalByteAt in class ByteString
      Parameters:
      index - index of byte
      Returns:
      the value
    • size

      public int size()
      Description copied from class: ByteString
      Gets the number of bytes.
      Specified by:
      size in class ByteString
      Returns:
      size in bytes
    • iterator

      public ByteString.ByteIterator iterator()
      Description copied from class: ByteString
      Return a ByteString.ByteIterator over the bytes in the ByteString. To avoid auto-boxing, you may get the iterator manually and call ByteString.ByteIterator.nextByte().
      Specified by:
      iterator in interface Iterable<Byte>
      Overrides:
      iterator in class ByteString
      Returns:
      the iterator
    • getTreeDepth

      protected int getTreeDepth()
      Description copied from class: ByteString
      Return the depth of the tree representing this ByteString, if any, whose root is this node. If this is a leaf node, return 0.
      Specified by:
      getTreeDepth in class ByteString
      Returns:
      tree depth or zero
    • isBalanced

      protected boolean isBalanced()
      Determines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Note that this definition of balanced is one where sub-trees of balanced trees are not necessarily balanced.
      Specified by:
      isBalanced in class ByteString
      Returns:
      true if the tree is balanced
    • substring

      public ByteString substring(int beginIndex, int endIndex)
      Takes a substring of this one. This involves recursive descent along the left and right edges of the substring, and referencing any wholly contained segments in between. Any leaf nodes entirely uninvolved in the substring will not be referenced by the substring.

      Substrings of length < 2 should result in at most a single recursive call chain, terminating at a leaf node. Thus the result will be a ByteString.LeafByteString.

      Specified by:
      substring in class ByteString
      Parameters:
      beginIndex - start at this index
      endIndex - the last character is the one before this index
      Returns:
      substring leaf node or tree
    • copyToInternal

      protected void copyToInternal(byte[] target, int sourceOffset, int targetOffset, int numberToCopy)
      Description copied from class: ByteString
      Internal (package private) implementation of ByteString.copyTo(byte[],int,int,int). It assumes that all error checking has already been performed and that numberToCopy > 0.
      Specified by:
      copyToInternal in class ByteString
    • copyTo

      public void copyTo(ByteBuffer target)
      Description copied from class: ByteString
      Copies bytes into a ByteBuffer.

      To copy a subset of bytes, you call this method on the return value of ByteString.substring(int, int). Example: byteString.substring(start, end).copyTo(target)

      Specified by:
      copyTo in class ByteString
      Parameters:
      target - ByteBuffer to copy into.
    • asReadOnlyByteBuffer

      public ByteBuffer asReadOnlyByteBuffer()
      Description copied from class: ByteString
      Constructs a read-only java.nio.ByteBuffer whose content is equal to the contents of this byte string. The result uses the same backing array as the byte string, if possible.
      Specified by:
      asReadOnlyByteBuffer in class ByteString
      Returns:
      wrapped bytes
    • asReadOnlyByteBufferList

      public List<ByteBuffer> asReadOnlyByteBufferList()
      Description copied from class: ByteString
      Constructs a list of read-only java.nio.ByteBuffer objects such that the concatenation of their contents is equal to the contents of this byte string. The result uses the same backing arrays as the byte string.

      By returning a list, implementations of this method may be able to avoid copying even when there are multiple backing arrays.

      Specified by:
      asReadOnlyByteBufferList in class ByteString
      Returns:
      a list of wrapped bytes
    • writeTo

      public void writeTo(OutputStream outputStream) throws IOException
      Description copied from class: ByteString
      Writes a copy of the contents of this byte string to the specified output stream argument.
      Specified by:
      writeTo in class ByteString
      Parameters:
      outputStream - the output stream to which to write the data.
      Throws:
      IOException - if an I/O error occurs.
    • writeToInternal

      void writeToInternal(OutputStream out, int sourceOffset, int numberToWrite) throws IOException
      Description copied from class: ByteString
      Internal version of ByteString.writeTo(OutputStream,int,int) that assumes all error checking has already been done.
      Specified by:
      writeToInternal in class ByteString
      Throws:
      IOException
    • writeTo

      void writeTo(ByteOutput output) throws IOException
      Description copied from class: ByteString
      Writes this ByteString to the provided ByteOutput. Calling this method may result in multiple operations on the target ByteOutput.

      This method may expose internal backing buffers of the ByteString to the ByteOutput in order to avoid additional copying overhead. It would be possible for a malicious ByteOutput to corrupt the ByteString. Use with caution!

      Specified by:
      writeTo in class ByteString
      Parameters:
      output - the output target to receive the bytes
      Throws:
      IOException - if an I/O error occurs
      See Also:
    • writeToReverse

      void writeToReverse(ByteOutput output) throws IOException
      Description copied from class: ByteString
      This method behaves exactly the same as ByteString.writeTo(ByteOutput) unless the ByteString is a rope. For ropes, the leaf nodes are written in reverse order to the byteOutput.
      Specified by:
      writeToReverse in class ByteString
      Parameters:
      output - the output target to receive the bytes
      Throws:
      IOException - if an I/O error occurs
      See Also:
      • invalid reference
        UnsafeByteOperations#unsafeWriteToReverse(ByteString, ByteOutput)
    • toStringInternal

      protected String toStringInternal(Charset charset)
      Description copied from class: ByteString
      Constructs a new String by decoding the bytes using the specified charset.
      Specified by:
      toStringInternal in class ByteString
      Parameters:
      charset - encode using this charset
      Returns:
      new string
    • isValidUtf8

      public boolean isValidUtf8()
      Description copied from class: ByteString
      Tells whether this ByteString represents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.

      More precisely, returns true whenever:

      
       Arrays.equals(byteString.toByteArray(),
           new String(byteString.toByteArray(), "UTF-8").getBytes("UTF-8"))
       

      This method returns false for "overlong" byte sequences, as well as for 3-byte sequences that would map to a surrogate character, in accordance with the restricted definition of UTF-8 introduced in Unicode 3.1. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate character byte sequences.

      See the Unicode Standard,
      Table 3-6. UTF-8 Bit Distribution,
      Table 3-7. Well Formed UTF-8 Byte Sequences.

      Specified by:
      isValidUtf8 in class ByteString
      Returns:
      whether the bytes in this ByteString are a well-formed UTF-8 byte sequence
    • partialIsValidUtf8

      protected int partialIsValidUtf8(int state, int offset, int length)
      Description copied from class: ByteString
      Tells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence. This method accepts and returns a partial state result, allowing the bytes for a complete UTF-8 byte sequence to be composed from multiple ByteString segments.
      Specified by:
      partialIsValidUtf8 in class ByteString
      Parameters:
      state - either 0 (if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytes
      offset - offset of the first byte to check
      length - number of bytes to check
      Returns:
      -1 if the partial byte sequence is definitely malformed, 0 if it is well-formed (no additional input needed), or, if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
    • equals

      public boolean equals(Object other)
      Specified by:
      equals in class ByteString
    • equalsFragments

      private boolean equalsFragments(ByteString other)
      Determines if this string is equal to another of the same length by iterating over the leaf nodes. On each step of the iteration, the overlapping segments of the leaves are compared.
      Parameters:
      other - string of the same length as this one
      Returns:
      true if the values of this string equals the value of the given one
    • partialHash

      protected int partialHash(int h, int offset, int length)
      Description copied from class: ByteString
      Compute the hash across the value bytes starting with the given hash, and return the result. This is used to compute the hash across strings represented as a set of pieces by allowing the hash computation to be continued from piece to piece.
      Specified by:
      partialHash in class ByteString
      Parameters:
      h - starting hash value
      offset - offset into this value to start looking at data values
      length - number of data values to include in the hash computation
      Returns:
      ending hash value
    • newCodedInput

      public CodedInputStream newCodedInput()
      Description copied from class: ByteString
      Creates a CodedInputStream which can be used to read the bytes. Using this is often more efficient than creating a CodedInputStream that wraps the result of ByteString.newInput().
      Specified by:
      newCodedInput in class ByteString
      Returns:
      stream based on wrapped data
    • newInput

      public InputStream newInput()
      Description copied from class: ByteString
      Creates an InputStream which can be used to read the bytes.

      The InputStream returned by this method is guaranteed to be completely non-blocking. The method InputStream.available() returns the number of bytes remaining in the stream. The methods InputStream.read(byte[]), InputStream.read(byte[],int,int) and InputStream.skip(long) will read/skip as many bytes as are available. The method InputStream.markSupported() returns true.

      The methods in the returned InputStream might not be thread safe.

      Specified by:
      newInput in class ByteString
      Returns:
      an input stream that returns the bytes of this byte string.
    • writeReplace

      Object writeReplace()
    • readObject

      private void readObject(ObjectInputStream in) throws IOException
      Throws:
      IOException