Class BreakTransliterator

  • All Implemented Interfaces:
    StringTransform, Transform<java.lang.String,​java.lang.String>

    final class BreakTransliterator
    extends Transliterator
    Inserts the specified characters at word breaks. To restrict it to particular characters, use a filter. TODO: this is an internal class, and only temporary. Remove it once we have \b notation in Transliterator.
    • Field Detail

      • insertion

        private java.lang.String insertion
      • boundaries

        private int[] boundaries
      • boundaryCount

        private int boundaryCount
    • Constructor Detail

      • BreakTransliterator

        public BreakTransliterator​(java.lang.String ID,
                                   UnicodeFilter filter,
                                   BreakIterator bi,
                                   java.lang.String insertion)
      • BreakTransliterator

        public BreakTransliterator​(java.lang.String ID,
                                   UnicodeFilter filter)
    • Method Detail

      • getInsertion

        public java.lang.String getInsertion()
      • setInsertion

        public void setInsertion​(java.lang.String insertion)
      • setBreakIterator

        public void setBreakIterator​(BreakIterator bi)
      • handleTransliterate

        protected void handleTransliterate​(Replaceable text,
                                           Transliterator.Position pos,
                                           boolean incremental)
        Description copied from class: Transliterator
        Abstract method that concrete subclasses define to implement their transliteration algorithm. This method handles both incremental and non-incremental transliteration. Let originalStart refer to the value of pos.start upon entry.
        • If incremental is false, then this method should transliterate all characters between pos.start and pos.limit. Upon return pos.start must == pos.limit.
        • If incremental is true, then this method should transliterate all characters between pos.start and pos.limit that can be unambiguously transliterated, regardless of future insertions of text at pos.limit. Upon return, pos.start should be in the range [originalStart, pos.limit). pos.start should be positioned such that characters [originalStart, pos.start) will not be changed in the future by this transliterator and characters [pos.start, pos.limit) are unchanged.

        Implementations of this method should also obey the following invariants:

        • pos.limit and pos.contextLimit should be updated to reflect changes in length of the text between pos.start and pos.limit. The difference pos.contextLimit - pos.limit should not change.
        • pos.contextStart should not change.
        • Upon return, neither pos.start nor pos.limit should be less than originalStart.
        • Text before originalStart and text after pos.limit should not change.
        • Text before pos.contextStart and text after pos.contextLimit should be ignored.

        Subclasses may safely assume that all characters in [pos.start, pos.limit) are filtered. In other words, the filter has already been applied by the time this method is called. See filteredTransliterate().

        This method is not for public consumption. Calling this method directly will transliterate [pos.start, pos.limit) without applying the filter. End user code should call transliterate() instead of this method. Subclass code should call filteredTransliterate() instead of this method.

        Specified by:
        handleTransliterate in class Transliterator
        Parameters:
        text - the buffer holding transliterated and untransliterated text
        pos - the indices indicating the start, limit, context start, and context limit of the text.
        incremental - if true, assume more text may be inserted at pos.limit and act accordingly. Otherwise, transliterate all text between pos.start and pos.limit and move pos.start up to pos.limit.
        See Also:
        Transliterator.transliterate(com.ibm.icu.text.Replaceable, int, int)
      • register

        static void register()
        Registers standard variants with the system. Called by Transliterator during initialization.
      • addSourceTargetSet

        public void addSourceTargetSet​(UnicodeSet inputFilter,
                                       UnicodeSet sourceSet,
                                       UnicodeSet targetSet)
        Description copied from class: Transliterator
        Returns the set of all characters that may be generated as replacement text by this transliterator, filtered by BOTH the input filter, and the current getFilter().

        SHOULD BE OVERRIDDEN BY SUBCLASSES. It is probably an error for any transliterator to NOT override this, but we can't force them to for backwards compatibility.

        Other methods vector through this.

        When gathering the information on source and target, the compound transliterator makes things complicated. For example, suppose we have:

         Global FILTER = [ax]
         a > b;
         :: NULL;
         b > c;
         x > d;
         
        While the filter just allows a and x, b is an intermediate result, which could produce c. So the source and target sets cannot be gathered independently. What we have to do is filter the sources for the first transliterator according to the global filter, intersect that transliterator's filter. Based on that we get the target. The next transliterator gets as a global filter (global + last target). And so on.

        There is another complication:

         Global FILTER = [ax]
         a >|b;
         b >c;
         
        Even though b would be filtered from the input, whenever we have a backup, it could be part of the input. So ideally we will change the global filter as we go.
        Overrides:
        addSourceTargetSet in class Transliterator
        targetSet - TODO
        See Also:
        Transliterator.getTargetSet()