Class CompressionUtils

java.lang.Object
htsjdk.samtools.cram.compression.CompressionUtils

public class CompressionUtils extends Object
Utility methods shared across CRAM 3.1 compression codecs (rANS, Range, Name Tokeniser, etc.), including uint7 encoding, bit-packing, and STRIPE data transformation.
  • Constructor Details

    • CompressionUtils

      public CompressionUtils()
  • Method Details

    • writeUint7

      public static void writeUint7(int i, ByteBuffer cp)
      Write an unsigned integer using 7-bit variable-length encoding (uint7). Each output byte uses 7 bits for data and the high bit as a continuation flag (1 = more bytes follow).
      Parameters:
      i - the value to write (must be non-negative)
      cp - the output buffer
    • readUint7

      public static int readUint7(ByteBuffer cp)
      Read an unsigned integer using 7-bit variable-length encoding (uint7). Each byte uses 7 bits for data and the high bit as a continuation flag (1 = more bytes follow).
      Parameters:
      cp - the input buffer
      Returns:
      the decoded unsigned integer value
    • writeUint7

      public static void writeUint7(int i, byte[] buf, int[] posHolder)
      Write uint7 into byte[] at posHolder[0], advancing posHolder[0].
    • readUint7

      public static int readUint7(byte[] buf, int[] posHolder)
      Read uint7 from byte[] at posHolder[0], advancing posHolder[0].
    • encodePack

      public static ByteBuffer encodePack(ByteBuffer inBuffer, ByteBuffer outBuffer, int[] frequencyTable, int[] packMappingTable, int numSymbols)
      Pack input symbols into a smaller number of bits per value based on the number of distinct symbols. Writes the pack header (symbol count, mapping table, packed length) to outBuffer and returns the packed data as a separate buffer.
      Parameters:
      inBuffer - the input data to pack
      outBuffer - the output buffer for the pack header (symbol count, mapping table, packed length)
      frequencyTable - frequency counts for each byte value (0-255)
      packMappingTable - mapping from original symbol to packed value
      numSymbols - the number of distinct symbols in the input
      Returns:
      a ByteBuffer containing the packed data
    • decodePack

      public static ByteBuffer decodePack(ByteBuffer inBuffer, byte[] packMappingTable, int numSymbols, int uncompressedPackOutputLength)
      Unpack bit-packed data back to one byte per symbol, reversing the transformation performed by encodePack(ByteBuffer, ByteBuffer, int[], int[], int).
      Parameters:
      inBuffer - the packed input data
      packMappingTable - mapping from packed value back to original symbol
      numSymbols - the number of distinct symbols (determines bits per value)
      uncompressedPackOutputLength - the expected number of output bytes
      Returns:
      a ByteBuffer containing the unpacked data
    • allocateOutputBuffer

      public static ByteBuffer allocateOutputBuffer(int inSize)
      Allocate an output buffer large enough to hold compressed rANS data, including worst-case frequency table overhead and header bytes.
      Parameters:
      inSize - the uncompressed input size
      Returns:
      a little-endian ByteBuffer sized for the worst-case compressed output
    • allocateByteBuffer

      public static ByteBuffer allocateByteBuffer(int bufferSize)
      Allocate a new little-endian ByteBuffer of the specified size.
      Parameters:
      bufferSize - the capacity of the buffer
      Returns:
      a new little-endian ByteBuffer
    • wrap

      public static ByteBuffer wrap(byte[] inputBytes)
      Wrap a byte array in a little-endian ByteBuffer.
      Parameters:
      inputBytes - the byte array to wrap
      Returns:
      a little-endian ByteBuffer backed by the input array
    • slice

      public static ByteBuffer slice(ByteBuffer inputBuffer)
      Create a little-endian slice of the given ByteBuffer (from position to limit).
      Parameters:
      inputBuffer - the buffer to slice
      Returns:
      a new little-endian ByteBuffer sharing the input's content
    • buildStripeUncompressedSizes

      public static int[] buildStripeUncompressedSizes(int totalSize)
      Compute the uncompressed size for each stripe stream. Earlier streams get the extra bytes when totalSize is not evenly divisible by the number of streams.
      Parameters:
      totalSize - the total uncompressed size
      Returns:
      array of per-stream sizes
    • stripeTranspose

      public static ByteBuffer[] stripeTranspose(ByteBuffer inBuffer, int[] sizes)
      Transpose (de-interleave) input data into N=4 separate streams using round-robin byte distribution. Stream i gets bytes at positions i, i+4, i+8, ...
      Parameters:
      inBuffer - the input data (position to limit)
      sizes - per-stream uncompressed sizes from buildStripeUncompressedSizes(int)
      Returns:
      array of ByteBuffers, one per stream
    • getStripeNumStreams

      public static int getStripeNumStreams()
      Returns:
      the number of streams used by the STRIPE codec (always 4)
    • toByteArray

      public static byte[] toByteArray(ByteBuffer buffer)
      Return a byte array with contents matching the ByteBuffer from position 0 to limit. If the buffer is backed by an array that exactly matches its limit, returns the backing array directly (no copy). Otherwise copies the data into a new array.
      Parameters:
      buffer - the source ByteBuffer
      Returns:
      a byte array containing the buffer's data