Class CRAMEncodingStrategy

java.lang.Object
htsjdk.samtools.cram.structure.CRAMEncodingStrategy

public class CRAMEncodingStrategy extends Object
Parameters that control the encoding strategy used when writing CRAM. Includes the CRAM version, compression level, container/slice sizing, and per-DataSeries compressor assignments.

The default constructor applies the CRAMCompressionProfile.NORMAL profile. Use CRAMCompressionProfile.toStrategy() or CRAMCompressionProfile.applyTo(CRAMEncodingStrategy) to configure a specific profile.

See Also:
  • Field Details

    • DEFAULT_MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD

      public static final int DEFAULT_MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD
      See Also:
    • DEFAULT_READS_PER_SLICE

      public static final int DEFAULT_READS_PER_SLICE
      See Also:
    • DEFAULT_BASES_PER_READ

      public static final int DEFAULT_BASES_PER_READ
      Default ratio of bases-per-slice to reads-per-slice. Matches htslib's default rule bases_per_slice = seqs_per_slice * 500. A slice is flushed when either the record count or the accumulated base count is reached, preventing individual slices from growing pathologically large when input reads are long (PacBio HiFi, ONT).
      See Also:
  • Constructor Details

  • Method Details

    • getCramVersion

      public CRAMVersion getCramVersion()
      Returns:
      the CRAM version to write
    • setCramVersion

      public CRAMEncodingStrategy setCramVersion(CRAMVersion cramVersion)
      Set the CRAM version to write.
      Parameters:
      cramVersion - the CRAM version (e.g., CramVersions.CRAM_v3 or CramVersions.CRAM_v3_1)
      Returns:
      this strategy for chaining
    • setReadsPerSlice

      public CRAMEncodingStrategy setReadsPerSlice(int readsPerSlice)
      Set number of reads per slice. In some cases, a container containing fewer slices than the requested value will be produced in order to honor the specification rule that all slices in a container must have the same ReferenceContextType. Note: this value must be >= getMinimumSingleReferenceSliceSize().
      Parameters:
      readsPerSlice - number of reads written per slice
      Returns:
      updated CRAMEncodingStrategy
    • setMinimumSingleReferenceSliceSize

      public CRAMEncodingStrategy setMinimumSingleReferenceSliceSize(int minimumSingleReferenceSliceSize)
      The minimum number of reads we need to have seen to emit a single-reference slice. If we've seen fewer than this number, and we have more reads from a different reference context, we prefer to switch to, and subsequently emit, a multiple reference slice, rather than a small single-reference that contains fewer than this number of records. This number must be <= the value for getReadsPerSlice()
      Parameters:
      minimumSingleReferenceSliceSize - the minimum slice size
      Returns:
      this strategy for chaining
    • getMinimumSingleReferenceSliceSize

      public int getMinimumSingleReferenceSliceSize()
    • setBasesPerSlice

      public CRAMEncodingStrategy setBasesPerSlice(long basesPerSlice)
      Set the maximum accumulated bases per slice. When the accumulated bases in a slice reaches this threshold, the slice is flushed even if getReadsPerSlice() has not been reached. This prevents individual slices from growing pathologically large for long-read data (PacBio HiFi, ONT).

      Setting a value of 0 reverts to the default (getReadsPerSlice() * DEFAULT_BASES_PER_READ), matching htslib's rule.

      Parameters:
      basesPerSlice - maximum bases per slice, or 0 to use the default
      Returns:
      this strategy for chaining
    • getBasesPerSlice

      public long getBasesPerSlice()
      Returns:
      the bases-per-slice threshold. If no explicit value was set, returns getReadsPerSlice() * DEFAULT_BASES_PER_READ (matching htslib).
    • setGZIPCompressionLevel

      public CRAMEncodingStrategy setGZIPCompressionLevel(int compressionLevel)
      Set the GZIP compression level used for data series compressed with GZIP.
      Parameters:
      compressionLevel - GZIP compression level (0-10)
      Returns:
      this strategy for chaining
    • setSlicesPerContainer

      public CRAMEncodingStrategy setSlicesPerContainer(int slicesPerContainer)
      Set the number of slices per container. If > 1, multiple slices will be placed in the same container if the slices share the same reference context (container records mapped to the same contig). MULTI-REF slices are always emitted as a single container to avoid conferring MULTI-REF on the next slice, which might otherwise be single-ref; the spec requires a MULTI_REF container to only contain multi-ref slices).
      Parameters:
      slicesPerContainer - requested number of slices per container
      Returns:
      this strategy for chaining
    • setCompressorMap

      public CRAMEncodingStrategy setCompressorMap(EnumMap<DataSeries, CompressorDescriptor> compressorMap)
      Set the per-DataSeries compressor map. Each entry maps a DataSeries to the CompressorDescriptor that should be used to compress its block.
      Parameters:
      compressorMap - the compressor map (defensively copied)
      Returns:
      this strategy for chaining
    • getCompressorMap

      public EnumMap<DataSeries, CompressorDescriptor> getCompressorMap()
      Returns:
      the per-DataSeries compressor map, or null if not set
    • setTrialCandidatesMap

      public CRAMEncodingStrategy setTrialCandidatesMap(EnumMap<DataSeries, List<CompressorDescriptor>> trialCandidatesMap)
      Set additional trial compression candidates per DataSeries. For data series with entries in this map, a TrialCompressor will be created that tries the primary compressor plus all listed candidates, selecting the smallest output.
      Parameters:
      trialCandidatesMap - map of data series to additional candidate descriptors
      Returns:
      this strategy for chaining
    • getTrialCandidatesMap

      public EnumMap<DataSeries, List<CompressorDescriptor>> getTrialCandidatesMap()
      Returns:
      the trial candidates map, or null if trial compression is not configured
    • setCustomCompressionHeaderEncodingMap

      public void setCustomCompressionHeaderEncodingMap(CompressionHeaderEncodingMap encodingMap)
      Set a pre-built CompressionHeaderEncodingMap that bypasses the compressor map. This is an advanced override intended for tests that need low-level control over encoding descriptors. When set, CompressionHeaderFactory will use this map directly instead of building one from the compressor map.
      Parameters:
      encodingMap - the encoding map to use, or null to use the compressor map
    • getCustomCompressionHeaderEncodingMap

      public CompressionHeaderEncodingMap getCustomCompressionHeaderEncodingMap()
      Returns:
      the custom encoding map, or null if the compressor map should be used
    • setStoreNM

      public CRAMEncodingStrategy setStoreNM(boolean storeNM)
      Set whether to store the NM:i tag verbatim. When false (default), NM is stripped during encoding for mapped reads and regenerated from features + reference during decoding. Matches htslib's CRAM_OPT_STORE_NM option.
      Parameters:
      storeNM - true to store NM verbatim, false to strip and regenerate
      Returns:
      this strategy for chaining
    • getStoreNM

      public boolean getStoreNM()
      Returns:
      whether NM:i tags are stored verbatim (false = stripped and regenerated)
    • setStoreMD

      public CRAMEncodingStrategy setStoreMD(boolean storeMD)
      Set whether to store the MD:Z tag verbatim. When false (default), MD is stripped during encoding for mapped reads and regenerated from features + reference during decoding. Matches htslib's CRAM_OPT_STORE_MD option.
      Parameters:
      storeMD - true to store MD verbatim, false to strip and regenerate
      Returns:
      this strategy for chaining
    • getStoreMD

      public boolean getStoreMD()
      Returns:
      whether MD:Z tags are stored verbatim (false = stripped and regenerated)
    • getGZIPCompressionLevel

      public int getGZIPCompressionLevel()
    • getReadsPerSlice

      public int getReadsPerSlice()
    • getSlicesPerContainer

      public int getSlicesPerContainer()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object