Class CRAMEncodingStrategy
java.lang.Object
htsjdk.samtools.cram.structure.CRAMEncodingStrategy
Parameters that control the encoding strategy used when writing CRAM. Includes the CRAM version,
compression level, container/slice sizing, and per-
DataSeries compressor assignments.
The default constructor applies the CRAMCompressionProfile.NORMAL profile. Use
CRAMCompressionProfile.toStrategy() or CRAMCompressionProfile.applyTo(CRAMEncodingStrategy)
to configure a specific profile.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDefault ratio of bases-per-slice to reads-per-slice.static final intstatic final int -
Constructor Summary
ConstructorsConstructorDescriptionCreate an encoding strategy with theCRAMCompressionProfile.NORMALprofile applied. -
Method Summary
Modifier and TypeMethodDescriptionbooleanlongintintintintbooleanbooleaninthashCode()setBasesPerSlice(long basesPerSlice) Set the maximum accumulated bases per slice.setCompressorMap(EnumMap<DataSeries, CompressorDescriptor> compressorMap) Set the per-DataSeries compressor map.setCramVersion(CRAMVersion cramVersion) Set the CRAM version to write.voidSet a pre-builtCompressionHeaderEncodingMapthat bypasses the compressor map.setGZIPCompressionLevel(int compressionLevel) Set the GZIP compression level used for data series compressed with GZIP.setMinimumSingleReferenceSliceSize(int minimumSingleReferenceSliceSize) The minimum number of reads we need to have seen to emit a single-reference slice.setReadsPerSlice(int readsPerSlice) Set number of reads per slice.setSlicesPerContainer(int slicesPerContainer) Set the number of slices per container.setStoreMD(boolean storeMD) Set whether to store the MD:Z tag verbatim.setStoreNM(boolean storeNM) Set whether to store the NM:i tag verbatim.setTrialCandidatesMap(EnumMap<DataSeries, List<CompressorDescriptor>> trialCandidatesMap) Set additional trial compression candidates per DataSeries.toString()
-
Field Details
-
DEFAULT_MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD
public static final int DEFAULT_MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD- See Also:
-
DEFAULT_READS_PER_SLICE
public static final int DEFAULT_READS_PER_SLICE- See Also:
-
DEFAULT_BASES_PER_READ
public static final int DEFAULT_BASES_PER_READDefault ratio of bases-per-slice to reads-per-slice. Matches htslib's default rulebases_per_slice = seqs_per_slice * 500. A slice is flushed when either the record count or the accumulated base count is reached, preventing individual slices from growing pathologically large when input reads are long (PacBio HiFi, ONT).- See Also:
-
-
Constructor Details
-
CRAMEncodingStrategy
public CRAMEncodingStrategy()Create an encoding strategy with theCRAMCompressionProfile.NORMALprofile applied.
-
-
Method Details
-
getCramVersion
- Returns:
- the CRAM version to write
-
setCramVersion
Set the CRAM version to write.- Parameters:
cramVersion- the CRAM version (e.g.,CramVersions.CRAM_v3orCramVersions.CRAM_v3_1)- Returns:
- this strategy for chaining
-
setReadsPerSlice
Set number of reads per slice. In some cases, a container containing fewer slices than the requested value will be produced in order to honor the specification rule that all slices in a container must have the sameReferenceContextType. Note: this value must be >=getMinimumSingleReferenceSliceSize().- Parameters:
readsPerSlice- number of reads written per slice- Returns:
- updated CRAMEncodingStrategy
-
setMinimumSingleReferenceSliceSize
The minimum number of reads we need to have seen to emit a single-reference slice. If we've seen fewer than this number, and we have more reads from a different reference context, we prefer to switch to, and subsequently emit, a multiple reference slice, rather than a small single-reference that contains fewer than this number of records. This number must be<=the value forgetReadsPerSlice()- Parameters:
minimumSingleReferenceSliceSize- the minimum slice size- Returns:
- this strategy for chaining
-
getMinimumSingleReferenceSliceSize
public int getMinimumSingleReferenceSliceSize() -
setBasesPerSlice
Set the maximum accumulated bases per slice. When the accumulated bases in a slice reaches this threshold, the slice is flushed even ifgetReadsPerSlice()has not been reached. This prevents individual slices from growing pathologically large for long-read data (PacBio HiFi, ONT).Setting a value of 0 reverts to the default (
getReadsPerSlice()*DEFAULT_BASES_PER_READ), matching htslib's rule.- Parameters:
basesPerSlice- maximum bases per slice, or 0 to use the default- Returns:
- this strategy for chaining
-
getBasesPerSlice
public long getBasesPerSlice()- Returns:
- the bases-per-slice threshold. If no explicit value was set, returns
getReadsPerSlice()*DEFAULT_BASES_PER_READ(matching htslib).
-
setGZIPCompressionLevel
Set the GZIP compression level used for data series compressed with GZIP.- Parameters:
compressionLevel- GZIP compression level (0-10)- Returns:
- this strategy for chaining
-
setSlicesPerContainer
Set the number of slices per container. If > 1, multiple slices will be placed in the same container if the slices share the same reference context (container records mapped to the same contig). MULTI-REF slices are always emitted as a single container to avoid conferring MULTI-REF on the next slice, which might otherwise be single-ref; the spec requires a MULTI_REF container to only contain multi-ref slices).- Parameters:
slicesPerContainer- requested number of slices per container- Returns:
- this strategy for chaining
-
setCompressorMap
public CRAMEncodingStrategy setCompressorMap(EnumMap<DataSeries, CompressorDescriptor> compressorMap) Set the per-DataSeries compressor map. Each entry maps aDataSeriesto theCompressorDescriptorthat should be used to compress its block.- Parameters:
compressorMap- the compressor map (defensively copied)- Returns:
- this strategy for chaining
-
getCompressorMap
- Returns:
- the per-DataSeries compressor map, or null if not set
-
setTrialCandidatesMap
public CRAMEncodingStrategy setTrialCandidatesMap(EnumMap<DataSeries, List<CompressorDescriptor>> trialCandidatesMap) Set additional trial compression candidates per DataSeries. For data series with entries in this map, aTrialCompressorwill be created that tries the primary compressor plus all listed candidates, selecting the smallest output.- Parameters:
trialCandidatesMap- map of data series to additional candidate descriptors- Returns:
- this strategy for chaining
-
getTrialCandidatesMap
- Returns:
- the trial candidates map, or null if trial compression is not configured
-
setCustomCompressionHeaderEncodingMap
Set a pre-builtCompressionHeaderEncodingMapthat bypasses the compressor map. This is an advanced override intended for tests that need low-level control over encoding descriptors. When set,CompressionHeaderFactorywill use this map directly instead of building one from the compressor map.- Parameters:
encodingMap- the encoding map to use, or null to use the compressor map
-
getCustomCompressionHeaderEncodingMap
- Returns:
- the custom encoding map, or null if the compressor map should be used
-
setStoreNM
Set whether to store the NM:i tag verbatim. When false (default), NM is stripped during encoding for mapped reads and regenerated from features + reference during decoding. Matches htslib'sCRAM_OPT_STORE_NMoption.- Parameters:
storeNM- true to store NM verbatim, false to strip and regenerate- Returns:
- this strategy for chaining
-
getStoreNM
public boolean getStoreNM()- Returns:
- whether NM:i tags are stored verbatim (false = stripped and regenerated)
-
setStoreMD
Set whether to store the MD:Z tag verbatim. When false (default), MD is stripped during encoding for mapped reads and regenerated from features + reference during decoding. Matches htslib'sCRAM_OPT_STORE_MDoption.- Parameters:
storeMD- true to store MD verbatim, false to strip and regenerate- Returns:
- this strategy for chaining
-
getStoreMD
public boolean getStoreMD()- Returns:
- whether MD:Z tags are stored verbatim (false = stripped and regenerated)
-
getGZIPCompressionLevel
public int getGZIPCompressionLevel() -
getReadsPerSlice
public int getReadsPerSlice() -
getSlicesPerContainer
public int getSlicesPerContainer() -
toString
-
equals
-
hashCode
-