Class CRAMRecordReadFeatures

java.lang.Object
htsjdk.samtools.cram.structure.CRAMRecordReadFeatures

public class CRAMRecordReadFeatures extends Object
Class for handling the read features for a CRAMCompressionRecord.
  • Constructor Details

    • CRAMRecordReadFeatures

      public CRAMRecordReadFeatures()
      Create a CRAMRecordReadFeatures with no actual read features (i.e. an unmapped record).
    • CRAMRecordReadFeatures

      public CRAMRecordReadFeatures(List<ReadFeature> readFeatures)
      Create a CRAMRecordReadFeatures from a list of read features consumed from a stream.
      Parameters:
      readFeatures -
    • CRAMRecordReadFeatures

      public CRAMRecordReadFeatures(SAMRecord samRecord, byte[] bamReadBases, byte[] refBases)
      Create the read features for a given SAMRecord.
      Parameters:
      samRecord - the SAMRecord for which to create read features
      bamReadBases - a modifiable copy of the readbases from the original SAM/BAM record, with the individual bases mapped to BAM bases (upper case)
      refBases - the reference bases for the entire reference contig to which this record is mapped
  • Method Details

    • getReadFeaturesList

      public final List<ReadFeature> getReadFeaturesList()
      Return the list of read features for this record.
    • getAlignmentEnd

      public int getAlignmentEnd(int alignmentStart, int readLength)
      Compute the alignment end position from the read features, alignment start, and read length.
      Parameters:
      alignmentStart - 1-based alignment start position
      readLength - length of the read in bases
      Returns:
      1-based alignment end position
    • getCigarForReadFeatures

      public Cigar getCigarForReadFeatures(int readLength)
      Build a Cigar from these read features and the given read length.
      Parameters:
      readLength - the length of the read in bases
      Returns:
      the reconstructed CIGAR
    • restoreReadBases

      public static byte[] restoreReadBases(List<ReadFeature> readFeatures, boolean isUnknownBases, int readAlignmentStart, int readLength, CRAMReferenceRegion cramReferenceRegion, SubstitutionMatrix substitutionMatrix)
      Get the read bases for a CRAMRecord given a set of read feaures and a reference region.
      Parameters:
      readFeatures - list of ReadFeatures for this record. may be null
      isUnknownBases - true if CF_UNKNOWN_BASES CRAM flag is set for this read
      readAlignmentStart - 1-based CRAM record alignment start
      readLength - read length for this read
      cramReferenceRegion - CRAMReferenceRegion spanning the reference bases required for this read, if reference-compressed. It is the caller's responsibility to have already fetched the correct bases (that is, the CRAMReferenceRegion's current bases must overlap this read's reference span. It is permissible for the region's span to be less than the entire read span in the case where the read span exceeds beyond the end of the underlying reference sequence.
      substitutionMatrix - substitution matrix to use for base resolution
      Returns:
      byte[] of read bases for this read
    • restoreBasesAndTags

      public static CRAMRecordReadFeatures.DecodeResult restoreBasesAndTags(List<ReadFeature> readFeatures, boolean isUnknownBases, int readAlignmentStart, int readLength, CRAMReferenceRegion cramReferenceRegion, SubstitutionMatrix substitutionMatrix, boolean computeMdNm)
      Fused single-pass decode: restore read bases from the reference + read features, build the CIGAR, and optionally compute the MD string and NM edit distance, all in a single iteration through the features list. This replaces the previous 3-4 pass approach (restoreReadBases + getCigarForReadFeatures + calculateMdAndNm + toBamReadBasesInPlace).

      Base normalization (upper-casing, replacing invalid bases with N) is done inline as bases are written, eliminating the need for a separate toBamReadBasesInPlace pass.

      Parameters:
      readFeatures - list of read features (may be null for pure reference matches)
      isUnknownBases - true if the CF_UNKNOWN_BASES flag is set
      readAlignmentStart - 1-based alignment start
      readLength - read length
      cramReferenceRegion - reference region covering this read's span
      substitutionMatrix - substitution matrix for base resolution
      computeMdNm - whether to compute MD string and NM count
      Returns:
      DecodeResult containing bases, CIGAR, and optionally MD/NM
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object