Skip to Content
Python APIPeak BasedPETMapperV3

PETMapperV3

class chr3d.peak_based.PETMapperV3( genome_index: str, mapping_quality_cutoff: int = 30, n_threads: int = 4, use_bwa_mem: bool = True, )

Maps ChIA-PET tags to reference genome and generates BEDPE files.

Workflow:

  1. BWA paired-end alignment → SAM
  2. SAM → sorted BAM (coordinate sorted for stats)
  3. samtools flagstat → alignment statistics
  4. Name-sorted BAM → BEDPE
  5. Deduplicate BEDPE

Parameters

ParameterTypeDescription
genome_indexstrPath to BWA genome index
mapping_quality_cutoffintMinimum mapping quality (default: 30)
n_threadsintNumber of threads for BWA/SAMtools (default: 4)
use_bwa_memboolUse BWA-MEM (True) or BWA-ALN (False) (default: True)

Methods

map_paired_fastq

def map_paired_fastq( self, fastq_r1: str, fastq_r2: str, output_prefix: str, output_dir: str = None, keep_bam: bool = True, remove_duplicates: bool = True, ) -> Dict[str, Any]

Complete mapping workflow for paired FASTQ files.

Parameters:

ParameterTypeDescription
fastq_r1strPath to R1 FASTQ file
fastq_r2strPath to R2 FASTQ file
output_prefixstrPrefix for output files
output_dirstrOutput directory (default: current directory)
keep_bamboolKeep intermediate BAM file (default: True)
remove_duplicatesboolRemove duplicate PETs (default: True)

Returns:

Dict[str, Any] with keys:

  • 'output_bam': Path to BAM file
  • 'output_bedpe': Path to final BEDPE file
  • 'flagstat': samtools flagstat output path
  • 'total_reads', 'mapped_reads', 'dedup_bedpe': Statistics

run_bwa_mem

def run_bwa_mem( self, fastq_r1: str, fastq_r2: str, output_bam: str, ) -> bool

Run BWA-MEM and pipe to samtools for BAM output.

Parameters:

ParameterTypeDescription
fastq_r1strPath to R1 FASTQ file
fastq_r2strPath to R2 FASTQ file
output_bamstrPath to output BAM file

Returns:

boolTrue if successful, False otherwise.

run_bwa_aln

def run_bwa_aln( self, fastq_r1: str, fastq_r2: str, output_bam: str, ) -> bool

Run BWA-ALN + SAMPE for short reads.

Parameters:

ParameterTypeDescription
fastq_r1strPath to R1 FASTQ file
fastq_r2strPath to R2 FASTQ file
output_bamstrPath to output BAM file

Returns:

boolTrue if successful, False otherwise.

remove_duplicates

def remove_duplicates( self, input_bedpe: str, output_bedpe: str, ) -> int

Remove duplicate PETs by coordinate.

Parameters:

ParameterTypeDescription
input_bedpestrPath to input BEDPE file
output_bedpestrPath to output deduplicated BEDPE file

Returns:

int — Number of duplicate PETs removed.

Example:

from chr3d.peak_based import PETMapperV3 mapper = PETMapperV3( genome_index="/data/genomes/hg38.fa", mapping_quality_cutoff=30, n_threads=24, use_bwa_mem=True, ) stats = mapper.map_paired_fastq( fastq_r1="sample_R1.fastq.gz", fastq_r2="sample_R2.fastq.gz", output_prefix="sample", output_dir="mapped/", keep_bam=True, remove_duplicates=True, ) print(f"Dedup BEDPE: {stats['dedup_bedpe']}") print(f"Duplicates removed: {stats.get('duplicates_removed', 0)}")
Last updated on