Skip to Content
Python APIPeak BasedChiaPetPipeline

ChiaPetPipeline

class chr3d.peak_based.ChiaPetPipeline( genome_index: str, linkers: list, threads: int = 4, mapq: int = 30, genome_size: str = 'hs', qvalue: float = 0.05, alpha: float = 0.05, min_score: int = 20, min_tag: int = 15, max_tag: int = 40, standard_chroms_only: bool = True, cytoband_file: Optional[str] = None, keep_intermediates: bool = False, )

End-to-end ChIA-PET pipeline orchestrator.

Parameters

ParameterTypeDescription
genome_indexstrPath to BWA-indexed genome FASTA
linkerslistOne or more linker sequences to filter against
threadsintCPU threads for BWA / samtools / linker filtering (default: 4)
mapqintMinimum mapping quality for BAM filtering (default: 30)
genome_sizestrMACS3 genome size string ('hs', 'mm', or integer; default: 'hs')
qvaluefloatMACS3 q-value cutoff (default: 0.05)
alphafloatFDR significance threshold (default: 0.05)
min_scoreintMinimum parasail alignment score for linker matching (default: 20)
min_tagintMinimum tag length after linker removal (default: 15)
max_tagintMaximum tag length after linker removal (default: 40)
standard_chroms_onlyboolRestrict loop calling to chr1-22 + chrX/Y (default: True)
cytoband_fileOptional[str]Path to UCSC cytoband file for centromere exclusion
keep_intermediatesboolKeep intermediate BAM files (default: False)

Methods

run

def run( self, fastq_r1: Optional[str] = None, fastq_r2: Optional[str] = None, output_dir: str = './results', sample_id: str = 'sample', start_from: int = 1, ) -> Dict[str, Any]

Run the full ChIA-PET pipeline, or resume from a later step.

Parameters:

ParameterTypeDescription
fastq_r1Optional[str]Path to R1 FASTQ (required when start_from<=1)
fastq_r2Optional[str]Path to R2 FASTQ (required when start_from<=1)
output_dirstrRoot output directory (created if absent) (default: './results')
sample_idstrSample name used as file prefix (default: 'sample')
start_fromintStep to resume from: 1=linker filtering, 2=mapping, 3=peak calling, 4=loop calling (default: 1)

Returns:

Dict[str, Any] containing collected stats from every pipeline step + timing breakdown.

Example:

from chr3d.peak_based.chiapet_pipeline import ChiaPetPipeline pipeline = ChiaPetPipeline( genome_index="/data/genomes/hg38.fa", linkers=["AAGTGGTAGTGTGGTG", "CACTGTGGCTGTGTGG"], threads=24, mapq=30, genome_size='hs', qvalue=0.05, alpha=0.05, ) stats = pipeline.run( fastq_r1="sample_R1.fastq.gz", fastq_r2="sample_R2.fastq.gz", output_dir="chiapet_results/", sample_id="sample1", start_from=1, ) print(f"Peaks: {stats.get('peak_file', 'N/A')}") print(f"Significant loops: {stats.get('significant_loops', 'N/A')}")
Last updated on