Skip to Content
Python APIUtilitiesdetect_restriction_enzyme

detect_restriction_enzyme

chr3d.utils.detect_restriction_enzyme( fastq_path: str, max_reads: int = 1_000_000, ) -> None

Detect which restriction enzyme was used in a Hi-C/HiChIP sample.

Analyzes the first N reads of a paired-end FASTQ to identify which restriction enzyme(s) were used in the library preparation.

Logic:

  1. Enrichment test — A restriction-enzyme cut site should appear at the start of reads far more often than at background positions (because biotin pull-down enriches for ligation junctions).
  2. Junction test — After fill-in and ligation, two identical overhangs are joined, producing a site+site junction (e.g., GATCGATC for MboI). The presence of these junctions at the read start is a definitive signature.
  3. Positional profile — The true enzyme shows a characteristic dual-peak pattern: a large peak at position 0 (cut site) and a second peak offset by len(site)+1 bases (partner ligation site).

This tool is designed for Hi-C and in-situ Hi-C data where reads start at restriction enzyme cut sites. For HiChIP / ChIA-PET data, reads start at linker/adapter sequences. Run this tool on trimmed reads or after the linker-filtering step for accurate detection.

Parameters

ParameterTypeDescription
fastq_pathstrPath to FASTQ file (R1 recommended)
max_readsintMaximum reads to analyze (default: 1,000,000)

Example:

from chr3d.utils import detect_restriction_enzyme # Analyze first 500k reads detect_restriction_enzyme( fastq_path="sample_R1.fastq.gz", max_reads=500_000 ) # Output includes: # - Positional profile for each candidate enzyme # - Enrichment fold-change at position 0 # - Junction counts # - Recommended enzyme
Last updated on