Skip to Content
Python APIHicFastqSplitter

FastqSplitter

class chr3d.FastqSplitter( n_chunks: int = 10, reads_per_chunk: Optional[int] = None, )

Split large FASTQ files into smaller chunks for parallel processing.

Useful for processing very large Hi-C datasets by splitting into manageable chunks that can be processed in parallel.

Parameters

ParameterTypeDescription
n_chunksintNumber of chunks to split into (default: 10)
reads_per_chunkOptional[int]Reads per chunk (overrides n_chunks if set)

Methods

split

def split( self, fastq1: str, fastq2: str, output_dir: str, prefix: str = "chunk", ) -> List[Tuple[str, str]]

Split paired FASTQ files into chunks.

Parameters:

ParameterTypeDescription
fastq1strPath to R1 FASTQ file
fastq2strPath to R2 FASTQ file
output_dirstrOutput directory for chunks
prefixstrPrefix for chunk files (default: 'chunk')

Returns:

List[Tuple[str, str]] — List of tuples (chunk_r1, chunk_r2) paths.

Example:

import chr3d as c3d splitter = c3d.FastqSplitter(n_chunks=10) chunks = splitter.split( fastq1="sample_R1.fastq.gz", fastq2="sample_R2.fastq.gz", output_dir="split_fastq/" ) print(f"Created {len(chunks)} chunk pairs")
Last updated on