crisprzip.nucleic_acid module
Represents nucleic acid hybrids, either by mismatch positions or sequences.
- class crisprzip.nucleic_acid.GuideTargetHybrid(guide: str, target: TargetDna, state: int = 0)
Bases:
object
A ssRNA guide interacting with ds DNA site through R-loop formation.
- guide
The RNA guide strand, in 5’-to-3’ notation
- Type:
str
- target
The dsDNA site to be interrogated
- Type:
TargetDna
- state
Length of the R-loop. Only for illustration purposes for now.
- Type:
int
- apply_point_mut(mutation: str) GuideTargetHybrid
- bp_map = {'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}
- find_mismatches()
Identify the positions of mismatching guide-target basepairs.
- classmethod from_cas9_offtarget(offtarget_seq: str, protospacer: str, state: int = 0) GuideTargetHybrid
Instantiate from protospacer and point mutations.
- Parameters:
offtarget_seq (str) –
Full sequence of the (off-)target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
- 23 nts: 5’-target-PAM-3’. The PAM should be specified or
provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be
specified or provided as ‘N’.
protospacer (str) –
Full sequence of the protospacer/on-target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
- 23 nts: 5’-target-PAM-3’. The PAM should be specified or
provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be
specified or provided as ‘N’.
state (int) – R-loop hybridization state
- classmethod from_cas9_protospacer(protospacer: str, mismatches: str = '', state: int = 0) GuideTargetHybrid
Instantiate from protospacer and point mutations.
- Parameters:
protospacer (str) –
Full sequence of the protospacer/on-target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
- 23 nts: 5’-target-PAM-3’. The PAM should be specified or
provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be
specified or provided as ‘N’.
mismatches (str) – Mismatch desciptors (in the form “A02T”) describing how the target deviates from the
protospacer
. Multiple mismatches should be space-separated.state (int) – R-loop hybridization state
- get_mismatch_pattern() MismatchPattern
- set_rloop_state(rloop_state)
- class crisprzip.nucleic_acid.MismatchPattern(array: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes])
Bases:
object
Positions of the mismatched bases bases in a target sequence.
- pattern
Array with True indicating mismatched basepairs
- Type:
numpy.ndarray
- length
Guide length
- Type:
int
- mm_num
Number of mismatches in the array
- Type:
int
- is_on_target
Indicates whether the array is the on-target array
- Type:
bool
Notes
Assumes a 3’-to-5’ DNA direction. (CRISPR-Cas9 directionality).
- classmethod from_mm_pos(guide_length: int, mm_pos_list: list = None, zero_based_index=False)
Alternative constructor. Uses 1-based indexing by default.
- classmethod from_string(mm_array_string)
- classmethod from_target_sequence(protospacer: str, target_sequence: str) MismatchPattern
Alternative constructor
- get_mm_pos()
- class crisprzip.nucleic_acid.NearestNeighborModel
Bases:
object
A model to estimate nucleic acid stability.
An implementation of the nearest neighbor model predicting energies for guide RNA-target DNA R-loops. Instantiating this class is only necessary to load the parameter files, a single object can be used to make all energy landscapes.
- energy_unit
Unit of ouput free energy. For kBT, assuming a temperature of 20°C.
- Type:
{‘kbt’, ‘kcalmol’}
Notes
Method adapted from Alkan et al. (2018). DNA duplex parameters from SantaLucia & Hicks (2004), RNA-DNA hybrid duplex parameters from Alkan et al. (2018).
There are 4 contributions to the R-loop energy.
- Basestacks in the DNA duplex that should be broken. These
parameters can be loaded directly from the SantaLucia & Hicks dataset. Unlike Alkan et al., we also consider basestacks with the basepairs flanking the target region. If these are unknown, we take the average energy from all 4 possible basestacks.
- Basestacks in the RNA/DNA hybrid that are created. Some of these
energies are experimentally determined, others are an average of dsDNA and dsRNA values.
- Internal loops, corresponding to (regions of) mismatches flanked
by matching basepairs. For internal loops of length 1 and 2, these have specific energies, for length > 2, their energies are the sum of the left and right basestack and a length-specific energy contribution.
- Basepair terminals at the end and beginning of the R-loop.
Alkan et al. consider only external loops, which appear only when the guide-target hybrid starts or ends with a mismatch, but we always consider the energy contribution due to the first and last matching basepair. These energies are typically quite small.
References
- dna_dna_params_file = 'santaluciahicks2004.json'
- classmethod dna_opening_energy(hybrid: GuideTargetHybrid) ndarray
Get the energy required to open the DNA duplex.
Calculated following the methods from Alkan et al. (2018). The DNA opening energy is the sum of all the basestack energies in the sequence (negative).
- Parameters:
hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated
- Returns:
open_energy – The energy required for opening the DNA duplex (in the desired units of energy), for each step in the R-loop formation process.
- Return type:
numpy.ndarray
- energy_unit = 'kbt'
- classmethod get_hybridization_energy(hybrid: GuideTargetHybrid, weight: float | Tuple[float, float] = None) ndarray
Calculate the R-loop cost.
Calculates theenergy that is required to open an R-loop between the guide RNA and target DNA of the hybrid object for each R-loop length. Converts energy units if necessary.
- Parameters:
hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated
weight (float or tuple`[`flaot], optional) – Optional weighing of the dna opening energy and rna duplex energy. If None (default), no weighing is applied. If float, both DNA and RNA energies are multiplied by the weight parameter. If tuple` of two `float`s, the first value is used as a multiplier for the DNA opening energy, and the second is used as a multiplier for the RNA-DNA hybridization energy.
- Returns:
energy – The energy required for hybridization (in the desired units of energy), for each step in the R-loop formation process.
- Return type:
numpy.ndarray
- classmethod load_data(force=False)
- rna_dna_params_file = 'alkan2018.json'
- classmethod rna_duplex_energy(hybrid: GuideTargetHybrid) ndarray
Get the energy required to create the RNA:DNA hybrid.
Calculated following the methods from Alkan et al. (2018). The RNA duplex energy has three contributions: 1) basestacks, 2) internal loops, 3) external loops / terminals. Alkan et al. only look at external loops, but here, we instead look at both basepair terminals, whether or not they are part of an external loop.
- Parameters:
hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated
- Returns:
duplex_energy – The energy required for creating the RNA:DNA hybrid (in the desired units of energy), for each step in the R-loop formation process.
- Return type:
numpy.ndarray
- classmethod set_temperature(temperature)
- temperature = 20
- class crisprzip.nucleic_acid.TargetDna(target_sequence, upstream_nt: str = None, downstream_nt: str = None)
Bases:
object
Double-stranded DNA site to be opened during R-loop formation.
- seq2
The “target sequence”, as present on the nontarget DNA strand (=protospacer), in 5’-to-3’notation.
- Type:
str
- seq1
The target strand (=spacer), in 3’-to-5’ notation
- Type:
str
- upstream_bp
The basepair upstream (5’-side) of the nontarget strand.
- Type:
str
- dnstream_bp
The basepair downstream (3’-side) of the nontarget strand. For Cas9, corresponds to the last basepair of the PAM.
- Type:
str
- apply_point_mut(mutation: str)
Change DNA hybrid according to a single point mutation.
Mutation strings have the form A02T, where the NTS nucleotide A at position 2 would get replaced by a nucleotide T.
- bp_map = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
- classmethod from_cas9_target(full_target: str) TargetDna
Make a TargetDna instance from a cas9 target sequence string.
- Parameters:
full_target (str) –
Full sequence of the protospacer/on-target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
- 23 nts: 5’-target-PAM-3’. The PAM should be specified or
provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be
specified or provided as ‘N’.
- crisprzip.nucleic_acid.find_average_mm_penalties(protospacer: str, weight: float | Tuple[float, float] = None)
Find the effective penalties for single point mutations.
Finds the effective penalties for all possible single point mutations on a target, and averages over them to return the position-dependent mismatch penalty due to undetermined mismatches.
- crisprzip.nucleic_acid.find_mismatches_cached(seq1, guide)
“Identify the positions of mismatching guide-target basepairs (cached).
- crisprzip.nucleic_acid.format_point_mutations(protospacer: str, target_sequence: str) List[str]
List the point mutations between
target_sequence
andprotospacer
.
- crisprzip.nucleic_acid.get_hybridization_energy(protospacer: str, offtarget_seq: str = None, mutations: str = '', weight: float | Tuple[float, float] = None) ndarray
Calculate the free energy cost of R-loop formation.
- Parameters:
protospacer (str) –
Full sequence of the protospacer/on-target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified
or provided as ‘N’.
offtarget_seq (str) –
Full sequence of the (off-)target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified
or provided as ‘N’.
mutations (str) – Mismatch desciptors (in the form “A02T”) describing how the target deviates from the protospacer. Multiple mismatches should be space-separated. Is empty by default, indicating no mismatches (=on-target hybridization energy).
weight (float or tuple`[`float], optional) – Optional weighing of the dna opening energy and rna duplex energy. If None (default), no weighing is applied. If float, both DNA and RNA energies are multiplied by the weight parameter. If tuple` of two `float`s, the first value is used as a multiplier for the DNA opening energy, and the second is used as a multiplier for the RNA-DNA hybridization energy.
- Returns:
hybridization_energy – Free energies required to create an R-loop.
- Return type:
numpy.ndarray
- crisprzip.nucleic_acid.get_na_energies_cached(protospacer: str, offtarget_seq: str = None) Tuple[Tuple[float], Tuple[float]]
Calculate the DNA and RNA contributoins to the R-loop cost (with caching).
- Parameters:
protospacer (str) –
Full sequence of the protospacer/on-target. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified
or provided as ‘N’.
offtarget_seq (str) –
Full sequence of the (off-)target: 5’-20nt-PAM-3’. Can be provided in 3 formats:
20 nts: 5’-target-3’. All nucleotides should be specified.
23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.
- 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified
or provided as ‘N’.
- Returns:
dna_opening_energy (tuple [float]) – Free energies required to open the DNA duplex
rna_duplex_energy (tuple [float]) – Free energies required to form the RNA duplex