crisprzip.nucleic_acid module

Represents nucleic acid hybrids, either by mismatch positions or sequences.

class crisprzip.nucleic_acid.GuideTargetHybrid(guide: str, target: TargetDna, state: int = 0)

Bases: object

A ssRNA guide interacting with ds DNA site through R-loop formation.

guide

The RNA guide strand, in 5’-to-3’ notation

Type:

str

target

The dsDNA site to be interrogated

Type:

TargetDna

state

Length of the R-loop. Only for illustration purposes for now.

Type:

int

apply_point_mut(mutation: str) GuideTargetHybrid
bp_map = {'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}
find_mismatches()

Identify the positions of mismatching guide-target basepairs.

classmethod from_cas9_offtarget(offtarget_seq: str, protospacer: str, state: int = 0) GuideTargetHybrid

Instantiate from protospacer and point mutations.

Parameters:
  • offtarget_seq (str) –

    Full sequence of the (off-)target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or

      provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be

      specified or provided as ‘N’.

  • protospacer (str) –

    Full sequence of the protospacer/on-target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or

      provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be

      specified or provided as ‘N’.

  • state (int) – R-loop hybridization state

classmethod from_cas9_protospacer(protospacer: str, mismatches: str = '', state: int = 0) GuideTargetHybrid

Instantiate from protospacer and point mutations.

Parameters:
  • protospacer (str) –

    Full sequence of the protospacer/on-target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or

      provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be

      specified or provided as ‘N’.

  • mismatches (str) – Mismatch desciptors (in the form “A02T”) describing how the target deviates from the protospacer. Multiple mismatches should be space-separated.

  • state (int) – R-loop hybridization state

get_mismatch_pattern() MismatchPattern
set_rloop_state(rloop_state)
class crisprzip.nucleic_acid.MismatchPattern(array: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes])

Bases: object

Positions of the mismatched bases bases in a target sequence.

pattern

Array with True indicating mismatched basepairs

Type:

numpy.ndarray

length

Guide length

Type:

int

mm_num

Number of mismatches in the array

Type:

int

is_on_target

Indicates whether the array is the on-target array

Type:

bool

Notes

Assumes a 3’-to-5’ DNA direction. (CRISPR-Cas9 directionality).

classmethod from_mm_pos(guide_length: int, mm_pos_list: list = None, zero_based_index=False)

Alternative constructor. Uses 1-based indexing by default.

classmethod from_string(mm_array_string)
classmethod from_target_sequence(protospacer: str, target_sequence: str) MismatchPattern

Alternative constructor

get_mm_pos()
classmethod make_random(guide_length: int, mm_num: int, rng: int | Generator = None)
class crisprzip.nucleic_acid.NearestNeighborModel

Bases: object

A model to estimate nucleic acid stability.

An implementation of the nearest neighbor model predicting energies for guide RNA-target DNA R-loops. Instantiating this class is only necessary to load the parameter files, a single object can be used to make all energy landscapes.

energy_unit

Unit of ouput free energy. For kBT, assuming a temperature of 20°C.

Type:

{‘kbt’, ‘kcalmol’}

Notes

Method adapted from Alkan et al. (2018). DNA duplex parameters from SantaLucia & Hicks (2004), RNA-DNA hybrid duplex parameters from Alkan et al. (2018).

There are 4 contributions to the R-loop energy.

  1. Basestacks in the DNA duplex that should be broken. These

    parameters can be loaded directly from the SantaLucia & Hicks dataset. Unlike Alkan et al., we also consider basestacks with the basepairs flanking the target region. If these are unknown, we take the average energy from all 4 possible basestacks.

  2. Basestacks in the RNA/DNA hybrid that are created. Some of these

    energies are experimentally determined, others are an average of dsDNA and dsRNA values.

  3. Internal loops, corresponding to (regions of) mismatches flanked

    by matching basepairs. For internal loops of length 1 and 2, these have specific energies, for length > 2, their energies are the sum of the left and right basestack and a length-specific energy contribution.

  4. Basepair terminals at the end and beginning of the R-loop.

    Alkan et al. consider only external loops, which appear only when the guide-target hybrid starts or ends with a mismatch, but we always consider the energy contribution due to the first and last matching basepair. These energies are typically quite small.

References

classmethod convert_units(energy_value: float | ndarray)
dna_dna_params: dict = None
dna_dna_params_file = 'santaluciahicks2004.json'
classmethod dna_opening_energy(hybrid: GuideTargetHybrid) ndarray

Get the energy required to open the DNA duplex.

Calculated following the methods from Alkan et al. (2018). The DNA opening energy is the sum of all the basestack energies in the sequence (negative).

Parameters:

hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated

Returns:

open_energy – The energy required for opening the DNA duplex (in the desired units of energy), for each step in the R-loop formation process.

Return type:

numpy.ndarray

energy_unit = 'kbt'
classmethod get_hybridization_energy(hybrid: GuideTargetHybrid, weight: float | Tuple[float, float] = None) ndarray

Calculate the R-loop cost.

Calculates theenergy that is required to open an R-loop between the guide RNA and target DNA of the hybrid object for each R-loop length. Converts energy units if necessary.

Parameters:
  • hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated

  • weight (float or tuple`[`flaot], optional) – Optional weighing of the dna opening energy and rna duplex energy. If None (default), no weighing is applied. If float, both DNA and RNA energies are multiplied by the weight parameter. If tuple` of two `float`s, the first value is used as a multiplier for the DNA opening energy, and the second is used as a multiplier for the RNA-DNA hybridization energy.

Returns:

energy – The energy required for hybridization (in the desired units of energy), for each step in the R-loop formation process.

Return type:

numpy.ndarray

classmethod load_data(force=False)
rna_dna_params: dict = None
rna_dna_params_file = 'alkan2018.json'
classmethod rna_duplex_energy(hybrid: GuideTargetHybrid) ndarray

Get the energy required to create the RNA:DNA hybrid.

Calculated following the methods from Alkan et al. (2018). The RNA duplex energy has three contributions: 1) basestacks, 2) internal loops, 3) external loops / terminals. Alkan et al. only look at external loops, but here, we instead look at both basepair terminals, whether or not they are part of an external loop.

Parameters:

hybrid (GuideTargetHybrid) – Hybrid object of which the hybridization energies are calculated

Returns:

duplex_energy – The energy required for creating the RNA:DNA hybrid (in the desired units of energy), for each step in the R-loop formation process.

Return type:

numpy.ndarray

classmethod set_energy_unit(unit: str)
classmethod set_temperature(temperature)
temperature = 20
class crisprzip.nucleic_acid.TargetDna(target_sequence, upstream_nt: str = None, downstream_nt: str = None)

Bases: object

Double-stranded DNA site to be opened during R-loop formation.

seq2

The “target sequence”, as present on the nontarget DNA strand (=protospacer), in 5’-to-3’notation.

Type:

str

seq1

The target strand (=spacer), in 3’-to-5’ notation

Type:

str

upstream_bp

The basepair upstream (5’-side) of the nontarget strand.

Type:

str

dnstream_bp

The basepair downstream (3’-side) of the nontarget strand. For Cas9, corresponds to the last basepair of the PAM.

Type:

str

apply_point_mut(mutation: str)

Change DNA hybrid according to a single point mutation.

Mutation strings have the form A02T, where the NTS nucleotide A at position 2 would get replaced by a nucleotide T.

bp_map = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
classmethod from_cas9_target(full_target: str) TargetDna

Make a TargetDna instance from a cas9 target sequence string.

Parameters:

full_target (str) –

Full sequence of the protospacer/on-target. Can be provided in 3 formats:

  • 20 nts: 5’-target-3’. All nucleotides should be specified.

  • 23 nts: 5’-target-PAM-3’. The PAM should be specified or

    provided as ‘NGG’.

  • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be

    specified or provided as ‘N’.

classmethod make_random(length: int, seed=None) TargetDna

Make a random target dna of specified length.

crisprzip.nucleic_acid.find_average_mm_penalties(protospacer: str, weight: float | Tuple[float, float] = None)

Find the effective penalties for single point mutations.

Finds the effective penalties for all possible single point mutations on a target, and averages over them to return the position-dependent mismatch penalty due to undetermined mismatches.

crisprzip.nucleic_acid.find_mismatches_cached(seq1, guide)

“Identify the positions of mismatching guide-target basepairs (cached).

crisprzip.nucleic_acid.format_point_mutations(protospacer: str, target_sequence: str) List[str]

List the point mutations between target_sequence and protospacer.

crisprzip.nucleic_acid.get_hybridization_energy(protospacer: str, offtarget_seq: str = None, mutations: str = '', weight: float | Tuple[float, float] = None) ndarray

Calculate the free energy cost of R-loop formation.

Parameters:
  • protospacer (str) –

    Full sequence of the protospacer/on-target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified

      or provided as ‘N’.

  • offtarget_seq (str) –

    Full sequence of the (off-)target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified

      or provided as ‘N’.

  • mutations (str) – Mismatch desciptors (in the form “A02T”) describing how the target deviates from the protospacer. Multiple mismatches should be space-separated. Is empty by default, indicating no mismatches (=on-target hybridization energy).

  • weight (float or tuple`[`float], optional) – Optional weighing of the dna opening energy and rna duplex energy. If None (default), no weighing is applied. If float, both DNA and RNA energies are multiplied by the weight parameter. If tuple` of two `float`s, the first value is used as a multiplier for the DNA opening energy, and the second is used as a multiplier for the RNA-DNA hybridization energy.

Returns:

hybridization_energy – Free energies required to create an R-loop.

Return type:

numpy.ndarray

crisprzip.nucleic_acid.get_na_energies_cached(protospacer: str, offtarget_seq: str = None) Tuple[Tuple[float], Tuple[float]]

Calculate the DNA and RNA contributoins to the R-loop cost (with caching).

Parameters:
  • protospacer (str) –

    Full sequence of the protospacer/on-target. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified

      or provided as ‘N’.

  • offtarget_seq (str) –

    Full sequence of the (off-)target: 5’-20nt-PAM-3’. Can be provided in 3 formats:

    • 20 nts: 5’-target-3’. All nucleotides should be specified.

    • 23 nts: 5’-target-PAM-3’. The PAM should be specified or provided as ‘NGG’.

    • 24 nts: 5’-upstream_nt-target-PAM-3’. The upstream_nt can be specified

      or provided as ‘N’.

Returns:

  • dna_opening_energy (tuple [float]) – Free energies required to open the DNA duplex

  • rna_duplex_energy (tuple [float]) – Free energies required to form the RNA duplex

crisprzip.nucleic_acid.make_hybr_energy_func(protospacer: str, weight: float | Tuple[float, float] = None) Callable

Make a hybridization energy function.