Contribute Media
A thank you to everyone who makes this possible: Read More

tfmodisco-lite: an attribution based motif discovery algorithm | SciPy 2023

Description

An important problem in genomics is identifying the proteins that bind to DNA. Although many methods attempt to learn DNA motifs underlying protein binding as position-weight matrices (PWMs), these PWMs cannot faithfully represent real biology. For instance, a static PWM cannot describe a zinc-finger protein whose fingers can optionally include one-nucleotide spacing. TF-MoDISco is a framework for extracting motifs using attribution scores from a machine-learning model. The learned motifs and syntax overcome many of the limitations presented by PWM. I will describe the TF-MoDISco algorithm and showcase its efficient re-implementation, tfmodisco-lite.

Details

Improve this page