Codebook Data Page v2

"Datasets that accompany resubmission of journal manuscripts starting in September 2025."

Summary of changes

This file contains the changes applied in v2: CB_Summary_of_changes

Description

This page contains basic information about the Codebook data sets, including plasmid information, experiment metadata, processed data (peaks, etc) and PWMs. Please refer to publications for details. Raw data are found on SRA and GEO under PRJEB78913 (ChIP), PRJEB76622 (GHT-SELEX), PRJEB61115 (HT-SELEX), GSE275577 (PBM).


The changes relative to v1 (the bioRxiv version) are as follows (see Excel document above for details):

  • We fixed several sample mislabelings by examining k-mer enrichments between replicate SELEX experiments, and by searching for the recoded gene synthesis constructs in ChIP-seq reads. We relabeled affected experiments and PWMs.
  • Next, we revised the MAGIX pipeline to begin with a peak-calling process, rather than genomic bins, and a thresholds was applied on the count of unique reads per peak. Thus, the GHT-SELEX peak locations and scores are "jittered" relative to the original versions.
  • Next, we recalculated the TOPs and CTOPs.

Browsable Data

TFs and plasmids

ChIP-Seq

GHT-SELEX

PBM

HT-SELEX

PWMs and PWM scan results

Comparison data