Codebook Data Page v2

Datasets that accompany the 2025/2026 Codebook publications

Description

This page contains basic information about the Codebook data sets, including plasmid information, experiment metadata, processed data (peaks, etc) and PWMs. Please refer to publications for details. Raw data are found on SRA and GEO under PRJEB78913 (ChIP), PRJEB76622 (GHT-SELEX), PRJEB61115 (HT-SELEX), GSE275577 (PBM).


The changes relative to v1 (the bioRxiv version) are as follows (see Excel document for details):

  • We fixed several sample mislabelings by examining k-mer enrichments between replicate SELEX experiments, and by searching for the recoded gene synthesis constructs in ChIP-seq reads. We relabeled affected experiments and PWMs.
  • Next, we revised the MAGIX pipeline to begin with a peak-calling process, rather than genomic bins, and a thresholds was applied on the count of unique reads per peak. Thus, the GHT-SELEX peak locations and scores are "jittered" relative to the original versions.
  • Next, we recalculated the TOPs and CTOPs.

Browsable Data

TFs and plasmids

ChIP-Seq

GHT-SELEX

PBM

HT-SELEX

PWMs and PWM scan results

ATAC-seq and H3K9me3 ChIP-seq data for selected Dark TFs

Comparison data