**Overview of GHT-SELEX Data**

This section provides an overview of the GHT-SELEX files utilized in the {Codebook, 2024} paper.

Genomic High-Throughput Systematic Evolution of Ligands by EXponential Enrichment (GHT-SELEX) assays were conducted for 392 transcription factors (TFs) as part of this project. The data generated from these assays is accessible here.

### Metadata

For detailed experimental information, please download the "GHT_SELEX_Metadata.xlsx" from the project's website. This Excel file comprises 5 individual spreadsheets:

1. **README**: Contains column descriptions for the remaining 5 spreadsheets.
2. **GHT_Overview_TF**: Provides data for each of the 392 TFs, including the number of GHT-SELEX experiments performed and their success rates.
3. **GHT_Overview_DBD**: Offers insights into experiments conducted for different constructs across each TF, distinguishing between full-length constructs and DNA binding domain (DBD) constructs.
4. **GHT_Experiment_Info**: Contains details for each individual experiment.
5. **Controls_Info**: Contains information about individual control reads.
6. **MAGIX_Peaks**: Offers information about peak files generated using the novel MAGIX method (Najafabadi lab, McGill University).

### Data Download

You can access the GHT-SELEX data by visiting the website under the GHT-SELEX heading.

The data is organized into the following directories:

1. **Reads (TrimmedReads.tar.gz):**
   This directory contains trimmed sequencing reads from all GHT-SELEX experiments conducted in this project.
   GHT-SELEX experiments typically involve multiple cycles, ranging from 3 to 4, and the reads are categorized into sub-folders based on their respective SELEX cycle (Cycle1, Cycle2, Cycle3, Cycle4).
   Additionally, there is a "Control" subdirectory that houses control experiments.
   Depending on the sequencing method, reads are either single-end or paired-end.
   For further details, please consult the "GHT_SELEX_Metadata.xlsx" and the accompanying paper.

2. **Genome Coverage Reads (BigWigs.tar.gz):**
   This directory contains genome coverage reads in "BigWig" format.
   The reads for each experiment were aligned to the genome using Bowtie2, and the coverage maps were generated using deepTools bamCoverage.
   More information can be found in the Methods section of the paper.

3. **Peak Summits (Peaks_Summits_Toronto.tar.gz):**
   These files are in "BED" format and display the binding sites resulting from each experiment. They represent the summit of the peaks identified by MACS2, indicating the most likely binding site of a TF.
   Each BED file consists of 5 columns, including chromosome, start, end, peak_name, and peak_score.

4. **Narrow Peaks (Peaks_NarrowPeaks_Toronto.tar.gz):**
   These files are similar to the summit files but encompass a wider region for each peak, corresponding to the peak width. Each narrowPeak file contains 10 columns, providing information on chromosome, start, end, peak_name, peak_score, strand, signalValue, pValue, qValue, and peak_summit.

5. **MAGIX Peaks (Peaks_MAGIX_McGill.tar.gz):**
   A novel "MAGIX" method was employed to call the peaks from experiments that met high-quality control standards, resulting in a single peak file for each TF, developed by the Najafabadi lab (McGill University).
   Each file comprises 6 columns: chr, start, stop, name, coefficient.br, coefficient.ar, full_LL, reduced_LL, p-value, and false discovery rate (FDR).
   For a more comprehensive understanding, please refer to the accompanying paper.