Outlier scan
============

``flexsweep scan`` is a standalone positive-selection scan independent of the
CNN/DANN pipeline. It computes one or more selection statistics from VCF data
and assigns each locus an empirical p-value based on the genome-wide
distribution of that statistic. No neutral simulations are required.

How it works
------------

Each statistic is computed at its natural resolution — per-SNP statistics
(iHS, nSL, etc.) produce one value per polymorphic site; sliding-window
statistics (H12, LASSI, etc.) produce one value per window. Once all contigs
are processed, raw values from the entire genome are pooled and ranked
together to produce genome-wide empirical p-values.

Empirical p-values
~~~~~~~~~~~~~~~~~~

For each statistic an empirical p-value is assigned following the
empirical outlier approach (Akey 2009):

.. math::

   p_i = \frac{\mathrm{rank}(-x_i)}{N_{\mathrm{valid}}}

where rank is computed on the negative value so that the **largest** statistic
gets the **smallest** p-value (rank 1 → p ≈ 0, outlier). :math:`N_{\mathrm{valid}}`
is the count of non-NaN loci. NaN values are excluded from the ranking and do
not contribute to :math:`N_{\mathrm{valid}}`.

For **signed statistics** (iHS, nSL, Tajima's D, Fay-Wu H, Zeng E), ranking
is done on :math:`|x_i|` before convert to negative, so that extreme values at both tails
are flagged as outliers.

The output column is named ``{stat}_pvalue``. A value close to 0 means the
locus is among the most extreme in the genome for that statistic.

.. note::

   The empirical p-value is not an analytical p-value. It reflects the
   position of a locus within *this* genome's distribution; it does not
   correspond to a controlled false-positive rate.


Available statistics
--------------------

Per-SNP statistics
~~~~~~~~~~~~~~~~~~

One score per polymorphic site. No sliding window.

.. list-table::
   :header-rows: 1
   :widths: 15 20 65

   * - Key
     - Rank column
     - Description
   * - ``ihs``
     - ``ihs``
     - Integrated haplotype score (Voight et al. 2006). Detects incomplete
       hard sweeps via extended haplotype homozygosity. Normalized within
       DAF bins. Configurable: ``min_maf`` (0.05), ``include_edges`` (False),
       ``gap_scale`` (20000), ``max_gap`` (200000).
   * - ``nsl``
     - ``nsl``
     - Number of segregating sites by length (Ferrer-Admetlla et al. 2014).
       Robust alternative to iHS; no genetic map required. Normalized within
       DAF bins. Configurable: ``min_maf`` (0.05).
   * - ``isafe``
     - ``isafe``
     - Identifying the favored allele in a sweep (Akbari et al. 2018).
       Pinpoints the causal mutation within a detected sweep region. Runs on
       non-overlapping regions. Configurable: ``region_size_bp`` (1000000),
       ``isafe_window`` (300), ``isafe_step`` (150), ``top_k`` (1),
       ``max_rank`` (15).
   * - ``dind``
     - ``dind``
     - Derived intra-allelic nucleotide diversity ratio (Barreiro et al.
       2009). Configurable: ``window_size`` (50000), ``min_focal_freq``
       (0.25), ``max_focal_freq`` (0.95).
   * - ``high_freq``
     - ``high_freq``
     - Frequency of high-frequency derived variants in a focal window
       (Lauterbur et al. 2023). Configurable: ``window_size`` (50000),
       ``min_focal_freq`` (0.25), ``max_focal_freq`` (0.95).
   * - ``low_freq``
     - ``low_freq``
     - Frequency of low-frequency derived variants in a focal window
       (Lauterbur et al. 2023). Configurable: ``window_size`` (50000),
       ``min_focal_freq`` (0.25), ``max_focal_freq`` (0.95).
   * - ``s_ratio``
     - ``s_ratio``
     - Ratio of segregating sites on derived vs. ancestral haplotypes
       (Lauterbur et al. 2023). Configurable: ``window_size`` (50000),
       ``min_focal_freq`` (0.25), ``max_focal_freq`` (0.95).
   * - ``hapdaf_o``
     - ``hapdaf_o``
     - Haplotype-derived allele frequency, other background (Lauterbur et al.
       2023). Configurable: ``window_size`` (50000), ``min_focal_freq``
       (0.25), ``max_focal_freq`` (0.95), ``max_ancest_freq`` (0.25),
       ``min_tot_freq`` (0.25).
   * - ``hapdaf_s``
     - ``hapdaf_s``
     - Haplotype-derived allele frequency, sweep background (Lauterbur et al.
       2023). Stricter ancestral-frequency thresholds than ``hapdaf_o``.
       Configurable: ``window_size`` (50000), ``min_focal_freq`` (0.25),
       ``max_focal_freq`` (0.95), ``max_ancest_freq`` (0.10),
       ``min_tot_freq`` (0.10).
   * - ``hscan``
     - ``hscan``
     - Average pairwise haplotype homozygosity tract length H(x) (Messer
       2015). Measures the mean shared haplotype block length across all
       sample pairs; detects hard and soft sweeps. Configurable:
       ``max_gap`` (200000), ``dist_mode`` (0), ``hscan_step`` (1).
        (Use ``hscan_step`` (not ``step``) to control scan resolution.
          ``step`` is a shared SNP-window parameter and is not used by hscan)

Sliding SNP-window statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One score per window of a fixed number of SNPs.

.. list-table::
   :header-rows: 1
   :widths: 15 12 12 12 49

   * - Key
     - Rank column
     - Default window
     - Default step
     - Description
   * - ``haf``
     - ``haf``
     - 201 SNPs
     - 10 SNPs
     - Haplotype allele frequency (Ronen et al. 2015). Mean pairwise
       haplotype similarity across a SNP window. Configurable via shared
       ``w_size`` and ``step``.
   * - ``h12``
     - ``h12``
     - 200 SNPs
     - 10 SNPs
     - H12 haplotype homozygosity (Garud et al. 2015). Combines the two most
       common haplotype frequencies. Configurable via shared ``w_size`` and
       ``step``.
   * - ``garud``
     - ``h12``
     - 200 SNPs
     - 10 SNPs
     - Full Garud statistics: H1, H12, H2/H1 (Garud et al. 2015).
       Configurable via shared ``w_size`` and ``step``.
   * - ``lassi``
     - ``T_m``
     - 201 SNPs
     - 10 SNPs
     - Composite likelihood sweep scan using the haplotype frequency spectrum
       (DeGiorgio et al. 2014). Configurable: ``K_truncation`` (10),
       ``sweep_mode`` (4), and shared ``w_size``, ``step``.
   * - ``lassip``
     - ``Lambda``
     - 201 SNPs
     - 10 SNPs
     - Spatially-aware saltiLASSI (DeGiorgio & Szpiech 2022). Configurable:
       ``K_truncation`` (10), ``sweep_mode`` (4), ``max_extend`` (100000),
       ``n_A`` (100), and shared ``w_size``, ``step``.
   * - ``raisd``
     - ``mu_total``
     - 50 SNPs
     - 1 SNP
     - RAiSD μ composite statistic combining SFS, SNP density variation, and
       LD (Alachiotis & Pavlidis 2018). Configurable: ``window_size`` (50).
       Additional output columns: ``mu_var``, ``mu_sfs``, ``mu_ld``.

Sliding bp-window statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One score per physical window. All configurable via shared ``w_size_bp``
(default 1 Mb) and ``step_bp`` (default 10 kb), except ``omega``, ``zns``,
``beta``, and ``ncd`` which have their own narrower defaults.

.. list-table::
   :header-rows: 1
   :widths: 15 10 10 10 55

   * - Key
     - Rank column
     - Default window
     - Default step
     - Description
   * - ``tajima_d``
     - ``tajima_d``
     - 1 Mb
     - 10 kb
     - Tajima's D (Tajima 1989). SFS-based test; negative values signal
       directional selection. Signed stat — ranked by ``abs(value)``.
   * - ``pi``
     - ``pi``
     - 1 Mb
     - 10 kb
     - Nucleotide diversity θ\ :sub:`π`.
   * - ``theta_w``
     - ``theta_w``
     - 1 Mb
     - 10 kb
     - Watterson's θ\ :sub:`W`.
   * - ``fay_wu_h``
     - ``fay_wu_h``
     - 1 Mb
     - 10 kb
     - Fay & Wu's H (Fay & Wu 2000). Sensitive to high-frequency derived
       alleles. Signed stat — ranked by ``abs(value)``.
   * - ``zeng_e``
     - ``zeng_e``
     - 1 Mb
     - 10 kb
     - Zeng's E (Zeng et al. 2006). Signed stat — ranked by ``abs(value)``.
   * - ``achaz_y``
     - ``achaz_y``
     - 1 Mb
     - 10 kb
     - Achaz Y (Achaz 2009). Robust to sequencing errors.
   * - ``fuli_d``
     - ``fuli_d``
     - 1 Mb
     - 10 kb
     - Fu & Li's D (Fu & Li 1993).
   * - ``fuli_d_star``
     - ``fuli_d_star``
     - 1 Mb
     - 10 kb
     - Fu & Li's D* (no outgroup required).
   * - ``fuli_f``
     - ``fuli_f``
     - 1 Mb
     - 10 kb
     - Fu & Li's F (Fu & Li 1993).
   * - ``fuli_f_star``
     - ``fuli_f_star``
     - 1 Mb
     - 10 kb
     - Fu & Li's F* (no outgroup required).
   * - ``neutrality``
     - ``tajima_d``
     - 1 Mb
     - 10 kb
     - Composite: Tajima's D, π, θ\ :sub:`W`, Fay-Wu H in one pass.
       Ranked by ``abs(tajima_d)``.
   * - ``omega``
     - ``omega_max``
     - 100 kb
     - 10 kb
     - Kim & Nielsen's ω (Kim & Nielsen 2004). LD patterns around a putative
       sweep centre.
   * - ``zns``
     - ``zns``
     - 100 kb
     - 10 kb
     - Kelly's Z\ :sub:`nS` (Kelly 1997). Mean pairwise r² across all SNP
       pairs in a window.
   * - ``beta``
     - ``beta1``
     - 50 kb
     - 5 kb
     - Beta1 statistic for balancing selection (Siewert & Voight 2017).
       Configurable: ``m`` (0.1).
   * - ``ncd``
     - ``ncd1``
     - 3 kb
     - 1.5 kb
     - NCD1 for balancing selection (Bitarello et al. 2018). Configurable:
       ``tf`` (0.5), ``w`` (3000), ``minIS`` (2).

Window mode
~~~~~~~~~~~

SNP-count windows are required for H12, LASSI, saltiLASSI, and RAiSD —
physical windows confound SNP density with haplotype diversity for those
statistics. SFS-based statistics (Tajima's D, Fay-Wu H, etc.) use physical
bp windows by default.

With the default ``window_mode="auto"``, each statistic uses its built-in
mode. Pass ``window_mode="snp"`` or ``window_mode="bp"`` to force a uniform
mode across all window statistics.


Normalization
-------------

Statistics sensitive to allele frequency (iHS, nSL, DIND, high_freq,
low_freq, s_ratio, hapdaf_o, hapdaf_s) are z-scored within genome-wide DAF
bins before p-values are computed. This removes the frequency-dependent bias
that would otherwise cause high-frequency SNPs to dominate outlier lists.

**DAF-only normalization** (default):

1. Compute 50 equal-frequency DAF bin edges over the genome-wide DAF
   distribution.
2. Assign each SNP to a bin.
3. Within each bin: z-score = (value − mean) / std. Bins with fewer than 2
   SNPs are left as NaN.

**Joint DAF × recombination rate normalization** (when ``recombination_map``
and ``n_r_bins`` are both set):

The genome is additionally stratified by recombination rate. Each SNP is
assigned to a (DAF bin, recomb rate bin) cell and z-scored within that cell.
This further reduces false positives in low-recombination regions (Johnson et
al. approach). To enable it, pass both ``recombination_map`` and
``n_r_bins`` (typically ``n_r_bins=10``):

.. code-block:: python

    results = scan(
        "data/vcf/",
        "results/YRI",
        stats=["ihs", "nsl"],
        recombination_map="data/decode_sexavg_2019.txt.gz",
        n_daf_bins=50,
        n_r_bins=10,
    )

.. note::

   Passing only ``recombination_map`` without ``n_r_bins`` uses DAF-only
   normalization with genetic-distance windows for T3 stats (dind, hapdaf,
   s_ratio). Set ``n_r_bins`` explicitly to enable joint normalization.


Multi-contig usage
------------------

Pass a directory to ``--vcf_path`` to process all ``*.vcf.gz`` / ``*.bcf.gz``
files. The scan uses a two-step approach:

1. Each contig is processed independently; raw unranked values are
produced for every requested statistic.

2. Results from all contigs are concatenated per statistic.
Normalization and empirical p-values are computed across all contigs together,
ensuring p-values reflect the true genome-wide distribution.

.. code-block:: bash

    flexsweep scan \
        --vcf_path data/vcf/ \
        --out_prefix results/YRI \
        --stats ihs,nsl,h12,lassip \
        --recombination_map data/decode_sexavg_2019.txt.gz \
        --nthreads 4


CLI reference
-------------

.. code-block:: text

    flexsweep scan [OPTIONS]

.. list-table::
   :header-rows: 1
   :widths: 25 12 63

   * - Option
     - Default
     - Description
   * - ``--vcf_path PATH``
     - required
     - Directory of ``*.vcf.gz`` files (one per chromosome/contig).
   * - ``--out_prefix PREFIX``
     - required
     - Output prefix. Writes ``{PREFIX}.{stat}.txt`` for each stat.
   * - ``--stats LIST``
     - required
     - Comma-separated stat keys, e.g. ``ihs,nsl,h12,lassip``.
   * - ``--w_size INT``
     - 201
     - SNP-count window size for SNP-mode stats.
   * - ``--step INT``
     - 10
     - SNP step size for SNP-mode stats.
   * - ``--w_size_bp INT``
     - 1000000
     - Physical window size (bp) for bp-mode stats.
   * - ``--step_bp INT``
     - 10000
     - Physical step size (bp) for bp-mode stats.
   * - ``--window_mode``
     - auto
     - ``auto`` uses per-stat defaults; ``snp`` forces SNP-count windows;
       ``bp`` forces physical bp windows for all window stats.
   * - ``--min_maf FLOAT``
     - 0.05
     - Minimum minor allele frequency for iHS and nSL.
   * - ``--window_size INT``
     - 50000
     - Focal window size (bp) for per-SNP stats (dind, hapdaf_o,
       hapdaf_s, s_ratio, high_freq, low_freq).
   * - ``--recombination_map PATH``
     - None
     - TSV recombination map (chr, start, end, cm_mb, cm). Enables
       genetic-distance windows for T3 stats.
   * - ``--n_daf_bins INT``
     - 50
     - Number of equal-frequency DAF bins for normalization.
   * - ``--n_r_bins INT``
     - None
     - Number of recombination rate bins for joint DAF × recomb
       normalization. Set to 10 to match Johnson et al. Requires
       ``--recombination_map``.
   * - ``--max_extend FLOAT``
     - 100000
     - saltiLASSI spatial decay cutoff in bp.
   * - ``--K_truncation INT``
     - 10
     - K truncation for LASSI/saltiLASSI (number of HFS classes).
   * - ``--sweep_mode INT``
     - 4
     - Sweep spectral model for LASSI/saltiLASSI (1–5; 4 = Gaussian decay).
   * - ``--raisd_window INT``
     - 50
     - SNP window size for RAiSD.
   * - ``--nthreads INT``
     - 1
     - Number of parallel workers.

Examples:

.. code-block:: bash

    # iHS + nSL on a single chromosome
    flexsweep scan \
        --vcf_path YRI.chr22.vcf.gz \
        --out_prefix results/YRI.chr22 \
        --stats ihs,nsl

    # H12 + saltiLASSI + RAiSD with custom SNP window
    flexsweep scan \
        --vcf_path YRI.chr22.vcf.gz \
        --out_prefix results/YRI.chr22 \
        --stats h12,lassip,raisd \
        --w_size 400 \
        --nthreads 4

    # SFS statistics using physical bp windows
    flexsweep scan \
        --vcf_path YRI.chr22.vcf.gz \
        --out_prefix results/YRI.chr22 \
        --stats tajima_d,fay_wu_h,zeng_e,omega

    # DIND + HapDAF with a larger focal window
    flexsweep scan \
        --vcf_path YRI.chr22.vcf.gz \
        --out_prefix results/YRI.chr22 \
        --stats dind,hapdaf_o,hapdaf_s \
        --window_size 100000 \
        --recombination_map data/decode_sexavg_2019.txt.gz

    # Genome-wide scan — joint DAF × recomb normalization
    flexsweep scan \
        --vcf_path data/vcf/ \
        --out_prefix results/YRI \
        --stats ihs,nsl,h12,lassip \
        --recombination_map data/decode_sexavg_2019.txt.gz \
        --n_r_bins 10 \
        --nthreads 4


Python API
----------

.. code-block:: python

    from flexsweep.scan import scan, available_stats, stat_params

    # List all available stat keys
    print(available_stats())

    # Inspect default parameters for all stats
    stat_params()

    # Inspect a single stat — shows rank_col, resolution, window_mode,
    # default_window, default_step, shared_params, and stat_params
    stat_params("raisd")
    # {'raisd': {'rank_col': 'mu_total', 'resolution': 'window',
    #            'window_mode': 'snp', 'default_window': '50 SNPs',
    #            'default_step': '10 SNPs',
    #            'stat_params': {'window_size': 50}, ...}}

    stat_params("hscan")
    # {'hscan': {'rank_col': 'hscan', 'resolution': 'snp',
    #            'window_mode': 'n/a (per-SNP stat)',
    #            'stat_params': {'max_gap': 200000, 'dist_mode': 0,
    #                            'hscan_step': 1}, ...}}

    # Basic scan
    results = scan(
        "data/vcf/",
        "results/YRI",
        stats=["ihs", "nsl", "h12", "lassip"],
        min_maf=0.05,
        recombination_map="data/decode_sexavg_2019.txt.gz",
        nthreads=4,
    )
    # results["ihs"]    → Polars DataFrame, SNP resolution, ihs_pvalue column
    # results["lassip"] → Polars DataFrame, window resolution, Lambda_pvalue column

    # Joint DAF × recomb normalization
    results = scan(
        "data/vcf/",
        "results/YRI",
        stats=["ihs", "nsl"],
        recombination_map="data/decode_sexavg_2019.txt.gz",
        n_daf_bins=50,
        n_r_bins=10,
        nthreads=4,
    )

    # Per-stat parameter overrides via config dict
    results = scan(
        "data/vcf/",
        "results/YRI",
        stats=["lassip", "raisd", "hscan"],
        config={
            "lassip": {"max_extend": 5e4, "K_truncation": 15},
            "raisd":  {"window_size": 100},
            "hscan":  {"hscan_step": 5, "max_gap": 100_000},
        },
        nthreads=4,
    )


Output format
-------------

Each statistic writes one tab-separated file ``{out_prefix}.{stat}.txt``:

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Column
     - Description
   * - ``chrom``
     - Chromosome/contig name from VCF.
   * - ``pos``
     - Genomic position (bp). SNP stats: SNP position. Window stats: centre
       of window.
   * - ``daf``
     - Derived allele frequency (iHS, nSL, DIND, HapDAF, s_ratio, iSAFE).
   * - ``n_snps``
     - Number of SNPs in window (window stats only).
   * - ``{stat_col}``
     - Main statistic value (after DAF normalization for iHS/nSL/dind/hapdaf/
       s_ratio).
   * - ``{stat_col}_pvalue``
     - Empirical p-value: rank(−value) / N_valid. Range (0, 1]; smaller =
       more extreme. Signed stats (iHS, nSL, Tajima's D, Fay-Wu H, Zeng E)
       are ranked by ``abs(value)``.

Additional columns vary by statistic (e.g., ``h2_h1`` for garud, ``m`` and
``A`` for lassip, ``mu_var``, ``mu_sfs``, ``mu_ld`` for raisd).


Scan results visualization
---------------------------

Use ``plot_scan`` from ``flexsweep.utils`` to generate Manhattan-style or
regional zoom plots directly from ``scan()`` output or saved TSV files.

**Genome-wide plot:**

.. code-block:: python

    from flexsweep.utils import plot_scan

    # When passing a scan() dict, stat_cols is resolved automatically
    # from STAT_REGISTRY (e.g. "raisd" → "mu_total", "lassi" → "T_m").
    # No need to specify stat_cols.

    # Single statistic — raw values, top 1% highlighted
    plot_scan({"ihs": results["ihs"]}, out="results/YRI.ihs.png")

    # Single statistic — empirical p-values (-log10 scale)
    plot_scan({"ihs": results["ihs"]}, pvalue=True, out="results/YRI.ihs.png")

    # Stacked multi-statistic panels — rank columns resolved automatically
    plot_scan(
        {k: results[k] for k in ["ihs", "h12", "lassip", "raisd"]},
        pvalue=True,
        out="results/YRI.multi.png",
    )
    # Plots: ihs, h12, Lambda (lassip), mu_total (raisd)

**Regional zoom plot** (raw + p-value side by side):

.. code-block:: python

    plot_scan(
        {k: results[k] for k in ["ihs", "lassip"]},
        chrom="22",
        center=17_000_000,
        window_bp=500_000,
        out="results/YRI.zoom.png",
    )

**From saved TSV files** (stat_cols must be provided explicitly):

.. code-block:: python

    plot_scan(
        ["results/YRI.ihs.txt", "results/YRI.lassip.txt"],
        stat_cols=["ihs", "Lambda"],   # must match column name in file
        pvalue=True,
        out="results/YRI.multi.png",
    )

``plot_scan`` parameters:

.. list-table::
   :header-rows: 1
   :widths: 20 12 68

   * - Parameter
     - Default
     - Description
   * - ``stats``
     - required
     - ``dict`` from ``scan()``, a single TSV path, or a list of TSV paths.
   * - ``stat_cols``
     - None
     - Stat column name(s). Required when ``stats`` is a file path / list.
       Defaults to all keys when ``stats`` is a dict.
   * - ``pvalue``
     - False
     - If True, plot :math:`-\log_{10}(p_{\mathrm{emp}})` with threshold
       lines at p = 0.01 and p = 0.001. If False, plot raw values with
       outliers highlighted.
   * - ``top_pct``
     - 0.01
     - Fraction of loci highlighted as outliers in raw mode.
   * - ``chrom``
     - None
     - Chromosome for zoom mode. Provide together with ``center``.
   * - ``center``
     - None
     - Centre position (bp) for zoom mode.
   * - ``window_bp``
     - 500000
     - Half-window in bp for zoom mode (±500 kb around ``center``).
   * - ``out``
     - None
     - Save path. If None, shows interactively.
   * - ``figsize``
     - None
     - Figure size tuple. Defaults to (14, 4) genome-wide, (10, 2.5×n) zoom.
   * - ``sharey``
     - False
     - Share y-axis across stacked panels.
   * - ``threshold_lines``
     - None
     - List of ``(y_value, linestyle, label)`` for horizontal lines in
       p-value mode. Pass ``[]`` to suppress defaults.


References
----------

**iHS**
  Voight, B.F., Kudaravalli, S., Wen, X. and Pritchard, J.K. (2006) A map of recent
  positive selection in the human genome. *PLOS Biology*, 4, e72.

**nSL**
  Ferrer-Admetlla, A., Liang, M., Korneliussen, T. and Nielsen, R. (2014) On detecting
  incomplete soft or hard selective sweeps using haplotype structure. *Molecular Biology
  and Evolution*, 31, 1275–1286.

**iSAFE**
  Akbari, A., Vitti, J.J., Iranmehr, A., Bakhtiari, M., Sabeti, P.C., Mirarab, S. and
  Bafna, V. (2018) Identifying the favored mutation in a positive selective sweep.
  *Nature Methods*, 15, 183–185.

**DIND**
  Barreiro, L.B., Henriques, R., Soares, M.J., Oliveira, J., Gasche, C., … and
  Quintana-Murci, L. (2009) Evolutionary dynamics of human Toll-like receptors and their
  different contributions to host defense. *PLOS Genetics*, 5, e1000562.

**HapDAF-s/o, s_ratio, high_freq, low_freq**
  Lauterbur, M.E., Munch, K. and Enard, D. (2023) Versatile detection of diverse selective
  sweeps with Flex-sweep. *Molecular Biology and Evolution*, 40, msad139.

**H12, H2/H1**
  Garud, N.R., Messer, P.W., Buzbas, E.O. and Petrov, D.A. (2015) Recent selective sweeps
  in North American *Drosophila melanogaster* show signatures of soft sweeps. *PLOS
  Genetics*, 11, e1005004.

**h-scan**
  Schlamp, et al. (2016) Evaluating the performance of selection scans to detect selective sweeps in domestic dogs.

**LASSI**
  Harris, A. and DeGiorgio. (2020) A Likelihood Approach for Uncovering Selective Sweep Signatures from haplotype Data

**saltiLASSI**
  DeGiorgio, M. and Szpiech, Z.A. (2022) A spatially aware likelihood test to detect sweeps
  from haplotype distributions. *PLOS Genetics*, 18, e1010134.

**RAiSD**
  Alachiotis, N. and Pavlidis, P. (2018) RAiSD detects positive selection based on multiple
  signatures of a selective sweep and SNP vectors. *Communications Biology*, 1, 79.

**ω (omega)**
  Kim, Y. and Nielsen, R. (2004) Linkage disequilibrium as a signature of selective sweeps.
  *Genetics*, 167, 1513–1524.

**ZnS**
  Kelly, J.K. (1997) A test of neutrality based on interlocus associations. *Genetics*,
  146, 1197–1206.

**Tajima's D**
  Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA
  polymorphism. *Genetics*, 123, 585–595.

**Fay-Wu H**
  Fay, J.C. and Wu, C.I. (2000) Hitchhiking under positive Darwinian selection. *Genetics*,
  155, 1405–1413.

**Zeng E**
  Zeng, K., Fu, Y.X., Shi, S. and Wu, C.I. (2006) Statistical tests for detecting positive
  selection by utilizing high-frequency variants. *Genetics*, 174, 1431–1439.

**Fu-Li D, F**
  Fu, Y.X. and Li, W.H. (1993) Statistical tests of neutrality of mutations. *Genetics*,
  133, 693–709.

**Beta (balancing selection)**
  Siewert, K.M. and Voight, B.F. (2020) BetaScan2: Standardized statistics to detect
  balancing selection utilizing substitution data. *Genome Biology and Evolution*, 12, evaa013.

**NCD1**
  Bitarello, B.D., de Filippo, C., Teixeira, J.C., Schmidt, J.M., Kleinert, P., Meyer, D.
  and Andrés, A.M. (2018) Signatures of long-term balancing selection in human genomes.
  *The American Journal of Human Genetics*, 102, 725–742.

**Outlier approach**
  Akey, J.M. (2009) Constructing genomic maps of positive selection in humans: where do
  we go from here? *Genome Research*, 19, 711–722.