Package 'snplinkage' reference manual

Title:	Single Nucleotide Polymorphisms Linkage Disequilibrium Visualizations
Description:	Linkage disequilibrium visualizations of up to several hundreds of single nucleotide polymorphisms (SNPs), annotated with chromosomic positions and gene names. Two types of plots are available for small numbers of SNPs (<40) and for large numbers (tested up to 500). Both can be extended by combining other ggplots, e.g. association studies results, and functions enable to directly visualize the effect of SNP selection methods, as minor allele frequency filtering and TagSNP selection, with a second correlation heatmap. The SNPs correlations are computed on Genotype Data objects from the 'GWASTools' package using the 'SNPRelate' package, and the plots are customizable 'ggplot2' and 'gtable' objects and are annotated using the 'biomaRt' package. Usage is detailed in the vignette with example data and results from up to 500 SNPs of 1,200 scans are in Charlon T. (2019) <doi:10.13097/archive-ouverte/unige:161795>.
Authors:	Thomas Charlon [aut, cre] , Karl Forner [aut], Alessandro Di Cara [aut], Jérôme Wojcik [aut]
Maintainer:	Thomas Charlon <[email protected]>
License:	GPL-3
Version:	1.2.0
Built:	2025-03-22 06:02:36 UTC
Source:	https://gitlab.com/thomaschln/snplinkage

Pipe

Description

Pipe an object forward into a function or call expression. Magrittr imported function, see details and examples in the magrittr package.

Arguments

`lhs`	A value or the magrittr placeholder.
`rhs`	A function call using the magrittr semantics.

Value

Result of rhs applied to lhs, see details in magrittr package.

Exposition pipe

Description

Expose the names in 'lhs' to the 'rhs' expression. Magrittr imported function, see details and examples in the magrittr package.

Arguments

`lhs`	A list, environment, or a data.frame.
`rhs`	An expression where the names in lhs is available.

Value

Result of rhs applied to one or several names of lhs.

Compute Chi-squared p-values on a Genotype data object

Description

Compute Chi-squared p-values on a Genotype data object

Usage

chisq_pvalues_gdata(
  gdata,
  snp_idxs,
  response_column = "region",
  response_value = "Europe",
  threshold = 2,
  ...
)
chisq_pvalues_gdata(
  gdata,
  snp_idxs,
  response_column = "region",
  response_value = "Europe",
  threshold = 2,
  ...
)

Arguments

`gdata`	Genotype data object
`snp_idxs`	SNPs indexes
`response_column`	Response column in gdata scans annotations data frame
`response_value`	Response value. The response vector will be a logical, true if equal to the value, false otherwise.
`threshold`	Keep only associations greater than the threshold
`...`	Passed to chisq_pvalues

Value

SNPs annotation data frame, chi-squared p-values in column pvalues

The data set consist of 103 common (>5% minor allele frequency) SNPs genotyped in 129 trios from an European-derived population. These SNPs are in a 500-kb region on human chromosome 5q31 implicated as containing a genetic risk factor for Crohn disease.

Imported from the gap R package.

An example use of the data is with the following paper, Kelly M. Burkett, Celia M. T. Greenwood, BradMcNeney, Jinko Graham. Gene genealogies for genetic association mapping, with application to Crohn's disease. Fron Genet 2013, 4(260) doi: 10.3389/fgene.2013.00260

Usage

data(crohn)data(crohn)

Format

A data frame containing 387 rows and 212 columns

Source

MJ Daly, JD Rioux, SF Schaffner, TJ Hudson, ES Lander (2001) High-resolution haplotype structure in the human genome Nature Genetics 29:229-232

Get diamond ggplot layer.

Description

Diamond ggplot layer for ggplot_ld

Usage

diamond_annots(data, x = "x", y = "y", color = "color", size = 0.5)
diamond_annots(data, x = "x", y = "y", color = "color", size = 0.5)

Arguments

`data`	Data frame of 3 columns defining the diamonds
`x`	Name of the column for horizontal positions
`y`	Name of the column for vertical positions
`color`	Name of the column for color values
`size`	Radius of the diamonds

Value

gglayers

Fetch allele 1 (default object)

Description

Fetch allele 1 (default object)

Usage

## Default S3 method:
fetch_allele1(obj, snps_idx)
## Default S3 method:
fetch_allele1(obj, snps_idx)

Arguments

`obj`	Default object
`snps_idx`	SNPs indexes

Fetch allele 1 (GdsGenotypeReader object)

Description

Fetch allele 1 (GdsGenotypeReader object)

Usage

## S3 method for class 'GdsGenotypeReader'
fetch_allele1(obj, snps_idx)
## S3 method for class 'GdsGenotypeReader'
fetch_allele1(obj, snps_idx)

Arguments

`obj`	GenotypeData object
`snps_idx`	SNPs indexes

Value

Allele 1

Fetch allele 1 (GenotypeData object)

Description

Fetch allele 1 (GenotypeData object)

Usage

## S3 method for class 'GenotypeData'
fetch_allele1(obj, ...)
## S3 method for class 'GenotypeData'
fetch_allele1(obj, ...)

Arguments

`obj`	GenotypeData object
`...`	Passed to getAlleleA

Value

Allele 1

Fetch allele 1 (GenotypeDataSubset object)

Description

Fetch allele 1 (GenotypeDataSubset object)

Usage

## S3 method for class 'GenotypeDataSubset'
fetch_allele1(obj, snps_idx)
## S3 method for class 'GenotypeDataSubset'
fetch_allele1(obj, snps_idx)

Arguments

`obj`	GenotypeDataSubset object
`snps_idx`	SNPs indexes

Value

Allele 1

Fetch allele 2 (default object)

Description

Fetch allele 2 (default object)

Usage

## Default S3 method:
fetch_allele2(obj, snps_idx)
## Default S3 method:
fetch_allele2(obj, snps_idx)

Arguments

`obj`	Default object
`snps_idx`	SNPs indexes

Fetch allele 2 (GdsGenotypeReader object)

Description

Fetch allele 2 (GdsGenotypeReader object)

Usage

## S3 method for class 'GdsGenotypeReader'
fetch_allele2(obj, snps_idx)
## S3 method for class 'GdsGenotypeReader'
fetch_allele2(obj, snps_idx)

Arguments

`obj`	GenotypeData object
`snps_idx`	SNPs indexes

Value

Allele 2

Fetch allele 2 (GenotypeData object)

Description

Fetch allele 2 (GenotypeData object)

Usage

## S3 method for class 'GenotypeData'
fetch_allele2(obj, ...)
## S3 method for class 'GenotypeData'
fetch_allele2(obj, ...)

Arguments

`obj`	GenotypeData object
`...`	Passed to getAlleleB

Value

Allele 2

Fetch allele 1 (GenotypeDataSubset object)

Description

Fetch allele 1 (GenotypeDataSubset object)

Usage

## S3 method for class 'GenotypeDataSubset'
fetch_allele2(obj, snps_idx)
## S3 method for class 'GenotypeDataSubset'
fetch_allele2(obj, snps_idx)

Arguments

`obj`	GenotypeDataSubset object
`snps_idx`	SNPs indexes

Value

Allele 2

Fetch GDS (default)

Description

Fetch GDS (default)

Usage

## Default S3 method:
fetch_gds(obj, ...)
## Default S3 method:
fetch_gds(obj, ...)

Arguments

`obj`	Default object
`...`	Not passed

Fetch GDS (GdsGenotypeReader)

Description

Fetch GDS (GdsGenotypeReader)

Usage

## S3 method for class 'GdsGenotypeReader'
fetch_gds(obj, ...)
## S3 method for class 'GdsGenotypeReader'
fetch_gds(obj, ...)

Arguments

`obj`	GdsGenotypeReader object
`...`	Not passed

Value

S4 slot 'handler' of obj

Fetch GDS (GenotypeData)

Description

Fetch GDS (GenotypeData)

Usage

## S3 method for class 'GenotypeData'
fetch_gds(obj, ...)
## S3 method for class 'GenotypeData'
fetch_gds(obj, ...)

Arguments

`obj`	GenotypeData object
`...`	Not passed

Value

fetch_gds output on S4 slot 'data' of obj

Fetch GDS (GenotypeDataSubset)

Description

Fetch GDS (GenotypeDataSubset)

Usage

## S3 method for class 'GenotypeDataSubset'
fetch_gds(obj, ...)
## S3 method for class 'GenotypeDataSubset'
fetch_gds(obj, ...)

Arguments

`obj`	GenotypeDataSubset object
`...`	Not passed

gdata_add_gene_annots

Description

Add biomaRt gene annotations to Genotype Data object.

Usage

gdata_add_gene_annots(
  gdata,
  snp_idxs,
  rsids_colname = "probe_id",
  biomart_metadb = get_biomart_metadb()
)
gdata_add_gene_annots(
  gdata,
  snp_idxs,
  rsids_colname = "probe_id",
  biomart_metadb = get_biomart_metadb()
)

Arguments

`gdata`	Genotype Data object
`snp_idxs`	SNP indexes
`rsids_colname`	Column of SNP annotation data frame with rs identifiers
`biomart_metadb`	List with slots snpmart and ensembl, corresponding to the biomart databases to query for SNP identifiers and gene names, respectively. See get_biomart_metadb function.

Value

Genotype Data object

gdata_add_gene_annots_aim_example

Description

Add ancestry informative markers gene annotations to Genotype Data object. Convenience function for the vignette to avoid querying biomaRt on build.

Usage

gdata_add_gene_annots_aim_example(gdata, aim_idxs)
gdata_add_gene_annots_aim_example(gdata, aim_idxs)

Arguments

`gdata`	Genotype Data object
`aim_idxs`	AIM indexes in the example Genotype data object

Value

Genotype Data object

gdata_add_gene_annots_hladr_example

Description

Add HLA-DR gene annotations to Genotype Data object. Convenience function for the vignette to avoid querying biomaRt on build.

Usage

gdata_add_gene_annots_hladr_example(gdata, hla_dr_idxs)
gdata_add_gene_annots_hladr_example(gdata, hla_dr_idxs)

Arguments

`gdata`	Genotype Data object
`hla_dr_idxs`	HLA-DR indexes in the example Genotype data object

Value

Genotype Data object

gdata_scan_annots

Description

Get scans annotations from a Genotype Data object or a subset.

Usage

gdata_scans_annots(gdata, scan_ids)
gdata_scans_annots(gdata, scan_ids)

Arguments

`gdata`	Genotype Data object
`scan_ids`	Scan identifiers to subset

Value

Scans annotations data frame

gdata_snp_annots

Description

Get SNPs annotations from a Genotype Data object or a subset.

Usage

gdata_snps_annots(gdata, snp_ids = NULL)
gdata_snps_annots(gdata, snp_ids = NULL)

Arguments

`gdata`	Genotype Data object
`snp_ids`	SNP identifiers to subset

Value

SNP annotation data frame

get_biomart_metadb

Description

To query gene names of SNPs, it is necessary to retrieve two objects using biomaRt::useMart. First, the object required to map SNP rs identifiers to ENSEMBL identifiers. Second, the object required to map ENSEMBL identifiers to common gene names. The function returns a list of two slots named snpmart and ensembl corresponding to each one, respectively. Once obtained it is saved to a local file.

Usage

get_biomart_metadb(
  filepath = extdata_filepath("bmart_meta.rds"),
  host = "https://grch37.ensembl.org"
)
get_biomart_metadb(
  filepath = extdata_filepath("bmart_meta.rds"),
  host = "https://grch37.ensembl.org"
)

Arguments

`filepath`	Path to save the biomaRt objects
`host`	BiomaRt Ensembl host, by default https://grch37.ensembl.org

Value

List of slots snpmart and ensembl as detailed above

Get scans annotations (GenotypeData object)

Description

Get scans annotations (GenotypeData object)

Usage

## S3 method for class 'GenotypeData'
get_scan_annot(obj, ...)
## S3 method for class 'GenotypeData'
get_scan_annot(obj, ...)

Arguments

`obj`	GenotypeData object
`...`	Not passed

Value

Data frame

Get scans annotations (GenotypeDataSubset object)

Description

Get scans annotations (GenotypeDataSubset object)

Usage

## S3 method for class 'GenotypeDataSubset'
get_scan_annot(obj, ...)
## S3 method for class 'GenotypeDataSubset'
get_scan_annot(obj, ...)

Arguments

`obj`	GenotypeDataSubset object
`...`	Not passed

Value

Data frame

Get SNPs annotations (GenotypeData object)

Description

Get SNPs annotations (GenotypeData object)

Usage

## S3 method for class 'GenotypeData'
get_snp_annot(obj, ...)
## S3 method for class 'GenotypeData'
get_snp_annot(obj, ...)

Arguments

`obj`	GenotypeData object
`...`	Not passed

Value

Data frame

Get SNPs annotations (GenotypeDataSubset object)

Description

Get SNPs annotations (GenotypeDataSubset object)

Usage

## S3 method for class 'GenotypeDataSubset'
get_snp_annot(obj, ...)
## S3 method for class 'GenotypeDataSubset'
get_snp_annot(obj, ...)

Arguments

`obj`	GenotypeDataSubset object
`...`	Not passed

Value

Data frame

Ggplot associations

Description

Get SNPs associations ggplot, either as points or as a linked area. Optionally add labels to most associated points using ggrepel.

Usage

ggplot_associations(
  df_snp,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  n_labels = 10,
  nudge = c(0, 1),
  linked_area = FALSE,
  byindex = linked_area,
  colors = if (linked_area) snp_position_colors(nrow(df_snp)) else "black"
)
ggplot_associations(
  df_snp,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  n_labels = 10,
  nudge = c(0, 1),
  linked_area = FALSE,
  byindex = linked_area,
  colors = if (linked_area) snp_position_colors(nrow(df_snp)) else "black"
)

Arguments

`df_snp`	SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname.
`pvalue_colname`	Column name of df_snp with association values
`labels_colname`	Optional column name of df_snp with labels. Set to NULL to remove.
`n_labels`	Number of labels of most associated points to display.
`nudge`	Nudge parameter passed to ggrepel::geom_label_repel.
`linked_area`	Add a linked area to associations points, default FALSE
`byindex`	Display by SNP index or chromosomic position (default)
`colors`	Colors of SNPs

Value

ggplot

Ggplot linkage disequilibrium

Description

Display SNP r2 correlations using points or diamonds with text.

Usage

ggplot_ld(
  df_ld,
  diamonds = length(unique(df_ld$SNP_A)) < 40,
  point_size = 120/sqrt(nrow(df_ld)),
  reverse = FALSE,
  reindex = TRUE
)
ggplot_ld(
  df_ld,
  diamonds = length(unique(df_ld$SNP_A)) < 40,
  point_size = 120/sqrt(nrow(df_ld)),
  reverse = FALSE,
  reindex = TRUE
)

Arguments

`df_ld`	Data frame with columns SNP_A, SNP_B, and R2. As returned by the snprelate_ld function.
`diamonds`	Should the values be displayed as diamonds or points ? Default is TRUE for less than 40 SNPs.
`point_size`	Size for geom_point. Ignored if diamonds is TRUE.
`reverse`	Reverse the display (horizontal symmetry)
`reindex`	If FALSE, SNPs are positionned following their IDs

Value

ggplot

Ggplot SNPs position

Description

Get SNPs position ggplot with mappings to combine with other ggplots. Optionally add labels and an upper subset.

Usage

ggplot_snp_pos(
  df_snp,
  upper_subset = NULL,
  labels_colname = NULL,
  colors = snp_position_colors(nrow(df_snp))
)
ggplot_snp_pos(
  df_snp,
  upper_subset = NULL,
  labels_colname = NULL,
  colors = snp_position_colors(nrow(df_snp))
)

Arguments

`df_snp`	SNP annotation data frame with a column named position and, if specified, one named as the labels_colname parameter.
`upper_subset`	Subset of df_snp for the positions on the upper side
`labels_colname`	Optional column name of df_snp to use as SNP labels.
`colors`	Colors for each SNP

Value

ggplot

Gtable of linkage disequilibrium and chromosomic positions

Description

Creates a gtable of linkage disequilibrium and chromosomic positions ggplots. A biplot_subset parameter is available to add a second linkage disequibrium ggplot to visualize the effect of a SNP selection.

Usage

gtable_ld(
  df_ld,
  df_snp,
  biplot_subset = NULL,
  labels_colname = NULL,
  diamonds = length(unique(df_ld$SNP_A)) < 40,
  point_size = ifelse(is.null(biplot_subset), 120, 80)/sqrt(nrow(df_ld)),
  title = "",
  title_biplot = "",
  ...
)
gtable_ld(
  df_ld,
  df_snp,
  biplot_subset = NULL,
  labels_colname = NULL,
  diamonds = length(unique(df_ld$SNP_A)) < 40,
  point_size = ifelse(is.null(biplot_subset), 120, 80)/sqrt(nrow(df_ld)),
  title = "",
  title_biplot = "",
  ...
)

Arguments

`df_ld`	Data frame returned by snprelate_ld
`df_snp`	SNP annotations with columns snpID and position
`biplot_subset`	SNP indexes of the subset for the second ld plot
`labels_colname`	Column name of df_snp to use as SNP labels
`diamonds`	Display the values as diamonds or as points Default is TRUE for less than 40 SNPs.
`point_size`	Size for geom_point. Ignored if diamonds is TRUE.
`title`	Plot title
`title_biplot`	Optional biplot title
`...`	Passed to ggplot_ld

Value

gtable of ggplots

Examples

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)
snp_idxs_8p23 <- select_region_idxs(qc$gdata, chromosome = 8,
  position_min = 11e6, position_max = 12e6)

df_ld <- snprelate_ld(qc$gdata, snps_idx = snp_idxs_8p23, quiet = TRUE)
plt <- gtable_ld(df_ld, df_snp = gdata_snps_annots(qc$gdata))

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)
snp_idxs_8p23 <- select_region_idxs(qc$gdata, chromosome = 8,
  position_min = 11e6, position_max = 12e6)

df_ld <- snprelate_ld(qc$gdata, snps_idx = snp_idxs_8p23, quiet = TRUE)
plt <- gtable_ld(df_ld, df_snp = gdata_snps_annots(qc$gdata))

Gtable of linkage disequilibrium and associations

Description

Creates a gtable of a linkage disequilibrium, chromosomic positions, and association scores ggplots.

Usage

gtable_ld_associations(
  df_assocs,
  df_ld,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  n_labels = 5,
  diamonds = nrow(df_assocs) <= 40,
  linked_area = diamonds,
  point_size = 150/nrow(df_assocs),
  colors = snp_position_colors(nrow(df_assocs)),
  ...
)
gtable_ld_associations(
  df_assocs,
  df_ld,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  n_labels = 5,
  diamonds = nrow(df_assocs) <= 40,
  linked_area = diamonds,
  point_size = 150/nrow(df_assocs),
  colors = snp_position_colors(nrow(df_assocs)),
  ...
)

Arguments

`df_assocs`	SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname.
`df_ld`	Data frame with columns SNP_A, SNP_B, and R2, as returned by the snprelate_ld function.
`pvalue_colname`	Column name of df_snp with association values
`labels_colname`	Optional column name of df_snp with labels. Set NULL to remove labels.
`n_labels`	Number of labels of most associated SNPs to display.
`diamonds`	Should the values be displayed as diamonds or points ? Default is TRUE for up to 40 SNPs.
`linked_area`	Add a linked area to associations points. Default same as diamonds.
`point_size`	Point size for ggplot_ld, ignored if diamonds is TRUE.
`colors`	Colors of SNPs
`...`	Passed to ggplot_associations

Value

gtable

Build gtable by combining ggplots

Description

Build gtable by combining ggplots

Usage

gtable_ld_associations_combine(ggplots, diamonds)
gtable_ld_associations_combine(ggplots, diamonds)

Arguments

`ggplots`	List of ggplots
`diamonds`	Does the LD visualization use diamond-type layout

Value

gtable of ggplots

Examples


library(snplinkage)

# example rnaseq data frame, 20 variables of 20 patients
m_rna = matrix(runif(20 ^ 2), nrow = 20)

# pair-wise correlation matrix
m_ld = cor(m_rna) ^ 2

# keep only upper triangle and reshape to data frame
m_ld[lower.tri(m_ld, diag = TRUE)] = NA
df_ld = reshape2::melt(m_ld) |> na.omit()

# rename for SNPLinkage
names(df_ld) = c('SNP_A', 'SNP_B', 'R2')

# visualize with ggplot_ld
gg_ld = ggplot_ld(df_ld)

# let's imagine the 20 variables came from 3 physically close regions
positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5),
              runif(7, 45e5, 50e5)) |> sort()

# build the dataframe
df_snp_pos = data.frame(position = positions)
df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7))

gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label',
                               upper_subset = TRUE)

# let's assume HLA-B is more associated with the outcome than the other genes
pvalues = c(runif(7, 1e-3, 1e-2), runif(6, 1e-8, 1e-6), runif(7, 1e-3, 1e-2))
log10_pvals = -log10(pvalues)

# we can reuse the df_snp_pos object
df_snp_pos$pvalues = log10_pvals

# add the chromosome column
df_snp_pos$chromosome = 6

gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label',
                                linked_area = TRUE, nudge = c(0, 0.5),
                                n_labels = 12)

l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs)
gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE)
grid::grid.draw(gt_ld)

library(snplinkage)

# example rnaseq data frame, 20 variables of 20 patients
m_rna = matrix(runif(20 ^ 2), nrow = 20)

# pair-wise correlation matrix
m_ld = cor(m_rna) ^ 2

# keep only upper triangle and reshape to data frame
m_ld[lower.tri(m_ld, diag = TRUE)] = NA
df_ld = reshape2::melt(m_ld) |> na.omit()

# rename for SNPLinkage
names(df_ld) = c('SNP_A', 'SNP_B', 'R2')

# visualize with ggplot_ld
gg_ld = ggplot_ld(df_ld)

# let's imagine the 20 variables came from 3 physically close regions
positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5),
              runif(7, 45e5, 50e5)) |> sort()

# build the dataframe
df_snp_pos = data.frame(position = positions)
df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7))

gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label',
                               upper_subset = TRUE)

# let's assume HLA-B is more associated with the outcome than the other genes
pvalues = c(runif(7, 1e-3, 1e-2), runif(6, 1e-8, 1e-6), runif(7, 1e-3, 1e-2))
log10_pvals = -log10(pvalues)

# we can reuse the df_snp_pos object
df_snp_pos$pvalues = log10_pvals

# add the chromosome column
df_snp_pos$chromosome = 6

gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label',
                                linked_area = TRUE, nudge = c(0, 0.5),
                                n_labels = 12)

l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs)
gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE)
grid::grid.draw(gt_ld)

Gtable of linkage disequilibrium and associations using a GenotypeData object

Description

Compute linkage disequilibrium using snprelate_ld on the set of SNPs in the associations data frame and call gtable_ld_associations. Creates a gtable of a linkage disequilibrium, chromosomic positions, and association scores ggplots.

Usage

gtable_ld_associations_gdata(
  df_assocs,
  gdata,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  diamonds = nrow(df_assocs) <= 40,
  window = 15,
  ...
)
gtable_ld_associations_gdata(
  df_assocs,
  gdata,
  pvalue_colname = "pvalues",
  labels_colname = "probe_id",
  diamonds = nrow(df_assocs) <= 40,
  window = 15,
  ...
)

Arguments

`df_assocs`	SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname.
`gdata`	GenotypeData object, as returned by load_gds_as_genotype_data
`pvalue_colname`	Column name of df_snp with association values
`labels_colname`	Optional column name of df_snp with labels. Set NULL to remove labels.
`diamonds`	Should the values be displayed as diamonds or points ? Default is TRUE for up to 40 SNPs.
`window`	Window size for snprelate_ld. Forced to the total number of SNPs if diamonds is FALSE
`...`	Passed to gtable_ld_associations

Value

gtable

Examples

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

snp_idxs_mhc <- select_region_idxs(qc$gdata,
  chromosome = 6, position_min = 29e6, position_max = 33e6)
df_assocs <- chisq_pvalues_gdata(qc$gdata, snp_idxs_mhc)

df_top_aim <- subset(df_assocs, rank(-pvalues, ties.method = 'first') <= 20)

#qc$gdata <- gdata_add_gene_annots(qc$gdata, rownames(df_top_aim))
qc$gdata <- gdata_add_gene_annots_aim_example(qc$gdata, rownames(df_top_aim))

plt <- gtable_ld_associations_gdata(df_top_aim, qc$gdata,
  labels_colname = 'gene')

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

snp_idxs_mhc <- select_region_idxs(qc$gdata,
  chromosome = 6, position_min = 29e6, position_max = 33e6)
df_assocs <- chisq_pvalues_gdata(qc$gdata, snp_idxs_mhc)

df_top_aim <- subset(df_assocs, rank(-pvalues, ties.method = 'first') <= 20)

#qc$gdata <- gdata_add_gene_annots(qc$gdata, rownames(df_top_aim))
qc$gdata <- gdata_add_gene_annots_aim_example(qc$gdata, rownames(df_top_aim))

plt <- gtable_ld_associations_gdata(df_top_aim, qc$gdata,
  labels_colname = 'gene')

Gtable of linkage disequilibrium and positions using a GenotypeData object

Description

Compute linkage disequilibrium using snprelate_ld on a set of SNP indexes and call gtable_ld. Two parameters are available to compute and compare minor allele frequency filtering and TagSNP selection by displaying two LD plots with their positions in the center. The maf and r2 parameters are used similarly and as follows: - compare baseline with MAF 5 gtable_ld(gdata, snps_idx, maf = 0.05) - compare baseline with TagSNP r2 = 0.8 gtable_ld(gdata, snps_idx, r2 = 0.8) - compare 5 gtable_ld(gdata, snps_idx, maf = c(0.05, 0.05), r2 = 0.8) - compare MAF 5 gtable_ld(gdata, snps_idx, maf = c(0.05, 0.1), r2 = c(0.8, 0.6))

Usage

gtable_ld_gdata(
  gdata,
  snps_idx,
  maf = NULL,
  r2 = NULL,
  diamonds = length(snps_idx) < 40,
  window = 15,
  autotitle = TRUE,
  autotitle_bp = TRUE,
  double_title = FALSE,
  ...
)
gtable_ld_gdata(
  gdata,
  snps_idx,
  maf = NULL,
  r2 = NULL,
  diamonds = length(snps_idx) < 40,
  window = 15,
  autotitle = TRUE,
  autotitle_bp = TRUE,
  double_title = FALSE,
  ...
)

Arguments

`gdata`	GenotypeData object returned by load_gds_as_genotype_data
`snps_idx`	SNPs indexes to select
`maf`	Minor allele frequency threshold(s), see description
`r2`	TagSNP r2 threshold(s), see description
`diamonds`	Display the values as diamonds or as points Default is TRUE for less than 40 SNPs.
`window`	Window size for snprelate_ld. Forced to the total number of SNPs if diamonds is FALSE
`autotitle`	Set title to feature selection method(s), number of SNPs and chromosome
`autotitle_bp`	Set biplot title to feature selection method(s), number of SNPs and chromosome
`double_title`	Logical, if false (default) keep only biplot title
`...`	Passed to gtable_ld

Value

gtable of ggplots

Examples

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

snp_idxs_1p13_large <- select_region_idxs(qc$gdata, chromosome = 1,
  position_min = 114e6, n_snps = 100)
plt <- gtable_ld_gdata(qc$gdata, snp_idxs_1p13_large)

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

snp_idxs_1p13_large <- select_region_idxs(qc$gdata, chromosome = 1,
  position_min = 114e6, n_snps = 100)
plt <- gtable_ld_gdata(qc$gdata, snp_idxs_1p13_large)

Build gtable by combining ggplots

Description

Build gtable by combining ggplots

Usage

gtable_ld_grobs(plots, labels_colname, title)
gtable_ld_grobs(plots, labels_colname, title)

Arguments

`plots`	List of ggplots
`labels_colname`	Does the SNP position plot contain labels
`title`	Title text string

Value

gtable of ggplots

Examples


library(snplinkage)

# example rnaseq data frame, 20 variables of 20 patients
m_rna = matrix(runif(20 ^ 2), nrow = 20)

# pair-wise correlation matrix
m_ld = cor(m_rna) ^ 2

# keep only upper triangle and reshape to data frame
m_ld[lower.tri(m_ld, diag = TRUE)] = NA
df_ld = reshape2::melt(m_ld) |> na.omit()

# rename for SNPLinkage
names(df_ld) = c('SNP_A', 'SNP_B', 'R2')

# visualize with ggplot_ld
gg_ld = ggplot_ld(df_ld)
# let's imagine the 20 variables came from 3 physically close regions
positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5),
              runif(7, 45e5, 50e5)) |> sort()

# build the dataframe
df_snp_pos = data.frame(position = positions)
df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7))
gg_snp_pos = ggplot_snp_pos(df_snp_pos, labels_colname = 'label')

l_ggs = list(snp_pos = gg_snp_pos, ld = gg_ld)
gt_ld = gtable_ld_grobs(l_ggs, labels_colname = TRUE,
                        title = 'RNASeq correlations')
grid::grid.draw(gt_ld)

library(snplinkage)

# example rnaseq data frame, 20 variables of 20 patients
m_rna = matrix(runif(20 ^ 2), nrow = 20)

# pair-wise correlation matrix
m_ld = cor(m_rna) ^ 2

# keep only upper triangle and reshape to data frame
m_ld[lower.tri(m_ld, diag = TRUE)] = NA
df_ld = reshape2::melt(m_ld) |> na.omit()

# rename for SNPLinkage
names(df_ld) = c('SNP_A', 'SNP_B', 'R2')

# visualize with ggplot_ld
gg_ld = ggplot_ld(df_ld)
# let's imagine the 20 variables came from 3 physically close regions
positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5),
              runif(7, 45e5, 50e5)) |> sort()

# build the dataframe
df_snp_pos = data.frame(position = positions)
df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7))
gg_snp_pos = ggplot_snp_pos(df_snp_pos, labels_colname = 'label')

l_ggs = list(snp_pos = gg_snp_pos, ld = gg_ld)
gt_ld = gtable_ld_grobs(l_ggs, labels_colname = TRUE,
                        title = 'RNASeq correlations')
grid::grid.draw(gt_ld)

Is SNP first dimension (default)

Description

Is SNP first dimension (default)

Usage

## Default S3 method:
is_snp_first_dim(obj, ...)
## Default S3 method:
is_snp_first_dim(obj, ...)

Arguments

`obj`	Default object
`...`	Not passed

Value

Is SNP first dimension (GDS object)

Description

Is SNP first dimension (GDS object)

Usage

## S3 method for class 'gds.class'
is_snp_first_dim(obj, ...)
## S3 method for class 'gds.class'
is_snp_first_dim(obj, ...)

Arguments

`obj`	GDS object
`...`	Not passed

Value

Logical, TRUE if SNP is first dimension

Is SNP first dimension (GdsGenotypeReader object)

Description

Is SNP first dimension (GdsGenotypeReader object)

Usage

## S3 method for class 'GdsGenotypeReader'
is_snp_first_dim(obj, ...)
## S3 method for class 'GdsGenotypeReader'
is_snp_first_dim(obj, ...)

Arguments

`obj`	GdsGenotypeReader object
`...`	Not passed

Value

is_snp_first_dim output on S4 slot 'handler'

Is SNP first dimension (GenotypeData object)

Description

Is SNP first dimension (GenotypeData object)

Usage

## S3 method for class 'GenotypeData'
is_snp_first_dim(obj, ...)
## S3 method for class 'GenotypeData'
is_snp_first_dim(obj, ...)

Arguments

`obj`	Genotype data object
`...`	Not passed

Value

is_snp_first_dim output on S4 slot 'data'

Is SNP first dimension (MatrixGenotypeReader object)

Description

Is SNP first dimension (MatrixGenotypeReader object)

Usage

## S3 method for class 'MatrixGenotypeReader'
is_snp_first_dim(obj, ...)
## S3 method for class 'MatrixGenotypeReader'
is_snp_first_dim(obj, ...)

Arguments

`obj`	MatrixGenotypeReader object
`...`	Not passed

Value

TRUE

Is SNP first dimension (NcdfGenotypeReader object)

Description

Is SNP first dimension (NcdfGenotypeReader object)

Usage

## S3 method for class 'NcdfGenotypeReader'
is_snp_first_dim(obj, ...)
## S3 method for class 'NcdfGenotypeReader'
is_snp_first_dim(obj, ...)

Arguments

`obj`	NcdfGenotypeReader object
`...`	Not passed

Value

TRUE

Load GDS as Genotype Data

Description

Open a connection to a snpgds file (cf. SNPRelate package) as a Genotype Data object.

Usage

load_gds_as_genotype_data(
  gds_file,
  read_snp_annot = TRUE,
  read_scan_annot = TRUE
)
load_gds_as_genotype_data(
  gds_file,
  read_snp_annot = TRUE,
  read_scan_annot = TRUE
)

Arguments

`gds_file`	Path of snpgds file
`read_snp_annot`	Read the SNPs' annotations
`read_scan_annot`	Read the scans' annotations

Value

Genotype Data object

Examples

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)

Separate a matrix in a list of matrices of length the number of cores and apply a function on the columns in parallel

Description

Separate a matrix in a list of matrices of length the number of cores and apply a function on the columns in parallel

Usage

parallel_apply(m_data, apply_fun, n_cores = 1, ...)
parallel_apply(m_data, apply_fun, n_cores = 1, ...)

Arguments

`m_data`	Data matrix
`apply_fun`	Function to apply
`n_cores`	Number of cores
`...`	Passed to apply_fun

Value

apply_fun return

print_qc_as_tex_table

Description

Print information about quality control performed by the snprelate_qc function.

Usage

print_qc_as_tex_table(
  gdata_qc,
  label = "qc",
  caption = paste("Quality control and feature selection of the subset of the",
    "human genome diversity project dataset.")
)
print_qc_as_tex_table(
  gdata_qc,
  label = "qc",
  caption = paste("Quality control and feature selection of the subset of the",
    "human genome diversity project dataset.")
)

Arguments

`gdata_qc`	Genotype Data object object returned by snprelate_qc
`label`	Label of the Tex table
`caption`	Caption of the Tex table

Value

Prints knitr::kable object using cat

save_hgdp_as_gds

Description

Save the HGDP SNP data text file as a Genomic Data Structure file

Usage

save_hgdp_as_gds(paths = hgdp_filepaths(), outpath = tempfile(), ...)
save_hgdp_as_gds(paths = hgdp_filepaths(), outpath = tempfile(), ...)

Arguments

`paths`	Paths of the zip, txt, and gds files
`outpath`	Output GDS file path
`...`	Passed to save_genotype_data_as_gds

Value

Path of the saved gds file

select_region_idxs

Description

Select SNP indexes corresponding to a specific genomic region.

Usage

select_region_idxs(
  gdata,
  chromosome,
  position_min = -Inf,
  position_max = Inf,
  n_snps = 0,
  offset = 0
)
select_region_idxs(
  gdata,
  chromosome,
  position_min = -Inf,
  position_max = Inf,
  n_snps = 0,
  offset = 0
)

Arguments

`gdata`	Genotype Data object
`chromosome`	Chromosome to select
`position_min`	Minimum base pair position to select
`position_max`	Maximum base pair position to select
`n_snps`	Maximum number of SNPs to return
`offset`	Number of SNPs to offset

Value

SNP indexes of Genotype Data object

Compute allele frequencie and snp missing rate

Description

Wrapper over SNPRelate::snpgdsSNPRateFreq

Usage

snprelate_allele_frequencies(
  gdata,
  snps_idx = NULL,
  scans_idx = NULL,
  quiet = FALSE
)
snprelate_allele_frequencies(
  gdata,
  snps_idx = NULL,
  scans_idx = NULL,
  quiet = FALSE
)

Arguments

`gdata`	A GenotypeData object
`snps_idx`	Vector of snps indices
`scans_idx`	Vector of scans indices
`quiet`	Whether to be quiet

Value

A data frame of snps_idx, snps_ids, allele1, allele2, maf, missing where allele1 and allele2 are the rates of the alleles, and maf the minimum of the 2. Missing is the missing rate. N.B: the allele rates are computed on the non missing genotypes, i.e. their sum equals 1.

Wrapper for snpgdsLDMat to compute r2

Description

Wrapper for snpgdsLDMat to compute r2

Usage

snprelate_ld(
  gdata,
  window_size = 0,
  min_r2 = 0,
  snps_idx = NULL,
  scans_idx = NULL,
  threads = 1,
  quiet = FALSE
)
snprelate_ld(
  gdata,
  window_size = 0,
  min_r2 = 0,
  snps_idx = NULL,
  scans_idx = NULL,
  threads = 1,
  quiet = FALSE
)

Arguments

`gdata`	A GenotypeData object
`window_size`	Max number of SNPs in LD window, 0 for no window
`min_r2`	Minimum r2 value to report
`snps_idx`	Indices of snps to use
`scans_idx`	Indices of scans to use
`threads`	The number of threads to use
`quiet`	Whether to be quiet

Value

A data frame with columns SNP_A, SNP_B, R2 for r2 >= min_r2

Wrapper for snpgdsLDpruning to select Tag SNPs

Description

The tagged snp set is (by sliding window) representative and strongly not redundant.

Usage

snprelate_ld_select(
  gdata,
  window_length = 500L,
  min_r2,
  window_size = NA,
  snps_idx = NULL,
  scans_idx = NULL,
  remove.monosnp = FALSE,
  autosome.only = FALSE,
  method = "r",
  threads = 1,
  quiet = FALSE,
  ...
)
snprelate_ld_select(
  gdata,
  window_length = 500L,
  min_r2,
  window_size = NA,
  snps_idx = NULL,
  scans_idx = NULL,
  remove.monosnp = FALSE,
  autosome.only = FALSE,
  method = "r",
  threads = 1,
  quiet = FALSE,
  ...
)

Arguments

`gdata`	A GenotypeData object
`window_length`	Max length in kb of the window
`min_r2`	Minimum r2 value to report
`window_size`	Max number of SNPs in LD window
`snps_idx`	Indices of snps to use
`scans_idx`	Indices of scans to use
`remove.monosnp`	if TRUE, remove monomorphic SNPs
`autosome.only`	if `TRUE`, use autosomal SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome
`method`	"composite", "r", "dprime", "corr", see details
`threads`	The number of threads to use, currently ignored
`quiet`	Whether to be quiet
`...`	Forwarded to SNPRelate::snpgdsLDpruning

Value

A list of SNP IDs stratified by chromosomes.

snprelate_qc

Description

Quality control using SNPRelate functions.

Usage

snprelate_qc(
  gdata,
  samples_nas = 0.03,
  ibs = 0.99,
  keep_ids = NULL,
  snps_nas = 0.01,
  maf = 0.05,
  tagsnp = 0.8,
  n_cores = 1
)
snprelate_qc(
  gdata,
  samples_nas = 0.03,
  ibs = 0.99,
  keep_ids = NULL,
  snps_nas = 0.01,
  maf = 0.05,
  tagsnp = 0.8,
  n_cores = 1
)

Arguments

`gdata`	Genotype data object
`samples_nas`	NA threshold for samples, default 3 pct
`ibs`	Samples identity by state threshold, default 99 pct
`keep_ids`	Samples ids to keep even if IBS is higher than threshold. Used for monozygotic twins.
`snps_nas`	NA threshold for SNPs, default 1 pct
`maf`	Minor allele frequency threshold, default 5 pct
`tagsnp`	TagSNP r2 correlation threshold, default 0.8
`n_cores`	Number of cores

Value

List of gdata, Genotype data object, and df_qc, QC info data frame

Examples

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

library(snplinkage)
gds_path <- save_hgdp_as_gds()
gdata <- load_gds_as_genotype_data(gds_path)
qc <- snprelate_qc(gdata, tagsnp = .99)

`m_data`	Data matrix of observations by variables
`response`	Response vector of length the number of observations
`adjust_method`	Multiple testing p-value adjustment method. Passed to stats::p.adjust. 'fdr' by default.
`mlog10_transform`	Logical, transform p-values by minus log10. True by default.
`n_cores`	Number of cores
`...`	Passed to stats::chisq.test

`lhs`	An object which serves both as the initial value and as target.
`rhs`	a function call using the magrittr semantics.

Package 'snplinkage'

Help Index

Assignment pipe

Description

Arguments

Value

Pipe

Description

Arguments

Value

Exposition pipe

Description

Arguments

Value

Compute Chi-squared p-values

Description

Usage

Arguments

Value

Compute Chi-squared p-values on a Genotype data object

Description

Usage

Arguments

Value

Crohn's disease data

Description

Usage

Format

Source

Get diamond ggplot layer.

Description

Usage

Arguments

Value

Fetch allele 1 (default object)

Description

Usage

Arguments

Fetch allele 1 (GdsGenotypeReader object)

Description

Usage

Arguments

Value

Fetch allele 1 (GenotypeData object)

Description

Usage

Arguments

Value

Fetch allele 1 (GenotypeDataSubset object)

Description

Usage

Arguments

Value

Fetch allele 2 (default object)

Description

Usage

Arguments

Fetch allele 2 (GdsGenotypeReader object)

Description

Usage

Arguments

Value

Fetch allele 2 (GenotypeData object)

Description

Usage

Arguments

Value

Fetch allele 1 (GenotypeDataSubset object)

Description

Usage

Arguments

Value

Fetch GDS (default)

Description

Usage

Arguments

Fetch GDS (GdsGenotypeReader)

Description

Usage

Arguments