Download this dataset for the tutorial: https://uni-duesseldorf.sciebo.de/s/maeMvg3lQiYYegl

Installation commands for the packages are available at the end.

1 Overview:

Figure 1: Here you can see the flowchart of the workflow. Single-cell RNA sequencing data traverses multiple standard analysis steps (yellow), such as clustering and biomarker discovery, before applying tools for biological interpretation (green) and enhanced visualization of the data (red) with a variety of established packages but also with a new developed package (iCCL clustering) within the thesis. NGS dta was already converted into a SeuratObject beforehand.

2 Standard Analysis

The Standard Analysis part of the workflow is fully conducted with the tool Seurat (4.0.0) and orientates along their established vignette. [1]

# we need to load Seurat and ggplot2 packages (ggplot enables some plotting options like titles)
library(Seurat)
library(ggplot2)

2.1 Data preparation

Before we can start the analysis, we need to load the SeuratObject. In this thesis we will only look at a subset of the data, therefore we will firstly display cluster names and subset the desired complex accordingly

#loading scRNAseq data (Cd45+ cells)
Granulocytes <- readRDS("F:/yourdirectory/tofile,rds")

2.1.1 Identification of highly variable genes

For downstream analysis we firstly calculate a subset of 2000 genes showing the highest cell-to-cell variation, meaning their expression fluctuates from highly expressed in some, to lowly expressed in other cells.

#identification of highly variable features
Granulocytes <- FindVariableFeatures(Granulocytes, selection.method = "vst", nfeatures = 2000)
#identification of the 10 most highly variable genes
top10 <- head(VariableFeatures(Granulocytes), 10)
top10

#plotting highly variable features
#without labels
plot1 <- VariableFeaturePlot(Granulocytes)
#with labels of top10
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
#combined plot
plot1 + plot2

2.1.2 Scaling data

To prevent domination of highly expressed genes in downstream analysis we apply linear transformation on the genes

#scaling data (linear transformation)
all.genes <- rownames(Granulocytes)
Granulocytes <- ScaleData(Granulocytes, features = all.genes)

2.1.3 Linear dimensionality reduction

To reduce data size for downstream analysis we use the method of Principal Component Analysis (PCA) which in general allows to extract relevant information from within a dataset and reduce its complexity into a lower dimension of 50 vectors, the so called principal components (PC), through linearity.

#principal component analysis
Granulocytes <- RunPCA(Granulocytes, features = VariableFeatures(object = Granulocytes))
#visuaize PCA result as DimPlot
DimPlot(Granulocytes, reduction = "pca") + ggtitle("PCA")

2.2 2D Clustering

Figure 2: Overview of the clustering steps for iCCL. Currently these are the standard Seurat Clustering Steps. User input through iCCL(SeuratObject, min.dim = x, max.dim = y), with x and y as numerals, enables the user to run Seurat clustering for the given SeuratObject for the specified dimension. The loop will generate four UMAP 2D projections for each dimension within the specified range x to y, for four granularity values (resolution z), which then can be compared.

Clustering in Seurat makes use of prior PCA scores of cells wherefore we want to detect the best fitting number of principal components that represent a robust compression of the dataset. Seurat therefore provides identification methods, such as the elbow plot, which however are very complex, complicated for beginners and provide a subjective scope. To overcome these problems, I developed the interactive Comparative Clustering Loop package (iCCL). iCCL currently adapts two tools, both offering great improvements to workflows for beginners and advanced researchers.

2.3 Dimensionality prediction

The first tool, predictdimension() offers a quantitative approach on determining the dimensionality before clustering. The method behind it builds on a method established by the Harvard Chan Bioinformatics Core [2], which defines following mathematical condition as a reliable entry point for choosing the principal components prior to clustering: The algorithm determines the minimum of two metric values (i.e. dimension), firstly the PC only contributing to five percent of the standard deviation with a cumulative contribution of 90 percent, and secondly the PC where the percent change in variation between PCs is less than 0.1 percent. The output is an elbow plot with metric values as dimensions, with the determined value leading to a color change in the plot.

#predict a approximative range for which clustering would make most sense
library(iCCL1)
predictdimension(Granulocytes)

Algorithm determined dimension 7 as the minimal point covering majority of variation in data, thus we will run iCCL on a range starting at 7 to determine the best dimension which bests represents our clusters in consideration of the clusters in the old dataset.

#proceed to run iCCL clustering algorithm for  the range between 18-22 (20 +- 2)
iCCL(Granulocytes, 7, 10) #8

#after visual checking for best fit for cluster representation
#we choose 22 dimensions and resolution 0.5 to proceed

#it is also possible to cluster via Seurat standard commands
Granulocytes <- FindNeighbors(Granulocytes, dims = 1:7)
Granulocytes <- FindClusters(Granulocytes, resolution = 0.5)
Granulocytes <- RunUMAP(Granulocytes, dims = 1:7)
#We can use further plotting options like label size:
DimPlot(Granulocytes, label = 1, label.size = 8, repel = 1)
#or plot cells corresponding to treatment
#for this we reverse the treatment order to 1. Sham and 2. Mi
Granulocytes$sample <- factor(Granulocytes$sample, levels = c('Sham', 'Mi'))
DimPlot(Granulocytes, split.by = "sample", label = 1, repel = 1, label.size= 7) + NoLegend()
DimPlot(Granulocytes, group.by = "sample", label = 1, repel = 1, label.size= 5) + ggtitle("Treatment")
#we can change colors for better visualization
DimPlot(Granulocytes, group.by = "sample", cols = c("Mi" = "red", "Sham" = "blue")) + ggtitle("Experimental Condition")

#save our Granulocytes Object
saveRDS(Granulocytes, file = "Granulocytes2.rds")

2.4 Biomarker

For downstream analysis, determining characteristic biomarkers of each cluster is of special importance, allowing biological interpretation. The FindMarkers() function automates this process via differential expression and generates a list containing genes and their respective values sorted by descending expression (log2 fold change). This list can be saved as a .csv and used to research markers of your clusters.

#determine Biomarkers####
Granulocytes.markers <- FindAllMarkers(Granulocytes, min.pct = 0.25, logfc.threshold = 0.25)
#save as .csv
write.csv(Granulocytes.markers,"Granulocytes.markers_march.csv")
#get top 1 marker for each cluster
library(dplyr)
top1 <- Granulocytes.markers %>% group_by(cluster) %>% top_n(n = 1, wt = avg_log2FC) #if an error occurs, you may try removing the 2 from avlog2fc
#plot their expression
FeaturePlot(Granulocytes, features = top1$gene)


#It is also possible to heatmap our overall top5
top5 <- Granulocytes.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_log2FC)
DoHeatmap(Granulocytes, features = top5$gene) #+ NoLegend()

3 Biological Interpretation

##Cell Type Annotation To label cells of a datasets with unknown cell type composition (or as proof) we use SingleR [3] reference-based cell type classification. The algorithm behind it takes our dataset and annotates it against predefined databases.

First we need to convert the SeruatObject into a SingleCellExperiment File and access the Benayoun et al. [4] “MouseRNAse” database.

library(scater)
library(SingleR)
library(celldex)
library(scuttle)


##preproccesing steps
# convert Seurat Object into SingleCellExperiment file
Granulocytes.sce <- as.SingleCellExperiment(Granulocytes)
#load external Database you want to do the prediction from ONTO your Object (SummarizedExperiment class .se)
#here we use built-in celldex MouseRNAseq database by Benayoun et al.
mouseref.se <- MouseRNAseqData()
#Rownames represent genes, store them in "common" value and paste into both databases
common <- intersect(rownames(Granulocytes.sce), rownames(mouseref.se))
mouseref.se <- mouseref.se[common, ]
Granulocytes.sce <- Granulocytes.sce[common, ]
#normalize and log-transform dataset
Granulocytes.sce <- logNormCounts(Granulocytes.sce)

Now we can start the annotation:

##Annotation
##create a prediction data frame with normal specificity (= label.main)
pred.Granulocytes.main <- SingleR(test = Granulocytes.sce, ref = mouseref.se, labels = mouseref.se$label.main)
#look at the results as a table
table(pred.Granulocytes.main$labels)
#or as a heatmap
plotScoreHeatmap(pred.Granulocytes.main)
##Integration of labels of data frame onto your Database
Granulocytes$SingleR = pred.Granulocytes.main$labels
#visualize with new labels
DimPlot(Granulocytes, group.by = "SingleR", label = TRUE, repel = 1, label.size = 7 ) + ggtitle("SingleR cell type prediction (normal specificity)")


#We can also visualize our SingleR results splitted by treatment
DimPlot(Granulocytes, group.by = "SingleR", split.by = "sample", label = TRUE, repel = 1) +ggtitle("Cell Type by treatment")

We can also access he high specificity frame of the database:

##create a prediction data frame with high specificity (=label.fine)
pred.Granulocytes.fine <- SingleR(test = Granulocytes.sce, ref = mouseref.se, labels = mouseref.se$label.fine)
table(pred.Granulocytes.fine$labels)
#Integration of high spec. dataframe
Granulocytes$SingleR_fine = pred.Granulocytes.fine$labels

## visualizw with fine labels
DimPlot(Granulocytes, group.by = "SingleR_fine", label = TRUE, repel = TRUE, label.size = 6) + ggtitle("SingleR (high specificity)")
#We can also visualize our SingleR results splitted by treatment
DimPlot(Granulocytes, group.by = "SingleR", split.by = "sample", label = TRUE, repel = 1) +ggtitle("Cell Type by treatment")
#save our Granulocytes Object
saveRDS(Granulocytes, file = "Granulocytes_singleR.rds")

3.1 Pathway Responsive Genes

PROGENy [5] [6] is a linear model for inferring the activity of our expressed biomarkers for selected, predefined pathways against each respective cluster. In contrast to classic pathway mapping methods, PROGENy specializes in overcoming the limitations of disregarding post-translational modification effects by training a regression model on a large pool of perturbation experiments.

Preparation steps:

library(progeny)
library(dplyr)
library(tidyr)
library(readr)
library(pheatmap)
library(tibble)

## Compute Progeny activity scores and add them to our Seurat object as a new assay called Progeny. 
Granulocytes <- progeny(Granulocytes, scale=FALSE, organism="Mouse", top=1000, perm=1, 
                return_assay = TRUE)
# scale the pathway activity scores. 
Granulocytes <- Seurat::ScaleData(Granulocytes, assay = "progeny") 
# transform Progeny scores into a data frame
progeny_scores_df <- 
  as.data.frame(t(GetAssayData(Granulocytes, slot = "scale.data", 
                               assay = "progeny"))) %>%
  rownames_to_column("Cell") %>%
  gather(Pathway, Activity, -Cell) 

# create a data frame with the specification of the cells that belong to each cluster to match with the Progeny scores.
CellsClusters <- data.frame(Cell = names(Idents(Granulocytes)), 
                            CellType = as.character(Idents(Granulocytes)),
                            stringsAsFactors = FALSE)
# match Progeny scores with the cell clusters.
progeny_scores_df <- inner_join(progeny_scores_df, CellsClusters)
# summarize the Progeny scores by cell population
summarized_progeny_scores <- progeny_scores_df %>% 
  group_by(Pathway, CellType) %>%
  summarise(avg = mean(Activity), std = sd(Activity))
##Plotting pathway activities for the different cell populations
#prepare the data and set colorcoding
summarized_progeny_scores_df <- summarized_progeny_scores %>%
  dplyr::select(-std) %>%   
  spread(Pathway, avg) %>%
  data.frame(row.names = 1, check.names = FALSE, stringsAsFactors = FALSE) 
paletteLength = 100
myColor = colorRampPalette(c("Darkblue", "white","red"))(paletteLength)
progenyBreaks = c(seq(min(summarized_progeny_scores_df), 0, 
                      length.out=ceiling(paletteLength/2) + 1),
                  seq(max(summarized_progeny_scores_df)/paletteLength, 
                      max(summarized_progeny_scores_df), 
                      length.out=floor(paletteLength/2)))
##Plot the heatmap
progeny_hmap = pheatmap(t(summarized_progeny_scores_df[,-1]),fontsize=14, 
                        fontsize_row = 10, 
                        color=myColor, breaks = progenyBreaks, 
                        main = "PROGENy (1000)", angle_col = 45,
                        treeheight_col = 0,  border_color = NA)
#save scores
write.csv(summarized_progeny_scores_df,"progeny_scores_grouped.csv")
saveRDS(Granulocytes, file = "Granulocytes_progeny.rds")

3.2 Gene enrichment / Ontology analysis

An approach to understand properties and characteristics of the priorly identified gene list we can apply Gene Set Enrichment Analysis (GSEA) algorithms on latter, checking for significant over-represented matches inside predefined databases, such as Gene Ontology (GO) which is the largest database for information on annotated genes.

Here we use ClusterProfileR [7] for GO analysis.

library(clusterProfiler)


#GO Analysis
#for this we use our markers
##if you need to load them again use:
Granulocytes.markers <- read.csv("Granulocytes.markers_march.csv")
#possible domains to check for: BP MF CC ALL
#in our case we want to understand biolical processes our markers are involved in -> BP

plot_cluster_go(Granulocytes.markers, cluster_name = '0', org = "mouse", ont = "BP") + ggtitle("GO - Cluster 0: Biologial Process")

#it's also possible to plot all in a loop
list <- c(0:3)
wd <- getwd()
#name a directory you want to plot into
dir.create("GO")
#run loop
for(k in list){
  go <- plot_cluster_go(Granulocytes.markers, cluster_name = k, org = "mouse", ont = "BP") + ggtitle("GO Biological Process - Cluster:", k)
  print(k)
  mypath <- file.path(wd, "GO", paste(k, ".png", sep = ""))
  png(file=mypath, width = 1500, height = 480, res = 120)
  print(go)
  dev.off()
}

3.3 Ligand-Receptor Interaction

In the traditional way, using bulk-analysis methods, it was unmanageable to generate insight into paracrine signalling pathways between different cells and to interpret them among varying tissues. Bioinformatic solutions to study intercellular communication, for instance NicheNet [8][8], established a weighted network out of existing databases for ligand–receptor, signal transduction and gene regulatory interactions. They compute ligand-receptor interaction predictions for gene expression data of interacting cells of our dataset (i.e. sender, receiver).

Click to expand code

First we need to load the databases and convert them to Mouse origin: (NicheNet heavily relies on tidyverse pipelines (%>%))

devtools::install_github("saeyslab/nichenetr")
BiocManager::install("limma")
library(nichenetr)
library(tidyverse)
library(dplyr)

#load ligand Matrix
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
#load Ligand-Receptor-Network
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
#load weightened networks
weighted_networks = readRDS(url("https://zenodo.org/record/3260758/files/weighted_networks.rds"))
weighted_networks_lr = weighted_networks$lr_sig %>% inner_join(lr_network %>% distinct(from,to), by = c("from","to"))

#Because the expression data is of mouse origin,
#we will convert the NicheNet network gene symbols from human to mouse based on one-to-one orthology:
lr_network = lr_network %>% mutate(from = convert_human_to_mouse_symbols(from), to = convert_human_to_mouse_symbols(to)) %>% drop_na()
colnames(ligand_target_matrix) = ligand_target_matrix %>% colnames() %>% convert_human_to_mouse_symbols()
rownames(ligand_target_matrix) = ligand_target_matrix %>% rownames() %>% convert_human_to_mouse_symbols()
ligand_target_matrix = ligand_target_matrix %>% .[!is.na(rownames(ligand_target_matrix)), !is.na(colnames(ligand_target_matrix))]
weighted_networks_lr = weighted_networks_lr %>% mutate(from = convert_human_to_mouse_symbols(from), to = convert_human_to_mouse_symbols(to)) %>% drop_na()

Now we can run the pipeline, which indeed is very long. We will further plot into a .pdf. To make adjustment easier for different data, this pipeline uses the object “seurat” which you can just assign your current SeuratObject to.

#Define a "sender/niche" cell population and a "receiver/target" cell population present
#in your expression data and determine which genes are expressed in both populations
#consider a gene to be expressed when it is expressed in at least 10% of cells in one cluster.

#so you dont have to change all according variables in the loop:
seurat <- Granulocytes

#make a list with the cluster numbers, for which the loop(i) will run
#Idents(seurat) <- "seurat_clusters"
list <- levels(Idents(seurat))
# or if you want to only run for a certain cluster (e.g. 1) use following:
#list <- c(1:1)

#start the pdf device with a file name
pdf("NichNet.pdf")

##start loop
for (i in list){
  #receiver
  receiver = i
  expressed_genes_receiver = get_expressed_genes(receiver, seurat, pct = 0.25)
  background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
  #sender
  sender_celltypes = list
  list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seurat, 0.25)
  #lapply to get the expressed genes of every sender cell type separately here
  expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()
 
  #Define a gene set of interest: these are the genes in the "receiver/target" cell
  #population that are potentially affected by ligands expressed by interacting cells
  #(e.g. genes differentially expressed upon cell-cell interaction)
  seurat_obj_receiver= subset(seurat, idents = receiver)
  seurat_obj_receiver = SetIdent(seurat_obj_receiver, value = seurat_obj_receiver[["sample"]])
  DE_table_receiver = FindMarkers(object = seurat, ident.1 = i, min.pct = 0.25) %>% rownames_to_column("gene")
  geneset_oi = DE_table_receiver %>% filter(p_val_adj <= 0.05 & abs(avg_log2FC) >= 0.25) %>% pull(gene)
  geneset_oi = geneset_oi %>% .[. %in% rownames(ligand_target_matrix)]
  
  #Define a set of potential ligands: these are ligands that are expressed by the "sender/niche" cell population and bind a (putative) receptor expressed by the "receiver/target" population
  #Because we combined the expressed genes of each sender cell type, in this example, we will perform one NicheNet analysis by pooling all ligands from all cell types together. Later on during the interpretation of the output, we will check which sender cell type expresses which ligand.
  ligands = lr_network %>% pull(from) %>% unique()
  receptors = lr_network %>% pull(to) %>% unique()
  expressed_ligands = intersect(ligands,expressed_genes_sender)
  expressed_receptors = intersect(receptors,expressed_genes_receiver)
  potential_ligands = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) %>% pull(from) %>% unique()
  
  ##Perform NicheNet ligand activity analysis: rank the potential ligands based on the presence of their target genes in the gene set of interest (compared to the background set of genes)
  ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
  ligand_activities = ligand_activities %>% arrange(-pearson) %>% mutate(rank = rank(desc(pearson)))
  ligand_activities
  best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand) %>% unique()
  
  #Dotplot of Ligands
  a <- DotPlot(seurat, features = best_upstream_ligands %>% rev(), cols = "RdYlBu") + RotatedAxis() + FontSize(x.text = 10, y.text = 7)
  #Dotplot split by Sample
  a2 <- DotPlot(seurat, features = best_upstream_ligands %>% rev(), cols = c("Green3", "Red3"), split.by = "sample", dot.scale = 5, scale.min = 10) + RotatedAxis() + FontSize(x.text = 10, y.text = 7)
  #Infer receptors and top-predicted target genes of ligands that are top-ranked in the ligand activity analysis
  
  #Active target gene inference
  active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 200) %>% bind_rows() %>% drop_na()
  active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.33)
  order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev() %>% make.names()
  order_targets = active_ligand_target_links_df$target %>% unique() %>% intersect(rownames(active_ligand_target_links)) %>% make.names()
  rownames(active_ligand_target_links) = rownames(active_ligand_target_links) %>% make.names() #make.names() for heatmap visualization of genes like H2-T23
  colnames(active_ligand_target_links) = colnames(active_ligand_target_links) %>% make.names() #make.names() for heatmap visualization of genes like H2-T23
  vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()
  p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized ligands","Predicted target genes", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential")  + theme(axis.text.x = element_text(face = "italic")) + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.006,0.012))
  
  #plot Ligand and regulatory functions
  b <- p_ligand_target_network
  
  ##Receptors of top-ranked ligands
  lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
  best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()
  lr_network_top_df_large = weighted_networks_lr %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)
  lr_network_top_df = lr_network_top_df_large %>% spread("from","weight",fill = 0)
  lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)
  dist_receptors = dist(lr_network_top_matrix, method = "binary")
  hclust_receptors = hclust(dist_receptors, method = "ward.D2")
  order_receptors = hclust_receptors$labels[hclust_receptors$order]
  dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
  hclust_ligands = hclust(dist_ligands, method = "ward.D2")
  order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]
  order_receptors = order_receptors %>% intersect(rownames(lr_network_top_matrix))
  order_ligands_receptor = order_ligands_receptor %>% intersect(colnames(lr_network_top_matrix))
  vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
  rownames(vis_ligand_receptor_network) = order_receptors %>% make.names()
  colnames(vis_ligand_receptor_network) = order_ligands_receptor %>% make.names()
  p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Ligands","Receptors", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
  
  #plot ligand receptor netwok
  c <- p_ligand_receptor_network
  
  #Receptors of top-ranked ligands, but after considering only bona fide ligand-receptor interactions documented in literature and publicly available databases
  lr_network_strict = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")
  ligands_bona_fide = lr_network_strict %>% pull(from) %>% unique()
  receptors_bona_fide = lr_network_strict %>% pull(to) %>% unique()
  lr_network_top_df_large_strict = lr_network_top_df_large %>% distinct(from,to) %>% inner_join(lr_network_strict, by = c("from","to")) %>% distinct(from,to)
  lr_network_top_df_large_strict = lr_network_top_df_large_strict %>% inner_join(lr_network_top_df_large, by = c("from","to"))
  lr_network_top_df_strict = lr_network_top_df_large_strict %>% spread("from","weight",fill = 0)
  lr_network_top_matrix_strict = lr_network_top_df_strict %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df_strict$to)
  dist_receptors = dist(lr_network_top_matrix_strict, method = "binary")
  hclust_receptors = hclust(dist_receptors, method = "ward.D2")
  order_receptors = hclust_receptors$labels[hclust_receptors$order]
  dist_ligands = dist(lr_network_top_matrix_strict %>% t(), method = "binary")
  hclust_ligands = hclust(dist_ligands, method = "ward.D2")
  order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]
  order_receptors = order_receptors %>% intersect(rownames(lr_network_top_matrix_strict))
  order_ligands_receptor = order_ligands_receptor %>% intersect(colnames(lr_network_top_matrix_strict))
  vis_ligand_receptor_network_strict = lr_network_top_matrix_strict[order_receptors, order_ligands_receptor]
  rownames(vis_ligand_receptor_network_strict) = order_receptors %>% make.names()
  colnames(vis_ligand_receptor_network_strict) = order_ligands_receptor %>% make.names()
  
  #plot ligand receptor network - strict
  p_ligand_receptor_network_strict = vis_ligand_receptor_network_strict %>% t() %>% make_heatmap_ggplot("Ligands","Receptors", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential\n(bona fide)")
  d <- p_ligand_receptor_network_strict
  
  ##Summary visualizations of the NicheNet analysis
  #combined heatmap: overlay ligand activities with target genes
  ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
  rownames(ligand_pearson_matrix) = rownames(ligand_pearson_matrix) %>% make.names()
  colnames(ligand_pearson_matrix) = colnames(ligand_pearson_matrix) %>% make.names()
  vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
  p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)") + theme(legend.text = element_text(size = 9))
  figures_without_legend = cowplot::plot_grid(p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
                                              p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""),
                                              align = "hv",
                                              nrow = 1,
                                              rel_widths = c(ncol(vis_ligand_pearson)+10, ncol(vis_ligand_target)))
  legends = cowplot::plot_grid(
    ggpubr::as_ggplot(ggpubr::get_legend(p_ligand_pearson)),
    ggpubr::as_ggplot(ggpubr::get_legend(p_ligand_target_network)),
    nrow = 1,
    align = "h")
  combined_plot = cowplot::plot_grid(figures_without_legend, legends, rel_heights = c(10,2), nrow = 2, align = "hv", labels = i)
  
  ##final combined plot
  e <- combined_plot
  f <- DoHeatmap(subset(seurat, downsample = 300), features = best_upstream_receptors, size = 3)
  g <- DoHeatmap(subset(seurat, downsample = 300), features = best_upstream_ligands, size = 3)
  print(a + labs(title = i))
  print(a2 + labs(title = i))
  print(b + labs(title = i))
  print(c + labs(title = i))
  print(d + labs(title = i))
  print(e + labs(title = i))
  print(f)
  print(g)
}

dev.off()

#another combined plot to print inside RStudio
combined_plot2 = cowplot::plot_grid(figures_without_legend, legends, rel_heights = c(10,5), nrow = 2, align = "hv")
combined_plot2

This is a plot I selected for the thesis (cluster1) and displays the bona fide interaction potential

4 Enhanced Visualization

4.1 3D Clusters

We can also visualize our UMAP results in 3D for a better spatial clarity of distribution and correlations between our clusters. For this we compute and extract a third dimension which we assign to the z-axis. Fetching data from previous calculations (meta-data), such as SingleR and also experimental conditions allows us to color clusters accordingly. The output is a fully interactive, rotable HTML file with on-hover information about cell identity, that can be used locally but also allows implementation online. This method was established by [9] and uses plotly [10] and our SeuratObject.

library(plotly)
Granulocytes <- RunUMAP(Granulocytes, dims = 1:22, n.components = 3L)
umap_1 <- Granulocytes[["umap"]]@cell.embeddings[,1]
umap_2 <- Granulocytes[["umap"]]@cell.embeddings[,2]
umap_3 <- Granulocytes[["umap"]]@cell.embeddings[,3]
Embeddings(object = Granulocytes, reduction = "umap")
plot.data <- FetchData(object = Granulocytes, vars = c("UMAP_1", "UMAP_2", "UMAP_3", "seurat_clusters", "SingleR", "sample"))
plot.data$label <- paste(rownames(plot.data))

In the plotting command you can adjust the metadata to color the cells for. (Also on-hover info and the assigned colors). In this case we will plot the CellTypes that we determined from the meta analysis.

#the plotting command:
#we can choose the assay/meta.data we want to color by, and further assign colors to clusters
plot_ly(data = plot.data, 
        x = ~UMAP_3, y = ~UMAP_1, z = ~UMAP_2, 
        color = ~seurat_clusters, 
        colors = c( "salmon",      
                    "olivedrab",   
                    "green",       
                    "cyan"
        ),
        type = "scatter3d", 
        mode = "markers", 
        marker = list(size = 1.5, width=0.5), # controls size of points 1. 0.5
        text=~seurat_clusters, #This is that extra column we made earlier for which we will use for cell ID
        hoverinfo="text")#When you visualize your plotly object, hovering your mouse pointer over a point shows cell names

4.2 Enhanced Plots

Scillus [11] offers some further plots about relations of our cells and clusters.

library(ggnewscale)
library(Scillus)

##Enhanced Plots for more Info about clusters & cell distribution
#Number of Cells per treatment
plot_stat(Granulocytes, plot_type = "group_count") + ggtitle("Number of cells by treatment")
#Number of Cells per cluster without grid
plot_stat(Granulocytes, plot_type = "cluster_count" ) + ggtitle(" Number of cells by cluster") +NoGrid()
#Proportion of treatment for clusters
#plot_stat(Granulocytes, plot_type = "prop_fill") + ggtitle(" Proportion of clusters in treatment [Scillus]")
#See which clusters contribute to which treatment
plot_stat(Granulocytes, plot_type = "prop_multi") + ggtitle("Contribution of clusters total cell count of condition") + NoGrid() #% of all cells in treatment

4.3 Making Data interactive

This section also contains enrichR [12] GSEA analysis, as we will include it in the interactive cerebro [13] file.

GSEA:

#explorable Data (Cerebro)####
#this also includes enrichr

##enrichR
library(EnrichR)
library(cerebroApp)
library(msigdbr)

##Get most expressed genes
Granulocytes <- getMarkerGenes(
  Granulocytes,
  assay = 'RNA',
  organism = 'mm',
  groups = c('seurat_clusters', 'SingleR'),
  name = 'cerebro_seurat', #will be used for pathwayenrichment (see next step)
  only_pos = TRUE
)

#Perform pathway enrichment on marker genes
Granulocytes <- getEnrichedPathways(
  Granulocytes,
  marker_genes_input = 'cerebro_seurat',
  #adj_p_cutoff = 0.01,
  max_terms = 100
)

##GenesetEnrichment with downloaded gmt file from #http://www.gsea-msigdb.org/gsea/msigdb
#musmusculus, all of C2 CGP

Granulocytes <- performGeneSetEnrichmentAnalysis(
  Granulocytes,
  assay = 'RNA',
  GMT_file = "genesets_C2CGPcomplete_musmusculus.gmt" ,
  groups = c('seurat_clusters','SingleR')
)

saveRDS(Granulocytes, "Granulocytes_cerebro_ende.rds")

Make a Cerebro .crb file and launch

##preparefor cerebro
##cerebro needs the metadata slots to be called by_sample, by_cluster
Granulocytes$by_sample <- Granulocytes$sample
Granulocytes$by_cluster <- Granulocytes$seurat_clusters


#### cerebro file
exportFromSeurat(
  object = Granulocytes,
  file = 'cerebro_granulocytes.crb', 
  experiment_name = 'Granulocytes Tutorial',
  organism = 'mm',
  groups = c("by_sample", "by_cluster", "SingleR"),
  nUMI = 'nCount_RNA',
  nGene = 'nFeature_RNA',
  add_all_meta_data = TRUE,
  use_delayed_array = FALSE,
  verbose = TRUE
)

##launch
launchCerebro()
#or use http://cardinet.hhu.de

5 Packages

Here are commands to download the main packages that were used:

5.1 iCCL

if (!require(devtools)) { install.packages(“devtools”) } devtools::install_github(“ari7cr/iCCL1”)

5.2 Seurat

install.packages(‘Seurat’)

5.3 SingleR

devtools::install_github(‘dviraran/SingleR’)

5.4 PROGENy

if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“progeny”)

5.5 NicheNet

devtools::install_github(“saeyslab/nichenetr”)

5.6 ClustrProfileR

BiocManager::install(“clusterProfiler”)

5.7 Scillus

devtools::install_github(“xmc811/Scillus”, ref = “development”)

6 References

1: Hao and Hao et al. Integrated analysis of multimodal single-cell data. bioRxiv (2020) [Seurat V4] https://satijalab.org/seurat/

2: Khetani, R. (n.d.). Harvard Chan Bioinformatics Core (HBC). Retrieved March 10, 2021, from https://bioinformatics.sph.harvard.edu/ , https://hbctraining.github.io/scRNA-seq/lessons/elbow_plot_metric.html

3: Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A, Chak S, Naikawadi RP, Wolters PJ, Abate AR, Butte AJ, Bhattacharya M (2019). “Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.” Nat. Immunol., 20, 163-172. doi: 10.1038/s41590-018-0276-y (URL: https://doi.org/10.1038/s41590-018-0276-y).

4: Benayoun, Bérénice A., Elizabeth A. Pollina, Param Priya Singh, Salah Mahmoudi, Itamar Harel, Kerriann M. Casey, Ben W. Dulken, Anshul Kundaje, and Anne Brunet. “Remodeling of Epigenome and Transcriptome Landscapes with Aging in Mice Reveals Widespread Induction of Inflammatory Responses.” Genome Research 29.4 (2019): 697-709. Print.

5: Schubert, Michael, et al. “Perturbation-response genes reveal signaling footprints in cancer gene expression.” Nature communications 9.1 (2018): 1-11.

6: Holland, Christian H., Bence Szalai, and Julio Saez-Rodriguez. “Transfer of regulatory knowledge from human to mouse for functional genomics analysis.” Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms 1863.6 (2020): 194431.

7: Yu, Guangchuang, et al. “clusterProfiler: an R package for comparing biological themes among gene clusters.” Omics: a journal of integrative biology 16.5 (2012): 284-287

8: Browaeys, Robin, Wouter Saelens, and Yvan Saeys. “NicheNet: modeling intercellular communication by linking ligands to target genes.” Nature methods 17.2 (2020): 159-162., https://github.com/saeyslab/nichenetr for the vignettes that were used

9: Fahd Qadir, et al. 3D Plotting of scRNAseq data using Seurat objects. 1.3, Zenodo, 11 Oct. 2019. https://github.com/Dragonmasterx87/Interactive-3D-Plotting-in-Seurat-3.0.0/

10: C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.

11: Mingchu Xu and Zhongqi Ge (2020). Scillus: Seurat wrapper package enhancing the processing and visualization of single cell data. R package version 0.4.0.

12: Kai Guo (2021). EnrichR: Functional Enrichment analysis. R package version 0.3.0.

13: Hillje, R., Pelicci, P.G. & Luzi, L. Cerebro: Interactive visualization of scRNA-seq data. Bioinformatics (2019).

Granulocyte tutorial

Arian Abbasi

18 3 2021