Introduction

This R notebook enables the reproduction of the analysis described in:

Tzani et al. Tracing production instability in a clonally-derived CHO cell line using single cell transcriptomics (2020) bioRxiv

In this study, the four biological replicates of a clonally derived CHO cell line were sampled at 72hrs post-seeding and analysed on the BD Rhapsody system whole transcriptome analysis. The FASTQ data were converted to a cell gene count matrix using the Seven Bridges Genomics BD Rhapsody WTA pipeline and combined into a single sample.

Raw Data Availibility

The FASTQ files for this study are available from the NCBI Sequence Read Archive ID: PRJNA661407

The associated github repository can be found here

Prepare for Analysis

Load packages

package_list <- c("monocle", "plyr", "dplyr", "ggExtra", "reshape", 
    "colorRamps", "stringr", "BioParallel", "ggpubr", "DT", "WebGestaltR", 
    "ggalt", "biomaRt", "patchwork", "viridis", "gridExtra", 
    "grid", "gganimate", "hrbthemes", "jcolors", "cowplot", "data.table", 
    "GGally", "ggplotify", "writexl", "Seurat")

lapply(package_list, require, character.only = TRUE)

Set seed

Here we set a seed to facilitate the reproducibility of the analysis

# set a seed for reproducibility
set.seed(38)

Plotting setup

# ggplot theme
theme_ms <- function(base_size = 10, base_family = "Helvetica") {
    library(grid)
    (theme_bw(base_size = base_size, base_family = base_family) + 
        theme(text = element_text(color = "black"), axis.title = element_text(face = "bold", 
            size = rel(1)), plot.title = element_text(face = "bold", 
            size = rel(1)), axis.text = element_text(size = rel(1), 
            color = "black"), legend.title = element_text(face = "bold"), 
            legend.text = element_text(face = "bold"), legend.background = element_rect(fill = "transparent"), 
            legend.key.size = unit(0.8, "lines"), panel.border = element_rect(color = "black", 
                size = 1), panel.grid = element_blank()))
}

# pallette for groups
colour.set <- c("#29BF12", "#009ACD", "#B1740F", "#DB2763", "#9A32CD", 
    "#628395", "#17377A", "#E5C616", "#FF7844", "#E48F1B")

# color for expression values
color.gradient <- scale_color_gradientn(colors = matlab.like(200), 
    trans = "log10")

# create a directory for saving figures
figsdir <- "./figures/"
if (!dir.exists(figsdir)) {
    dir.create(figsdir)
}

# function to save figures
saveManuscriptPlot <- function(p, width, height) {
    figfile <- file.path(figsdir, sprintf("%s.png", gsub("\\.", 
        "_", deparse(substitute(p)))))
    ggsave(figfile, plot = p, width = width, height = height, 
        units = "in", device = "png")
}

Calculate relative expression

Create a function to determine the relative expression of a gene from a Monocle cell dataset. When the expression of a gene is than the detection limit, the gene is equal to the detection limit.

monocle_relative_expression <- function(cds, gene) {
    normalized_expression <- t(exprs(cds[gene, ]))/sizeFactors(cds)
    normalized_expression[normalized_expression < cds@lowerDetectionLimit] <- cds@lowerDetectionLimit
    return(normalized_expression)
}

Import raw data

scRNASeq

We import data from the Seven Bridges Genomics (SBG) platform for analysis in R. The UMI data for each replicate is stored in a separate CSV file with each row representing an individual cell, the column represents a gene. In this experiment, 4 biological replicates for the CHOK1 DP12 cell line were analysed using Rhapsody WTA. We create a batch ID than be used to regress any technical variation due to batch. We determine the genes detected in at least one replicate. If a gene is not detected in a replicate we assign a 0 UMI value for that replicate. The replicates are then combined in a single matrix.

SBG file import

raw_umi_r1 <- read.table("scrnaseq_data/CHOK1DP12_r1.csv", sep = ",", 
    header = T, row.names = 1, check.names = FALSE)
raw_umi_r1_batch <- rep("rep1", dim(raw_umi_r1)[1])

raw_umi_r2 <- read.table("scrnaseq_data/CHOK1DP12_r2.csv", sep = ",", 
    header = T, row.names = 1, check.names = FALSE)
raw_umi_r2_batch <- rep("rep2", dim(raw_umi_r2)[1])

raw_umi_r3 <- read.table("scrnaseq_data/CHOK1DP12_r3.csv", sep = ",", 
    header = T, row.names = 1, check.names = FALSE)
raw_umi_r3_batch <- rep("rep3", dim(raw_umi_r3)[1])

raw_umi_r4 <- read.table("scrnaseq_data/CHOK1DP12_r4.csv", sep = ",", 
    header = T, row.names = 1, check.names = FALSE)
raw_umi_r4_batch <- rep("rep4", dim(raw_umi_r4)[1])

# determine the genes detected in at least 1 replicate
detected_genes <- unique(c(colnames(raw_umi_r1), colnames(raw_umi_r2), 
    colnames(raw_umi_r3), colnames(raw_umi_r4)))
unique_genes <- unique(detected_genes)

# add a 0 column if gene is not detected in that replicate
raw_umi_r1[, setdiff(detected_genes, colnames(raw_umi_r1))] <- 0

raw_umi_r2[, setdiff(detected_genes, colnames(raw_umi_r2))] <- 0

raw_umi_r3[, setdiff(detected_genes, colnames(raw_umi_r3))] <- 0

raw_umi_r4[, setdiff(detected_genes, colnames(raw_umi_r4))] <- 0

raw_umi <- rbind(raw_umi_r1, raw_umi_r2, raw_umi_r3, raw_umi_r4)

# add text to cell label for Monocle compatability later
rownames(raw_umi) <- str_c("cell_", rownames(raw_umi))  # add text to ensure comparability with monocle plotting later

raw_umi_batch <- c(raw_umi_r1_batch, raw_umi_r2_batch, raw_umi_r3_batch, 
    raw_umi_r4_batch)  # record which Rhapsody cartridge the data came from

Create Monocle object

The primary analysis package for this analysis Monocle v2 . To use Monocle we must first create a cell dataset (CDS) object containing the UMI data. During this process we assign a detection limit along with the expressionFamily - in this case negbinomial().

pheno.data <- rep("cho", dim(raw_umi)[1])
pheno.data.df <- data.frame(type = pheno.data, batch = raw_umi_batch)
rownames(pheno.data.df) <- rownames(raw_umi)
pd <- new("AnnotatedDataFrame", data = pheno.data.df)

f.data <- colnames(raw_umi)
f.data.df <- data.frame(type = f.data)  # Must be data frame object
rownames(f.data.df) <- colnames(raw_umi)
fd <- new("AnnotatedDataFrame", data = f.data.df)

cells <- new("CellDataSet", exprs = t(raw_umi), phenoData = pd, 
    featureData = fd, lowerDetectionLimit = 0.5, expressionFamily = negbinomial())

paste("Raw data for", dim(cells)[2], "and", dim(cells)[1], " genes captured")

[1] "Raw data for 4673 and 20594  genes captured"

bulk RNASeq

To compare a matched bulkRNASeq dataset to our single cell data were calculated a gene-level transcripts per million (TPM) value using Kallisto.

# import the TPM results from Kalisto for the four matched bulk samples
bulk_tpm <- read.table("bulkrnaseq_data/gene_mab_tpms_all_samples.tsv", 
    header = T, row.names = 1)

# separate the bulk samples from the Kalisto TPM output and convert to log(tpm+1)
bulk_tpm_r1 <- log((bulk_tpm[, 2]) + 1)
names(bulk_tpm_r1) <- rownames(bulk_tpm)

bulk_tpm_r2 <- log((bulk_tpm[, 3]) + 1)
names(bulk_tpm_r2) <- rownames(bulk_tpm)

bulk_tpm_r3 <- log((bulk_tpm[, 4]) + 1)
names(bulk_tpm_r3) <- rownames(bulk_tpm)

bulk_tpm_r4 <- log((bulk_tpm[, 5]) + 1)
names(bulk_tpm_r4) <- rownames(bulk_tpm)

# determine the correlation between each of the scRNASeq samples
bulk_tpm <- data.frame(cbind(bulk_tpm_r1, bulk_tpm_r2, bulk_tpm_r3, 
    bulk_tpm_r4))

# rename the columns for each replicate
colnames(bulk_tpm) <- c("Replicate 1", "Replicate 2", "Replicate 3", 
    "Replicate 4")

Replicate comparison

scRNASeq

To compare to the bulk RNASeq data we merge the scRNASeq to pseudoBulk profiles, 1 for each replicate and convert expression to TPM. We then determine the agreement between the replicates

# separate the scRNA by Rhapsody cartridge from the monocle CDS sum the UMI counts and convert to log(TPM+1)

pseudocounts_r1 <- rowSums(exprs(cells)[, pData(cells)$batch == 
    "rep1"])
pseudocounts_r2 <- rowSums(exprs(cells)[, pData(cells)$batch == 
    "rep2"])
pseudocounts_r3 <- rowSums(exprs(cells)[, pData(cells)$batch == 
    "rep3"])
pseudocounts_r4 <- rowSums(exprs(cells)[, pData(cells)$batch == 
    "rep4"])

# create logged pseduobulk TPM
norm_pseudocounts_r1 <- log(((pseudocounts_r1)/sum(pseudocounts_r1/1e+06)) + 
    1)
norm_pseudocounts_r2 <- log(((pseudocounts_r2)/sum(pseudocounts_r2/1e+06)) + 
    1)
norm_pseudocounts_r3 <- log(((pseudocounts_r3)/sum(pseudocounts_r3/1e+06)) + 
    1)
norm_pseudocounts_r4 <- log(((pseudocounts_r4)/sum(pseudocounts_r4/1e+06)) + 
    1)

# determine the correlation between each of the scRNASeq samples
norm_pseudocounts <- data.frame(cbind(norm_pseudocounts_r1, norm_pseudocounts_r2, 
    norm_pseudocounts_r3, norm_pseudocounts_r4))

colnames(norm_pseudocounts) <- c("Replicate 1", "Replicate 2", 
    "Replicate 3", "Replicate 4")

fig.cor.scRNASeq <- ggpairs(norm_pseudocounts, lower = list(continuous = wrap("points", 
    alpha = 0.3, size = 0.01), combo = wrap("dot", alpha = 0.4, 
    size = 0.2))) + theme_ms() + theme(panel.spacing = grid::unit(1, 
    "lines"))

# save the plot for Supplementary Figure 1
s.fig1 <- fig.cor.scRNASeq
saveManuscriptPlot(s.fig1, 8, 8)
s.fig1

bulk RNASeq

Determine the agreement between the bulkRNASeq replicates

fig.cor.bulkRNASeq <- ggpairs(bulk_tpm, lower = list(continuous = wrap("points", 
    alpha = 0.3, size = 0.01), combo = wrap("dot", alpha = 0.4, 
    size = 0.2))) + theme_ms() + theme(panel.spacing = grid::unit(1, 
    "lines"))

s.fig2 <- fig.cor.bulkRNASeq
saveManuscriptPlot(s.fig2, 8, 8)
s.fig2

scRNASeq v bulkRNASeq

Comparison of each scRNASeq sample to it’s respective matched bulkRNASeq sample.

Map gene IDs

# annotate the genes with biomaRt
ensembl <- useMart(host = "http://sep2019.archive.ensembl.org", 
    biomart = "ENSEMBL_MART_ENSEMBL", dataset = "cgpicr_gene_ensembl")

picr_annotation <- getBM(attributes = c("ensembl_gene_id", "entrezgene_id", 
    "external_gene_name", "description", "gene_biotype"), filters = c("ensembl_gene_id"), 
    values = rownames(bulk_tpm), mart = ensembl, uniqueRows = TRUE)

# the seven bridges genomics pipeline outputs both symbols and ens gene ids get the ids that are matched to gene symbols in the cds
symbols_cds <- as.character(fData(cells)$type[grep("ENSCGR", 
    fData(cells)$type, invert = TRUE)])
matched_to_cds_symbols <- picr_annotation[as.character(picr_annotation$external_gene_name) %in% 
    fData(cells)$type, ]
matched_to_cds_symbols <- unique(setDT(matched_to_cds_symbols), 
    by = c("ensembl_gene_id", "external_gene_name"))
matched_to_cds_symbols <- matched_to_cds_symbols[, c(1, 3)]

# ensembl ids are common to bulk and scRNASeq
ensembl_cds <- as.character(fData(cells)$type[grep("ENSCGR", 
    fData(cells)$type)])

# the PICR geneome has no MT chromosome, add the ensembl ids for the CHOK1 mtDNA genome
mito.genes <- c("ND1", "ND2", "COX1", "COX2", "ATP8", "ATP6", 
    "COX3", "ND3", "ND4L", "ND4", "ND5", "ND6", "CYTB")

mito.ensembl <- c("ENSCGRG00000000006", "ENSCGRG00000000010", 
    "ENSCGRG00000000016", "ENSCGRG00000000019", "ENSCGRG00000000021", 
    "ENSCGRG00000000022", "ENSCGRG00000000023", "ENSCGRG00000000025", 
    "ENSCGRG00000000027", "ENSCGRG00000000028", "ENSCGRG00000000032", 
    "ENSCGRG00000000033", "ENSCGRG00000000035")

# create the ids for both datasets and include the mAb genes
cds_ids <- c(matched_to_cds_symbols$external_gene_name, ensembl_cds, 
    "heavy_chain", "light_chain", mito.genes)
bulk_ids <- c(matched_to_cds_symbols$ensembl_gene_id, ensembl_cds, 
    "heavy_chain", "light_chain", mito.ensembl)

Correlation between single cell and bulk RNASeq data

# determine the density of points to colour plot
density_r1 <- densCols(bulk_tpm_r1[bulk_ids], norm_pseudocounts_r1[cds_ids], 
    colramp = colorRampPalette(rev(rainbow(10, end = 4/6))), 
    nbin = 64)
density_r2 <- densCols(bulk_tpm_r2[bulk_ids], norm_pseudocounts_r2[cds_ids], 
    colramp = colorRampPalette(rev(rainbow(10, end = 4/6))), 
    nbin = 64)
density_r3 <- densCols(bulk_tpm_r3[bulk_ids], norm_pseudocounts_r3[cds_ids], 
    colramp = colorRampPalette(rev(rainbow(10, end = 4/6))), 
    nbin = 64)
density_r4 <- densCols(bulk_tpm_r4[bulk_ids], norm_pseudocounts_r4[cds_ids], 
    colramp = colorRampPalette(rev(rainbow(10, end = 4/6))), 
    nbin = 64)

# create the data frame for the plot
bulk_sc_cor_df <- data.frame(replicate = c(rep("Replicate 1", 
    length(bulk_ids)), rep("Replicate 2", length(bulk_ids)), 
    rep("Replicate 3", length(bulk_ids)), rep("Replicate 4", 
        length(bulk_ids))), bulk_tpm = c(bulk_tpm_r1[bulk_ids], 
    bulk_tpm_r2[bulk_ids], bulk_tpm_r3[bulk_ids], bulk_tpm_r4[bulk_ids]), 
    scrnaseq_tpm = c(norm_pseudocounts_r1[cds_ids], norm_pseudocounts_r1[cds_ids], 
        norm_pseudocounts_r1[cds_ids], norm_pseudocounts_r1[cds_ids]), 
    density = c(density_r1, density_r2, density_r3, density_r4))

fig.scatter.sc.v.bulk <- bulk_sc_cor_df %>% ggplot() + geom_point(aes(x = bulk_tpm, 
    y = scrnaseq_tpm, col = density), size = 0.1) + scale_color_identity() + 
    stat_cor(method = "pearson", aes(x = bulk_tpm, y = scrnaseq_tpm), 
        inherit.aes = TRUE) + facet_wrap(~replicate) + theme_ms() + 
    labs(x = "Bulk RNASeq Log(TPM+1)", y = "Pseudobulk RNASeq Log(TPM+1)") + 
    theme(strip.background = element_blank())

s.fig3 <- fig.scatter.sc.v.bulk
saveManuscriptPlot(s.fig3, 8, 8)
s.fig3

scRNASeq data preprocessing

The first stage of preprocessing in the Monocle workflow aims to eliminate poor quality cells from further analysis. In this section we calculate metrics to enable filtering of poor quality cells from the analysis - total UMIs, and fraction of UMIs that originate from mitochondrial genes.

Filter by Total UMIs

Cells that have too few UMIs per cells as well as those that have too many UMIs per cell are removed from the analysis.

# determine the sum of UMIs captured for each cell
pData(cells)$total.mRNAs <- Matrix::colSums(exprs(cells))

# determine the average number of UMIs captured per cell
pData(cells)$sample.mean.mRNAs <- mean(pData(cells)$total.mRNAs)

# set the limits
pData(cells)$lower.bound <- pData(cells)$sample.mean.mRNAs/2.5
pData(cells)$upper.bound <- pData(cells)$sample.mean.mRNAs * 
    2.5

# identify cells passing the filter
total_mRNA_passed <- row.names(subset(pData(cells), total.mRNAs > 
    lower.bound[1] & total.mRNAs < upper.bound[1]))

# plot the distribution of all cells and the filters used
fig.total.mRNA <- ggplot(data = pData(cells)$totalmRNAs, aes(pData(cells)$total.mRNAs)) + 
    geom_histogram(binwidth = 200, fill = "#084594") + geom_vline(xintercept = pData(cells)$lower.bound[1], 
    linetype = "dashed", color = "red", size = 0.75) + geom_vline(xintercept = pData(cells)$upper.bound[1], 
    linetype = "dashed", color = "red", size = 0.75) + xlab("Total UMIs") + 
    ylab("Number of cells") + theme_ms() + theme(axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)))

fig1.b <- fig.total.mRNA
saveManuscriptPlot(fig1.b, 6, 6)
fig1.b

Filter by mtDNA reads

A cell with >= 15% of the total UMIs originating from mitochondrial genes is removed.

# determine the fraction of UMIs from mito mRNAs
pData(cells)$mito.mRNAs <- Matrix::colSums(exprs(cells[row.names(subset(fData(cells), 
    rownames(fData(cells)) %in% mito.genes)), ]))

pData(cells)$frac.mito.mRNAs <- pData(cells)$mito.mRNAs/pData(cells)$total.mRNAs  # percentage of total UMIs from mtDNA

# filter
mito_mRNA_passed <- row.names(subset(pData(cells), frac.mito.mRNAs < 
    0.15))  # the cells with < 15% of UMIs from  mtDNA

fig.mtDNA.frac <- pData(cells) %>% group_by(as.factor(raw_umi_batch)) %>% 
    dplyr::select(frac.mito.mRNAs) %>% ggplot(aes(y = frac.mito.mRNAs, 
    x = as.factor(raw_umi_batch), fill = as.factor(raw_umi_batch))) + 
    geom_violin(show.legend = FALSE) + geom_point(position = "jitter", 
    alpha = 0.2, size = 0.01, color = "black") + geom_hline(yintercept = 0.15, 
    linetype = "dashed", color = "red", size = 1) + ylab("Mitochondrial UMIs (%)") + 
    xlab(" ") + scale_y_continuous(labels = scales::percent) + 
    scale_x_discrete(labels = c(rep1 = "Rep 1", rep2 = "Rep 2", 
        rep3 = "Rep 3", rep4 = "Rep 4")) + theme_ms() + theme(legend.position = "none", 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1))) + scale_fill_jcolors("pal3")

fig1.c <- fig.mtDNA.frac
saveManuscriptPlot(fig1.c, 6, 6)
fig1.c

Remove poor quality cells

# IDs of cells passing both criteria
qc_passed_cells <- intersect(mito_mRNA_passed, total_mRNA_passed)

# retain cells passsing
cells.filter <- cells[, qc_passed_cells]

# plot for each replicate
total_cells <- table(pData(cells)$batch)
qc_passed_cells <- table(pData(cells.filter)$batch)
filtered_cells <- total_cells - qc_passed_cells
df_qc <- t(rbind(qc_passed_cells, filtered_cells))

dat.m <- melt(df_qc)
qc_df <- dat.m %>%
  dplyr::group_by(Var1) %>%
  dplyr::arrange(Var1, desc(Var2)) %>%
  dplyr::mutate(lab_ypos = cumsum(value) - 0.5 * value)

fig.cells.removed <- ggplot(data = qc_df, aes(x = Var1, y = value)) +
  geom_col(aes(fill = Var2), width = 0.7, position = position_stack(reverse = TRUE)) +
  theme(
    legend.title = element_blank(),
    legend.position = "top",
    panel.background = element_blank(),
    axis.line.y = element_line(),
    axis.text.x = element_text(face = "bold")
  ) +
  scale_y_continuous(limits = c(0, 1500)) +
  labs(x = " ", y = "Number of cells") +
  scale_x_discrete(labels = c(
    "rep1" = "Rep 1", "rep2" = "Rep 2",
    "rep3" = "Rep 3", "rep4" = "Rep 4"
  )) +
  theme_ms() +
  theme(legend.position = "none", axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", size = rel(1))) +
  theme(
    legend.title = element_blank(),
    legend.position = (c(.75, .94))
    # legend.text = element_text(size = 12)
  ) +
  scale_fill_manual(values = c("#3CA437", "#D7263D"), labels = c("Passed", "Removed"))

fig1.d <- fig.cells.removed
saveManuscriptPlot(fig1.d, 6, 6)
fig1.d

Filter Genes by expression

We now remove genes that are not expressed in more than 50 cells in our dataset or the maximum expression is less than 2 UMIs in any cell

cells.filter <- estimateSizeFactors(cells.filter)

cells.filter <- estimateDispersions(cells.filter)

cells.filter <- detectGenes(cells.filter, min_expr = 0.1)

cells.filter <- cells.filter[row.names(subset(fData(cells.filter), 
    num_cells_expressed > 100)), ]

fData(cells.filter)$max.expression <- cells.filter %>% exprs() %>% 
    (qlcMatrix::rowMax) %>% as.double()
max.expr.cutoff <- 4

cells.filter <- cells.filter[row.names(subset(fData(cells.filter), 
    max.expression > max.expr.cutoff)), ]

# remove mitochondrial genes from the analysis
cells.filter <- cells.filter[!row.names(subset(fData(cells.filter))) %in% 
    mito.genes, ]

fig.genes.expression <- ggplot(data = pData(cells)$num_genes_expressed, 
    aes(pData(cells.filter)$num_genes_expressed)) + geom_histogram(binwidth = 15, 
    fill = "#084594") + theme_ms() + xlab("Detected genes") + 
    ylab("Number of cells") + theme(legend.position = "none", 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1)))

fig1.e <- fig.genes.expression
saveManuscriptPlot(fig1.e, 6, 6)
fig1.e

Figure 2

p1 <- (ggplot() + theme_void() + labs(tag = "(a)"))/(ggplot() + 
    theme_void())
p2 <- (ggplot() + theme_void())/(ggplot() + theme_void())
p3 <- (fig1.b + labs(tag = "(b)"))/(fig1.d + labs(tag = "(d)"))
p4 <- (fig1.c + labs(tag = "(c)"))/(fig1.e + labs(tag = "(e)"))

figure1 <- p1 | p2 | p3 | p4
# figure1 <- figure1 + plot_annotation(tag_levels = 'a',tag_prefix = '(', tag_suffix = ')') + plot_annotation(tag_levels = 'a',tag_prefix = '(', tag_suffix = ')')
saveManuscriptPlot(figure1, 12, 8)

Gene annotation

The SBG platform utilises the ENSEMBL GTF file for the PICR genome. To improve the annotation of the genes we utlise the corresponding Entrez IDs for those genes labelled with and “ENSCGRG” IDs to obtain a gene symbol

# set bioMart
ensembl <- useMart(host = "http://sep2019.archive.ensembl.org", 
    biomart = "ENSEMBL_MART_ENSEMBL", dataset = "cgpicr_gene_ensembl")

# make a gene_short_name column for the CDS object
fData(cells.filter)$gene_short_name <- rownames(cells.filter)

# how many of the genes have no gene symbol sum(str_count(fData(cells.filter)$gene_short_name, 'ENSCGR*'))
unannotated <- fData(cells.filter)$gene_short_name[grep("ENSCGR", 
    fData(cells.filter)$gene_short_name)]

# biomart query
annotation <- getBM(attributes = c("ensembl_gene_id", "entrezgene_id", 
    "external_gene_name", "description", "gene_biotype"), filters = c("ensembl_gene_id"), 
    values = unannotated, mart = ensembl, uniqueRows = TRUE)

# store the final annotation
annotation$final <- "Unannotated"

# 1) add the ensembl annotated external gene names to final columns annotation$final[!annotation$external_gene_name == ''] <- annotation$external_gene_name[!annotation$external_gene_name == '']

# 2) where we have no external gene name from ensembl but we do have an entrez ID
match_entrez_ids <- annotation %>% dplyr::filter(!is.na(entrezgene_id))
# load NCBI annotations
cgr_entrez <- readLines("./genelists/cgr_ncbi_ids.txt")

# match entrezID
split_gene_info <- strsplit(cgr_entrez[grepl(paste(match_entrez_ids[, 
    2], collapse = "|"), cgr_entrez)], "\t")

# extract the matched symbols
entrez_matched <- cbind(sapply(split_gene_info, "[", 2), sapply(split_gene_info, 
    "[", 3))
colnames(entrez_matched) <- c("entrez_gene_id", "symbol")

# add the symbols to the annotation
annotation$final[match(entrez_matched[, 1], annotation$entrezgene_id)] <- entrez_matched[, 
    2]

# add the new annotations to the CDS object
fData(cells.filter)$gene_short_name[match(annotation$ensembl_gene_id, 
    fData(cells.filter)$gene_short_name)] <- annotation$final

Cell cycle scoring

For the next section we used the cell cycle scoring function of Seurat to determine which phase of the cell cycles cells were in. Using this function will allow us to correct the effect of cell cycle in subsequent analyses.

Create a Seurat object

# extract the raw counts from the cds and make a Seurat oject containing the batch variable
cho.obj <- CreateSeuratObject(counts = exprs(cells.filter), project = "chok1_dp12", 
    min.cells = 3, min.features = 200)

cho.obj <- AddMetaData(object = cho.obj, metadata = pData(cells.filter)$batch, 
    col.name = "batch")

Preprocess Seurat

Before scoring cells by the cell cycle phase any variation due to batch is removed

Specify cell cycle genes

Here we use custom CHO cell genes involved in cell cycle

s.genes <- read.table("./genelists/cho.s.txt", stringsAsFactors = F)

g2m.genes <- read.table("./genelists/cho.g2m.txt", stringsAsFactors = F)

Calculate the cell cycle scores

# run the seurat cell cycle scoring module
cho.obj <- CellCycleScoring(cho.obj, s.features = s.genes$V1, 
    g2m.features = g2m.genes$V1, set.ident = TRUE)

Add cell cycle scores to CDS

pData(cells.filter)$S.Score <- cho.obj$S.Score
pData(cells.filter)$G2M.Score <- cho.obj$G2M.Score
pData(cells.filter)$cc_phase <- cho.obj$Phase

tSNE Analysis

To conduct an exploratory analysis of CHO K1 DP12N1 scRNASeq data we perform an unsupervised dimensionality reduction and visualisation using t-Stocastic neighbour embedding (t-SNE).

Identify variable genes

Identify the most variable genes in the scRNASeq dataset.

options(warn = -1)  # suppress warnings that otherwise clutter output
#' prioritize variable genes for the analysis
disp_table <- dispersionTable(cells.filter)
unsup_clustering_genes <- subset(disp_table, mean_expression > 
    0.1)
cells.filter <- setOrderingFilter(cells.filter, unsup_clustering_genes$gene_id)

fig.variable.genes <- plot_ordering_genes(cells.filter) + theme_ms() + 
    theme(aspect.ratio = 1) + xlab("Mean Expression") + ylab("Dispersion") + 
    theme(aspect.ratio = 1, axis.text = element_text(size = rel(1)), 
        axis.title = element_text(face = "bold", size = rel(1)))

fig2.a <- fig.variable.genes
saveManuscriptPlot(fig2.a, 6, 6)
fig2.a

Select PCs

The dimensionality of the single cell expression data was further reduced using principal components analysis (PCA) and following determination of the proportion of variance captured by each component, the first 12 PCs were selected for t-SNE.

fig.pc.variance <- plot_pc_variance_explained(cells.filter, return_all = FALSE) + 
    geom_vline(xintercept = 12, linetype = "dashed", color = "red", 
        size = 0.5) + labs(x = "Principal Component", y = "Variance Explained") + 
    theme_ms() + theme(aspect.ratio = 1, axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)))

fig2.b <- fig.pc.variance
saveManuscriptPlot(fig2.b, 6, 6)
fig2.b

t-Stocastic neighbour embedding

To account for potential heterogeneity from technical and biological confounding factors, we corrected the UMI count data prior to t-SNE, ensuring that covariation between multiple factors was removed simultaneously. For technical covariates, possible batch effects arising from use of different cell isolation cartridges as well as differences in the total number of UMIs captured and genes detected per cell were regressed out.

cells.filter <- reduceDimension(cells.filter, max_components = 2, 
    num_dim = 12, reduction_method = "tSNE", verbose = FALSE, 
    residualModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score", 
    norm_method = "vstExprs")

Identify tSNE clusters

The Monocle densityPeak with parameters rho = 2 & delta = 4 is used to identify subclusters in the data. Note: the t-SNE plot with identified clusters is shown below.

cells.filter <- clusterCells(cells.filter, rho_threshold = 2, 
    delta_threshold = 4, skip_rho_sigma = T)

Distance cutoff calculated to 3.224882

fig.scatter.rho.delta <- plot_rho_delta(cells.filter, rho_threshold = 2, 
    delta_threshold = 4) + theme_ms() + theme(aspect.ratio = 1)

fig_s_5 <- fig.scatter.rho.delta
saveManuscriptPlot(fig_s_5, 6, 6)
fig_s_5

Effect of CC correction

Here we show the effect of correcting for cell cycle on the t-SNE visualisation.

cells.filter <- reduceDimension(cells.filter, max_components = 2, 
    num_dim = 12, reduction_method = "tSNE", verbose = FALSE, 
    residualModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs", 
    norm_method = "vstExprs")

fig.tsne.cc_uncorrected <- plot_cell_clusters(cells.filter, color = "cc_phase", 
    cell_size = 0.8) + scale_y_continuous(limits = c(-35, 35)) + 
    scale_x_continuous(limits = c(-35, 35)) + labs(x = "t-SNE 1", 
    y = "t-SNE 2") + guides(color = guide_legend(override.aes = list(size = 5))) + 
    theme_ms() + theme(legend.position = "right", aspect.ratio = 1, 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1.5)))

fig_s_4a <- fig.tsne.cc_uncorrected
saveManuscriptPlot(fig_s_4a, 6, 6)
fig_s_4a

# correct and repeat with Seurat cc_scores
cells.filter <- reduceDimension(cells.filter, max_components = 2, 
    num_dim = 12, reduction_method = "tSNE", verbose = FALSE, 
    residualModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score", 
    norm_method = "vstExprs")

# show the cell cycle uncorrected
fig.tsne.cc_corrected <- plot_cell_clusters(cells.filter, color = "cc_phase", 
    cell_size = 0.8) + scale_y_continuous(limits = c(-35, 35)) + 
    scale_x_continuous(limits = c(-35, 35)) + labs(x = "t-SNE 1", 
    y = "t-SNE 2") + guides(color = guide_legend(override.aes = list(size = 5))) + 
    theme_ms() + theme(legend.position = "right", aspect.ratio = 1, 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1.5)))

fig_s_4b <- fig.tsne.cc_corrected
saveManuscriptPlot(fig_s_4b, 6, 6)
fig_s_4b

tSNE density

Visualiation of cell density in 2D t-SNE plot, show the presence of sub-clusters confirming the presence of transcriptional heterogeneity.

# extract the monocle reduced dimesions
tsne_coordinates <- t(reducedDimA(cells.filter))

df <- data.frame(x = tsne_coordinates[, 1], y = tsne_coordinates[, 
    2], d = densCols(tsne_coordinates[, 1], tsne_coordinates[, 
    2], nbin = 64, bandwidth = 1.5, colramp = colorRampPalette(rev(rainbow(10, 
    end = 4/6)))))

fig.tsne.density <- ggplot(df) + geom_point(aes(x, y, col = d), 
    size = 0.2) + scale_color_identity() + theme_ms() + theme(aspect.ratio = 1, 
    panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1))) + scale_y_continuous(limits = c(-35, 
    35)) + scale_x_continuous(limits = c(-35, 35)) + labs(x = "t-SNE 1", 
    y = "t-SNE 2")

fig2.c <- fig.tsne.density
saveManuscriptPlot(fig2.c, 6, 6)
fig2.c

tSNE clusters

To identify groups of cells form the tSNE plot we use densityPeak algorithm. Each cluster is assinged a color for reference.

fig.tsne.cluster <- plot_cell_clusters(cells.filter, color = "Cluster", 
    cell_size = 0.5) + scale_y_continuous(limits = c(-35, 35)) + 
    scale_x_continuous(limits = c(-35, 35)) + labs(x = "t-SNE 1", 
    y = "t-SNE 2") + guides(color = guide_legend(override.aes = list(size = 5))) + 
    theme_ms() + theme(legend.position = "none", aspect.ratio = 1, 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1))) + scale_color_jcolors("pal8")

fig2.d <- fig.tsne.cluster
saveManuscriptPlot(fig2.d, 6, 6)

fig2.d

Figure 2

figure2 <- (fig2.a | fig2.b)/(fig2.c | fig2.d) + plot_annotation(tag_levels = "a", 
    tag_prefix = "(", tag_suffix = ")")
saveManuscriptPlot(figure2, 9, 9)

Differential expression

Using Monocle’s differential gene test the genes that differ between the clusters of cells are identified. We again regress out confounding factors i.e. batch and cell cycle phase

# monocle differential expression by cluster
tictoc::tic()
cluster_de <- differentialGeneTest(cells.filter, fullModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score + Cluster", 
    reducedModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score", 
    cores = 32)
tictoc::toc()

472.447 sec elapsed

Anti-IL8 mAb gene expression

bulk RNASeq

From the bulk RNASeq data we find that both the heavy and light chain were expressed in the top 2% of all genes. The expression of the light chain (median TPM = 7.9) was higher than that of the heavy chain (median TPM = 6.4).

mab_genes <- c("heavy_chain", "light_chain")

# expression of the mAb genes in bulkRNASeq
bulk_tpm_replicates <- cbind(bulk_tpm_r1[bulk_ids], bulk_tpm_r2[bulk_ids], 
    bulk_tpm_r3[bulk_ids], bulk_tpm_r4[bulk_ids])
# take the median of the log(TPM+1)
bulk_tpm_median <- data.frame(median_tpm = apply(bulk_tpm_replicates, 
    1, median))
bulk_tpm_max <- data.frame(median_tpm = apply(bulk_tpm_replicates, 
    1, max))
bulk_tpm_min <- data.frame(median_tpm = apply(bulk_tpm_replicates, 
    1, min))

hc_median_tpm <- bulk_tpm_median["heavy_chain", 1]
lc_median_tpm <- bulk_tpm_median["light_chain", 1]
gapdh_median_tpm <- bulk_tpm_median["ENSCGRG00015003621", 1]
actb_median_tpm <- bulk_tpm_median["ENSCGRG00015013883", 1]

fig.density.mab.bulk <- bulk_tpm_median %>% ggplot() + geom_density(aes(x = median_tpm), 
    fill = "#2E6657", alpha = 0.2) + theme_ms() + # theme(aspect.ratio = 1) +
labs(x = "Log (median bulk TPM +1)") + geom_segment(aes(x = hc_median_tpm, 
    y = 0, xend = hc_median_tpm, yend = 0.4), linetype = "dashed", 
    color = "#558aa6", arrow = arrow(length = unit(0.15, "cm"), 
        ends = "first", type = "closed")) + annotate("text", 
    x = hc_median_tpm, y = 0.42, label = "Heavy chain", color = "#558aa6") + 
    geom_segment(aes(x = lc_median_tpm, y = 0, xend = lc_median_tpm, 
        yend = 0.5), linetype = "dashed", color = "#B1740F", 
        arrow = arrow(length = unit(0.15, "cm"), ends = "first", 
            type = "closed")) + annotate("text", x = lc_median_tpm + 
    0.45, y = 0.52, label = "Light chain", color = "#B1740F") + 
    geom_segment(aes(x = gapdh_median_tpm, y = 0, xend = gapdh_median_tpm, 
        yend = 0.6), linetype = "dashed", color = "grey", arrow = arrow(length = unit(0.15, 
        "cm"), ends = "first", type = "closed")) + annotate("text", 
    x = gapdh_median_tpm, y = 0.62, label = "Gapdh", color = "grey") + 
    scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 
    0))


s_fig5 <- fig.density.mab.bulk
saveManuscriptPlot(s_fig5, 12, 6)
s_fig5

scRNASeq

HC & LC violin

Heavy and light chain expression across the population.

# create a relative expression violin plot for the heavy and light chain
pData(cells.filter)$rel.exprs.hc <- monocle_relative_expression(cells.filter, 
    "heavy_chain")
pData(cells.filter)$rel.exprs.lc <- monocle_relative_expression(cells.filter, 
    "light_chain")
cds_exprs <- cbind(pData(cells.filter)$rel.exprs.hc, pData(cells.filter)$rel.exprs.lc)
cds_exprs <- reshape2::melt(as.matrix(cds_exprs))
colnames(cds_exprs) <- c("Cell", "Gene", "expression")

# log10 adjust the expression
cds_exprs$adjusted_expression <- log10(cds_exprs$expression)

fig.violin.mab <- ggplot(aes_string(x = "Gene", y = "expression", 
    fill = "Gene"), data = cds_exprs) + geom_violin(alpha = 0.2) + 
    geom_boxplot(width = 0.2) + expand_limits(y = c(0.5, 1)) + 
    scale_y_log10() + scale_fill_jcolors("pal6") + theme(legend.position = "none") + 
    theme_ms() + scale_x_discrete(labels = c(heavy_chain = "Heavy Chain", 
    light_chain = "Light Chain")) + # stat_summary(fun.y='median',geom='line', aes(group='f_id', col='grey'), position=position_dodge(width= 0.9)) +
theme(legend.position = "none", aspect.ratio = 1, axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1))) + 
    labs(y = "Relative expression", x = "") + stat_compare_means(method = "wilcox.test", 
    label.x = 0.75)
fig3.b <- fig.violin.mab
saveManuscriptPlot(fig3.b, 6, 6)
fig3.b

HC tSNE

Light expression visualised in the t-SNE space.

fig.tsne.rel.hc <- plot_cell_clusters(cells.filter, color = "rel.exprs.hc", 
    cell_size = 0.5) + color.gradient + theme_ms() + theme(panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank(), legend.position = "right", 
    aspect.ratio = 1, axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(face = "bold", size = rel(1))) + 
    scale_y_continuous(limits = c(-35, 35)) + scale_x_continuous(limits = c(-35, 
    35)) + labs(x = "t-SNE 1", y = "t-SNE 2") + labs(x = "t-SNE 1", 
    y = "t-SNE 2", color = "Relative\nExpression", title = "Heavy Chain")
fig3.d <- fig.tsne.rel.hc
saveManuscriptPlot(fig3.d, 6, 6)
fig3.d

HC cluster expression

Assessing the difference in heavy chain expession for the t-SNE clusters.

pData(cells.filter) %>%
  group_by(Cluster) %>%
  summarise(`Cell number` = n(), `Median relative HC expression` = median(rel.exprs.hc), `Max. expression` = max(rel.exprs.hc)) %>%
  DT::datatable(
    rownames = T,
    options = list(
      pagelength = T,
      scrollX = T,
      columnDefs = list(list(
        className = "dt-head-center dt-center",
        targets = 1:2
      ))
    )
  ) %>%
  formatSignif(digits = 2, columns = 3:4)

print(paste("Heavy chain DE q-value for cluster = ", signif(cluster_de["heavy_chain", ]$qval, 2)), sep = "")

[1] "Heavy chain DE q-value for cluster =  1.6e-53"

fig.box.hc.cluster <- pData(cells.filter) %>%
  group_by(Cluster) %>%
  ggplot(aes(x = Cluster, y = rel.exprs.hc, fill = Cluster)) +
  geom_violin(alpha = 0.2) +
  geom_boxplot(alpha = 0.5, width = 0.2) +
  scale_y_log10(limits = c(0.5, 1e2)) +
  theme_ms() +
  theme(
    legend.position = "none",
    # aspect.ratio = 1,
    axis.text = element_text(size = rel(1)),
    axis.title = element_text(face = "bold", size = rel(1)),
    plot.title = element_text(size = rel(1))
  ) +
  labs(y = "Relative expression", x = "Cluster", title = "Heavy Chain") +
  scale_fill_jcolors("pal8") +
  annotate("text", x = "2", y = 75, label = expression("DE q-value = " * 1.6 * " \u00D7 " * 10^{
    -53
  }))

fig3.f <- fig.box.hc.cluster
saveManuscriptPlot(fig3.f, 6, 3)
fig3.f

LC tSNE

Light expression visualised in the t-SNE space.

fig.tsne.rel.lc <- plot_cell_clusters(cells.filter, color = "rel.exprs.lc", 
    cell_size = 0.5) + color.gradient + theme_ms() + theme(panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank(), legend.position = "right", 
    aspect.ratio = 1, axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(face = "bold", size = rel(1))) + 
    scale_y_continuous(limits = c(-35, 35)) + scale_x_continuous(limits = c(-35, 
    35)) + labs(x = "t-SNE 1", y = "t-SNE 2") + labs(x = "t-SNE 1", 
    y = "t-SNE 2", color = "Relative\nExpression", title = "Light Chain")
fig3.e <- fig.tsne.rel.lc
saveManuscriptPlot(fig3.e, 6, 6)
fig3.e

LC cluster expression

Assessing the difference in heavy chain expession for the t-SNE clusters.

pData(cells.filter) %>%
  group_by(Cluster) %>%
  summarise(`Cell number` = n(), `Median relative LC expression` = median(rel.exprs.lc), `Max. expression` = max(rel.exprs.lc)) %>%
  DT::datatable(
    rownames = T,
    options = list(
      pagelength = T,
      scrollX = T,
      columnDefs = list(list(
        className = "dt-head-center dt-center",
        targets = 1:2
      ))
    )
  ) %>%
  formatSignif(digits = 2, columns = 3:4)

print(paste("Light chain DE q-values for cluster = ", signif(cluster_de["light_chain", ]$qval, 2)), sep = "")

[1] "Light chain DE q-values for cluster =  1.3e-36"

fig.box.lc.cluster <- pData(cells.filter) %>%
  group_by(Cluster) %>%
  ggplot(aes(x = Cluster, y = rel.exprs.lc, fill = Cluster)) +
  geom_violin(alpha = 0.2) +
  geom_boxplot(alpha = 0.5, width = 0.2) +
  scale_y_log10(limits = c(0.5, 1e2)) +
  theme_ms() +
  theme(
    legend.position = "none",
    # aspect.ratio = 1,
    axis.text = element_text(size = rel(1)),
    axis.title = element_text(face = "bold", size = rel(1)),
    plot.title = element_text(size = rel(1))
  ) +
  labs(y = "Relative expression", x = "Cluster", title = "Light Chain") +
  scale_fill_jcolors("pal8") +
  annotate("text", x = "2", y = 75, label = expression("DE q-value = " * 1.3 * " \u00D7 " * 10^{
    -36
  }))

fig3.g <- fig.box.lc.cluster
saveManuscriptPlot(fig3.g, 6, 3)
fig3.g

Figure 3

Create and save figure 3 for the manuscript.

p1 <- fig3.b | fig2.d + theme(legend.position = "right")
p2 <- fig3.f/fig3.g
p3 <- fig3.d + theme(legend.position = "right") | fig3.e + theme(legend.position = "right")


p4 <- p1/p2/p3
figure3 <- p4 + plot_annotation(tag_levels = "a", tag_prefix = "(", 
    tag_suffix = ")")
saveManuscriptPlot(figure3, 9, 12)

Trajectory analysis

Construct trajectory

Using the genes found to vary beteen the identified clusters we first project onto a 2D space using the DDRTree algorithm before ording the cells in pseduotime

de_genes <- cluster_de %>%
  arrange(qval) %>%
  filter(qval < 0.01)

cells.filter <- setOrderingFilter(cells.filter, de_genes$type)

cells.filter <- reduceDimension(cells.filter,
  max_components = 2,
  reduction_method = "DDRTree",
  residualModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score",
  norm_method = "vstExprs"
)

cells.filter <- orderCells(cells.filter)

# cell states are assigned arbitrary numbers, here we reassign #
# to read left to right in the trajectory plot
pData(cells.filter)$State <- mapvalues(pData(cells.filter)$State,
  from = c(
    "6", "7", "5", "4",
    "8", "3", "2", "9", "1"
  ),
  to = c(
    "1", "2", "3", "4",
    "5", "6", "7", "8", "9"
  )
)

pData(cells.filter)$State <- factor(pData(cells.filter)$State,
  levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9")
)

fig.trajectory.state <- plot_cell_trajectory(cells.filter,
  theta = 0,
  show_branch_points = FALSE,
  color_by = "State",
  cell_size = 0.5
) +
  labs(x = "Component 1", y = "Component 2", title = "Monocle trajectory") +
  guides(color = guide_legend(override.aes = list(size = 5))) +
  theme_ms() +
  scale_x_reverse() + # we reverse the direction of the x-axis so that State 1 is at the left
  scale_color_manual(values = colour.set) +
  scale_alpha(pData(cells.filter)$State) +
  theme(
    axis.text = element_text(size = rel(1)),
    axis.title = element_text(face = "bold", size = rel(1))
  )

# modify the transperancy of the cells on the plot
# fig.trajectory.state <- ggplot_build(fig.trajectory.state)
# fig.trajectory.state$data[[2]]$alpha <- 0.3
#
# fig.trajectory.state <- ggplot_gtable(fig.trajectory.state)
fig5.a <- fig.trajectory.state
saveManuscriptPlot(fig5.a, 8, 4)
fig5.a

# plot the number of cells in each state
fig.bar.state.cells <- pData(cells.filter) %>%
  dplyr::group_by(State) %>%
  dplyr::summarise(num_cells = n()) %>%
  ggplot(aes(x = State, y = num_cells, fill = State)) +
  geom_text(aes(label = num_cells), position = position_dodge(width = 0.9), vjust = -0.25) +
  geom_bar(alpha = 1, stat = "identity") +
  ylim(c(0, 1050)) +
  theme(legend.position = "none") +
  theme_ms() +
  theme(
    legend.position = "none",
    axis.text = element_text(size = rel(1)),
    axis.title = element_text(face = "bold", size = rel(1))
  ) +
  labs(y = "Number of cells", x = "State") +
  scale_fill_manual(values = colour.set)

fig5.b <- fig.bar.state.cells
saveManuscriptPlot(fig5.b, 6, 6)
fig5.b

Median relative HC & LC in each cell state

pData(cells.filter) %>% group_by(State) %>% summarise(`Cell number` = n(), 
    `Median relative HC expression` = median(rel.exprs.hc), `Max. expression` = max(rel.exprs.hc)) %>% 
    DT::datatable(rownames = T, options = list(pagelength = T, 
        scrollX = T, columnDefs = list(list(className = "dt-head-center dt-center", 
            targets = 1:2)))) %>% formatSignif(digits = 2, columns = 3:4)

pData(cells.filter) %>% group_by(State) %>% summarise(`Cell number` = n(), 
    `Median relative LC expression` = median(rel.exprs.lc), `Max. expression` = max(rel.exprs.lc)) %>% 
    DT::datatable(rownames = T, options = list(pagelength = T, 
        scrollX = T, columnDefs = list(list(className = "dt-head-center dt-center", 
            targets = 1:2)))) %>% formatSignif(digits = 2, columns = 3:4)

HC state expression

fig.box.hc.state <- pData(cells.filter) %>% group_by(State) %>% 
    ggplot(aes(x = State, y = rel.exprs.hc, fill = State)) + 
    geom_violin(alpha = 0.2) + geom_boxplot(alpha = 1, width = 0.15) + 
    scale_y_log10(limits = c(0.5, 100)) + theme_ms() + theme(legend.position = "none", 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1)), plot.title = element_text(size = rel(1))) + 
    labs(y = "Relative expression", x = "State", title = "Heavy Chain") + 
    stat_compare_means(label.y.npc = c(0.95), label = "p.signif", 
        method = "wilcox.test", ref.group = "1", hide.ns = TRUE, 
        size = 7) + scale_fill_manual(values = colour.set)

fig5.c <- fig.box.hc.state
saveManuscriptPlot(fig5.c, 6, 6)
fig5.c

HC trajectory

# overlay the heavy chain expression on the trajectory plot
fig.trajectory.rel.hc <- plot_cell_trajectory(cells.filter, theta = 0, 
    show_branch_points = FALSE, color_by = "rel.exprs.hc", cell_size = 0.5) + 
    color.gradient + labs(x = "Component 1", y = "Component 2", 
    color = "Relative\nExpression", title = "Heavy Chain") + 
    theme_ms() + theme(legend.position = "right", axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(size = rel(1))) + scale_x_reverse()

fig5.e <- fig.trajectory.rel.hc
saveManuscriptPlot(fig5.e, 12, 6)
fig5.e

LC state expression

# relative expression of the light chain in each cell state
fig.box.lc.state <- pData(cells.filter) %>%
  group_by(State) %>%
  ggplot(aes(x = State, y = rel.exprs.lc, fill = State)) +
  geom_violin(alpha = 0.2) +
  geom_boxplot(alpha = 1, width = 0.15) +
  scale_y_log10(limits = c(0.5, 1e2)) +
  theme(legend.position = "none") +
  theme_ms() +
  theme(
    legend.position = "none",

    axis.text = element_text(size = rel(1)),
    axis.title = element_text(face = "bold", size = rel(1)),
    plot.title = element_text(size = rel(1))
  ) +
  labs(y = "Relative expression", x = "State", title = "Light Chain") +
  stat_compare_means(
    label.y.npc = c(0.95),
    label = "p.signif",
    method = "wilcox.test",
    ref.group = "1",
    hide.ns = TRUE,
    size = 7,
    color = "black"
  ) +
  scale_fill_manual(values = colour.set)

fig5.d <- fig.box.lc.state
saveManuscriptPlot(fig5.d, 6, 6)
fig5.d

LC trajectory

# overlay the light chain expression on the trajectory plot
fig.trajectory.rel.lc <- plot_cell_trajectory(cells.filter, theta = 0, 
    show_branch_points = FALSE, color_by = "rel.exprs.lc", cell_size = 0.5) + 
    color.gradient + labs(x = "Component 1", y = "Component 2", 
    color = "Relative\nExpression", title = "Light Chain") + 
    theme_ms() + theme(legend.position = "right", axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(size = rel(1))) + scale_x_reverse()

fig5.f <- fig.trajectory.rel.lc
saveManuscriptPlot(fig5.f, 12, 6)
fig5.f

Figure 5

Create and save figure 5 in the manuscript.

p1 <- (fig5.a + labs(tag = "(a)"))/(fig5.e + labs(tag = "(c)"))/(fig5.f + 
    labs(tag = "(e)"))
p2 <- (fig5.b + labs(tag = "(b)"))/(fig5.c + labs(tag = "(d)"))/(fig5.d + 
    labs(tag = "(f)"))

figure5 <- p1 | p2 + plot_annotation(tag_levels = "a", tag_prefix = "(", 
    tag_suffix = ")")
saveManuscriptPlot(figure5, 16, 11)

Expression of CHO genes along trajectory

Set the root of the Monocle trajectory

Cell state 1 is set as the root or beginning of the trajectory

# set the root state
cells.filter <- orderCells(cells.filter, root_state = 1)

fig.trajectory.pseudotime <- plot_cell_trajectory(cells.filter, 
    color_by = "Pseudotime", theta = 0, show_branch_points = FALSE, 
    cell_size = 0.5) + labs(x = "Component 1", y = "Component 2", 
    title = "Monocle pseudotime") + scale_color_viridis_c(option = "D") + 
    theme_ms() + theme(legend.position = "right", legend.title.align = 0.5, 
    axis.text = element_text(size = rel(1)), axis.title = element_text(face = "bold", 
        size = rel(1))) + scale_x_reverse()

fig6.a <- fig.trajectory.pseudotime
saveManuscriptPlot(fig6.a, 7, 3)

fig6.a

Pseudotime differential expression

Identification of genes that correlated with Pseudotime. Those genes found to have a qval < 0.01 using the Monocle differentialGeneTest function were considered significant

pseudotime_de <- differentialGeneTest(cells.filter, fullModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score + sm.ns(Pseudotime, df = 3)", 
    reducedModelFormulaStr = "~batch + num_genes_expressed + total.mRNAs + S.Score + G2M.Score", 
    cores = 32)

# select significant genes
pseudotime_de <- pseudotime_de[pseudotime_de$qval < 0.01, ]

# save the results for Table S3
table.s3 <- pseudotime_de %>% arrange(qval) %>% dplyr::select(type, 
    gene_short_name, num_cells_expressed, max.expression, pval, 
    qval) %>% dplyr::rename(`SBG Annotation` = type, `Gene Symbol` = gene_short_name, 
    `p-value` = pval, `adj p-value` = qval)

tabsdir <- "./tables/"
if (!dir.exists(tabsdir)) {
    dir.create(tabsdir)
}

fn <- paste("tables/Table S3.xlsx", sep = "")
suppressMessages(if (file.exists(fn)) {
    file.remove(fn)
})

[1] TRUE

write_xlsx(list(pseudotime_de = table.s3), path = fn, format_headers = TRUE)

Cluster analysis

cluster_number <- 5

fig.heatmap.pseudotime <- plot_pseudotime_heatmap(cells.filter[rownames(pseudotime_de), 
    ], cores = 32, use_gene_short_name = F, num_clusters = cluster_number, 
    show_rownames = F, return_heatmap = T, norm_method = "vstExprs")

fig6.b <- fig.heatmap.pseudotime
saveManuscriptPlot(fig6.b, 6, 6)
fig6.b

Enrichment analysis

Use WebGestaltR to determine if the identified clusters of genes are enriched when compared to GO. Those biological processes with a Benjamini Hochberg adjusted p-value < 0.05 were considered significantly enriched.

Enriched GO BPs

Cluster 1

datatable(enrich_result[[1]]$enrichResult[, c(1, 2, 4, 5, 7, 
    8, 9)], rownames = T, filter = "top", options = list(pagelength = T, 
    scrollX = T, columnDefs = list(list(className = "dt-head-center dt-center", 
        targets = 3:6)))) %>% formatSignif(digits = 2, columns = 5:7)

Cluster 2

datatable(enrich_result[[2]]$enrichResult[, c(1, 2, 4, 5, 8, 
    9)], rownames = T, filter = "top", options = list(pagelength = T, 
    scrollX = T, columnDefs = list(list(className = "dt-head-center dt-center", 
        targets = 3:6)))) %>% formatSignif(digits = 2, columns = 5:6)

CHO cell gene expression

Overlay the expression of genes found to significantly correlate with the progression of cells along the trajectory.

Psap

Plot the expression of the Prosaposin gene

# relative expression
pData(cells.filter)$rel.exprs.psap <- monocle_relative_expression(cells.filter, 
    "Psap")

fig.traj.rel.psap <- plot_cell_trajectory(cells.filter, theta = 0, 
    show_branch_points = FALSE, color_by = "rel.exprs.psap", 
    cell_size = 0.5) + color.gradient + labs(x = "Component 1", 
    y = "Component 2", title = "Psap", color = "Relative\nexpression") + 
    theme_ms() + theme(legend.position = "right", axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(size = rel(1))) + scale_x_reverse()


fig6.c <- fig.traj.rel.psap
saveManuscriptPlot(fig6.c, 7, 3)
fig6.c

Hmox1

Plot the expression of the Heme oxygenase 1 gene

# relative expression
pData(cells.filter)$rel.exprs.hmox1 <- monocle_relative_expression(cells.filter, 
    "Hmox1")

fig.traj.rel.hmox1 <- plot_cell_trajectory(cells.filter, theta = 0, 
    show_branch_points = FALSE, color_by = "rel.exprs.hmox1", 
    cell_size = 0.5) + color.gradient + labs(x = "Component 1", 
    y = "Component 2", title = "Hmox1", color = "Relative\nexpression") + 
    theme_ms() + theme(legend.position = "right", axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(size = rel(1))) + scale_x_reverse()


fig6.e <- fig.traj.rel.hmox1
saveManuscriptPlot(fig6.e, 7, 3)
fig6.e

Fth1

Plot the expression of the Ferritin 1 heavy chain 1 gene

# relative expression
pData(cells.filter)$rel.exprs.fth1 <- monocle_relative_expression(cells.filter, 
    "Fth1")

fig.traj.rel.fth1 <- plot_cell_trajectory(cells.filter, theta = 0, 
    show_branch_points = FALSE, color_by = "rel.exprs.fth1", 
    cell_size = 0.5) + color.gradient + labs(x = "Component 1", 
    y = "Component 2", title = "Fth1", color = "Relative\nexpression") + 
    theme_ms() + theme(legend.position = "right", axis.text = element_text(size = rel(1)), 
    axis.title = element_text(face = "bold", size = rel(1)), 
    plot.title = element_text(size = rel(1))) + scale_x_reverse()


fig6.f <- fig.traj.rel.fth1
saveManuscriptPlot(fig6.f, 7, 3)
fig6.f

Figure 6

Create and save figure 6 in the manuscript.

p1 <- (fig6.a + labs(tag = "(a)"))/(ggplot() + theme_void() + 
    labs(tag = "(b)"))/(ggplot() + theme_void())/(ggplot() + 
    theme_void())/(ggplot() + theme_void())
p2 <- (fig6.f + labs(tag = "(c)"))/(fig6.e + labs(tag = "(d)"))/(fig6.c + 
    labs(tag = "(e)"))

figure6 <- p1 | p2
saveManuscriptPlot(figure6, 12, 10)

Session info

devtools::session_info()

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       Ubuntu 16.04.5 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_IE.UTF-8                 
 ctype    en_IE.UTF-8                 
 tz       Europe/Dublin               
 date     2020-11-05                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package        * version  date       lib source        
   abind            1.4-5    2016-07-21 [1] CRAN (R 4.0.2)
   AnnotationDbi    1.50.1   2020-06-29 [1] Bioconductor  
   apcluster        1.4.8    2019-08-21 [1] CRAN (R 4.0.2)
   ape              5.4      2020-06-03 [1] CRAN (R 4.0.2)
   ash              1.0-15   2015-09-01 [1] CRAN (R 4.0.2)
   askpass          1.1      2019-01-13 [1] CRAN (R 4.0.2)
   assertthat       0.2.1    2019-03-21 [1] CRAN (R 4.0.2)
   backports        1.1.8    2020-06-17 [1] CRAN (R 4.0.2)
   Biobase        * 2.48.0   2020-04-27 [1] Bioconductor  
   BiocFileCache    1.12.0   2020-04-27 [1] Bioconductor  
   BiocGenerics   * 0.34.0   2020-04-27 [1] Bioconductor  
   BiocManager      1.30.10  2019-11-16 [1] CRAN (R 4.0.2)
   biomaRt        * 2.44.1   2020-06-17 [1] Bioconductor  
   bit              1.1-15.2 2020-02-10 [1] CRAN (R 4.0.2)
   bit64            0.9-7    2017-05-08 [1] CRAN (R 4.0.2)
   blob             1.2.1    2020-01-20 [1] CRAN (R 4.0.2)
   bookdown         0.20     2020-06-23 [1] CRAN (R 4.0.2)
   broom            0.7.0    2020-07-09 [1] CRAN (R 4.0.2)
   callr            3.4.3    2020-03-28 [1] CRAN (R 4.0.2)
   car              3.0-8    2020-05-21 [1] CRAN (R 4.0.2)
   carData          3.0-4    2020-05-22 [1] CRAN (R 4.0.2)
   cellranger       1.1.0    2016-07-27 [1] CRAN (R 4.0.2)
   cli              2.0.2    2020-02-28 [1] CRAN (R 4.0.2)
 P cluster          2.1.0    2019-06-19 [?] CRAN (R 4.0.0)
 P codetools        0.2-16   2018-12-24 [?] CRAN (R 4.0.0)
   colorRamps     * 2.3      2012-10-29 [1] CRAN (R 4.0.2)
   colorspace       1.4-1    2019-03-18 [1] CRAN (R 4.0.2)
   combinat         0.0-8    2012-10-29 [1] CRAN (R 4.0.2)
   cowplot        * 1.0.0    2019-07-11 [1] CRAN (R 4.0.2)
   cpp11            0.2.1    2020-08-11 [1] CRAN (R 4.0.2)
   crayon           1.3.4    2017-09-16 [1] CRAN (R 4.0.2)
   crosstalk        1.1.0.1  2020-03-13 [1] CRAN (R 4.0.2)
   curl             4.3      2019-12-02 [1] CRAN (R 4.0.2)
   data.table     * 1.12.8   2019-12-09 [1] CRAN (R 4.0.2)
   DBI              1.1.0    2019-12-15 [1] CRAN (R 4.0.2)
   dbplyr           1.4.4    2020-05-27 [1] CRAN (R 4.0.2)
   DDRTree        * 0.1.5    2017-04-30 [1] CRAN (R 4.0.2)
   densityClust     0.3      2017-10-24 [1] CRAN (R 4.0.2)
   desc             1.2.0    2018-05-01 [1] CRAN (R 4.0.2)
   devtools         2.3.0    2020-04-10 [1] CRAN (R 4.0.2)
   digest           0.6.25   2020-02-23 [1] CRAN (R 4.0.2)
   docopt           0.7.1    2020-06-24 [1] CRAN (R 4.0.2)
   doParallel       1.0.15   2019-08-02 [1] CRAN (R 4.0.2)
   doRNG            1.8.2    2020-01-27 [1] CRAN (R 4.0.2)
   dplyr          * 1.0.0    2020-05-29 [1] CRAN (R 4.0.2)
   DT             * 0.14     2020-06-24 [1] CRAN (R 4.0.2)
   ellipsis         0.3.1    2020-05-15 [1] CRAN (R 4.0.2)
   evaluate         0.14     2019-05-28 [1] CRAN (R 4.0.2)
   extrafont        0.17     2014-12-08 [1] CRAN (R 4.0.2)
   extrafontdb      1.0      2012-06-11 [1] CRAN (R 4.0.2)
   fansi            0.4.1    2020-01-08 [1] CRAN (R 4.0.2)
   farver           2.0.3    2020-01-16 [1] CRAN (R 4.0.2)
   fastICA          1.2-2    2019-07-08 [1] CRAN (R 4.0.2)
   fastmap          1.0.1    2019-10-08 [1] CRAN (R 4.0.2)
   fitdistrplus     1.1-1    2020-05-19 [1] CRAN (R 4.0.2)
   FNN              1.1.3    2019-02-15 [1] CRAN (R 4.0.2)
   forcats          0.5.0    2020-03-01 [1] CRAN (R 4.0.2)
   foreach          1.5.0    2020-03-30 [1] CRAN (R 4.0.2)
 P foreign          0.8-80   2020-05-24 [?] CRAN (R 4.0.2)
   formatR          1.7      2019-06-11 [1] CRAN (R 4.0.2)
   fs               1.4.2    2020-06-30 [1] CRAN (R 4.0.2)
   future           1.18.0   2020-07-09 [1] CRAN (R 4.0.2)
   future.apply     1.6.0    2020-07-01 [1] CRAN (R 4.0.2)
   gdtools          0.2.2    2020-04-03 [1] CRAN (R 4.0.2)
   generics         0.0.2    2018-11-29 [1] CRAN (R 4.0.2)
   GGally         * 2.0.0    2020-06-06 [1] CRAN (R 4.0.2)
   ggalt          * 0.4.0    2017-02-15 [1] CRAN (R 4.0.2)
   gganimate      * 1.0.6    2020-07-08 [1] CRAN (R 4.0.2)
   ggExtra        * 0.9      2019-08-27 [1] CRAN (R 4.0.2)
   ggplot2        * 3.3.2    2020-06-19 [1] CRAN (R 4.0.2)
   ggplotify      * 0.0.5    2020-03-12 [1] CRAN (R 4.0.2)
   ggpubr         * 0.4.0    2020-06-27 [1] CRAN (R 4.0.2)
   ggrepel          0.8.2    2020-03-08 [1] CRAN (R 4.0.2)
   ggridges         0.5.2    2020-01-12 [1] CRAN (R 4.0.2)
   ggsignif         0.6.0    2019-08-08 [1] CRAN (R 4.0.2)
   globals          0.12.5   2019-12-07 [1] CRAN (R 4.0.2)
   glue             1.4.1    2020-05-13 [1] CRAN (R 4.0.2)
   gridExtra      * 2.3      2017-09-09 [1] CRAN (R 4.0.2)
   gridGraphics     0.5-0    2020-02-25 [1] CRAN (R 4.0.2)
   gtable           0.3.0    2019-03-25 [1] CRAN (R 4.0.2)
   haven            2.3.1    2020-06-01 [1] CRAN (R 4.0.2)
   hms              0.5.3    2020-01-08 [1] CRAN (R 4.0.2)
   HSMMSingleCell   1.8.0    2020-05-07 [1] Bioconductor  
   htmltools        0.5.0    2020-06-16 [1] CRAN (R 4.0.2)
   htmlwidgets      1.5.1    2019-10-08 [1] CRAN (R 4.0.2)
   httpuv           1.5.4    2020-06-06 [1] CRAN (R 4.0.2)
   httr             1.4.1    2019-08-05 [1] CRAN (R 4.0.2)
   ica              1.0-2    2018-05-24 [1] CRAN (R 4.0.2)
   igraph           1.2.5    2020-03-19 [1] CRAN (R 4.0.2)
   IRanges          2.22.2   2020-05-21 [1] Bioconductor  
   irlba          * 2.3.3    2019-02-05 [1] CRAN (R 4.0.2)
   iterators        1.0.12   2019-07-26 [1] CRAN (R 4.0.2)
   jcolors        * 0.0.4    2019-05-22 [1] CRAN (R 4.0.2)
   jsonlite         1.7.0    2020-06-25 [1] CRAN (R 4.0.2)
 P KernSmooth       2.23-17  2020-04-26 [?] CRAN (R 4.0.0)
   knitr          * 1.29     2020-06-23 [1] CRAN (R 4.0.2)
   labeling         0.3      2014-08-23 [1] CRAN (R 4.0.2)
   later            1.1.0.1  2020-06-05 [1] CRAN (R 4.0.2)
 P lattice          0.20-41  2020-04-02 [?] CRAN (R 4.0.0)
   lazyeval         0.2.2    2019-03-15 [1] CRAN (R 4.0.2)
   leiden           0.3.3    2020-02-04 [1] CRAN (R 4.0.2)
   lifecycle        0.2.0    2020-03-06 [1] CRAN (R 4.0.2)
   limma            3.44.3   2020-06-12 [1] Bioconductor  
   listenv          0.8.0    2019-12-05 [1] CRAN (R 4.0.2)
   lmtest           0.9-37   2019-04-30 [1] CRAN (R 4.0.2)
   magick           2.4.0    2020-06-23 [1] CRAN (R 4.0.2)
   magrittr         1.5      2014-11-22 [1] CRAN (R 4.0.2)
   maps             3.3.0    2018-04-03 [1] CRAN (R 4.0.2)
 P MASS             7.3-51.6 2020-04-26 [?] CRAN (R 4.0.0)
 P Matrix         * 1.2-18   2019-11-27 [?] CRAN (R 4.0.0)
   matrixStats      0.56.0   2020-03-13 [1] CRAN (R 4.0.2)
   memoise          1.1.0    2017-04-21 [1] CRAN (R 4.0.2)
   mime             0.9      2020-02-04 [1] CRAN (R 4.0.2)
   miniUI           0.1.1.1  2018-05-18 [1] CRAN (R 4.0.2)
   monocle        * 2.16.0   2020-04-27 [1] Bioconductor  
   munsell          0.5.0    2018-06-12 [1] CRAN (R 4.0.2)
 P nlme             3.1-148  2020-05-24 [?] CRAN (R 4.0.2)
   openssl          1.4.2    2020-06-27 [1] CRAN (R 4.0.2)
   openxlsx         4.1.5    2020-05-06 [1] CRAN (R 4.0.2)
   packrat          0.5.0    2018-11-14 [1] CRAN (R 4.0.2)
   patchwork      * 1.0.1    2020-06-22 [1] CRAN (R 4.0.2)
   pbapply          1.4-2    2019-08-31 [1] CRAN (R 4.0.2)
   pheatmap         1.0.12   2019-01-04 [1] CRAN (R 4.0.2)
   pillar           1.4.6    2020-07-10 [1] CRAN (R 4.0.2)
   pkgbuild         1.0.8    2020-05-07 [1] CRAN (R 4.0.2)
   pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.0.2)
   pkgload          1.1.0    2020-05-29 [1] CRAN (R 4.0.2)
   plotly           4.9.2.1  2020-04-04 [1] CRAN (R 4.0.2)
   plyr           * 1.8.6    2020-03-03 [1] CRAN (R 4.0.2)
   png              0.1-7    2013-12-03 [1] CRAN (R 4.0.2)
   prettyunits      1.1.1    2020-01-24 [1] CRAN (R 4.0.2)
   processx         3.4.3    2020-07-05 [1] CRAN (R 4.0.2)
   progress         1.2.2    2019-05-16 [1] CRAN (R 4.0.2)
   proj4            1.0-10   2020-03-02 [1] CRAN (R 4.0.2)
   promises         1.1.1    2020-06-09 [1] CRAN (R 4.0.2)
   proxy            0.4-24   2020-04-25 [1] CRAN (R 4.0.2)
   ps               1.3.3    2020-05-08 [1] CRAN (R 4.0.2)
   purrr            0.3.4    2020-04-17 [1] CRAN (R 4.0.2)
   qlcMatrix        0.9.7    2018-04-20 [1] CRAN (R 4.0.2)
   R6               2.4.1    2019-11-12 [1] CRAN (R 4.0.2)
   RANN             2.6.1    2019-01-08 [1] CRAN (R 4.0.2)
   rappdirs         0.3.1    2016-03-28 [1] CRAN (R 4.0.2)
   RColorBrewer     1.1-2    2014-12-07 [1] CRAN (R 4.0.2)
   Rcpp             1.0.5    2020-07-06 [1] CRAN (R 4.0.2)
   RcppAnnoy        0.0.16   2020-03-08 [1] CRAN (R 4.0.2)
   readr            1.3.1    2018-12-21 [1] CRAN (R 4.0.2)
   readxl           1.3.1    2019-03-13 [1] CRAN (R 4.0.2)
   remotes          2.1.1    2020-02-15 [1] CRAN (R 4.0.2)
   reshape        * 0.8.8    2018-10-23 [1] CRAN (R 4.0.2)
   reshape2         1.4.4    2020-04-09 [1] CRAN (R 4.0.2)
   reticulate       1.16     2020-05-27 [1] CRAN (R 4.0.2)
   rio              0.5.16   2018-11-26 [1] CRAN (R 4.0.2)
   rlang            0.4.7    2020-07-09 [1] CRAN (R 4.0.2)
   rmarkdown        2.3      2020-06-18 [1] CRAN (R 4.0.2)
   rmdformats     * 0.3.7    2020-03-11 [1] CRAN (R 4.0.2)
   rngtools         1.5      2020-01-23 [1] CRAN (R 4.0.2)
   ROCR             1.0-11   2020-05-02 [1] CRAN (R 4.0.2)
   rprojroot        1.3-2    2018-01-03 [1] CRAN (R 4.0.2)
   RSQLite          2.2.0    2020-01-07 [1] CRAN (R 4.0.2)
   rstatix          0.6.0    2020-06-18 [1] CRAN (R 4.0.2)
   rsvd             1.0.3    2020-02-17 [1] CRAN (R 4.0.2)
   Rtsne            0.15     2018-11-10 [1] CRAN (R 4.0.2)
   Rttf2pt1         1.3.8    2020-01-10 [1] CRAN (R 4.0.2)
   rvcheck          0.1.8    2020-03-01 [1] CRAN (R 4.0.2)
   S4Vectors        0.26.1   2020-05-16 [1] Bioconductor  
   scales           1.1.1    2020-05-11 [1] CRAN (R 4.0.2)
   sctransform      0.2.1    2019-12-17 [1] CRAN (R 4.0.2)
   sessioninfo      1.1.1    2018-11-05 [1] CRAN (R 4.0.2)
   Seurat         * 3.1.5    2020-04-16 [1] CRAN (R 4.0.2)
   shiny            1.5.0    2020-06-23 [1] CRAN (R 4.0.2)
   slam             0.1-47   2019-12-21 [1] CRAN (R 4.0.2)
   sparsesvd        0.2      2019-07-15 [1] CRAN (R 4.0.2)
   stringi          1.4.6    2020-02-17 [1] CRAN (R 4.0.2)
   stringr        * 1.4.0    2019-02-10 [1] CRAN (R 4.0.2)
 P survival         3.2-3    2020-06-13 [?] CRAN (R 4.0.1)
   svglite          1.2.3.2  2020-07-07 [1] CRAN (R 4.0.2)
   systemfonts      0.3.0    2020-09-01 [1] CRAN (R 4.0.2)
   testthat         2.3.2    2020-03-02 [1] CRAN (R 4.0.2)
   tibble           3.0.3    2020-07-10 [1] CRAN (R 4.0.2)
   tictoc           1.0      2014-06-17 [1] CRAN (R 4.0.2)
   tidyr            1.1.0    2020-05-20 [1] CRAN (R 4.0.2)
   tidyselect       1.1.0    2020-05-11 [1] CRAN (R 4.0.2)
   tsne             0.1-3    2016-07-15 [1] CRAN (R 4.0.2)
   tweenr           1.0.1    2018-12-14 [1] CRAN (R 4.0.2)
   usethis          1.6.1    2020-04-29 [1] CRAN (R 4.0.2)
   uwot             0.1.8    2020-03-16 [1] CRAN (R 4.0.2)
   vctrs            0.3.1    2020-06-05 [1] CRAN (R 4.0.2)
   VGAM           * 1.1-3    2020-04-28 [1] CRAN (R 4.0.2)
   viridis        * 0.5.1    2018-03-29 [1] CRAN (R 4.0.2)
   viridisLite    * 0.3.0    2018-02-01 [1] CRAN (R 4.0.2)
   WebGestaltR    * 0.4.3    2020-01-16 [1] CRAN (R 4.0.2)
   whisker          0.4      2019-08-28 [1] CRAN (R 4.0.2)
   withr            2.2.0    2020-04-20 [1] CRAN (R 4.0.2)
   writexl        * 1.3      2020-05-05 [1] CRAN (R 4.0.2)
   xfun             0.15     2020-06-21 [1] CRAN (R 4.0.2)
   XML              3.99-0.4 2020-07-05 [1] CRAN (R 4.0.2)
   xtable           1.8-4    2019-04-21 [1] CRAN (R 4.0.2)
   yaml             2.2.1    2020-02-01 [1] CRAN (R 4.0.2)
   zip              2.0.4    2019-09-01 [1] CRAN (R 4.0.2)
   zoo              1.8-8    2020-05-02 [1] CRAN (R 4.0.2)

[1] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib/x86_64-pc-linux-gnu/4.0.2
[2] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib-ext/x86_64-pc-linux-gnu/4.0.2
[3] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib-R/x86_64-pc-linux-gnu/4.0.2

 P ── Loaded and on-disk path mismatch.

CHOK1 DP12 cell line scRNASeq

Clarke Lab. NIBRT

Introduction

Raw Data Availibility

Prepare for Analysis

Load packages

Set seed

Plotting setup

Calculate relative expression

Import raw data

scRNASeq

SBG file import

Create Monocle object

bulk RNASeq

Replicate comparison

scRNASeq

bulk RNASeq

scRNASeq v bulkRNASeq

Map gene IDs

Correlation between single cell and bulk RNASeq data

scRNASeq data preprocessing

Filter by Total UMIs

Filter by mtDNA reads

Remove poor quality cells

Filter Genes by expression

Figure 2

Gene annotation

Cell cycle scoring

Create a Seurat object

Preprocess Seurat

Specify cell cycle genes

Calculate the cell cycle scores

Add cell cycle scores to CDS

tSNE Analysis

Identify variable genes

Select PCs

t-Stocastic neighbour embedding

Identify tSNE clusters

Effect of CC correction

tSNE density

tSNE clusters

Figure 2

Differential expression

Anti-IL8 mAb gene expression

bulk RNASeq

scRNASeq

HC & LC violin

HC tSNE

HC cluster expression

LC tSNE

LC cluster expression

Figure 3

Trajectory analysis

Construct trajectory

Median relative HC & LC in each cell state

HC state expression

HC trajectory

LC state expression

LC trajectory

Figure 5

Expression of CHO genes along trajectory

Set the root of the Monocle trajectory

Pseudotime differential expression

Cluster analysis

Enrichment analysis

Enriched GO BPs

Cluster 1

Cluster 2

CHO cell gene expression

Psap

Hmox1

Fth1

Figure 6

Session info