April 15, 2019  |  CITE-seq, Education  |  By  |  0 Comments

How-to: Analyze CITE-seq Data in Cytobank

CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes by sequencing [1], is one of the latest innovations for studying single-cell biology. It enables researchers to simultaneously capture RNA and surface protein expression on the same cells with next generation sequencing technology. Scientists can correlate the two data types and identify biomarkers and better characterize cell phenotypes [2]. The normalization and comparison of the two data types, however, do present data analysis challenges. At Cytobank, we want to make CITE-seq data analysis easier and help you discover more.

Here’s an approach that utilizes the Cytobank platform and should help you effectively analyze your multi-omic CITE-seq data.

Overview of the CITE-seq workflow in Cytobank

Pipeline overview

Figure 1. A simple, three-step workflow to analyze CITE-seq data


This three-step workflow is a general approach to analyze CITE-seq data, integrating a third-party software, a Cytobank-developed R script, and the existing Cytobank platform.

  1. In step 1, a sequencing mapper maps raw gene sequencing reads to a reference genome (such as GRCh38 or GRCh37) and informs gene expression based on the location of the mapped sequencing reads. We recommend using Cell Ranger to align data from 10X Genomics. For alignment of other types of single-cell RNA-seq data, you can use STAR. CITE-seq-Count can then process the antibody reads contained in the CITE-seq data and produce the raw antibody expression count matrix.
  2. Step 2 will help you normalize and filter the raw gene and antibody expression data using an R script we developed. The script will automatically apply several data QC filters to remove noise from the raw data and normalize the filtered data to correct for the sequencing depth bias. The script will also combine gene and antibody expression and output the merged expression file per sample.
  3. After step 1 & 2, upload the processed data into Cytobank via DROP and start to run Cytobank machine learning algorithms (viSNE, FlowSOM, and CITRUS) to unpack the enriched information encoded in your CITE-seq data. See below for more details.

Cytobank for biological discovery through CITE-seq data

Explore Underlying Data Patterns with CITE-seq + viSNE

For a quick view of your CITE-seq data, you can run viSNE with the filtered and normalized single-cell RNA-Seq gene expression data. Using an example data set downloaded from the Satija Lab, we found that viSNE nicely separates cell events based on the CITE-seq gene expression data (Figure 2).

Screen Shot 2019-04-11 at 10.36.54 AM

Figure 2. CITE-seq data renders well-defined populations. Rather than using all the genes, we clustered cells with the top 700 most variable genes to effectively uncover the underlying data pattern of a cord blood mononuclear cells sample data.


A Closer Look at Antibody Expression on Gene Clusters

To confirm the gene expression clustering result, you can overlay the antibody expression on the gene clusters (Figure 3).

Screen Shot 2019-04-11 at 10.43.58 AM  

Figure 3. Red and yellow dots are events that have high expression of the surface protein indicated on the tSNE plot header.


In Cytobank, you also can visualize surface protein expression across the identified cell populations with a heatmap (Figure 4).

Screen Shot 2019-04-11 at 10.44.24 AM

Figure 4. Heatmap shows the expression of multiple surface proteins per cell type.


You also can look at the expression of surface proteins and corresponding protein-coding genes together (Figure 5).

Screen Shot 2019-04-11 at 10.45.01 AM

Figure 5. CD3D and CD3E are protein-coding genes of CD3. FCGR3A is the coding gene of CD16. CD8 has CD8B and CD8A as its corresponding protein-coding genes.
The heatmap demonstrates that the gene expression corresponds with protein expression across the identified cell population.

You can examine individual surface proteins to find out how well the protein expression correlates with the gene expression data by creating an overlaid dot plot in the Cytobank Working Illustration (Figure 6). In this example, expression of CD3 is positively correlated with the expression of the CD3E gene even though there is a cluster of cells that have low gene expression and high antibody expression.

Screen Shot 2019-04-11 at 10.45.26 AM
Figure 6. The x-axis of the above dot plot represents protein expression.

The y-axis of the plot is gene expression.

CITE-seq is a powerful technology that allows you to simultaneously look at intracellular gene expression and extracellular protein marker expression at a single-cell level of resolution. This workflow allows you to conduct an integrated analysis on the Cytobank platform. This is the first step for Cytobank in this area, and we are interested in developing additional workflows for analyzing CITE-seq data and appreciate your feedback regarding what would be helpful.

Want to try it for yourself?

  • Start generating the protein data you need in your single-cell RNA-seq experiment with BioLegend TotalSeq antibodies.
  • Sequencing TCR or BCRs? BioLegend has you covered as well with their TotalSeq™-C reagents. Learn more about BioLegend reagents and their oligo-antibody conjugates applications.
  • If you’re not already using Cytobank, sign up for a free, full-featured 30 day trial of Cytobank Premium.
  • The pre-processed CITE-seq demo data can be obtained on Cytobank Premium and Enterprise. Create a new account to access a free 30 day trial on Cytobank Premium, and test out our machine learning algorithms with the CITE-seq demo datasets or your own data.
  • CITE-seq analysis utilizing DROP is available to both Premium and Enterprise-level license users. Currently Premium users have a limit of 100 parameters, and Enterprise have a limit of 818 parameters that they can import with DROP.  Cytobank is working to increase the number of parameters.
  • Check out our recent webinar featuring Dr. Adeeb Rahman from Mt. Sinai’s Human Immune Monitoring Center.
  • Contact us at info@cytobank.org if you want to obtain a copy of the Cytobank CITE-seq R script and the raw expression data of the demo dataset.
  • Contact our Scientific Services team for help with analyzing your unmapped CITE-seq data.

If you encounter any issues when following our analysis workflow, please feel free to reach us at support@cytobank.org



[1] Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H et al (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14(9):865–868

[2] Ortega MA, Poirion O, Zhu X, Huang S, Wolfgruber TK, Sebra R, Garmire LX. Using single-cell multiple omics approaches to resolve tumor heterogeneity. Clin Transl Med 2017;6:46.