June 8, 2017  |  Cytobank  |  By  |  0 Comments

Maximize Your Insight with Machine Learning Algorithms
and NanoString® 3D Biology™ Technology

Courtesy: NanoString
Image courtesy: NanoString

Scientists across many therapeutic areas are striving to solve complex biological problems by measuring multiple analytes, thinking that together these data will power deeper discovery. However, analyzing these data independently is less effective and more time-consuming than analyzing them together. Cytobank’s new DROP feature allows scientists to apply machine learning algorithms to many data types, including these datasets, and to develop integrated insights quickly.

With bulk data, unsupervised machine learning algorithms on Cytobank can help you identify clinically relevant groups of samples by combining information from all of your markers at the same time. We’ll illustrate that here with Nanostring® 3D Biology technology, which simultaneously analyzes up to 800 SNVs, RNA, and proteins and phospho-proteins from the same sample. In this example, the assays used profile 104 SNV and small InDels, 192 RNA, and 28 total and phospho-proteins in 144 samples.

Applying viSNE to the protein and RNA markers shows us that there are groups of samples that have similar expression across the protein and RNA markers. Coloring by expression of each marker illustrates why the samples are segregated this way. For example, we see that only one of the large “islands” contains samples that express RAF1 mRNA, whereas another contains samples that express MAPK1 mRNA (Figure 1, top). We can see similar patterns with protein expression, where large islands on the viSNE map contain samples with differential expression of p53 and phospho-ERK. We also see differential expression of markers within the samples in the large islands, for example phospho-GSK is expressed only in the samples that separate to the inside of each island, and MAPK3 mRNA is expressed only in some of the samples in one of the islands (Figure 1, bottom). The SNV data revealed that these samples are a mix of homozygous WT for BRAF, and heterozygous and homozygous for BRAF V600E. Overlaying this information on the viSNE map supports the conclusion that these mutations drive the RNA and protein expression differences that segregate these samples.


Figure 1. Similar samples are grouped on viSNE maps colored by RNA expression, protein expression, and genotype. Each point on a viSNE map is a single sample. The points are grouped together based on similarity in high-dimensional space across all of the markers.

To define groups of the samples that we could further characterize and analyze, we applied the hierarchical clustering method SPADE to the viSNE map coordinates (Figure 2). The resulting groups can then be overlaid on the viSNE map and the marker expression can be viewed on the SPADE tree to visualize how these groups capture the patterns of expression we explored above.

Figure 2. Defining groups of samples with SPADE.
Figure 2. Defining groups of samples with SPADE.

After defining groups of samples based on combined 3D Biology marker expression, we used tools in Cytobank’s platform to see how these groups correlate with known characteristics of the samples. The samples came from 3 different melanoma cell lines that were treated with vemurafenib, DMSO or Calyculin A for varying lengths of time.

Overlaying these variables on the viSNE map and inspecting Cytobank’s summary tables shows us that these variables correlate with the groups we defined based on marker expression (Figure 3). A quick verification of marker expression patterns with heatmaps and histogram overlays stratified by these known characteristics confirms that the expression signatures that drive the unsupervised sample groups are the same (Figure 4).

Figure 3. Understanding correlations between known sample characteristics and high-dimensional 3D marker expression signatures.
Figure 3. Understanding correlations between known sample characteristics and high-dimensional 3D marker expression signatures.
Figure 4. Summarizing marker expression patterns by known sample characteristics.
Figure 4. Summarizing marker expression patterns by known sample characteristics.

This example illustrates the power of applying unsupervised machine learning algorithms on the Cytobank platform to NanoString’s 3D Biology data in order to efficiently extract relevant information and understand the similarities and differences between samples based on combined expression of hundreds of multi-analyte markers.

Want to try it for yourself?