January 11, 2017  |  Announcements, API, Cytobank  |  By  |  0 Comments

New API Enhancements: 4 Sample Workflows for Advanced Analysis

Cytobank has released version 5.5.0 with enhancements to our API, enabling more flexible and functional workflows that leverage Cytobank’s secure infrastructure and cloud-based compute and storage. Among the enhancements are new API endpoints for viSNE, CITRUS, SPADE, sharing, Sample Tags, and compensation.

At Cytobank, we’ve seen emerging needs among scientists and research organizations the world over that are driving the development of our API. These needs often demand functionality beyond that given by basic browser-based analysis sessions, with themes including connecting the Cytobank platform directly to other information systems, allowing batch processing and chaining of native functionality, and supporting pull and push of data, configurations, statistics, and attachments from Cytobank to support external pipelines, algorithms, and studies.

api-2_1-Fig-1a
Visual overview of the high level themes driving development of the Cytobank API.

In this article, we present a variety of workflows highlighting how the Cytobank API can increase the efficiency and velocity of research efforts. Illustrated workflows include:

  • Pulling clinical data and programmatically applying Sample Tags in Cytobank, as well as batching numerous CITRUS runs based on these data
  • Automatically applying bubbles to SPADE trees
  • Broad-scope analysis across different analyzed datasets to gather information for comparing analysis methods and training a cell identity classifier
  • Running PCA on data in Cytobank

Workflow 1:
Fetch Clinical Research Data, Apply Sample Tags, and Batch CITRUS Runs

Researchers are often interested in the association between single cell analysis results stored in Cytobank and multiple clinical variables, including those that weren’t the primary endpoints of their study. Key information on these clinical variables is usually stored separately from the single cell analysis results in Cytobank, and the cross-referencing of this information can be burdensome. This burden increases when consolidating analytical results and findings in other information systems for compliance.

Even once the clinical data and specimen data have been combined, analysis itself provides another set of challenges. In the case of a dataset involving many participants with a variety of clinical variables and outcomes, combing through the data to find signatures correlated to each clinical variable can be tedious and time consuming.

Using the Cytobank API, you can seamlessly transfer structured clinical information to Cytobank as Sample Tags to provide context for analysis. Couple this simultaneously with the automatic batching of numerous CITRUS analyses to thoroughly evaluate every permutation of clinically-related groupings of samples and quickly discover any significant correlations within the data.

Clinical data are pulled from an information system and then applied to specimen data in Cytobank as Sample Tags via the API. Local processing happens using a generic script to identify groupings of samples within clinical attributes. The groupings for each clinical attribute are then used to create batches of CITRUS runs using samples on Cytobank. Cytobank compute will scale to process the queue and leave results ready to interpret for the researcher.
Clinical data are pulled from an information system and then applied to specimen data in Cytobank as Sample Tags via the API. Local processing happens using a generic script to identify groupings of samples within clinical attributes. The groupings for each clinical attribute are then used to create batches of CITRUS runs using samples on Cytobank. Cytobank compute will scale to process the queue and leave results ready to interpret for the researcher.

 

Workflow 2: Automatically Bubble a SPADE Tree

SPADE is a useful way to analyze and visualize data; however, one drawback of the method is that each SPADE tree must be categorized into biological populations from scratch per run of the algorithm in a process termed “bubbling.” This presents a bottleneck for analysis. Automating the categorization of SPADE clusters into phenotypic groups would greatly accelerate the interpretation of SPADE results.

How to best categorize clustered or single cell data into discrete populations is an oft-debated topic. However, for those users that have defined their own criteria for assigning cellular identity, Cytobank offers an API endpoint for setting node-to-bubble relationships. A script can be developed easily to read the lightweight cluster metadata for a SPADE run, establish a biological identity for each cluster, and then post these relationships to Cytobank as bubbles. Researchers can then interact with the categorizations and further explore the dataset without the time burden of bubbling.

api-spade-set-bubbles
The Cytobank API is used to extract lightweight SPADE metadata. Local processing is done to establish biological identity for each cluster. These categorizations are then sent to Cytobank as bubbles via the API. The criteria used to assign cell identity will be up to the researcher, and will use expression attributes of the clusters and perhaps other available metadata such as position in the SPADE tree.

 

Workflow 3: Meta-Analysis of Analyzed Datasets

As the number of analyses executed by the many researchers using Cytobank grows over time, a rich repository of information and knowledge accumulates. Using search tools on Cytobank, years of existing research can be polled for data relevant to current questions, be it for a particular biomarker, disease area, therapeutic compound, or any scientific variable. With operations via the Cytobank API, more sophisticated analyses that combine data from multiple experiments can be orchestrated to ask deeper questions and extract value from large swaths of the centralized, structured data on Cytobank.

One example of a meta-analysis you might want perform using the Cytobank API would be to train a classifier that can assign population identities to single cell data. The large number of data sets that have already been analyzed and labeled with biological context by human experts on Cytobank can be used as training and validation sets. Regardless of the method used to categorize cells into populations (sequential gating, SPADE, viSNE, CITRUS, etc.), an identical core set of statistical metadata to describe these populations can be obtained.

Using the API, all of this summary classification data from a variety of experiments can be extracted simply into a standard format. This information can then be combined and mined for patterns that inform the automatic classification of future single cell data, controlling for the categorization method that was used. Alternatively, it can be used to make comparisons between categorization methods.

Populations analyzed and labeled by human experts come in many forms from different analysis methods. The Cytobank API can be used to extract this information in a structured way on a large scale in order to gain deeper insights about the biological categorization of single cell data across analysis methods and data sets.
Populations analyzed and labeled by human experts come in many forms from different analysis methods. The Cytobank API can be used to extract this information in a structured way on a large scale in order to gain deeper insights about the biological categorization of single cell data across analysis methods and data sets.

Workflow 4: Run PCA on Data in Cytobank

The number of useful analytical methods applicable to cytometry data has grown rapidly in recent years. Cytobank doesn’t natively support many of the methods being developed in the field or currently allow custom scripts to be executed inside our platform. However, we are developing a middle ground where a researcher can install a software package such as the R software environment (a simple process akin to installing any computer application), and Cytobank will provide all the more challenging scaffolding to connect data on Cytobank to published algorithms with a single command. The experience will be as similar as possible to setting up a native algorithm within Cytobank and free the researcher from grappling with the complexities of using the underlying analytical packages.

api-Fig-5-pca-cytobank
A single command is entered to initialize a PCA run. Information from Cytobank is fetched via the API and presented for graphical configuration of settings. The run is executed and the results automatically ported to Cytobank for visualization and analysis. This functionality is currently in development. To stay informed about progress or to provide input, get in touch with Cytobank Support.

 

Get Started Using the Cytobank API

Our API opens up the Cytobank platform to workflows and creative use far beyond the native functionality of Cytobank. The resources below will help you get started:

 

About Cytobank

Modern efforts in basic and clinical research routinely bring together teams of many people with different skill sets investigating biology across different data types and geographies. Beyond the immediate challenges of doing high quality science (producing and analyzing data) exists an equally challenging problem of coordination, logistics, and centralization such that research programs can produce results in a timely and reproducible fashion while being assured of security, privacy, and storage fidelity of data and results. This latter problem is often approached without the best system for the job and can cause diminished productivity from the resulting overhead.

The Cytobank platform was designed with these needs in mind to make modern research programs efficient and productive. Our integrated suite of tools handles basic analysis, advanced algorithms and visualization, organization and searchable archiving, and permissioned sharing. A secured and redundant cloud infrastructure scalably powers the platform and allows access from anywhere in the world on a web-enabled device.