May 10, 2017  |  API, Cytobank, User Stories  |  By  |  0 Comments

Two Disciplines, One Analysis Pipeline:
Leveraging the Cytobank API to Find Biological Insights

This week we interview scientists from two different disciplines to hear how they worked together and used the Cytobank API to develop an automated pipeline to find biomarkers for several chronic Graft-versus-host disease (GVHD) outcomes.  We asked bioinformatician and Cytobank consultant Ashu Sethi to share with us her experience using the API, and Cytobank’s own Hannah Polikowsky, on behalf of Vanderbilt, how this pipeline improved her analysis workflow.


Current understanding of cGVHD pathophysiology is limited; acute GVHD (aGVHD) manifestations may coexist; and scoring systems for diagnosis, classification, staging, and response are complex and lack quantitative metrics.  This collaboration built an automated pipeline to enable easier investigation of cellular signatures that correlate with various clinical outcomes.

Fig 1. Automated Analysis Pipeline.  Clinical outcomes from 24 cGVHD patients pre-extracorporeal photopheresis treatment (ECP) were annotated and recorded in a RedCap database; corresponding mass cytometry data were stored in Cytobank. A) Clinical data including history of aGVHD, prior exposure to myeloablative conditioning (MAC), prior total body irradiation treatment (TBI), development of skin sclerosis, 2-month response to ECP, and day of mass cytometry data acquisition were retrieved from Vanderbilt RedCap database. B) CITRUS runs were batched via the Cytobank API to discover signatures correlated to different clinical outcome groups. C) Results from a CITRUS run using SAM to identify markers associated with skin sclerosis found cluster 119509 (and its child clusters) to be more abundant in the patient group that did not develop skin sclerosis. Histogram output from CITRUS described marker expression in this cluster compared to other cells.


Ashu Sethi:

Describe your background in bioinformatics.  AshuS

My journey to become a bioinformatician actually began at Accenture, where I worked as a software engineer  helping our large pharmaceutical clients mine and manage their large article databases.  Working with pharma clients piqued my interest for working with biological data types and I jumped at the opportunity to pursue a masters in Bioinformatics at San Diego State University when relocating to California.  During my masters studies, I built a pipeline, which incorporated tools from the Broad Institute GATK toolkit, to analyze genomic data from 400 samples of microbacteria with the goal of finding mutations that correlate with resistance to tuberculosis (TB) treatment.

As a bioinformatician, what types of problems have you been tasked with investigating?
What data types have you explored?

In a recent experience, I worked at Gensignia alongside a team of bioinformaticians to develop a multi-marker diagnostic test that combines the measure of cell-free miRNA in plasma with a computational algorithm to improve early detection of lung cancer.  In an earlier role at La Jolla Institute for Allergy and Immunology, I worked with a variety of data types such as RNAseq and ChIPseq data with the goal of understanding and exploring different human diseases.  One central role for me as a bioinformation has been to build computational pipelines for biomarker discovery. This project with Cytobank was unique because I hadn’t previously worked with mass cytometry data; however, the goal of building a pipeline for biomarker discovery was similar to previous projects.

What types of tools/methods have you used to mine data?
How did this help you solve the research problems you and your team were investigating?

Tools and methods to mine the data have depended on the data type.  For example, I have built databases to store data for some projects; whereas, other projects did not require this step.  The key to each project is understanding and defining the project goals before deciding which tools will best solve the given problem (such as understanding the mutations in mycobacteria that cause resistance to TB drugs).  Also the time one may take to write a software program varies with each project goal. Sometimes it may take just a few minutes if the underlying algorithm is sufficiently clear. On other occasions, it may take weeks or days to design the algorithm and write the software program. Furthermore, building these pipelines permits more flexibility in adjusting parameters, which can save time and help the team find results faster.

Describe your experience using the Cytobank API.   What resources helped you get started?

Overall, using the Cytobank API was a positive experience.  Resources to get started were extremely well documented.  Honestly, it would be great if all tools were this well-documented!

What did you like about using the Cytobank API in this project?  What are some key advantages to kicking off analyses using the API?

The Cytobank API R library has been very helpful in this project. Using the R package permitted easier flexibility in adjusting parameters.  A few lines of R code can kick-off multiple analyses simultaneously, which saves time.  Additionally, defining the logic upfront is a requirement to programming, forcing experiment planning, which ultimately reduces potential analysis challenges.  Furthermore, it is straightforward to check a script during development, which enables reduction in experimental error.  

How do you see the field of bioinformatics augmenting decision making in the clinic and/or improving understanding of human disease?

I believe strongly in bioinformatics and its ability to provide insights in human health. Bioinformatics is a tool powered to help us diagnose disease earlier while curative treatment options are still available by mining large datasets for insights.  If we can diagnose disease earlier, we can improve survival rates. I’m very much looking forward to discoveries in the field that continue to advance personalized medicine.


Hannah Polikowsky:HannahP

Describe your role at Vanderbilt University.

My role as Program Manager of Tissue Research is developing and improving efficiencies for collecting, analyzing and obtaining quality high parameter single cell data for clinical translational applications.  I am also Managing Director of the recently developed Cancer & Immunology Core, which is a fee-for-service mass cytometry core that applies well-established protocols and methods to help investigators use mass cytometry to further their research endeavors.  Ultimately, my role is to help clinicians and scientists systematically use mass cytometry and associated analyses to investigate the immune milieu in varying contexts (e.g. after various perturbations such as drug treatment or cell culture).

Why was it of interest to develop this automated pipeline?

cGVHD is a complex disease.  cGVHD manifestations involve multiple organs (skin, liver, mouth, GI tract, eye, lung, joint/fascia, and genital tract), all of which must be evaluated and scored for diagnosis, classification, staging, and response to therapy.  It is unclear which clinical manifestations or ‘features’ provide the most value for clinical assessment.  As such, developing an automated analysis pipeline that could use multiple clinical features to find cellular correlates enabled us to mine more data for insights.

What did you like about using the Cytobank API in this project?

Data analysis is very time consuming.  For any given experiment, at least 50% of effort is devoted to data analysis, which is challenging to communicate to collaborators.  Tools to speed up the analysis process and find results faster are welcome.  Leveraging the Cytobank API to help us kick-off multiple CITRUS analyses simultaneously gave us time, that would have otherwise been spent meticulously setting-up each CITRUS run, to explore and understand the results.

What’s next?  How do you plan on continuing to use this pipeline?

Data analysis is a continuous process.  As new findings appear in the literature and new data are collected, there are additional insights to investigate and subsequently additional data to analyze. We plan on using this pipeline to help facilitate data analysis in other projects.  This pipeline will help us mine large data sets, such as the now recruiting longitudinal and multi-center phase II ibrutinib study (for patients post donor stem cell transplant due to treatment unresponsive lymphoma).

How do you see computational analyses augmenting decision making in the clinic and/or improving understanding of human disease?

Advances in biotechnology, such as mass cytometry or scRNAseq, enable us to measure more and more parameters at the single cell level.  The amount of data that can be captured is exponentially increasing, meaning that insights that can be mined from these data are also exponentially increasing, but only if we develop computational pipelines to do so.  Computational analyses can help us explore the vast expanse of human disease and lead to discoveries in arenas where knowledge is limited, such cGVHD pathophysiology.


Tips for Getting Started Using the Cytobank API

To accelerate your workflows and find your own biological insights by leveraging the API: