As you’ve probably heard from us before, the era of Big Data is here and we want to make sure you’re prepared to face the unique challenges it brings. However, we at Cytobank aren’t the only ones thinking about the implications of managing all the data being generated by high dimensional flow and mass cytometry; +Ryan Duggan via the Cytometry Google community recently hosted a Cytometry Hangout on the topic of Data Management and had a guest panel consisting of our very own Nikesh Kotecha, Kevin Krouse from labkey.org, and Wade Rogers from the University of Pennsylvania. If you didn’t happen to catch it live, below is a quick summary of the panel.
Data management is, especially in the context of cytometry, an electronic system for managing data generated by instruments, the metadata, and the analysis tools related to the same. Systems can range from a static shared storage system like DropBox, all the way to a LIMS (Laboratory Information Management System) or a platform like Cytobank which adds analysis and annotation on top of data storage and collaboration.
Regardless of which platform or solution, the panel and Ryan agreed that one of the most essential features of a data management platform was the ability to annotate the data/metadata collected. One of the most prominent forays into cytometry annotation is the MIFlowCyt standard recently recommended by ISAC, which outlines the minimum information required to report the experimental details of flow cytometry experiments. However, many other ways exist to annotate your flow data, from naming channels and samples in the acquisition software, to tagging files in Cytobank, with the goal of eventually being able to search across completely different assay results using natural language processing. In order to start down this path however, people who are in charge of the instruments need to train users to annotate, either at the cytometer or on subsequent analysis software. Our previous blog post on Future Proofing Your Experiments and Files covers some simple techniques for quality annotation.
Obviously, annotating data and recording experiment protocols is important, but what about keeping track of the tools used to analyze that data? Ensuring that analysis software and algorithms produce consistent results between updates is also essential to the reproducibility of an experiment. Since Cytobank is a web-based platform, all of the updates are tested and deployed on the server, with no action required from the end user.
As the field of cytometry begins to generate more and more data, it will be up to core facilities to lead the way in the responsible management of that data, and analysis companies to make it easier for the cores to achieve this. This idea is covered in-depth in our blog post on the Transparency of Data Analysis, which discusses the recent Institute of Medicine report that was written after the retraction of several large studies due to the mismanagement of the data analysis process.
The panel was concluded with the participants saying what they thought was most important, and they agreed that it was resoundingly “Annotation”.
This experience not only emphasizes the importance of thinking about what is necessary in the near future, but also that having a community to interact with is important for the development of a field. The efforts of Ryan and the Cytometry Google community, but also everyone who contributes to the Purdue mailing list, are essential as cytometry grows and tackles tough issues in the future.