Image Analysis Collaboration: A Practical Guide

What is the role of an image analyst?

An image analyst is an expert in taking images and extracting quantitative measurements that can be used for biological discovery. We do not do image acquisition, though most of us have a background as wet-lab biologists and microscopists that we draw on. Image analysts are most effective when we are involved at every stage of a project to ensure that experimental setup or data components that are essential for a given step aren’t more challenging because of decisions made in a previous step.

Some of the roles that we perform:

Experimental design

This can include plate layout, microscope setup, label selection, sample preparation, metadata tracking, image acquisition, etc. Considering the final metric(s) to be extracted and quantified during the experimental design process will greatly increase the likelihood your data will be well-fit to the question you're designing the experiment to answer. Because we want your data to be the best it can be, we typically do not charge for short experimental design consultations, such as through our free office hour program (broad.io/imagingofficehours) .

The steps of any quantitative bioimaging experiment are mutually interconnected and are ideally optimized together. Reproduced from Senft and Diaz-Rohrer et al, PLoS Biology 2023

Assay development

This is determining the tasks that need to be performed and the parameters to best perform them. This can involve a variety of tasks including image handling (e.g. aligning images of the same cells taken during different acquisitions), object segmentation (e.g. identifying cells in an image), quantification method selection (e.g. finding the method that best distinguishes control populations), etc.

Pipeline/workflow development

This is assembling assay steps into a pipeline or workflow that can be used in a repeatable manner. We often build CellProfiler pipelines as our pipeline endpoint, but can build workflows in other image analysis softwares and can bring together multiple softwares in a single workflow.

Quality control

We can build quality control assays/workflows that can be applied at many individual points in an analysis workflow (e.g. to raw images to find images with debris, to segmented images to flag poor segmentation, etc.) or we can integrate quality control metrics into the steps themselves (e.g. not performing image analysis on images with debris).

Data handling

Data often needs to be wrangled from the way that it is output by an image analysis software before it is in a format optimized for biological discovery. This can include aggregation of many individual .csv’s into a database or annotating data outputs with metadata.

Data analysis

We are biologists and, with assistance from you (the subject matter expert), we can perform data analysis to answer specific biological questions. We emphasize writing scripts so that analyses are reproducible and the same analysis can easily be applied to subsequent datasets/batches.

Training

We can teach members of your lab how to perform any of the image analysis tasks listed above. This can be done purely as pedagogy, by shadowing an image analyst as they perform any of the tasks above on your data, or as part of hand-off of a project so that you are empowered to use the tools/workflows we develop on subsequent batches of your data.

Overview of the key skills and capabilities of a bioimage analyst. Reproduced from Cimini et al, Journal of Cell Science 2024

How do you describe your image analysis goals?

To get the best help from an image analyst, you need to be able to answer a number of questions.

What is your biological goal?

You should be able to communicate your research in 1-3 sentences.

Examples

We study lipid metabolism and are interested in the mechanisms of lipid storage organelle biogenesis. We primarily use 3T3-L1 mouse adipocyte cell line for basic discovery and have a couple different mouse lines that we use for exploring physiology.
We study a rare Mendelian genetic disease. We have a cell line that models the disease and we want to find compounds that rescue the disease phenotype.

What help do you want from image analysts?

You should be able to state exactly what you are hoping for from our help in 1-3 sentences. This needs to be discrete and deliverable-focused and is generally separate from your biological goals. Note that having a discrete deliverable helps guide our work, but that we will always provide as much documentation and data intermediates as possible. e.g. If your deliverable is a list of hits from a Cell Painting screen, we would also return raw numerical data, each step of data processing used to generate morphological profiles, QC metrics, and data analysis notebooks used to find hits.

Examples

We need image analysis pipelines that we can run in our lab that will identify cells and lipid droplets and calculate the percentage of each cell occupied by lipid droplets in both 3T3-L1 cell culture and mouse adipocyte tissue slices.
We want a rank-ordered list of the compounds that best cause our disease-model cell line to return to a wild-type phenotype in a Cell Painting screen.

What are your biological controls?

Just like any other biological experiment, you need biological controls. At the least, you must have a negative control, meaning for any treatment that you are applying, you must also have an untreated condition (e.g. DMSO for drug treatment, a non-targeting vector for CRISPR treated cells, a wild-type cell line for genetic variant analysis). If we are performing assay development, it’s very helpful for us to also have a positive control (e.g. a drug or genetic perturbation that you know causes the phenotype you are interested in capturing) because we can then best optimize the assay to fully separate positive from negative conditions.

What are your technical controls?

Technical controls can cover a broad variety of controls and are very dependent upon the specific assay and desired measurements/features. If you don’t know what technical controls you should have are, that’s okay - that’s why it’s best to involve us from the very beginning of a project so that we can help you figure them out! A few examples of technical controls are listed below.

Single label controls

Single label controls allow us to assess the specificity and intensity of individual labels. They are particularly important if the metric you are interested in extracting from the images involves quantifying the relationship between two or more labels. e.g. Colocalization analysis assumes that if there is no colocalization then there is no overlap of signal and a ratiometric analysis assumes that if there is no expression that there will be no signal in the channel.

In a hypothetical immunofluorescence experiment where we want to measure the ratio between the expression of proteins A and B in the cytoplasm, our assay would use the following channels:

Channel 1: DAPI
Channel 2: primary antibody anti-A + a secondary antibody A
Channel 3: primary antibody anti-B + a secondary antibody B

The controls for non-specific labeling would be:

Secondary A (nothing else)
Secondary B (nothing else)

The single label bleedthrough controls would be:

Primary A + Secondary A (nothing else)
Primary B + Secondary B (nothing else)
DAPI (nothing else)

The controls for cross-reactivity would also include

Cells labeled with all of the experimental reagents, except for primary antibody A
Cells labeled with all of the experimental reagents, except for primary antibody B

In all three control cases, the images of the controls should be acquired identically to the experimental samples (e.g. they would all have 3 channels, taken with the same exposure settings, and ideally prepared and imaged in parallel with the "real" samples).

Target plates

Large scale Cell Painting screens are often acquired in multiple batches. While developing the JUMP dataset, we developed a Target plate that has a paired compound, ORF overexpression, and CRISPR knockdown versions that were run along with every batch of the screen. Including the plate with every batch has been used for inter-batch alignment and eventually, we hope to use it for alignment of data between datasets.

What does your phenotype or structure of interest look like?

Though we are all biologists, we are likely not experts in your biology of interest so we need you to clearly communicate to us what you are looking at to define your phenotype. Often a great way to do this is to annotate a couple of images of your positive and negative controls.

Does the Cimini lab do Cell Painting?

Yes! We love to help you with your Cell Painting or other morphological profiling data.

We do not do any wet lab work. If you need cells cultured and imaged, we recommend working with the Broad Center for the Development of Therapeutics (CDoT) as they are a frequent collaborator. If you are generating your own images or working with a different partner, we are happy to consult on experimental design including plate layout and imager configuration.

For large-scale arrayed Cell Painting experiments, we have a fixed cost model for which we transfer image data to our storage, create flat-field illumination correction images, optimize segmentation parameters, perform some QC, and run an image analysis pipeline in CellProfiler to create single cell morphological profiling measurements. We then perform data handling and generate per-well profiles. Smaller experiments, including pilots, will typically involve most of the same steps; these are charged on a per-hour-spent basis.

For morphological profiling data, we have a set of “standard” data analysis that we often perform for collaborators after performing image analysis on their Cell Painting or other morphological profiling data. Data exploration is outside of our fixed cost model and is charged hourly. In increasing levels of time/complexity they are:

Similarity matrices and hierarchical clustering. We can make observations about how metadata groupings affect profiles this way, but deep exploration of the clusters are typically performed by subject-matter expert biologists (i.e. you).
Calculation of phenotypic activity and phenotypic consistency. Requires sufficient negative control population in your experiment.
Basic hit calling, using phenotypic activity.
Advanced hit calling, using positive controls and custom profile exploration for best separation of positive from negative controls.

We also have a fixed cost model for profile generation in Pooled Cell Painting/Optical Pooled Screening experiments and can perform similar data analyses. It is currently a much more manually intensive, and therefore more expensive, workflow but we are actively working on overhauling the workflow to bring down costs in the future.

What is a batch?

We think of a batch as a group of images that can be analyzed with the same pipeline. A batch is a set of data (images) with common sources of uncontrolled technical variability and metadata. In practice, this usually means images that were collected on the same day or within a short time window. Uncontrolled technical variability can have many sources including fluctuations in incubator temperature, stain concentration caused by precipitation, batch aliquot, or degradation, humidity, etc. It can also encompass variability that you thought was controlled but turned out not to be such as a laser strength being accidentally changed or a filter being bumped out of place on the microscope.

In order to use the same pipeline on all your images, it generally requires that your images have common experimental metadata. There are exceptions to this rule, particularly during early phases of assay development, but it’s always best to have a conversation with us before acquiring your data if your experimental setup has multiple metadata configurations as part of a single batch.

Your experimental question determines if a single or multiple pipelines are necessary. A couple examples are:

Different stain concentrations

If your question is “how does my stain concentration affect signal:noise?” and you are not using that stain for segmentation, then you can use a single pipeline with many stain concentrations. If you use that stain for segmentation/object identification then multiple pipelines are likely necessary as the segmentation parameters probably require tuning across the concentrations.

Different channels

If your question is “which combination of stains best enables me to detect my phenotype” you can use the same pipeline if you acquire the same channels for all your images (even if you don’t have a stain in some of them), i.e. the superset of all possible channels in your experiment. If you acquire just the channels that include stain then you will need different pipelines as a pipeline will have to be configured for each combination of input channels.

Subsequent batches of data may be analyzed with the same pipeline, but we always have to check that a new batch of data performs similarly to a previous batch. Pipelines may require tuning of thresholds, segmentation parameters, etc.

What images should you send us for assay development?

To create the most robust pipelines during assay development, we need the full diversity of your images as inputs so that the pipeline performs equally well across the diversity and is not overfit to a particular data subset (e.g. just pretty images, or just a single batch of images). This includes positive and negative controls. It also includes any diversity that may be present in the data that you want analyzed with the pipeline we develop. This can mean images acquired on multiple microscopes, on different days or in different batches. It can include images that you know you’d want caught and filtered out of analysis (e.g. images that are out of focus). We do not want just your prettiest images.

What can be accomplished in 2 hours/10 hours/many hours of image analysis?

In 2 hours we can look at your data, recommend specific tools, and make suggestions about what changes you might make to your assay or acquisition to make your image data more quantifiable. We can talk you through specific image analysis strategies. We generally can not deliver a specific product.

In 10 hours, we can also make a first draft of a pipeline or workflow, show you how the pipeline works, point you at resources for using and fine tuning the workflow, and point you to general image analysis help resources. Depending on the scale, diversity, and complexity of your data we can sometimes return a specific deliverable such as “Batch 1 run through a CellProfiler pipeline” or “a classifier to identify images with debris”. We generally can not create and run a workflow end-to-end from raw images through biological discovery.

For larger scale collaborations, we can do any/all of the activities described in “What is the role of an image analyst?” For previous large-scale collaborations we have assisted with setting up cloud tooling for image analysis, written custom code notebooks for downstream analyses, written custom CellProfiler plugins, and run large scale screen data ourselves.

Why do we have data management requirements?

Quantitative, reproducible image analysis relies on scripting and the easier it is for a script to parse information from your file and folder names, the less time we have to spend on re-writing a script or renaming your files, the more time we have to use our actual image analysis expertise to help you. We understand that our data management requirements may not be intuitive to biologists who aren’t used to working with scripts, but we ask that you please do your best to carefully follow our requirements and ask us for clarification or help if you have any questions.

What are the data management requirements?

Data must be organized in batches with common metadata - see “What is a Batch?” above for more.
No spaces or punctuation (besides underscores and dashes - see below) in any folder names. (We prefer they aren’t in any file names either).
Make sure that your naming pattern is consistent between folders/files. Things to pay attention to:
- Ordering. e.g. if one folder is COS7_treated, please call the other folder COS7_untreated NOT untreated_COS7
- Underscores (_) and dashes (-). Either is okay, but use the same choice across corresponding files/folders. e.g. if using Batch1_treated, please use Batch2_treated NOT Batch2-treated
- Abbreviations. e.g. if using SH1, please also use SH2 NOT shRNA2

How do I transfer files to the Cimini Lab?

The ideal method for file transfer depends on the total size of the data being transferred.

If a very small set of images is being shared (<10GB) you can upload your files to a Google Drive that you give us access to. This is not our preferred method.
For files up to ~200GB, we recommend using a File Transfer Program and uploading to Imaging Platform’s Dropbox. We can provide our FTP protocol by request.
For larger file transfers, we can provide instructions and permissions for you to upload to our AWS S3 staging bucket.
For files of any scale, you can host your data in your own AWS S3 bucket and we can help you set up permissions that we can read/write from/to your bucket directly.
Broad internal collaborators are encouraged to upload their data to a Broad server and give Cimini Lab access by filling a Group Access Request form.

How do I acknowledge help from the Cimini lab?

For help taking only a couple hours, we appreciate acknowledgement of the image analyst, Cimini Lab, and/or Imaging Platform; acknowledgements are critical for our continued funding and existence (as is true in most service facilities and/or core facilities). If you wish to cite the Imaging Platform, please use RRID:SCR_024653. If you have worked with us via the Center for Open Bioimage Analysis (COBA) (which includes our office hours), please acknowledge National Institute of General Medical Sciences NIH P41 GM135019 in the acknowledgements of any resulting publications. Please do tell us about these papers when you have a minute - we are genuinely happy to see you succeed!

If we have provided substantial work for your group, especially if we have provided intellectual contribution in design of the analysis or the overall experiment, regardless of fee structure it is a scientific collaboration - one where, if the analysis is published, co-authorship would be appropriate. In this we follow the ICMJE guidelines, which list "Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;" as a criterion for authorship. Please see also Angeletti et al. 1999: “Intellectual interactions between resource and research scientists are essential to the success of each project. When this success results in publication, a citation in the acknowledgments section of a manuscript may be appropriate for routine analysis. However, contributions from resource scientists that involve novel resource laboratory work and insight, experimental design, or advanced data analysis that make a publication possible or significantly enhance its value require co-authorship as the appropriate acknowledgment.”

Tips for successful collaborations with your friendly local bioimage analyst. Reproduced from Cimini et al, Journal of Cell Science 2024

Cimini Lab

Image Analysis Collaboration: A Practical Guide

What is the role of an image analyst?

Experimental design

Assay development

Pipeline/workflow development

Quality control

Data handling

Data analysis

Training