Factor Support for flowWorkspace - Extended Class Approach#406
Factor Support for flowWorkspace - Extended Class Approach#406
Conversation
…levels in pData by delegating to flowSet's phenoData slot
UsageBasic Usagelibrary(flowWorkspace)
# Load data and convert to cytoset_factors
cs <- load_cytoset_from_fcs(...)
cs_f <- cytoset_factors(cs)
# Create pData with custom factor levels
pd <- pData(cs_f)
pd$Patient <- factor(pd$Patient, levels = c("C", "B", "A"))
pd$Visit <- factor(pd$Visit, levels = c("V3", "V2", "V1"))
# Assign - factors are now preserved in the factor_data slot!
# Strings are also synced to C++ for compatibility.
pData(cs_f) <- pd
# Retrieve - factors restored with correct levels from factor_data
pd2 <- pData(cs_f)
stopifnot(is.factor(pd2$Patient))
stopifnot(identical(levels(pd2$Patient), c("C", "B", "A")))With GatingSetgs <- load_gs(...)
gs_f <- GatingSet_factors(gs)
pd <- pData(gs_f)
pd$Batch <- factor(pd$Batch, levels = c("Batch3", "Batch2", "Batch1"))
pData(gs_f) <- pd
# Factors preserved in GatingSet
pd2 <- pData(gs_f)
stopifnot(is.factor(pd2$Batch))With ggcyto (visualization)# ggcyto automatically respects factor ordering from pData
cs_f <- cytoset_factors(cs)
pd <- pData(cs_f)
pd$Patient <- factor(pd$Patient, levels = c("Control", "Treatment"))
pData(cs_f) <- pd
# Plots will respect factor order
ggcyto(cs_f, aes(x = "CD4")) +
geom_histogram() +
facet_wrap(~Patient) # Facets ordered: Control, then Treatment |
|
Thanks for taking a look @mikejiang. I've been thinking about this and I have a new proposal that avoids the need for extended classes. For flowSet objects we store the metadata in the For cytosets, we instead store the metadata on the C++ side as character strings which is accessed using So we could simply store the metadata on the R side and return that to the user when So all we would need to do is make the pData accessor for cytosets return the R side metadata and update the replacements methods to update the R side metadata first and then the C++ side metadata. |
|
We don't want to change the current behavior of cs/gs, which is meant to be a pure thin wrapper around C++ them structure |
Overview
This implementation adds factor-level preservation to flowWorkspace through new extended classes, keeping changes isolated and non-intrusive to existing code.
Problem
The C++ backend of cytoset stores pData as strings (
map<string, string>), which causes factor columns to lose their level information when assigned:Solution: Extended Classes
New Classes
cytoset_factors- Extendscytosetwith factor preservationGatingSet_factors- ExtendsGatingSetwith factor preservationDesign
cytoset_factors.R,GatingSet_factors.R)cytosetandGatingSetclasses unchanged*_factorsclassesfactor_dataslot (data.frame) to store pData with factorsfactor_dataand syncs string representations to C++ backendArchitecture