hormone2cell.hormone_strength

hormone2cell.hormone_strength(ave_all: DataFrame, geneset_definition: DataFrame, celltype_column: str = 'Celltype_unique', tissue_col: str | None = 'Tissue', adjustment: bool = True, assay: str | None = 'cell', include_cols: str = ['hormonegene_include1', 'hormonegene_include2', 'hormonegene_include3', 'hormonegene_include4'], exclude_cols: str = ['hormonegene_exclude1', 'hormonegene_exclude2'], thresh_expr_low=0.01, thresh_pct=5, specificity_threshold=0.8, coverage_threshold=0.5, thresh_included_initial=0.1, thresh_excluded_initial=0.15, thresh_included_adjusted=0.5, thresh_excluded_adjusted=0.15, use_precomputed: str | None = None, max_expression_file='gene_max_expression.csv') → DataFrame

End-to-end pipeline to compute hormone strength per cell type and return a long-form, annotated table.

Workflow

Collect all included/excluded genes from hormone_producing.
Compute a wide hormone × cell-type matrix of strengths. - If adjustment=True, run a specificity/coverage-adjusted pass and

replace the affected hormones.
- If adjustment=False, run a single unadjusted pass.
Convert the wide matrix to long format and append hormone annotations (and the assay label if provided).

Parameters

ave_allpd.DataFrame: Input table used by downstream steps. Must contain at least the column named by celltype_column; if adjustment=True, it must also contain tissue_col.
geneset_definitionpd.DataFrame: Hormone definition/annotation table. Must contain ‘hormone_short’; may optionally include ‘hormone_display’, ‘hormone_figures’, ‘Tier’. Columns listed in include_cols / exclude_cols should hold gene IDs.
celltype_columnstr, default “Celltype_unique”: Column name identifying cell types.
tissue_colOptional[str], default “Tissue”: Column name identifying tissue; only required when adjustment=True.
adjustmentbool, default True: Whether to run the specificity/coverage-adjusted second pass.
assayOptional[str], default “cell”: If provided, added as a constant ‘assay’ column in the output.
include_cols, exclude_cols: Column lists in hormone_producing that define include/exclude gene sets.
thresh_expr_low, thresh_pctfloat, defaults 0.1 and 0.05: Initial filtering thresholds used inside the calculation.
specificity_threshold, coverage_thresholdfloat, defaults 0.8 and 0.5: Thresholds for identifying hormones that need adjustment (τ specificity and coverage across cell types within tissues).
thresh_included_initial, thresh_excluded_initialfloat: Expression thresholds for the first pass (included/excluded genes).
thresh_included_adjusted, thresh_excluded_adjustedfloat: Expression thresholds used for the adjusted pass.
use_precomputedcell or nucleus or None, default None.: Load precomputed maximum log-expression values for hormone-related genes and indicate which assay to load the precomputed thresholds for.

Returns

pd.DataFrame

Long-form DataFrame with columns like: [‘Hormone’, <celltype_column>, ‘Strength’, ‘hormone_short’,

‘hormone_display’, ‘hormone_figures’, ‘Tier’, ‘assay’(optional)].

‘Strength’ is derived from the wide matrix’s values.