hormone2cell.hormone_strength

hormone2cell.hormone_strength(ave_all: DataFrame, geneset_definition: DataFrame, celltype_column: str = 'Celltype_unique', tissue_col: str | None = 'Tissue', adjustment: bool = True, assay: str | None = 'cell', include_cols: str = ['hormonegene_include1', 'hormonegene_include2', 'hormonegene_include3', 'hormonegene_include4'], exclude_cols: str = ['hormonegene_exclude1', 'hormonegene_exclude2'], thresh_expr_low=0.01, thresh_pct=5, specificity_threshold=0.8, coverage_threshold=0.5, thresh_included_initial=0.1, thresh_excluded_initial=0.15, thresh_included_adjusted=0.5, thresh_excluded_adjusted=0.15, use_precomputed: str | None = None, max_expression_file='gene_max_expression.csv') DataFrame

End-to-end pipeline to compute hormone strength per cell type and return a long-form, annotated table.

Workflow

  1. Collect all included/excluded genes from hormone_producing.

  2. Compute a wide hormone × cell-type matrix of strengths. - If adjustment=True, run a specificity/coverage-adjusted pass and

    replace the affected hormones.

    • If adjustment=False, run a single unadjusted pass.

  3. Convert the wide matrix to long format and append hormone annotations (and the assay label if provided).

Parameters

ave_allpd.DataFrame

Input table used by downstream steps. Must contain at least the column named by celltype_column; if adjustment=True, it must also contain tissue_col.

geneset_definitionpd.DataFrame

Hormone definition/annotation table. Must contain ‘hormone_short’; may optionally include ‘hormone_display’, ‘hormone_figures’, ‘Tier’. Columns listed in include_cols / exclude_cols should hold gene IDs.

celltype_columnstr, default “Celltype_unique”

Column name identifying cell types.

tissue_colOptional[str], default “Tissue”

Column name identifying tissue; only required when adjustment=True.

adjustmentbool, default True

Whether to run the specificity/coverage-adjusted second pass.

assayOptional[str], default “cell”

If provided, added as a constant ‘assay’ column in the output.

include_cols, exclude_cols

Column lists in hormone_producing that define include/exclude gene sets.

thresh_expr_low, thresh_pctfloat, defaults 0.1 and 0.05

Initial filtering thresholds used inside the calculation.

specificity_threshold, coverage_thresholdfloat, defaults 0.8 and 0.5

Thresholds for identifying hormones that need adjustment (τ specificity and coverage across cell types within tissues).

thresh_included_initial, thresh_excluded_initialfloat

Expression thresholds for the first pass (included/excluded genes).

thresh_included_adjusted, thresh_excluded_adjustedfloat

Expression thresholds used for the adjusted pass.

use_precomputedcell or nucleus or None, default None.

Load precomputed maximum log-expression values for hormone-related genes and indicate which assay to load the precomputed thresholds for.

Returns

pd.DataFrame

Long-form DataFrame with columns like: [‘Hormone’, <celltype_column>, ‘Strength’, ‘hormone_short’,

‘hormone_display’, ‘hormone_figures’, ‘Tier’, ‘assay’(optional)].

‘Strength’ is derived from the wide matrix’s values.