Features engineering

Prepare input coordinates

Format input coordinates and compute intermediary results to prepare features computation:

bigfish.classification.prepare_extracted_data(cell_mask, nuc_mask=None, ndim=None, rna_coord=None, centrosome_coord=None)

Prepare data extracted from images.

Parameters:
cell_masknp.ndarray, np.uint, np.int or bool

Surface of the cell with shape (y, x).

nuc_mask: np.ndarray, np.uint, np.int or bool

Surface of the nucleus with shape (y, x).

ndimint

Number of spatial dimensions to consider (2 or 3). Mandatory if rna_coord is provided.

rna_coordnp.ndarray, np.int

Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1.

centrosome_coordnp.ndarray, np.int

Coordinates of the detected centrosome with shape (nb_elements, 3) or (nb_elements, 2). One coordinate per dimension (zyx or yx dimensions).

Returns:
cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

distance_cellnp.ndarray, np.float32

Distance map from the cell with shape (y, x), in pixels.

distance_cell_normalizednp.ndarray, np.float32

Normalized distance map from the cell with shape (y, x).

centroid_cellnp.ndarray, np.int

Coordinates of the cell centroid with shape (2,).

distance_centroid_cellnp.ndarray, np.float32

Distance map from the cell centroid with shape (y, x), in pixels.

nuc_masknp.ndarray, bool

Surface of the nucleus with shape (y, x).

cell_mask_out_nucnp.ndarray, bool

Surface of the cell (outside the nucleus) with shape (y, x).

distance_nucnp.ndarray, np.float32

Distance map from the nucleus with shape (y, x), in pixels.

distance_nuc_normalizednp.ndarray, np.float32

Normalized distance map from the nucleus with shape (y, x).

centroid_nucnp.ndarray, np.int

Coordinates of the nucleus centroid with shape (2,).

distance_centroid_nucnp.ndarray, np.float32

Distance map from the nucleus centroid with shape (y, x), in pixels.

rna_coord_out_nucnp.ndarray, np.int

Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1. Spots detected inside the nucleus are removed.

centroid_rnanp.ndarray, np.int

Coordinates of the rna centroid with shape (2,) or (3,).

distance_centroid_rnanp.ndarray, np.float32

Distance map from the rna centroid with shape (y, x), in pixels.

centroid_rna_out_nucnp.ndarray, np.int

Coordinates of the rna centroid (outside the nucleus) with shape (2,) or (3,).

distance_centroid_rna_out_nucnp.ndarray, np.float32

Distance map from the rna centroid (outside the nucleus) with shape (y, x), in pixels.

distance_centrosomenp.ndarray, np.float32

Distance map from the centrosome with shape (y, x), in pixels.


Compute features

Functions to compute features about cell morphology and RNAs localization. There are two main functions to compute spatial and morphological features are:

Group of features can be computed separately:

See an example of application here.

bigfish.classification.compute_features(cell_mask, nuc_mask, ndim, rna_coord, smfish=None, voxel_size_yx=None, foci_coord=None, centrosome_coord=None, compute_distance=False, compute_intranuclear=False, compute_protrusion=False, compute_dispersion=False, compute_topography=False, compute_foci=False, compute_area=False, compute_centrosome=False, return_names=False)

Compute requested features.

Parameters:
cell_masknp.ndarray, np.uint, np.int or bool

Surface of the cell with shape (y, x).

nuc_mask: np.ndarray, np.uint, np.int or bool

Surface of the nucleus with shape (y, x).

ndimint

Number of spatial dimensions to consider (2 or 3).

rna_coordnp.ndarray, np.int

Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1. If cluster id is not provided foci related features are not computed.

smfishnp.ndarray, np.uint

Image of RNAs, with shape (y, x).

voxel_size_yxint, float or None

Size of a voxel on the yx plan, in nanometer.

foci_coordnp.ndarray, np.int

Array with shape (nb_foci, 5) or (nb_foci, 4). One coordinate per dimension for the foci centroid (zyx or yx coordinates), the number of spots detected in the foci and its index.

centrosome_coordnp.ndarray, np.int

Coordinates of the detected centrosome with shape (nb_elements, 3) or (nb_elements, 2). One coordinate per dimension (zyx or yx dimensions). These coordinates are mandatory to compute centrosome related features.

compute_distancebool

Compute distance related features.

compute_intranuclearbool

Compute nucleus related features.

compute_protrusionbool

Compute protrusion related features.

compute_dispersionbool

Compute dispersion indices.

compute_topographybool

Compute topographic features.

compute_focibool

Compute foci related features.

compute_areabool

Compute area related features.

compute_centrosomebool

Compute centrosome related features.

return_namesbool

Return features names.

Returns:
featuresnp.ndarray, np.float32

Array of features.

bigfish.classification.get_features_name(names_features_distance=False, names_features_intranuclear=False, names_features_protrusion=False, names_features_dispersion=False, names_features_topography=False, names_features_foci=False, names_features_area=False, names_features_centrosome=False)

Return the current list of features names.

Parameters:
names_features_distancebool

Return names of features related to distances from nucleus or cell membrane.

names_features_intranuclearbool

Return names of features related to nucleus.

names_features_protrusionbool

Return names of features related to protrusions.

names_features_dispersionbool

Return names of features used to quantify mRNAs dispersion within the cell.

names_features_topographybool

Return names of topographic features of the cell.

names_features_focibool

Return names of features related to foci.

names_features_areabool

Return names of features related to area of the cell.

names_features_centrosomebool

Return names of features related to centrosome.

Returns:
features_nameList[str]

A list of features name.

bigfish.classification.features_distance(rna_coord, distance_cell, distance_nuc, cell_mask, ndim, check_input=True)

Compute distance related features.

Parameters:
rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

distance_cellnp.ndarray, np.float32

Distance map from the cell with shape (y, x).

distance_nucnp.ndarray, np.float32

Distance map from the nucleus with shape (y, x).

cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

ndimint

Number of spatial dimensions to consider.

check_inputbool

Check input validity.

Returns:
index_mean_dist_cellfloat

Normalized mean distance of RNAs to the cell membrane.

index_median_dist_cellfloat

Normalized median distance of RNAs to the cell membrane.

index_mean_dist_nucfloat

Normalized mean distance of RNAs to the nucleus.

index_median_dist_nucfloat

Normalized median distance of RNAs to the nucleus.

bigfish.classification.features_in_out_nucleus(rna_coord, rna_coord_out_nuc, check_input=True)

Compute nucleus related features.

Parameters:
rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

rna_coord_out_nucnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns. Spots detected inside the nucleus are removed.

check_inputbool

Check input validity.

Returns:
proportion_rna_in_nucfloat

Proportion of RNAs detected inside the nucleus.

nb_rna_out_nucfloat

Number of RNAs detected outside the nucleus.

nb_rna_in_nucfloat

Number of RNAs detected inside the nucleus.

bigfish.classification.features_protrusion(rna_coord, cell_mask, nuc_mask, ndim, voxel_size_yx, check_input=True)

Compute protrusion related features.

Parameters:
rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

nuc_masknp.ndarray, bool

Surface of the nucleus with shape (y, x).

ndimint

Number of spatial dimensions to consider.

voxel_size_yxint or float

Size of a voxel on the yx plan, in nanometer.

check_inputbool

Check input validity.

Returns:
index_rna_protrusionfloat

Number of RNAs detected in a protrusion and normalized by the expected number of RNAs under random distribution.

proportion_rna_protrusionfloat

Proportion of RNAs detected in a protrusion.

protrusion_areafloat

Protrusion area (in pixels).

bigfish.classification.features_dispersion(smfish, rna_coord, centroid_rna, cell_mask, centroid_cell, centroid_nuc, ndim, check_input=True)

Compute RNA Distribution Index features (RDI) described in:

RDI Calculator: An analysis Tool to assess RNA distributions in cells, Stueland M., Wang T., Park H. Y., Mili, S., 2019.

Parameters:
smfishnp.ndarray, np.uint

Image of RNAs, with shape (y, x).

rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

centroid_rnanp.ndarray, np.int

Coordinates of the rna centroid with shape (2,) or (3,).

cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

centroid_cellnp.ndarray, np.int

Coordinates of the cell centroid with shape (2,).

centroid_nucnp.ndarray, np.int

Coordinates of the nucleus centroid with shape (2,).

ndimint

Number of spatial dimensions to consider.

check_inputbool

Check input validity.

Returns:
index_polarizationfloat

Polarization index (PI).

index_dispersionfloat

Dispersion index (DI).

index_peripheral_distributionfloat

Peripheral distribution index (PDI).

bigfish.classification.features_topography(rna_coord, cell_mask, nuc_mask, cell_mask_out_nuc, ndim, voxel_size_yx, check_input=True)

Compute topographic features.

Parameters:
rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

nuc_masknp.ndarray, bool

Surface of the nucleus with shape (y, x).

cell_mask_out_nucnp.ndarray, bool

Surface of the cell (outside the nucleus) with shape (y, x).

ndimint

Number of spatial dimensions to consider.

voxel_size_yxint or float

Size of a voxel on the yx plan, in nanometer.

check_inputbool

Check input validity.

Returns:
index_rna_nuc_margefloat

Number of RNAs detected in a specific region around nucleus and normalized by the expected number of RNAs under random distribution. Six regions are targeted (less than 500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the nucleus boundary).

proportion_rna_nuc_margefloat

Proportion of RNAs detected in a specific region around nucleus. Six regions are targeted (less than 500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the nucleus boundary).

index_rna_cell_margefloat

Number of RNAs detected in a specific region around cell membrane and normalized by the expected number of RNAs under random distribution. Six regions are targeted (0-500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the cell membrane).

proportion_rna_cell_margefloat

Proportion of RNAs detected in a specific region around cell membrane. Six regions are targeted (0-500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the cell membrane).

bigfish.classification.features_foci(rna_coord, foci_coord, ndim, check_input=True)

Compute foci related features.

Parameters:
rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

foci_coordnp.ndarray, np.int

Array with shape (nb_foci, 5) or (nb_foci, 4). One coordinate per dimension for the foci centroid (zyx or yx coordinates), the number of spots detected in the foci and its index.

ndimint

Number of spatial dimensions to consider.

check_inputbool

Check input validity.

Returns:
proportion_rna_in_focifloat

Proportion of RNAs detected in a foci.

bigfish.classification.features_area(cell_mask, nuc_mask, cell_mask_out_nuc, check_input=True)

Compute area related features.

Parameters:
cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

nuc_masknp.ndarray, bool

Surface of the nucleus with shape (y, x).

cell_mask_out_nucnp.ndarray, bool

Surface of the cell (outside the nucleus) with shape (y, x).

check_inputbool

Check input validity.

Returns:
nuc_relative_areafloat

Proportion of nucleus area in the cell.

cell_areafloat

Cell area (in pixels).

nuc_areafloat

Nucleus area (in pixels).

cell_area_out_nucfloat

Cell area outside the nucleus (in pixels).

bigfish.classification.features_centrosome(smfish, rna_coord, distance_centrosome, cell_mask, ndim, voxel_size_yx, check_input=True)

Compute centrosome related features (in 2 dimensions).

Parameters:
smfishnp.ndarray, np.uint

Image of RNAs, with shape (y, x).

rna_coordnp.ndarray, np.int

Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.

distance_centrosomenp.ndarray, np.float32

Distance map from the centrosome with shape (y, x), in pixels.

cell_masknp.ndarray, bool

Surface of the cell with shape (y, x).

ndimint

Number of spatial dimensions to consider.

voxel_size_yxint or float

Size of a voxel on the yx plan, in nanometer.

check_inputbool

Check input validity.

Returns:
index_mean_dist_centfloat

Normalized mean distance of RNAs to the closest centrosome.

index_median_dist_centfloat

Normalized median distance of RNAs to the closest centrosome.

index_rna_centrosomefloat

Number of RNAs within a 2000nm radius from a centrosome, normalized by the expected number of RNAs under random distribution.

proportion_rna_centrosomefloat

Proportion of RNAs within a 2000nm radius from a centrosome.

index_centrosome_dispersionfloat

Centrosomal dispersion index. It quantify the dispersion of RNAs around centrosomes. The lower, the closer the RNAs are.