Features engineering
Prepare input coordinates
Format input coordinates and compute intermediary results to prepare features computation:
- bigfish.classification.prepare_extracted_data(cell_mask, nuc_mask=None, ndim=None, rna_coord=None, centrosome_coord=None)
Prepare data extracted from images.
- Parameters:
- cell_masknp.ndarray, np.uint, np.int or bool
Surface of the cell with shape (y, x).
- nuc_mask: np.ndarray, np.uint, np.int or bool
Surface of the nucleus with shape (y, x).
- ndimint
Number of spatial dimensions to consider (2 or 3). Mandatory if rna_coord is provided.
- rna_coordnp.ndarray, np.int
Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1.
- centrosome_coordnp.ndarray, np.int
Coordinates of the detected centrosome with shape (nb_elements, 3) or (nb_elements, 2). One coordinate per dimension (zyx or yx dimensions).
- Returns:
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- distance_cellnp.ndarray, np.float32
Distance map from the cell with shape (y, x), in pixels.
- distance_cell_normalizednp.ndarray, np.float32
Normalized distance map from the cell with shape (y, x).
- centroid_cellnp.ndarray, np.int
Coordinates of the cell centroid with shape (2,).
- distance_centroid_cellnp.ndarray, np.float32
Distance map from the cell centroid with shape (y, x), in pixels.
- nuc_masknp.ndarray, bool
Surface of the nucleus with shape (y, x).
- cell_mask_out_nucnp.ndarray, bool
Surface of the cell (outside the nucleus) with shape (y, x).
- distance_nucnp.ndarray, np.float32
Distance map from the nucleus with shape (y, x), in pixels.
- distance_nuc_normalizednp.ndarray, np.float32
Normalized distance map from the nucleus with shape (y, x).
- centroid_nucnp.ndarray, np.int
Coordinates of the nucleus centroid with shape (2,).
- distance_centroid_nucnp.ndarray, np.float32
Distance map from the nucleus centroid with shape (y, x), in pixels.
- rna_coord_out_nucnp.ndarray, np.int
Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1. Spots detected inside the nucleus are removed.
- centroid_rnanp.ndarray, np.int
Coordinates of the rna centroid with shape (2,) or (3,).
- distance_centroid_rnanp.ndarray, np.float32
Distance map from the rna centroid with shape (y, x), in pixels.
- centroid_rna_out_nucnp.ndarray, np.int
Coordinates of the rna centroid (outside the nucleus) with shape (2,) or (3,).
- distance_centroid_rna_out_nucnp.ndarray, np.float32
Distance map from the rna centroid (outside the nucleus) with shape (y, x), in pixels.
- distance_centrosomenp.ndarray, np.float32
Distance map from the centrosome with shape (y, x), in pixels.
Compute features
Functions to compute features about cell morphology and RNAs localization. There are two main functions to compute spatial and morphological features are:
Group of features can be computed separately:
See an example of application here.
- bigfish.classification.compute_features(cell_mask, nuc_mask, ndim, rna_coord, smfish=None, voxel_size_yx=None, foci_coord=None, centrosome_coord=None, compute_distance=False, compute_intranuclear=False, compute_protrusion=False, compute_dispersion=False, compute_topography=False, compute_foci=False, compute_area=False, compute_centrosome=False, return_names=False)
Compute requested features.
- Parameters:
- cell_masknp.ndarray, np.uint, np.int or bool
Surface of the cell with shape (y, x).
- nuc_mask: np.ndarray, np.uint, np.int or bool
Surface of the nucleus with shape (y, x).
- ndimint
Number of spatial dimensions to consider (2 or 3).
- rna_coordnp.ndarray, np.int
Coordinates of the detected spots with shape (nb_spots, 4) or (nb_spots, 3). One coordinate per dimension (zyx or yx dimensions) plus the index of the cluster assigned to the spot. If no cluster was assigned, value is -1. If cluster id is not provided foci related features are not computed.
- smfishnp.ndarray, np.uint
Image of RNAs, with shape (y, x).
- voxel_size_yxint, float or None
Size of a voxel on the yx plan, in nanometer.
- foci_coordnp.ndarray, np.int
Array with shape (nb_foci, 5) or (nb_foci, 4). One coordinate per dimension for the foci centroid (zyx or yx coordinates), the number of spots detected in the foci and its index.
- centrosome_coordnp.ndarray, np.int
Coordinates of the detected centrosome with shape (nb_elements, 3) or (nb_elements, 2). One coordinate per dimension (zyx or yx dimensions). These coordinates are mandatory to compute centrosome related features.
- compute_distancebool
Compute distance related features.
- compute_intranuclearbool
Compute nucleus related features.
- compute_protrusionbool
Compute protrusion related features.
- compute_dispersionbool
Compute dispersion indices.
- compute_topographybool
Compute topographic features.
- compute_focibool
Compute foci related features.
- compute_areabool
Compute area related features.
- compute_centrosomebool
Compute centrosome related features.
- return_namesbool
Return features names.
- Returns:
- featuresnp.ndarray, np.float32
Array of features.
- bigfish.classification.get_features_name(names_features_distance=False, names_features_intranuclear=False, names_features_protrusion=False, names_features_dispersion=False, names_features_topography=False, names_features_foci=False, names_features_area=False, names_features_centrosome=False)
Return the current list of features names.
- Parameters:
- names_features_distancebool
Return names of features related to distances from nucleus or cell membrane.
- names_features_intranuclearbool
Return names of features related to nucleus.
- names_features_protrusionbool
Return names of features related to protrusions.
- names_features_dispersionbool
Return names of features used to quantify mRNAs dispersion within the cell.
- names_features_topographybool
Return names of topographic features of the cell.
- names_features_focibool
Return names of features related to foci.
- names_features_areabool
Return names of features related to area of the cell.
- names_features_centrosomebool
Return names of features related to centrosome.
- Returns:
- features_nameList[str]
A list of features name.
- bigfish.classification.features_distance(rna_coord, distance_cell, distance_nuc, cell_mask, ndim, check_input=True)
Compute distance related features.
- Parameters:
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- distance_cellnp.ndarray, np.float32
Distance map from the cell with shape (y, x).
- distance_nucnp.ndarray, np.float32
Distance map from the nucleus with shape (y, x).
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- ndimint
Number of spatial dimensions to consider.
- check_inputbool
Check input validity.
- Returns:
- index_mean_dist_cellfloat
Normalized mean distance of RNAs to the cell membrane.
- index_median_dist_cellfloat
Normalized median distance of RNAs to the cell membrane.
- index_mean_dist_nucfloat
Normalized mean distance of RNAs to the nucleus.
- index_median_dist_nucfloat
Normalized median distance of RNAs to the nucleus.
- bigfish.classification.features_in_out_nucleus(rna_coord, rna_coord_out_nuc, check_input=True)
Compute nucleus related features.
- Parameters:
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- rna_coord_out_nucnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns. Spots detected inside the nucleus are removed.
- check_inputbool
Check input validity.
- Returns:
- proportion_rna_in_nucfloat
Proportion of RNAs detected inside the nucleus.
- nb_rna_out_nucfloat
Number of RNAs detected outside the nucleus.
- nb_rna_in_nucfloat
Number of RNAs detected inside the nucleus.
- bigfish.classification.features_protrusion(rna_coord, cell_mask, nuc_mask, ndim, voxel_size_yx, check_input=True)
Compute protrusion related features.
- Parameters:
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- nuc_masknp.ndarray, bool
Surface of the nucleus with shape (y, x).
- ndimint
Number of spatial dimensions to consider.
- voxel_size_yxint or float
Size of a voxel on the yx plan, in nanometer.
- check_inputbool
Check input validity.
- Returns:
- index_rna_protrusionfloat
Number of RNAs detected in a protrusion and normalized by the expected number of RNAs under random distribution.
- proportion_rna_protrusionfloat
Proportion of RNAs detected in a protrusion.
- protrusion_areafloat
Protrusion area (in pixels).
- bigfish.classification.features_dispersion(smfish, rna_coord, centroid_rna, cell_mask, centroid_cell, centroid_nuc, ndim, check_input=True)
Compute RNA Distribution Index features (RDI) described in:
RDI Calculator: An analysis Tool to assess RNA distributions in cells, Stueland M., Wang T., Park H. Y., Mili, S., 2019.
- Parameters:
- smfishnp.ndarray, np.uint
Image of RNAs, with shape (y, x).
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- centroid_rnanp.ndarray, np.int
Coordinates of the rna centroid with shape (2,) or (3,).
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- centroid_cellnp.ndarray, np.int
Coordinates of the cell centroid with shape (2,).
- centroid_nucnp.ndarray, np.int
Coordinates of the nucleus centroid with shape (2,).
- ndimint
Number of spatial dimensions to consider.
- check_inputbool
Check input validity.
- Returns:
- index_polarizationfloat
Polarization index (PI).
- index_dispersionfloat
Dispersion index (DI).
- index_peripheral_distributionfloat
Peripheral distribution index (PDI).
- bigfish.classification.features_topography(rna_coord, cell_mask, nuc_mask, cell_mask_out_nuc, ndim, voxel_size_yx, check_input=True)
Compute topographic features.
- Parameters:
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- nuc_masknp.ndarray, bool
Surface of the nucleus with shape (y, x).
- cell_mask_out_nucnp.ndarray, bool
Surface of the cell (outside the nucleus) with shape (y, x).
- ndimint
Number of spatial dimensions to consider.
- voxel_size_yxint or float
Size of a voxel on the yx plan, in nanometer.
- check_inputbool
Check input validity.
- Returns:
- index_rna_nuc_margefloat
Number of RNAs detected in a specific region around nucleus and normalized by the expected number of RNAs under random distribution. Six regions are targeted (less than 500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the nucleus boundary).
- proportion_rna_nuc_margefloat
Proportion of RNAs detected in a specific region around nucleus. Six regions are targeted (less than 500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the nucleus boundary).
- index_rna_cell_margefloat
Number of RNAs detected in a specific region around cell membrane and normalized by the expected number of RNAs under random distribution. Six regions are targeted (0-500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the cell membrane).
- proportion_rna_cell_margefloat
Proportion of RNAs detected in a specific region around cell membrane. Six regions are targeted (0-500nm, 500-1000nm, 1000-1500nm, 1500-2000nm, 2000-2500nm and 2500-3000nm from the cell membrane).
- bigfish.classification.features_foci(rna_coord, foci_coord, ndim, check_input=True)
Compute foci related features.
- Parameters:
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- foci_coordnp.ndarray, np.int
Array with shape (nb_foci, 5) or (nb_foci, 4). One coordinate per dimension for the foci centroid (zyx or yx coordinates), the number of spots detected in the foci and its index.
- ndimint
Number of spatial dimensions to consider.
- check_inputbool
Check input validity.
- Returns:
- proportion_rna_in_focifloat
Proportion of RNAs detected in a foci.
- bigfish.classification.features_area(cell_mask, nuc_mask, cell_mask_out_nuc, check_input=True)
Compute area related features.
- Parameters:
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- nuc_masknp.ndarray, bool
Surface of the nucleus with shape (y, x).
- cell_mask_out_nucnp.ndarray, bool
Surface of the cell (outside the nucleus) with shape (y, x).
- check_inputbool
Check input validity.
- Returns:
- nuc_relative_areafloat
Proportion of nucleus area in the cell.
- cell_areafloat
Cell area (in pixels).
- nuc_areafloat
Nucleus area (in pixels).
- cell_area_out_nucfloat
Cell area outside the nucleus (in pixels).
- bigfish.classification.features_centrosome(smfish, rna_coord, distance_centrosome, cell_mask, ndim, voxel_size_yx, check_input=True)
Compute centrosome related features (in 2 dimensions).
- Parameters:
- smfishnp.ndarray, np.uint
Image of RNAs, with shape (y, x).
- rna_coordnp.ndarray, np.int
Coordinates of the detected RNAs with zyx or yx coordinates in the first 3 or 2 columns.
- distance_centrosomenp.ndarray, np.float32
Distance map from the centrosome with shape (y, x), in pixels.
- cell_masknp.ndarray, bool
Surface of the cell with shape (y, x).
- ndimint
Number of spatial dimensions to consider.
- voxel_size_yxint or float
Size of a voxel on the yx plan, in nanometer.
- check_inputbool
Check input validity.
- Returns:
- index_mean_dist_centfloat
Normalized mean distance of RNAs to the closest centrosome.
- index_median_dist_centfloat
Normalized median distance of RNAs to the closest centrosome.
- index_rna_centrosomefloat
Number of RNAs within a 2000nm radius from a centrosome, normalized by the expected number of RNAs under random distribution.
- proportion_rna_centrosomefloat
Proportion of RNAs within a 2000nm radius from a centrosome.
- index_centrosome_dispersionfloat
Centrosomal dispersion index. It quantify the dispersion of RNAs around centrosomes. The lower, the closer the RNAs are.