interdim.pipeline
Functions
|
Perform clustering on the input data. |
|
Create an interactive scatter plot using Dash. |
|
Perform dimensionality reduction on the input data. |
|
Evaluate clustering performance using the specified method. |
Classes
|
|
|
A class for creating interactive plots based on data points. |
- class interdim.pipeline.InterDimAnalysis(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]
- __init__(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]
Initialize the InterDimAnalysis object.
- Parameters:
data – Input data for analysis.
true_labels – True labels for supervised evaluation (optional).
verbose – Whether to print progress information.
- reduce(method: Literal['pca', 'tsne', 'umap', 'truncated_svd', 'fast_ica', 'nmf', 'isomap', 'lle', 'mds', 'spectral_embedding', 'gaussian_random_projection', 'sparse_random_projection'] = 'tsne', n_components: int = 2, **kwargs) ndarray [source]
Perform dimensionality reduction on the data.
- Parameters:
method – Dimensionality reduction method to use.
n_components – Number of components to reduce to.
**kwargs – Additional arguments for the reduction method.
- Returns:
The reduced data.
- cluster(method: Literal['kmeans', 'dbscan', 'hdbscan', 'agglomerative', 'birch', 'mini_batch_kmeans', 'spectral', 'affinity_propagation', 'mean_shift', 'optics', 'gaussian_mixture'] = 'dbscan', which_data: str = 'reduced', **kwargs) ndarray [source]
Perform clustering on the chosen dataset (original or reduced).
- Parameters:
method – Clustering method to use.
which_data – Which dataset to use for clustering (‘original’ or ‘reduced’).
**kwargs – Additional arguments for the clustering method.
- Returns:
The cluster labels.
- Raises:
ValueError – If the selected dataset hasn’t been prepared.
- score(method: Literal['adjusted_mutual_info', 'adjusted_rand', 'completeness', 'fowlkes_mallows', 'homogeneity', 'mutual_info', 'normalized_mutual_info', 'rand', 'v_measure', 'contingency_matrix', 'pair_confusion_matrix', 'calinski_harabasz', 'davies_bouldin', 'silhouette'] = 'silhouette', true_labels: ndarray | None = None) float | ndarray [source]
Evaluate the clustering performance.
- Parameters:
method – Scoring method to use.
true_labels – True labels for supervised scoring methods (optional).
- Returns:
The computed score.
- Raises:
ValueError – If clustering hasn’t been performed yet.
- show(n_components: int | None = 3, which_data: str = 'reduced', point_visualization: Callable | Literal['bar', 'box', 'histogram', 'line', 'violin'] | None = None, marker_kwargs: Dict | None = None, scatter_kwargs: Dict | None = None, interact_mode: Literal['hover', 'click'] = 'hover', port: int | None = None) Dash [source]
Generate an interactive visualization of the data.
- Parameters:
n_components – Number of components to show (1, 2, or 3).
which_data – Which dataset to show (‘original’ or ‘reduced’).
point_visualization – Either a function or a string specifying the plot type for interaction events.
marker_kwargs – Dictionary of marker properties.
scatter_kwargs – Dictionary of scatter plot properties.
interact_mode – Interaction mode (‘hover’ or ‘click’).
port – Port to run the Dash server on. If None, a free port will be found automatically.
- Returns:
A Dash application instance for the interactive plot.
- Return type:
dash.Dash
- Raises:
ValueError – If invalid options are selected or required methods haven’t been run.