interdim.pipeline

Functions

`cluster_data`(data[, method, n_clusters])	Perform clustering on the input data.
`interactive_scatterplot`(x[, y, z, ...])	Create an interactive scatter plot using Dash.
`reduce_dimensionality`(data[, method, ...])	Perform dimensionality reduction on the input data.
`score_clustering`(X, labels[, true_labels, ...])	Evaluate clustering performance using the specified method.

Classes

`InterDimAnalysis`(data[, true_labels, verbose])
`InteractionPlot`(data_source[, plot_type, ...])	A class for creating interactive plots based on data points.

class interdim.pipeline.InterDimAnalysis(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]

__init__(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]

Initialize the InterDimAnalysis object.

Parameters:

data – Input data for analysis.
true_labels – True labels for supervised evaluation (optional).
verbose – Whether to print progress information.

reduce(method: Literal['pca', 'tsne', 'umap', 'truncated_svd', 'fast_ica', 'nmf', 'isomap', 'lle', 'mds', 'spectral_embedding', 'gaussian_random_projection', 'sparse_random_projection'] = 'tsne', n_components: int = 2, **kwargs) → ndarray[source]

Perform dimensionality reduction on the data.

Parameters:

method – Dimensionality reduction method to use.
n_components – Number of components to reduce to.
**kwargs – Additional arguments for the reduction method.

Returns:

The reduced data.

cluster(method: Literal['kmeans', 'dbscan', 'hdbscan', 'agglomerative', 'birch', 'mini_batch_kmeans', 'spectral', 'affinity_propagation', 'mean_shift', 'optics', 'gaussian_mixture'] = 'dbscan', which_data: str = 'reduced', **kwargs) → ndarray[source]

Perform clustering on the chosen dataset (original or reduced).

Parameters:

method – Clustering method to use.
which_data – Which dataset to use for clustering (‘original’ or ‘reduced’).
**kwargs – Additional arguments for the clustering method.

Returns:

The cluster labels.

Raises:

ValueError – If the selected dataset hasn’t been prepared.

score(method: Literal['adjusted_mutual_info', 'adjusted_rand', 'completeness', 'fowlkes_mallows', 'homogeneity', 'mutual_info', 'normalized_mutual_info', 'rand', 'v_measure', 'contingency_matrix', 'pair_confusion_matrix', 'calinski_harabasz', 'davies_bouldin', 'silhouette'] = 'silhouette', true_labels: ndarray | None = None) → float | ndarray[source]

Evaluate the clustering performance.

Parameters:

method – Scoring method to use.
true_labels – True labels for supervised scoring methods (optional).

Returns:

The computed score.

Raises:

ValueError – If clustering hasn’t been performed yet.

show(n_components: int | None = 3, which_data: str = 'reduced', point_visualization: Callable | Literal['bar', 'box', 'histogram', 'line', 'violin'] | None = None, marker_kwargs: Dict | None = None, scatter_kwargs: Dict | None = None, interact_mode: Literal['hover', 'click'] = 'hover', port: int | None = None) → Dash[source]

Generate an interactive visualization of the data.

Parameters:

n_components – Number of components to show (1, 2, or 3).
which_data – Which dataset to show (‘original’ or ‘reduced’).
point_visualization – Either a function or a string specifying the plot type for interaction events.
marker_kwargs – Dictionary of marker properties.
scatter_kwargs – Dictionary of scatter plot properties.
interact_mode – Interaction mode (‘hover’ or ‘click’).
port – Port to run the Dash server on. If None, a free port will be found automatically.

Returns:

A Dash application instance for the interactive plot.

Return type:

dash.Dash

Raises:

ValueError – If invalid options are selected or required methods haven’t been run.