interdim.pipeline

Functions

cluster_data(data[, method, n_clusters])

Perform clustering on the input data.

interactive_scatterplot(x[, y, z, ...])

Create an interactive scatter plot using Dash.

reduce_dimensionality(data[, method, ...])

Perform dimensionality reduction on the input data.

score_clustering(X, labels[, true_labels, ...])

Evaluate clustering performance using the specified method.

Classes

InterDimAnalysis(data[, true_labels, verbose])

InteractionPlot(data_source[, plot_type, ...])

A class for creating interactive plots based on data points.

class interdim.pipeline.InterDimAnalysis(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]
__init__(data: ndarray, true_labels: ndarray | None = None, verbose: bool = True)[source]

Initialize the InterDimAnalysis object.

Parameters:
  • data – Input data for analysis.

  • true_labels – True labels for supervised evaluation (optional).

  • verbose – Whether to print progress information.

reduce(method: Literal['pca', 'tsne', 'umap', 'truncated_svd', 'fast_ica', 'nmf', 'isomap', 'lle', 'mds', 'spectral_embedding', 'gaussian_random_projection', 'sparse_random_projection'] = 'tsne', n_components: int = 2, **kwargs) ndarray[source]

Perform dimensionality reduction on the data.

Parameters:
  • method – Dimensionality reduction method to use.

  • n_components – Number of components to reduce to.

  • **kwargs – Additional arguments for the reduction method.

Returns:

The reduced data.

cluster(method: Literal['kmeans', 'dbscan', 'hdbscan', 'agglomerative', 'birch', 'mini_batch_kmeans', 'spectral', 'affinity_propagation', 'mean_shift', 'optics', 'gaussian_mixture'] = 'dbscan', which_data: str = 'reduced', **kwargs) ndarray[source]

Perform clustering on the chosen dataset (original or reduced).

Parameters:
  • method – Clustering method to use.

  • which_data – Which dataset to use for clustering (‘original’ or ‘reduced’).

  • **kwargs – Additional arguments for the clustering method.

Returns:

The cluster labels.

Raises:

ValueError – If the selected dataset hasn’t been prepared.

score(method: Literal['adjusted_mutual_info', 'adjusted_rand', 'completeness', 'fowlkes_mallows', 'homogeneity', 'mutual_info', 'normalized_mutual_info', 'rand', 'v_measure', 'contingency_matrix', 'pair_confusion_matrix', 'calinski_harabasz', 'davies_bouldin', 'silhouette'] = 'silhouette', true_labels: ndarray | None = None) float | ndarray[source]

Evaluate the clustering performance.

Parameters:
  • method – Scoring method to use.

  • true_labels – True labels for supervised scoring methods (optional).

Returns:

The computed score.

Raises:

ValueError – If clustering hasn’t been performed yet.

show(n_components: int | None = 3, which_data: str = 'reduced', point_visualization: Callable | Literal['bar', 'box', 'histogram', 'line', 'violin'] | None = None, marker_kwargs: Dict | None = None, scatter_kwargs: Dict | None = None, interact_mode: Literal['hover', 'click'] = 'hover', port: int | None = None) Dash[source]

Generate an interactive visualization of the data.

Parameters:
  • n_components – Number of components to show (1, 2, or 3).

  • which_data – Which dataset to show (‘original’ or ‘reduced’).

  • point_visualization – Either a function or a string specifying the plot type for interaction events.

  • marker_kwargs – Dictionary of marker properties.

  • scatter_kwargs – Dictionary of scatter plot properties.

  • interact_mode – Interaction mode (‘hover’ or ‘click’).

  • port – Port to run the Dash server on. If None, a free port will be found automatically.

Returns:

A Dash application instance for the interactive plot.

Return type:

dash.Dash

Raises:

ValueError – If invalid options are selected or required methods haven’t been run.