Welcome to THOI’s documentation!

README

THOI: Torch - Higher Order Interactions

THOI Logo

Description

THOI is a Python package designed to compute O information in Higher Order Interactions using batch processing. This package leverages PyTorch for efficient tensor operations.

Installation

Prerequisites

Ensure you have Python 3.6 or higher installed.

Installing THOI with your prefered Versions of PyTorch

Because PyTorch installation can depend on the user environment and requirements (GPU or CPU support or a specific version of PyTorch), you need to install PyTorch separately before installing THOI. Follow these steps:

  1. Visit the official PyTorch installation guide:

    • Go to the PyTorch website and navigate to the “Get Started” page.

    • Select your preferences for the following options:

      • PyTorch Build: Stable or LTS (long-term support)

      • Your Operating System: Linux, Mac, or Windows

      • Package: Pip (recommended)

      • Language: Python

      • Compute Platform: CPU, CUDA 10.2, CUDA 11.1, etc.

  2. Get the Installation Command:

    • Based on your selections, the PyTorch website will provide the appropriate installation command.

    • For example, for the CPU-only version, the command will look like this:

      pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
      
    • For the GPU version with CUDA 11.1, the command will look like this:

      pip install torch==1.8.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
      
  3. Install PyTorch:

    • Copy and run the command provided by the PyTorch website in your terminal.

  4. Install THOI:

    • Once PyTorch is installed, install THOI using:

      pip install thoi
      

Usage

After installation, you can start using THOI in your projects. Here is a simple example:

from thoi.measures.gaussian_copula import multi_order_measures, nplets_measures
from thoi.heuristics import simulated_annealing, greedy
import numpy as np

X = np.random.normal(0,1, (1000, 10))

# Computation of O information for the nplet that consider all the variables of X
measures = nplets_measures(X)

# Computation of O info for a single nplet (it must be a list of nplets even if it is a single nplet)
measures = nplets_measures(X, [[0,1,3]])

# Computation of O info for multiple nplets
measures = nplets_measures(X, [[0,1,3],[3,7,4],[2,6,3]])

# Extensive computation of O information measures over all combinations of features in X
measures = multi_order_measures(X)

# Compute the best 10 combinations of features (nplet) using greedy, starting by exaustive search in
# lower order and building from there. Result shows best O information for
# each built optimal orders
best_nplets, best_scores = greedy(X, 3, 5, repeat=10)

# Compute the best 10 combinations of features (nplet) using simulated annealing: There are two initialization options
# 1. Starting by a custom initial solution with shape (repeat, order) explicitely provided by the user.
# 2. Selecting random samples from the order.
# Result shows best O information for each built optimal orders
best_nplets, best_scores = simulated_annealing(X, 5, repeat=10)

For detailed usage and examples, please refer to the documentation.

Contributing

We welcome contributions from the community. If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use the thoi library in a scientific project, please cite it using one of the following formats:

BibTeX
@misc{thoi,
  author       = {Laouen Belloli and Rubén Herzog},
  title        = {THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing},
  year         = {2024},
  url          = {https://pypi.org/project/thoi/}
}

APA Belloli, L., & Herzog, R. (2023). THOI: An efficient library for higher order interactions analysis based on Gaussian copulas enhanced by batch-processing. Retrieved from https://pypi.org/project/thoi/

MLA Belloli, Laouen, and Rubén Herzog. THOI: An efficient library for higher order interactions analysis based on Gaussian copulas enhanced by batch-processing. 2023. Web. https://pypi.org/project/thoi/.

Authors

For more details, visit the GitHub repository.

License

MIT License

Copyright (c) 2024 Laouen Mayal Louan Belloli

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Indices and tables

Functions

thoi.measures.gaussian_copula.multi_order_measures(X: Tensor | ndarray | Sequence[ndarray | Sequence[Any]], min_order: int = 3, max_order: int | None = None, *, covmat_precomputed: bool = False, T: List[int] | int | None = None, batch_size: int = 1000000, device: device = device(type='cpu'), num_workers: int = 0, batch_aggregation: Callable[[any], any] | None = None, batch_data_collector: Callable[[ndarray, ndarray, ndarray, ndarray, ndarray], any] | None = None)

Compute multi-order measures (TC, DTC, O, S) for the given data matrix X.

The measurements computed are:
  • Total Correlation (TC)

  • Dual Total Correlation (DTC)

  • O-information (O)

  • S-information (S)

Parameters:
  • X (TensorLikeArray) – Input data, which can be one of the following: - A single torch.Tensor or np.ndarray with shape (T, N). - A sequence (e.g., list) of torch.Tensor or np.ndarray, each with shape (T, N), representing multiple datasets. - A sequence of sequences, where each inner sequence is an array-like object of shape (T, N). If covmat_precomputed is True, X should be: - A single torch.Tensor or np.ndarray covariance matrix with shape (N, N). - A sequence of covariance matrices, each with shape (N, N).

  • min_order (int, optional) – Minimum order to compute. Default is 3. Note: 3 <= min_order <= max_order <= N.

  • max_order (int, optional) – Maximum order to compute. If None, uses N (number of variables). Default is None. Note: min_order <= max_order <= N.

  • covmat_precomputed (bool, optional) – If True, X is treated as covariance matrices instead of raw data. Default is False.

  • T (int or list of int, optional) – Number of samples used to compute bias correction. This parameter is used only if covmat_precomputed is True. If X is a sequence of covariance matrices, T should be a list of sample sizes corresponding to each matrix. If T is None and covmat_precomputed is True, bias correction is not applied. Default is None.

  • batch_size (int, optional) – Batch size for DataLoader. Default is 1,000,000.

  • device (torch.device, optional) – Device to use for computation. Default is torch.device(‘cpu’).

  • num_workers (int, optional) – Number of workers for DataLoader. Default is 0.

  • batch_aggregation (callable, optional) – Function to aggregate the collected batch data into the final result. It should accept a list of outputs from batch_data_collector and return the final aggregated result. The return type of this function determines the return type of multi_order_measures. By default, it uses concat_and_sort_csv, which concatenates CSV data and sorts it, returning a pandas DataFrame. For more information see Collectors Concat and sort CSV

  • batch_data_collector (callable, optional) –

    Function to process and collect data from each batch. It should accept the following parameters:

    • nplets: torch.Tensor of n-plet indices, shape (batch_size, order)

    • nplets_tc: torch.Tensor of total correlation values, shape (batch_size, D)

    • nplets_dtc: torch.Tensor of dual total correlation values, shape (batch_size, D)

    • nplets_o: torch.Tensor of O-information values, shape (batch_size, D)

    • nplets_s: torch.Tensor of S-information values, shape (batch_size, D)

    • batch_number: int, the current batch number

    The output of batch_data_collector must be compatible with the input expected by batch_aggregation. By default, it uses batch_to_csv, which collects data into CSV. For more information see Collectors Batch to CSV

Returns:

  • Any – The aggregated result of the computed measures. The exact type depends on the batch_aggregation function used. By default, it returns a pandas DataFrame containing the computed metrics (DTC, TC, O, S), the n-plets indexes, the order and the dataset information.

  • Where

  • —–

  • D (int) – Number of datasets. If X is a single dataset, D = 1.

  • N (int) – Number of variables (features) in each dataset.

  • T (int) – Number of samples in each dataset (if applicable).

  • order (int) – The size of the n-plets being analyzed, ranging from min_order to max_order.

  • batch_size (int) – Number of n-plets processed in each batch.

Notes

  • The default batch_data_collector and batch_aggregation functions are designed to work together. If you provide custom functions, ensure that the output of batch_data_collector is compatible with the input of batch_aggregation.

  • Ensure that the length of T matches the number of datasets when covmat_precomputed is True and X is a sequence of covariance matrices.

  • The function computes measures for all combinations of variables of orders ranging from min_order to max_order.

  • The function is optimized for batch processing using PyTorch tensors, facilitating efficient computations on large datasets.

Examples

Using default batch data collector and aggregation:

>>> result = multi_order_measures(X, min_order=3, max_order=5)

Using custom batch data collector and aggregation:

>>> def custom_batch_data_collector(nplets, tc, dtc, o, s, batch_number):
...     # Custom processing
...     return custom_data
...
>>> def custom_batch_aggregation(batch_data_list):
...     # Custom aggregation
...     return final_result
...
>>> result = multi_order_measures(
...     X,
...     min_order=3,
...     max_order=5,
...     batch_data_collector=custom_batch_data_collector,
...     batch_aggregation=custom_batch_aggregation
... )

References

thoi.measures.gaussian_copula.nplets_measures(X: Tensor | ndarray | Sequence[ndarray | Sequence[Any]], nplets: Tensor | ndarray | Sequence[ndarray | Sequence[Any]] | None = None, *, covmat_precomputed: bool = False, T: List[int] | int | None = None, device: device = device(type='cpu'), verbose: int = 20, batch_size: int = 1000000)

Compute higher-order measures (TC, DTC, O, S) for specified n-plets in the given data matrices X.

The computed measures are:
  • Total Correlation (TC)

  • Dual Total Correlation (DTC)

  • O-information (O)

  • S-information (S)

Parameters:
  • X (TensorLikeArray) – Input data, which can be one of the following: - A single torch.Tensor or np.ndarray with shape (T, N). - A sequence (e.g., list) of torch.Tensor or np.ndarray, each with shape (T, N), representing multiple datasets. - A sequence of sequences, where each inner sequence is an array-like object of shape (T, N). If covmat_precomputed is True, X should be: - A single torch.Tensor or np.ndarray covariance matrix with shape (N, N). - A sequence of covariance matrices, each with shape (N, N).

  • nplets (TensorLikeArray, optional) – The n-plets to calculate the measures, with shape (n_nplets, order). If None, all possible n-plets of the given order are considered.

  • covmat_precomputed (bool, optional) – If True, X is treated as covariance matrices instead of raw data. Default is False.

  • T (int or list of int, optional) – Number of samples used to compute bias correction. This parameter is used only if covmat_precomputed is True. If X is a sequence of covariance matrices, T should be a list of sample sizes corresponding to each matrix. If T is None and covmat_precomputed is True, bias correction is not applied. Default is None.

  • device (torch.device, optional) – Device to use for computation. Default is torch.device(‘cpu’).

  • verbose (int, optional) – Logging verbosity level. Default is logging.INFO.

  • batch_size (int, optional) – Batch size for processing n-plets. Default is 1,000,000.

Returns:

  • torch.Tensor – Tensor containing the computed measures for each n-plet with shape (n_nplets, D, 4)

  • Where

  • —–

  • D (int) – Number of datasets. If X is a single dataset, D = 1.

  • N (int) – Number of variables (features) in each dataset.

  • T (int) – Number of samples in each dataset (if applicable).

  • order (int) – The size of the n-plets being analyzed.

  • n_nplets (int) – Number of n-plets processed.

Examples

Compute measures for all possible 3-plets in a single dataset:

```python import torch import numpy as np

# Sample data matrix with 100 samples and 5 variables X = np.random.randn(100, 5)

# Compute measures for all 3-plets measures = nplets_measures(X, nplets=None, covmat_precomputed=False, T=100) ```

Compute measures for specific n-plets in multiple datasets:

```python import torch import numpy as np

# Sample data matrices for 2 datasets, each with 100 samples and 5 variables X1 = np.random.randn(100, 5) X2 = np.random.randn(100, 5) X = [X1, X2]

# Define specific n-plets to analyze nplets = torch.tensor([[0, 1, 2], [1, 2, 3]])

# Compute measures for the specified n-plets measures = nplets_measures(X, nplets=nplets, covmat_precomputed=False, T=[100, 100]) ```

Compute measures with precomputed covariance matrices:

```python import torch import numpy as np

# Precompute covariance matrices for 2 datasets covmat1 = np.cov(np.random.randn(100, 5), rowvar=False) covmat2 = np.cov(np.random.randn(100, 5), rowvar=False) X = [covmat1, covmat2]

# Number of samples for each covariance matrix T = [100, 100]

# Define specific n-plets to analyze nplets = torch.tensor([[0, 1], [2, 3]])

# Compute measures using precomputed covariance matrices measures = nplets_measures(X, nplets=nplets, covmat_precomputed=True, T=T) ```

Notes

  • If nplets is None, the function considers all possible n-plets of the specified order within the datasets.

  • Ensure that the length of T matches the number of datasets when covmat_precomputed is True and X is a sequence of covariance matrices.

  • The function is optimized for batch processing using PyTorch tensors, facilitating efficient computations on large datasets.

References

thoi.measures.gaussian_copula_hot_encoded.multi_order_measures_hot_encoded(X: Tensor | ndarray | Sequence[ndarray | Sequence[Any]], min_order: int = 3, max_order: int | None = None, *, covmat_precomputed: bool = False, T: int | None = None, batch_size: int = 100000, device: device = device(type='cpu'), num_workers: int = 0, batch_aggregation: Callable[[any], any] | None = None, batch_data_collector: Callable[[ndarray, ndarray, ndarray, ndarray, ndarray], any] | None = None)

Compute multi-order Gaussian Copula (GC) measurements for the given data matrix X. The measurements computed are:

  • Total Correlation (TC)

  • Dual Total Correlation (DTC)

  • O-information (O)

  • S-information (S)

Parameters:
  • X (-) – T samples x N variables matrix. if not covmat_precomputed, it should be a numpy array.

  • min_order (-) – Minimum order to compute (default: 3).

  • max_order (-) – Maximum order to compute (default: None, will use N).

  • covmat_precomputed (-) – If True, X is a covariance matrix (default: False).

  • T (-) – Number of samples used to compute bias correction (default: None). This parameter is only used if covmat_precomputed is True.

  • batch_size (-) – Batch size for DataLoader (default: 1000000).

  • device (-) – The device to use for the computation. Default is ‘cpu’.

  • num_workers (-) – Number of workers for DataLoader (default: 0).

  • batch_aggregation (-) – Function to aggregate the batched data (default: pd.concat).

  • batch_data_collector (-) – Function to collect the batched data (default: batch_to_csv).

Returns:

DataFrame containing computed metrics.

Return type:

pd.DataFrame

thoi.measures.gaussian_copula_hot_encoded.nplets_measures_hot_encoded(X: Tensor | ndarray | Sequence[ndarray | Sequence[Any]], nplets: Tensor | ndarray | Sequence[ndarray | Sequence[Any]] | None = None, *, covmat_precomputed: bool = False, T: int | None = None, batch_size: int = 100000, device: device = device(type='cpu'))

Brief: Compute the higher order measurements (tc, dtc, o and s) for the given data matrices X over the nplets.

Parameters: - X TensorLikeArray: The input data to compute the nplets. It can be a list of 2D numpy arrays or tensors of shape: 1. (T, N) where T is the number of samples if X are multivariate series. 2. a list of 2D covariance matrices with shape (N, N). - nplets (Optional[Union[np.ndarray,torch.Tensor]]): The nplets to calculate the measures with shape (batch_size, order) - covmat_precomputed (bool): A boolean flag to indicate if the input data is a list of covariance matrices or multivariate series. - T (Optional[Union[int, List[int]]]): A list of integers indicating the number of samples for each multivariate series. - device (torch.device): The device to use for the computation. Default is ‘cpu’. - batch_size (int): Batch size for processing n-plets. Default is 100,000.

Returns: - torch.Tensor: The measures for the nplets with shape (n_nplets, D, 4) where D is the number of matrices, n_nplets is the number of nplets to calculate over and 4 is the number of metrics (tc, dtc, o, s)

thoi.commons.gaussian_copula(X: ndarray)

Gaussian Copula Transformation

Transform the data into a Gaussian copula and compute the covariance matrix.

param X:

A 2D numpy array of shape (T, N) where T is the number of samples and N is the number of variables.

type X:

np.ndarray

returns:
  • X_gaussian (np.ndarray) – The data transformed into the Gaussian copula (same shape as the input).

  • X_gaussian_covmat (np.ndarray) – The covariance matrix of the Gaussian copula transformed data.

Notes

  • The Gaussian copula transformation involves ranking the data, normalizing the ranks, and applying the inverse CDF of the standard normal distribution.

  • Infinite values resulting from the inverse CDF transformation are set to 0.

  • The covariance matrix is computed from the Gaussian copula transformed data.

thoi.commons.gaussian_copula_covmat(X: ndarray)

Compute the covariance matrix of the Gaussian copula transformed data.

Parameters:

X (np.ndarray) – A 2D numpy array of shape (T, N) where T is the number of samples and N is the number of variables.

Returns:

The covariance matrix of the Gaussian copula transformed data.

Return type:

np.ndarray

Notes

thoi.collectors.batch_to_csv(nplets_idxs: Tensor, nplets_tc: Tensor, nplets_dtc: Tensor, nplets_o: Tensor, nplets_s: Tensor, bn: int, only_synergetic: bool = False, columns: List[str] | None = None, N: int | None = None, sep: str = '\t', indexing_method: str = 'indexes', output_path: str | None = None) DataFrame | None

Collectors Batch to CSV

Convert batch results to a pandas DataFrame and optionally save to CSV.

This function processes the measures computed for n-plets in a batch and converts them into a pandas DataFrame. It can also save the DataFrame to a CSV file if an output path is provided.

param nplets_idxs:

Indices of the n-plets. Shape: (batch_size, order).

type nplets_idxs:

torch.Tensor

param nplets_tc:

Total correlation values. Shape: (batch_size, D).

type nplets_tc:

torch.Tensor

param nplets_dtc:

Dual total correlation values. Shape: (batch_size, D).

type nplets_dtc:

torch.Tensor

param nplets_o:

O-information values. Shape: (batch_size, D).

type nplets_o:

torch.Tensor

param nplets_s:

S-information values. Shape: (batch_size, D).

type nplets_s:

torch.Tensor

param bn:

Batch number, used for identification in output files.

type bn:

int

param only_synergetic:

If True, only includes n-plets with negative O-information (synergetic). Default is False.

type only_synergetic:

bool, optional

param columns:

Names of the variables (features). If None, variable names will be generated as ‘var_0’, ‘var_1’, …, ‘var_N-1’.

type columns:

list of str, optional

param N:

Total number of variables. Required if columns is not provided.

type N:

int, optional

param sep:

Separator to use in the CSV file. Default is tab (’t’).

type sep:

str, optional

param indexing_method:

Method used to represent n-plets. Can be ‘indexes’ or ‘hot_encoded’. Default is ‘indexes’.

type indexing_method:

str, optional

param output_path:

Path to save the CSV file. If None, the DataFrame is returned instead of being saved.

type output_path:

str, optional

returns:
  • pd.DataFrame or None – DataFrame containing the measures and variable information for the n-plets. Returns None if output_path is provided and the DataFrame is saved to a file.

  • Where

  • —–

  • D (int) – Number of datasets. If measures are computed over multiple datasets, D > 1.

  • N (int) – Number of variables (features).

  • batch_size (int) – Number of n-plets in the batch.

  • order (int) – Order of the n-plets (number of variables in each n-plet).

Notes

  • The function can filter out n-plets with non-negative O-information if only_synergetic is True.

  • The resulting DataFrame includes the measures and a binary indicator for each variable indicating its presence in the n-plet.

  • The DataFrame also includes columns for ‘order’ and ‘dataset’.

Examples

```python # Sample inputs nplets_idxs = torch.tensor([[0, 1], [1, 2], [0, 2]]) nplets_tc = torch.rand(3, 1) nplets_dtc = torch.rand(3, 1) nplets_o = torch.rand(3, 1) nplets_s = torch.rand(3, 1) bn = 0 columns = [‘A’, ‘B’, ‘C’] N = 3

# Convert batch to DataFrame df = batch_to_csv(

nplets_idxs, nplets_tc, nplets_dtc, nplets_o, nplets_s, bn, columns=columns, N=N

)

thoi.collectors.batch_to_tensor(nplets_idxs: Tensor, nplets_tc: Tensor, nplets_dtc: Tensor, nplets_o: Tensor, nplets_s: Tensor, bn: int | None = None, top_k: int | None = None, metric: str | Callable[[Tensor], Tensor] = 'o', largest: bool = False) Tuple[Tensor, Tensor, Tensor | None]

Process batch measures and optionally select top-k n-plets.

Parameters:
  • nplets_idxs (torch.Tensor) – Indices of the n-plets. Shape: (batch_size, order).

  • nplets_tc (torch.Tensor) – Total correlation values. Shape: (batch_size, D).

  • nplets_dtc (torch.Tensor) – Dual total correlation values. Shape: (batch_size, D).

  • nplets_o (torch.Tensor) – O-information values. Shape: (batch_size, D).

  • nplets_s (torch.Tensor) – S-information values. Shape: (batch_size, D).

  • bn (int, optional) – Batch number. Not used in the function but kept for compatibility.

  • top_k (int, optional) – If provided, selects the top-k n-plets based on the specified metric.

  • metric (string with value 'dtc', 'tc', 'o' or 's' or Callable, optional) – Metric to use for ranking if top_k is provided. Default is ‘o’ (O-information).

  • largest (bool, optional) – If True, selects n-plets with the largest metric values if false return n-plets with the smalest values. Default is False.

Returns:

  • Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]

    • n-plets measures. Shape: (batch_size or k, D, 4).

    • n-plets indices. Shape: (batch_size or k, order).

    • Metric values of the selected n-plets if top_k is provided, else None.

  • Where

  • —–

  • D (int) – Number of datasets.

  • order (int) – Order of the n-plets.

  • batch_size (int) – Number of n-plets in the batch.

Notes

  • If top_k is provided and less than batch_size, the function selects the top-k n-plets based on the metric.

  • The measures are stacked along the last dimension in the order: (tc, dtc, o, s).

Examples

```python # Sample inputs nplets_idxs = torch.tensor([[0, 1], [1, 2], [0, 2]]) nplets_tc = torch.rand(3, 1) nplets_dtc = torch.rand(3, 1) nplets_o = torch.rand(3, 1) nplets_s = torch.rand(3, 1)

# Process batch without top-k selection measures, idxs, _ = batch_to_tensor(

nplets_idxs, nplets_tc, nplets_dtc, nplets_o, nplets_s

)

# Process batch with top-2 selection based on O-information measures_topk, idxs_topk, metric_values = batch_to_tensor(

nplets_idxs, nplets_tc, nplets_dtc, nplets_o, nplets_s, top_k=2, metric=’o’, largest=False

)

thoi.collectors.concat_and_sort_csv(batched_dataframes) DataFrame

Collectors Concat and sort CSV

Concatenate a list of DataFrames and sort them by the ‘dataset’ column.

param batched_dataframes:

List of DataFrames to concatenate and sort.

type batched_dataframes:

list of pd.DataFrame

returns:

The concatenated and sorted DataFrame.

rtype:

pd.DataFrame

Notes

  • The DataFrames are concatenated along the rows.

  • Sorting is performed using the ‘dataset’ column in ascending order.

  • The index is reset after sorting.

Examples

`python df1 = pd.DataFrame({'dataset': [0, 0], 'value': [1, 2]}) df2 = pd.DataFrame({'dataset': [1, 1], 'value': [3, 4]}) combined_df = concat_and_sort_csv([df1, df2]) `

thoi.collectors.concat_batched_tensors(batched_tensors: List[Tuple[Tensor, Tensor, Tensor | None]], top_k: int | None = None, metric: str | Callable | None = 'o', largest: bool = False) Tuple[Tensor, Tensor, Tensor | None]

Concatenate batched tensors and optionally select top-k n-plets.

Parameters:
  • batched_tensors (list of tuples) –

    Each tuple contains:
    • nplets_measures: torch.Tensor, shape (batch_size, D, 4)

    • nplets_idxs: torch.Tensor, shape (batch_size, order)

    • nplets_scores: torch.Tensor or None, shape (batch_size,)

  • top_k (int, optional) – If provided, selects the top-k n-plets across all batches. Default is None.

  • metric (str or Callable, optional) – Metric to use for ranking if top_k is provided. Default is ‘o’.

  • largest (bool, optional) – If True, selects n-plets with the largest metric values, if not, select n-plets with smallest values. Default is False.

Returns:

  • Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]

    • Concatenated n-plets measures. Shape: (total_nplets or k, D, 4).

    • Concatenated n-plets indices. Shape: (total_nplets or k, order).

    • Metric values of the selected n-plets if top_k is provided, else None.

  • Where

  • —–

  • D (int) – Number of datasets.

  • order (int) – Order of the n-plets.

  • total_nplets (int) – Total number of n-plets across all batches.

Notes

  • If top_k is provided, the function selects the top-k n-plets across all batches.

  • If top_k is provided, nplets_scores must be provided in batched_tensors.

Examples

```python # Suppose we have batched tensors from two batches batched_tensors = [

(measures_batch1, idxs_batch1, None), (measures_batch2, idxs_batch2, None)

]

# Concatenate without top-k selection measures_all, idxs_all, _ = concat_batched_tensors(batched_tensors)

# Concatenate and select top-5 n-plets based on O-information measures_topk, idxs_topk, metric_values = concat_batched_tensors(

batched_tensors, top_k=5, metric=’o’, largest=False

)

thoi.collectors.top_k_nplets(nplets_idxs: Tensor, nplets_measures: Tensor, k: int, metric: str | Callable[[Tensor], Tensor], largest: bool) Tuple[Tensor, Tensor, Tensor]

Select the top-k n-plets based on a specified metric.

Parameters:
  • nplets_idxs (torch.Tensor) – Indices of the n-plets. Shape: (batch_size, order).

  • nplets_measures (torch.Tensor) – Measures for each n-plet. Shape: (batch_size, D, 4).

  • k (int) – Number of top n-plets to select.

  • metric (string with value 'dtc', 'tc', 'o' or 's' or Callable) – Metric to use for ranking the n-plets. Can be a string specifying a measure (‘tc’, ‘dtc’, ‘o’, ‘s’), or a custom callable that takes nplets_measures and returns a tensor of values.

  • largest (bool) – If True, selects n-plets with the largest metric values if false return n-plets with the smalest values. Default is False.

Returns:

  • Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

    • Selected n-plets measures. Shape: (k, D, 4).

    • Selected n-plets indices. Shape: (k, order).

    • Metric values of the selected n-plets. Shape: (k,).

  • Where

  • —–

  • D (int) – Number of datasets.

  • order (int) – Order of the n-plets (number of variables in each n-plet).

  • batch_size (int) – Number of n-plets in the batch.

Notes

  • The function computes the specified metric for each n-plet and selects the top-k based on this metric.

  • The metric can be one of the predefined measures or a custom function.

Examples

```python # Sample data nplets_idxs = torch.tensor([[0, 1], [1, 2], [0, 2]]) nplets_measures = torch.rand(3, 1, 4) # Assuming D=1 k = 2 metric = ‘o’ # Use O-information for ranking

# Get top-k n-plets top_measures, top_idxs, top_values = top_k_nplets(

nplets_idxs, nplets_measures, k, metric, largest=False

)

thoi.heuristics.greedy.greedy(X: Tensor | ndarray | Sequence[ndarray | Sequence[Any]], initial_order: int = 3, order: int | None = None, *, covmat_precomputed: bool = False, T: List[int] | int | None = None, repeat: int = 10, batch_size: int = 1000000, repeat_batch_size: int = 1000000, device: device = device(type='cpu'), metric: str | Callable = 'o', largest: bool = False)

Greedy algorithm to find the best order of n-plets to maximize the metric for a given multivariate series or covariance matrices.

Parameters:
  • X (TensorLikeArray) – The input data to compute the n-plets. It can be a list of 2D numpy arrays or tensors of shape: 1. (T, N) where T is the number of samples if X are multivariate series. 2. A list of 2D covariance matrices with shape (N, N).

  • initial_order (int, optional) – The initial order to start the greedy algorithm. Default is 3.

  • order (int, optional) – The final order to stop the greedy algorithm. If None, it will be set to N.

  • covmat_precomputed (bool, optional) – A boolean flag to indicate if the input data is a list of covariance matrices or multivariate series. Default is False.

  • T (int or list of int, optional) – A list of integers indicating the number of samples for each multivariate series. Default is None.

  • repeat (int, optional) – The number of repetitions to do to obtain different solutions starting from less optimal initial solutions. Default is 10.

  • batch_size (int, optional) – The batch size to use for the computation. Default is 1,000,000.

  • repeat_batch_size (int, optional) – The batch size for repeating the computation. Default is 1,000,000.

  • device (torch.device, optional) – The device to use for the computation. Default is ‘cpu’.

  • metric (Union[str, Callable], optional) – The metric to evaluate. One of ‘tc’, ‘dtc’, ‘o’, ‘s’ or a callable function. Default is ‘o’.

  • largest (bool, optional) – A flag to indicate if the metric is to be maximized or minimized. Default is False.

Returns:

  • best_nplets (torch.Tensor) – The n-plets with the best score found with shape (repeat, order).

  • best_scores (torch.Tensor) – The best scores for the best n-plets with shape (repeat,).

Notes

  • The function uses a greedy algorithm to iteratively find the best n-plets that maximize or minimize the specified metric.

  • The initial solutions are computed using the multi_order_measures function.

  • The function iterates over the remaining orders to get the best solution for each order.

thoi.heuristics.simulated_annealing.random_sampler(N: int, order: int, repeat: int, device: device | None = None) Tensor

Generate random samples of n-plets.

Parameters:
  • N (int) – The number of variables.

  • order (int) – The order of the n-plets.

  • repeat (int) – The number of samples to generate.

  • device (torch.device, optional) – The device to use for the computation. Default is ‘cpu’.

Returns:

A tensor of shape (repeat, order) containing the random samples.

Return type:

torch.Tensor

thoi.heuristics.simulated_annealing.simulated_annealing(X: ndarray | Tensor | List[ndarray] | List[Tensor], order: int | None = None, *, covmat_precomputed: bool = False, T: List[int] | int | None = None, initial_solution: Tensor | None = None, repeat: int = 10, batch_size: int = 1000000, device: device = device(type='cpu'), max_iterations: int = 1000, early_stop: int = 100, initial_temp: float = 100.0, cooling_rate: float = 0.99, metric: str | Callable = 'o', largest: bool = False, verbose: int = 20) tuple[Tensor, Tensor]

Simulated annealing algorithm to find the best n-plets to maximize the metric for a given multivariate series or covariance matrices.

Parameters:
  • X (Union[np.ndarray, torch.Tensor, List[np.ndarray], List[torch.Tensor]]) – The input data to compute the n-plets. It can be a list of 2D numpy arrays or tensors of shape: 1. (T, N) where T is the number of samples if X are multivariate series. 2. A list of 2D covariance matrices with shape (N, N).

  • order (int, optional) – The order of the n-plets. If None, it will be set to N.

  • covmat_precomputed (bool, optional) – A boolean flag to indicate if the input data is a list of covariance matrices or multivariate series. Default is False.

  • T (int or list of int, optional) – A list of integers indicating the number of samples for each multivariate series. Default is None.

  • initial_solution (torch.Tensor, optional) – The initial solution with shape (repeat, order). If None, a random initial solution is generated.

  • repeat (int, optional) – The number of repetitions to do to obtain different solutions starting from less optimal initial solutions. Default is 10.

  • batch_size (int, optional) – The batch size to use for the computation. Default is 1,000,000.

  • device (torch.device, optional) – The device to use for the computation. Default is ‘cpu’.

  • max_iterations (int, optional) – The maximum number of iterations for the simulated annealing algorithm. Default is 1000.

  • early_stop (int, optional) – The number of iterations with no improvement to stop early. Default is 100.

  • initial_temp (float, optional) – The initial temperature for the simulated annealing algorithm. Default is 100.0.

  • cooling_rate (float, optional) – The cooling rate for the simulated annealing algorithm. Default is 0.99.

  • metric (Union[str, Callable], optional) – The metric to evaluate. One of ‘tc’, ‘dtc’, ‘o’, ‘s’ or a callable function. Default is ‘o’.

  • largest (bool, optional) – A flag to indicate if the metric is to be maximized or minimized. Default is False.

  • verbose (int, optional) – Logging verbosity level. Default is logging.INFO.

Returns:

  • best_solution (torch.Tensor) – The n-plets with the best score found with shape (repeat, order).

  • best_energy (torch.Tensor) – The best scores for the best n-plets with shape (repeat,).

Notes

  • The function uses a simulated annealing algorithm to iteratively find the best n-plets that maximize or minimize the specified metric.

  • The initial solutions are computed using the random_sampler function if not provided.

  • The function iterates over the remaining orders to get the best solution for each order.

thoi.heuristics.simulated_annealing_multi_order.hot_encode_to_indexes(nplets)

Convert hot-encoded n-plets to index-based representation.

Parameters:

nplets (torch.Tensor) – The hot-encoded n-plets with shape (batch_size, N).

Returns:

The index-based representation of the n-plets with shape (batch_size, order).

Return type:

torch.Tensor

thoi.heuristics.simulated_annealing_multi_order.simulated_annealing_multi_order(X: ndarray | Tensor | List[ndarray] | List[Tensor], *, covmat_precomputed: bool = False, T: List[int] | int | None = None, initial_solution: Tensor | None = None, repeat: int = 10, batch_size: int = 1000000, device: device = device(type='cpu'), max_iterations: int = 1000, early_stop: int = 100, initial_temp: float = 100.0, cooling_rate: float = 0.99, step_size: int = 1, metric: str | Callable = 'o', largest: bool = False, verbose: int = 20) tuple[Tensor, Tensor]

Simulated annealing algorithm to find the best multi-order n-plets to maximize the metric for a given multivariate series or covariance matrices.

Parameters:
  • X (Union[np.ndarray, torch.Tensor, List[np.ndarray], List[torch.Tensor]]) – The input data to compute the n-plets. It can be a list of 2D numpy arrays or tensors of shape: 1. (T, N) where T is the number of samples if X are multivariate series. 2. A list of 2D covariance matrices with shape (N, N).

  • covmat_precomputed (bool, optional) – A boolean flag to indicate if the input data is a list of covariance matrices or multivariate series. Default is False.

  • T (int or list of int, optional) – A list of integers indicating the number of samples for each multivariate series. Default is None.

  • initial_solution (torch.Tensor, optional) – The initial solution with shape (repeat, N). If None, a random initial solution is generated.

  • repeat (int, optional) – The number of repetitions to do to obtain different solutions starting from less optimal initial solutions. Default is 10.

  • batch_size (int, optional) – The batch size to use for the computation. Default is 1,000,000.

  • device (torch.device, optional) – The device to use for the computation. Default is ‘cpu’.

  • max_iterations (int, optional) – The maximum number of iterations for the simulated annealing algorithm. Default is 1000.

  • early_stop (int, optional) – The number of iterations with no improvement to stop early. Default is 100.

  • initial_temp (float, optional) – The initial temperature for the simulated annealing algorithm. Default is 100.0.

  • cooling_rate (float, optional) – The cooling rate for the simulated annealing algorithm. Default is 0.99.

  • step_size (int, optional) – The number of elements to change in each step. Default is 1.

  • metric (Union[str, Callable], optional) – The metric to evaluate. One of ‘tc’, ‘dtc’, ‘o’, ‘s’ or a callable function. Default is ‘o’.

  • largest (bool, optional) – A flag to indicate if the metric is to be maximized or minimized. Default is False.

  • verbose (int, optional) – Logging verbosity level. Default is logging.INFO.

Returns:

  • best_solution (torch.Tensor) – The n-plets with the best score found with shape (repeat, N).

  • best_energy (torch.Tensor) – The best scores for the best n-plets with shape (repeat,).

Notes

  • The function uses a simulated annealing algorithm to iteratively find the best n-plets that maximize or minimize the specified metric.

  • The initial solutions are computed using the _random_solutions function if not provided.

  • The function iterates over the remaining orders to get the best solution for each order.