aphin.utils.data package

Submodules

aphin.utils.data.data module

Encapsulate data loading and data generation

class aphin.utils.data.data.Data(t, X, X_dt=None, U=None, Mu=None, J=None, R=None, Q=None, B=None)[source]

Bases: ABC

Container class for datasets in a linear dynamical system framework.

This class is designed to store and manage various datasets required for system analysis and identification, including states, time derivatives, inputs, parameters, and port-Hamiltonian matrices.

property Data

Retrieve the state and derivative data from the container.

Returns:

tuple

A tuple containing: - X: States array with shape (n_sim, n_t, n_n, n_dn). - X_dt: Time derivatives of the states with the same shape as X. - U: Input array with shape (n_sim, n_t, n_u), if available. - Mu: Parameters array with shape (n_sim, n_mu), if available.

calculate_errors(ph_identified_data_instance, domain_split_vals=None)[source]

Calculate and store RMS and latent errors between true and predicted states.

This method computes the Root Mean Square (RMS) errors for the states and latent variables between the true and predicted values. It calculates the RMS errors for each domain, computes the mean RMS error across all simulations, and, if available, computes the latent variable errors.

Parameters:

ph_identified_data_instanceData

An instance of the Data class containing the predicted state variables and latent variables (if applicable). This instance is used to compute errors against the true states stored in the current instance.

domain_split_valslist of int, optional

List specifying the number of degrees of freedom (DOFs) for each domain. If provided, it splits the states into domains for more granular error analysis.

Returns:

None

The method updates internal attributes to store the computed state errors and latent errors (if latent variables are present).

Notes:

  • self.state_error_list contains the RMS errors for each domain.

  • self.state_error_mean is the mean RMS error averaged over all simulations.

  • self.latent_error and self.latent_error_mean are computed only if the ph_identified_data_instance

contains latent variables (Z).

calculate_latent_errors(ph_identified_data_instance)[source]

Compute RMS errors for latent variables between true and predicted values.

This method calculates the Root Mean Square (RMS) error between the true latent variables (Z) and the predicted latent variables (Z_ph) from an identified data instance.

Parameters:

ph_identified_data_instanceinstance of Data

An instance of the Data class containing the predicted latent variables (Z_ph) and true latent variables (Z) used for comparison.

Returns:

ndarray

Array of RMS errors for the latent variables, with shape consistent with the latent variable dimensions.

static calculate_rms_error(X_or_Z, X_or_Z_id)[source]

Calculates the root mean square (RMS) error between the given dataset and its identified counterpart.

Parameters:

X_or_Zndarray

Array of states (X) or latent states (Z) with shape (n_sim, n_t, n_n, n_dn) or (n_sim, n_t, r) respectively.

X_or_Z_idndarray

Array of identified states (X_id) or identified latent states (Z_id) with the same shape as X_or_Z.

Returns:

norm_rms_errorndarray

Array of normalized RMS errors with shape (n_sim, n_t).

calculate_state_errors(ph_identified_data_instance, domain_split_vals=None)[source]

Calculate the normalized RMS error and the relative error between true states and predicted states.

This method computes errors by comparing the true state values (self.X) with the predicted state values from an identified data instance (ph_identified_data_instance). Errors are calculated for each domain if specified.

Parameters:

ph_identified_data_instanceinstance of Data

An instance of the Data class containing the predicted states used for comparison.

domain_split_valslist of int, optional

List specifying the number of degrees of freedom in each domain. This is used to split the state arrays into different domains for error calculation. The sum of these values must match the total number of dofs per node (self.n_dn). If None, the entire state is considered as a single domain.

Returns:

list of ndarray

A list of normalized RMS errors for each domain. Each element of the list represents the RMS error for a specific domain, calculated as the difference between the true and predicted states, normalized by the total number of samples and features.

property data

Returns the stored datasets for time, states, state derivatives, inputs, and parameters.

Returns:

tuple

A tuple containing: - t: Array of time steps. - x: States array. - dx_dt: Time derivatives of the states. - u: Input array. - mu: Parameters array.

decrease_num_simulations(num_sim: int, seed=None)[source]

Reduces the number of simulations to speed up training by selecting a random subset.

This method randomly selects a specified number of simulations from the available data, which can be useful for speeding up computations or experiments.

Parameters:

num_simint

The target number of simulations to retain. The data is reduced to this number of simulations.

seedint, optional

Random seed for reproducibility. If provided, it initializes the random number generator.

decrease_num_time_steps(num_time_steps: int)[source]

Reduces the number of time steps in the dataset by truncating the time series.

This method selects a subset of time steps from the original time series to reduce its length. It ensures that the number of remaining time steps matches the specified target.

Parameters:

num_time_stepsint

The target number of time steps to retain. Must be less than or equal to the current number of time steps.

features_to_states()[source]

Transforms the feature array back into the state array required for validation.

This method reshapes the feature arrays x and dx_dt back into the original state arrays X and X_dt. The transformation restores the dimensions to match the number of simulations, time steps, nodes, and degrees of freedom per node.

Parameters:

None

The method uses internal attributes x, dx_dt, n_sim, n_t, n_n, and n_dn to perform the transformation.

Returns:

None

The method updates the internal state of the object with the reshaped state arrays X and X_dt.

filter_data(window=10, order=3, interp_equidis_t=False)[source]

Apply Savitzky-Golay filtering to the state data to smooth it and compute the time derivatives.

This method filters the state data X and its derivative X_dt using the Savitzky-Golay filter to reduce noise and smooth the data. If interp_equidis_t is set to True, the method will first interpolate the data to equally spaced time points before applying the filter.

Parameters:

windowint, optional

The length of the filter window (i.e., the number of points used to calculate the smoothing). It must be an odd integer. Default is 10.

orderint, optional

The order of the polynomial used to fit the samples. It must be less than the window length. Default is 3.

interp_equidis_tbool, optional

If True, interpolate the data to equally spaced time points before filtering. Default is False.

classmethod from_data(data_path, **kwargs)[source]

Loads a dataset from a .npz file and creates an instance of the class.

This class method reads time steps, state data, and inputs from a .npz file and initializes an instance of the class using the loaded data. The file should contain arrays for time steps (t), states (X), and inputs (U), and optionally other parameters.

Parameters:

data_pathstr

Path to the .npz file containing the dataset.

**kwargskeyword arguments

Additional parameters to pass to the class constructor.

Returns:

instance of cls

An instance of the class initialized with the data from the .npz file and any additional parameters provided.

Notes:

The .npz file should contain the following arrays: - ‘t’ : ndarray with shape (n_t,) - ‘X’ : ndarray with shape (n_sim, n_t, n_n, n_dn) - ‘U’ : ndarray with shape (n_sim, n_t, n_u)

get_initial_conditions()[source]

Retrieve the initial conditions from the dataset.

Returns:

ndarray

Initial conditions with shape (n_sim, n_f), where n_f is derived from reshaping the data.

property ph_matrices

Returns the port-Hamiltonian matrices used in the data container.

Returns:

tuple

A tuple containing: - J: Port-Hamiltonian interconnection matrix. - R: Port-Hamiltonian dissipation matrix. - Q: Port-Hamiltonian energy matrix. - B: Port-Hamiltonian input matrix.

static read_data_from_npz(data_path)[source]

Reads data from a .npz file and returns it as a dictionary.

Parameters:

data_pathstr

Path to the .npz file or the directory containing the .npz file. If a directory is provided, the method searches for the first .npz file in the directory.

Returns:

dict

Dictionary containing the following keys: - ‘t’: ndarray, array of time steps. - ‘X’: ndarray, states array. - ‘X_dt’: ndarray, time derivatives of the states (optional, may be None). - ‘U’: ndarray, input array (optional, may be None). - ‘Mu’: ndarray, parameters array (optional, may be None). - ‘J’: ndarray, pH interconnection matrix (optional, may be None). - ‘R’: ndarray, pH dissipation matrix (optional, may be None). - ‘Q’: ndarray, pH energy matrix (optional, may be None). - ‘B’: ndarray, pH input matrix (optional, may be None).

rescale_X()[source]

Undo the scaling of states and time derivatives.

This method reverses the scaling applied to the state array X with shape (n_sim, n_t, n_n, n_dn) and the time derivatives X_dt. It must be called after scale_X has been performed to restore the original data values. The method assumes that scaling has been previously applied and will raise an error if no scaling has been performed.

Parameters:

None

Returns:

None

The method updates the internal state to reflect the rescaled data and sets self.is_scaled to False.

static save_data(data_path, t, X, U, Mu=None)[source]

Saves time steps, state data, and inputs to a compressed .npz file.

This method saves the provided time steps, state data, inputs, and optional parameters to a .npz file for efficient storage and retrieval.

Parameters:

data_pathstr

The path to the .npz file where the data will be saved.

tndarray

Array of time steps with shape (n_t,).

Xndarray

Array of system states at all time steps with shape (n_sim, n_t, n_n, n_dn).

Undarray

Array of system inputs with shape (n_sim, n_t, n_u).

Mundarray, optional

Array of parameters with shape (n_sim, n_mu). Default is None. If provided, it will be saved along with the other data.

Returns:

None

This method does not return any value. It saves the data to the specified file path.

save_state_traj_as_csv(path, dof=0, second_oder=False, filename='state_trajectories')[source]

Save the state trajectories and their time derivatives as CSV files.

This method saves the state trajectories and, if applicable, their time derivatives to CSV files. The method can handle both first-order and second-order systems, with the option to exclude redundant derivative data for second-order systems.

Parameters:

pathstr

Directory path where the CSV files will be saved.

dofint, optional

Degree of freedom (dof) index to save. Default is 0.

second_orderbool, optional

If True, the system is treated as second-order, and additional processing is applied to handle state derivatives accordingly. Default is False.

filenamestr, optional

Base name for the output CSV files. Default is “state_trajectories”.

save_traj_as_csv(state, state_dt, n_dn, path, state_var='x', derivatives=['', '_dt'], filename='state_trajectories')[source]

Save state trajectories and their time derivatives to a CSV file.

This method concatenates state trajectories with their time derivatives and saves them to a CSV file. The CSV file includes time steps, state variables, and their derivatives. The output file is saved in the specified directory with a filename that can be customized.

Parameters:

statenp.ndarray

Array of state trajectories with shape (n_sim, n_t, n_dn), where n_sim is the number of simulations, n_t is the number of time steps, and n_dn is the number of degrees of freedom per node.

state_dtnp.ndarray

Array of state time derivatives with shape (n_sim, n_t, n_dn), where n_sim, n_t, and n_dn are as defined above.

n_dnint

Number of degrees of freedom per node.

pathstr

Directory path where the CSV file will be saved.

state_varstr, optional

Prefix for the state variables in the CSV header. Default is “x”.

derivativeslist of str, optional

List of strings representing the derivatives to include in the header. Default is [“”, “_dt”] for state and its first time derivative.

filenamestr, optional

Base name for the output CSV file. The file will be saved with this name and a “.csv” extension. Default is “state_trajectories”.

scale_Mu(mu_train_bounds=None, desired_bounds=[-1, 1])[source]

Scale parameter values to the specified range and return the scaled values along with scaling factors.

This method scales the parameter values in Mu to fit within the desired_bounds range. Each parameter dimension is scaled individually. Scaling can be based on maximum values, a scalar, or provided training bounds.

Parameters:

desired_boundslist, scalar, or “max”, optional

Desired range for the scaled values. It can be: - A list of two values (e.g., [-1, 1]) specifying the lower and upper bounds. - A scalar, where all values are scaled by this single value. - The string “max”, where scaling is based on the maximum value in Mu.

mu_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_mu), where n_mu is the number of parameter dimensions in Mu. If not provided, scaling factors are computed from the data.

Returns:

scaled_Munp.ndarray

The scaled values of Mu with the same shape as the input.

mu_train_boundsnp.ndarray

The computed or provided scaling factors used for scaling.

scale_U(u_train_bounds=None, desired_bounds=[-1, 1])[source]

Scale input values to a specified range.

This method scales the input values U to the range defined by desired_bounds, which is typically [-1, 1]. Each dimension of the input is scaled individually. The scaling factors are either provided through u_train_bounds or computed based on the provided data.

Parameters:

u_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_u), where n_u is the number of input dimensions. If provided, these factors are used for scaling the input values.

desired_boundslist or scalar, optional

Desired range for the scaled inputs. Can be: - A list of two values (e.g., [-1, 1]) to specify the lower and upper bounds. - A scalar to scale all input values by this value. - The string “max” to scale all input values by the maximum value observed.

Returns:

None

The method updates the internal state by scaling the input values U, setting self.u_train_bounds, and updating self.desired_bounds_u. The scaled inputs are reshaped into feature format.

scale_X(scaling_values=None, domain_split_vals=None)[source]

Scale the state array based on specified scaling values.

This method scales the state array X with shape (n_sim, n_t, n_n, n_dn) and afterwards the feature array x with shape (n_t * n_s, n) using provided scaling values. It can handle multiple domains if specified by domain_split_vals. If scaling values are not provided, it defaults to scaling by the maximum value in each domain.

Parameters:

scaling_valueslist of float, optional

Scalar values used to scale each domain, as defined by domain_split_vals. If None, scaling is performed by the maximum value of each domain.

domain_split_valslist of int, optional

List of integers specifying the number of degrees of freedom (DOFs) for each domain. The sum of these values must equal n_dn. If None, the data is treated as a single domain.

Returns:

None

The method updates the internal state to reflect the scaled data and sets self.is_scaled to True.

Notes:

  • The method performs scaling for both the state data (X) and its time derivatives (X_dt).

  • If scaling_values are not provided, the maximum value for each domain is used for scaling.

  • The method assumes that the domains defined by domain_split_vals add up to the total number of DOFs (n_dn).

  • After scaling, the feature representation of states is updated using self.states_to_features().

scale_all(scaling_values=None, domain_split_vals=None, u_train_bounds=None, u_desired_bounds=[-1, 1], mu_train_bounds=None, mu_desired_bounds=[-1, 1])[source]

Scales states X, inputs and parameters :param scaling_values: see scale_X :param domain_split_vals: see scale_X

scale_quantity(Quantity, train_bounds=None, desired_bounds=[-1, 1])[source]

Scale a given quantity to a specified range and return the scaled values along with scaling factors.

This method scales the input Quantity values to fit within the desired_bounds range. Each dimension of the input is scaled individually. The scaling can be based on maximum values, a scalar, or training bounds if provided.

Parameters:

Quantitynp.ndarray

The data to be scaled. It can be of shape (n_sim, n_quantity) for time-independent data or (n_sim, n_t, n_quantity) for time-dependent data, where: - n_sim: number of simulations - n_t: number of time steps - n_quantity: number of quantities to be scaled

train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_q), where n_q is the number of dimensions in Quantity. If not provided, scaling factors are computed from the data.

desired_boundslist, scalar, or “max”, optional

Desired range for the scaled values. It can be: - A list of two values (e.g., [-1, 1]) specifying the lower and upper bounds. - A scalar, where all values are scaled by this single value. - The string “max”, where scaling is based on the maximum value in Quantity.

Returns:

scaled_Quantitynp.ndarray

The scaled values of Quantity with the same shape as the input.

train_boundsnp.ndarray

The computed or provided scaling factors used for scaling.

desired_boundslist

The bounds used for scaling, as provided or computed.

property shape

Return the shape of the dataset.

Returns:

tuple

A tuple containing: - n_sim: number of simulations - n_t: number of time steps - n_n: number of nodes - n_dn: number of degrees of freedom per node - n_u: number of inputs - n_mu: number of parameters

split_state_into_domains(domain_split_vals=None)[source]

Splits the state array into different domains based on the specified dimensions.

This method divides the state array X into multiple domains according to the given domain_split_vals. Each domain corresponds to a subset of degrees of freedom (dofs), and the sum of these values must equal the total number of dofs per node (self.n_dn).

Parameters:

domain_split_valslist of int, optional

List specifying the number of dofs in each domain. For example, [1, 2, 2] indicates three domains with 1, 2, and 2 dofs, respectively. The sum of these values must equal self.n_dn. If None, the entire state is considered as a single domain.

Returns:

list of ndarray

A list where each element is a state array corresponding to a specific domain. The length of the list equals the number of domains specified by domain_split_vals.

states_to_features()[source]

Transforms the state array into a feature array for identification purposes.

This method reshapes the state array X and its time derivatives X_dt into feature arrays that are required for system identification. It also reshapes the input array U and parameter array Mu if they are provided.

The transformation results in: - x: a feature array of shape (n_s, n_f), where:

  • n_s is the number of samples (n_sim * n_t)

  • n_f is the number of features (n_n * n_dn)

  • u: reshaped input array of shape (n_s, n_u) if U is provided

  • mu: reshaped parameters array of shape (n_s, n_mu) if Mu is provided

Returns:

None

The method updates the internal state of the object with the transformed feature arrays.

train_test_split(test_size, seed)[source]

see Dataset.train_test_split

train_test_split_sim_idx(sim_idx_train, sim_idx_test)[source]

see Dataset.train_test_split_sim_idx

truncate_time(trunc_time_ratio)[source]

Truncates the time values of states for performing time generalization experiments.

This method shortens the time series of states and associated data based on the given ratio. The truncation is applied to time steps, states, time derivatives, inputs, and parameters, if they are provided.

Parameters:

trunc_time_ratiofloat

Ratio of the time series to retain. A value of 1.0 means no truncation, while a value between 0 and 1 truncates the time series to the specified proportion of the total time steps.

Returns:

None

The method modifies the internal state of the object to reflect the truncated time series.

class aphin.utils.data.data.LTIDataset(t, X, U=None, X_dt=None)[source]

Bases: Data

Dataset class for Linear Time-Invariant (LTI) systems.

Inherits from the Data class and is designed to handle datasets specifically for LTI systems. This class extends the functionality of the base Data class by including data that might be used for LTI system identification or analysis.

class aphin.utils.data.data.PHIdentifiedData(t, X, X_dt=None, U=None, Mu=None, x_ph=None, dx_dt_ph=None, z=None, Z=None, z_dt=None, Z_dt=None, x_rec=None, X_rec=None, x_rec_dt=None, X_rec_dt=None, z_dt_ph_map=None, Z_dt_ph_map=None, z_ph=None, Z_ph=None, z_dt_ph=None, Z_dt_ph=None, H_ph=None, n_red=None, J=None, R=None, B=None, Q=None, solving_times=None, **kwargs)[source]

Bases: Data

Class representing the identified port-Hamiltonian data which was obtained through the aphin framework.

classmethod from_identification(data, system_layer, ph_network, integrator_type='IMR', decomp_option='lu', **kwargs)[source]

Create an instance of PHIdentifiedData from the identified pH system.

This method initializes an instance of the PHIdentifiedData class using results obtained from an identified port-Hamiltonian (pH) system. It calculates latent variables, their time derivatives, and reconstructed states, and performs simulation using the identified pH system. The method supports both APHIN and PHIN networks.

Parameters:
  • cls (type) – The class to instantiate.

  • data (Data) – The dataset object containing the initial conditions and other data.

  • system_layer (SystemLayer) – The system layer object used to extract system matrices.

  • ph_network (APHIN or PHIN) – The pH network object used for encoding, reconstructing, and calculating time derivatives.

  • integrator_type (str, optional) – The type of integrator used for simulation (default is “IMR”).

  • decomp_option (str, optional) – The decomposition option for solving the system (default is “lu”).

  • **kwargs (dict) – Additional keyword arguments to pass to the PHIdentifiedData class constructor.

Returns:

An instance of the PHIdentifiedData class initialized with results from the identified pH system.

Return type:

PHIdentifiedData

static obtain_ph_data(data, ph_network, system_layer, J_ph, R_ph, B_ph, Q_ph, integrator_type, decomp_option)[source]

Obtain the port-Hamiltonian (pH) system data, including latent variables, time derivatives, reconstructed states, and Hamiltonian values.

This method computes the reduced trajectories for the identified pH system based on the provided data, pH network, and system parameters. It calculates latent variables and their time derivatives, reconstructs states and their time derivatives, and computes the Hamiltonian for each simulation.

Parameters:
  • data (Data) – Dataset object containing the input data and initial conditions.

  • ph_network (APHIN or PHIN) – Port-Hamiltonian autoencoder object used for encoding, reconstructing, and computing derivatives.

  • system_layer (SystemLayer) – System layer object defining the latent space dimensionality and configuration.

  • J_ph (numpy.ndarray) – System matrix J for the pH system.

  • R_ph (numpy.ndarray) – System matrix R for the pH system.

  • B_ph (numpy.ndarray, optional) – System matrix B for the pH system, by default None.

  • Q_ph (numpy.ndarray, optional) – System matrix Q for the pH system, by default None.

  • integrator_type (str) – Type of integrator used for solving the pH system.

  • decomp_option (str) – Decomposition option for solving the pH system.

Returns:

A tuple containing: - z_ph : numpy.ndarray

Latent variables for each simulation.

  • dz_dt_phnumpy.ndarray

    Time derivatives of the latent variables.

  • x_phnumpy.ndarray

    Reconstructed states for each simulation.

  • dx_dt_phnumpy.ndarray

    Time derivatives of the reconstructed states.

  • Z_phnumpy.ndarray

    Reshaped latent variables into state array format.

  • Z_dt_phnumpy.ndarray

    Reshaped time derivatives of the latent variables into state array format.

  • X_phnumpy.ndarray

    Reshaped reconstructed states into state array format.

  • X_dt_phnumpy.ndarray

    Reshaped time derivatives of the reconstructed states into state array format.

  • H_phnumpy.ndarray

    Hamiltonian values for each simulation.

Return type:

tuple

static obtain_ph_map_data(ph_network, z, data, n_f)[source]

Obtain time derivatives of latent variables using the pH network and reshape them into state format.

This method calculates the time derivatives of latent variables using the pH network and reshapes them into a format consistent with the state arrays.

Parameters:
  • ph_network (PHNetwork) – pH network object used to compute time derivatives.

  • z (numpy.ndarray) – Latent variables for which the time derivatives are to be computed.

  • data (Data) – Dataset object containing input data and parameters.

  • n_f (int) – Number of latent variables (features) used in reshaping.

Returns:

A tuple containing: - z_dt_ph_map : numpy.ndarray

Time derivatives of the latent variables as computed by the pH network.

  • Z_dt_ph_mapnumpy.ndarray

    Reshaped time derivatives of the latent variables into state array format.

Return type:

tuple

static obtain_results_from_ph_autoencoder(data, system_layer, ph_network)[source]

Obtain relevant results from the identified port-Hamiltonian autoencoder (APHIN), including latent variables, time derivatives, and reconstructed states.

This method extracts and computes the following from the port-Hamiltonian autoencoder: - Latent variables and their time derivatives. - Reconstructed states and their time derivatives. It also reshapes the feature arrays into state arrays suitable for further analysis.

Parameters:
  • data (Data) – Dataset object containing the input data used for encoding and reconstructing.

  • system_layer (SystemLayer) – System layer object defining the system’s latent space dimensionality and configuration.

  • ph_network (APHIN) – Port-Hamiltonian autoencoder object used for encoding, reconstructing, and computing time derivatives.

Returns:

A tuple containing: - z : numpy.ndarray

Latent variables obtained from encoding the input data.

  • z_dtnumpy.ndarray

    Time derivatives of the latent variables.

  • x_recnumpy.ndarray

    Reconstructed states from the latent variables.

  • x_rec_dtnumpy.ndarray

    Time derivatives of the reconstructed states.

  • Znumpy.ndarray

    Reshaped latent variables into state array format.

  • Z_dtnumpy.ndarray

    Reshaped time derivatives of the latent variables into state array format.

  • X_recnumpy.ndarray

    Reshaped reconstructed states into state array format.

  • X_rec_dtnumpy.ndarray

    Reshaped time derivatives of the reconstructed states into state array format.

Return type:

tuple

static obtain_results_from_ph_network(data, system_layer)[source]

Obtain relevant results from the identified pH network, including state variables and time derivatives.

This method extracts and reshapes the state variables and their time derivatives from the provided dataset. It also prepares placeholders for the reconstructed states and their time derivatives to ensure conformity with the expected output format.

Parameters:
  • data (Data) – Dataset object containing the input data and time derivatives.

  • system_layer (SystemLayer) – System layer object defining the latent space dimensionality.

Returns:

A tuple containing: - z : numpy.ndarray

Latent variables (high-dimensional states).

  • z_dtnumpy.ndarray

    Time derivatives of the latent variables.

  • x_recnumpy.ndarray

    Reconstructed states (if applicable, otherwise None).

  • x_rec_dtnumpy.ndarray

    Time derivatives of the reconstructed states (if applicable, otherwise None).

  • Znumpy.ndarray

    Reshaped latent variables into state array format.

  • Z_dtnumpy.ndarray

    Reshaped time derivatives of the latent variables into state array format.

  • X_recnumpy.ndarray

    Reshaped reconstructed states into state array format (if applicable, otherwise None).

  • X_rec_dtnumpy.ndarray

    Reshaped time derivatives of the reconstructed states into state array format (if applicable, otherwise None).

Return type:

tuple

save_latent_traj_as_csv(path, filename='latent_trajectories')[source]

Save latent trajectories to a CSV file with the following format: t, z_0_isim, z_1_isim, …, z_{r-1}_isim, z_0_dt_isim, z_1_dt_isim, …, z_{r-1}_dt_isim

This method saves both the predicted latent trajectories (Z_ph) and their time derivatives (Z_dt_ph) as well as the reference latent trajectories (Z) and their time derivatives (Z_dt) to CSV files. Each file includes the time vector t, followed by the latent variables and their time derivatives.

Parameters:
  • path (str) – The directory path where the CSV files will be saved.

  • filename (str, optional) – The base filename for the CSV files. The default is “latent_trajectories”. Additional suffixes will be added to differentiate between predicted and reference trajectories.

aphin.utils.data.dataset module

Encapsulate data loading and data generation

class aphin.utils.data.dataset.Dataset(t, X, X_dt, U=None, Mu=None, J=None, R=None, Q=None, B=None)[source]

Bases: Data

Container for multiple datasets (train and test) with states, inputs, and parameters.

Parameters:

tndarray

Array of time steps.

Xndarray

States array with shape (n_sim, n_t, n_n, n_dn).

X_dtndarray

Time derivatives of the states, with the same shape as X.

Undarray, optional

Input array with shape (n_sim, n_t, n_u). Default is None.

Mundarray, optional

Parameters array with shape (n_sim, n_mu). Default is None.

Jndarray, optional

pH interconnection matrix with shape (r, r, n_sim). Default is None.

Rndarray, optional

pH dissipation matrix with shape (r, r, n_sim). Default is None.

Qndarray, optional

pH energy matrix with shape (r, r, n_sim). Default is None.

Bndarray, optional

pH input matrix with shape (r, n_u, n_sim). Default is None.

Attributes:

TRAINDataset or None

Training dataset. Initialized to None.

TESTDataset or None

Testing dataset. Initialized to None.

Notes:

  • Inherits from the Data class, which handles the initialization and validation of the time steps, states, inputs, and parameters.

  • This class is designed to facilitate the handling of multiple datasets, including training and testing datasets.

property Data

Retrieve the state and derivative data from the container for the training dataset.

Returns:

A tuple containing: - X: States array with shape (n_sim, n_t, n_n, n_dn). - X_dt: Time derivatives of the states, with the same shape as X. - U: Input array with shape (n_sim, n_t, n_u), if available. - Mu: Parameters array with shape (n_sim, n_mu), if available.

Return type:

tuple

property Data_test

Retrieve the state and derivative data from the test dataset.

Returns:

A tuple containing: - X: States array with shape (n_sim, n_t, n_n, n_dn) from the test dataset. - X_dt: Time derivatives of the states, with the same shape as X, from the test dataset. - U: Input array with shape (n_sim, n_t, n_u) from the test dataset, if available. - Mu: Parameters array with shape (n_sim, n_mu) from the test dataset, if available.

Return type:

tuple

calculate_errors(ph_identified_data_instance, domain_split_vals=None, save_to_txt=False, result_dir=None)[source]

Calculate and store RMS and latent errors between true and predicted states.

This method computes the Root Mean Square (RMS) errors for the states and latent variables between the true and predicted values. It calculates the RMS errors for each domain, computes the mean RMS error across all simulations, and, if available, computes the latent variable errors.

Parameters:

ph_identified_data_instanceData

An instance of the Data class containing the predicted state variables and latent variables (if applicable). This instance is used to compute errors against the true states stored in the current instance.

domain_split_valslist of int, optional

List specifying the number of degrees of freedom (DOFs) for each domain. If provided, it splits the states into domains for more granular error analysis.

Returns:

None

The method updates internal attributes to store the computed state errors and latent errors (if latent variables are present).

Notes:

  • self.state_error_list contains the RMS errors for each domain.

  • self.state_error_mean is the mean RMS error averaged over all simulations.

  • self.latent_error and self.latent_error_mean are computed only if the ph_identified_data_instance

contains latent variables (Z).

property data

Get training data.

Returns:

The training dataset object. This property returns the data associated with the training dataset, which is accessible through self.TRAIN.data.

Return type:

Data

decrease_num_simulations(num_sim: int, seed=None)[source]

Reduce the number of training simulations to a specified target number by randomly selecting a subset.

Parameters:
  • num_sim (int) – The target number of simulations to retain.

  • seed (int, optional) – Random seed for reproducibility of the selection process. If not provided, a random seed is used.

Notes

  • This method only affects the training dataset (TRAIN), not the test dataset (TEST).

decrease_num_time_steps(num_time_steps: int)[source]

Truncate the number of time steps in both the training and testing datasets to the specified target.

Parameters:

num_time_steps (int) – The target number of time steps to retain in the datasets.

Notes

  • This method applies the truncation to both the training dataset (TRAIN) and the testing dataset (TEST).

features_to_states()[source]

Transforms the state array into a feature array for identification purposes.

This method reshapes the state array X and its time derivatives X_dt into feature arrays that are required for system identification. It also reshapes the input array U and parameter array Mu if they are provided.

The transformation results in: - x: a feature array of shape (n_s, n_f), where:

  • n_s is the number of samples (n_sim * n_t)

  • n_f is the number of features (n_n * n_dn)

  • u: reshaped input array of shape (n_s, n_u) if U is provided

  • mu: reshaped parameters array of shape (n_s, n_mu) if Mu is provided

Returns:

None

The method updates the internal state of the object with the transformed feature arrays.

Notes

  • The transformation applies to both the training and testing datasets.

property ph_matrices

Get port-Hamiltonian matrices from the training dataset.

Returns:

A tuple containing the port-Hamiltonian matrices associated with the training dataset, accessible through self.TRAIN.ph_matrices.

Return type:

tuple

property ph_matrices_test

Get port-Hamiltonian matrices from the testing dataset.

Returns:

A tuple containing the port-Hamiltonian matrices associated with the testing dataset, accessible through self.TEST.ph_matrices.

Return type:

tuple

rescale_X()[source]

Undo the scaling of states and time derivatives. The scaling is applied to both the training and testing datasets.

This method reverses the scaling applied to the state array X with shape (n_sim, n_t, n_n, n_dn) and the time derivatives X_dt. It must be called after scale_X has been performed to restore the original data values. The method assumes that scaling has been previously applied and will raise an error if no scaling has been performed.

Parameters:

None

Returns:

None

The method updates the internal state to reflect the rescaled data and sets self.is_scaled to False.

scale_Mu(mu_train_bounds=None, desired_bounds=[-1, 1])[source]

Scale parameter values to the specified range and return the scaled values along with scaling factors. The scaling is applied to both the training and testing datasets.

This method scales the parameter values in Mu to fit within the desired_bounds range. Each parameter dimension is scaled individually. Scaling can be based on maximum values, a scalar, or provided training bounds.

Parameters:

desired_boundslist, scalar, or “max”, optional

Desired range for the scaled values. It can be: - A list of two values (e.g., [-1, 1]) specifying the lower and upper bounds. - A scalar, where all values are scaled by this single value. - The string “max”, where scaling is based on the maximum value in Mu.

mu_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_mu), where n_mu is the number of parameter dimensions in Mu. If not provided, scaling factors are computed from the data and scaling values from the training dataset are used for the testing dataset.

Returns:

None

The method updates the internal state to reflect the rescaled parameters.

scale_U(u_train_bounds=None, desired_bounds=[-1, 1])[source]

Scale input values to a specified range. The scaling is applied to both the training and testing datasets.

This method scales the input values U to the range defined by desired_bounds, which is typically [-1, 1]. Each dimension of the input is scaled individually. The scaling factors are either provided through u_train_bounds or computed based on the provided data.

Parameters:

u_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_u), where n_u is the number of input dimensions. If provided, these factors are used for scaling the input values. If not provided, bounds are computed from the data and these values from the training dataset are used for the testing dataset.

desired_boundslist or scalar, optional

Desired range for the scaled inputs. Can be: - A list of two values (e.g., [-1, 1]) to specify the lower and upper bounds. - A scalar to scale all input values by this value. - The string “max” to scale all input values by the maximum value observed.

Returns:

None

The method updates the internal state by scaling the input values U, setting self.u_train_bounds, and updating self.desired_bounds_u. The scaled inputs are reshaped into feature format.

scale_X(scaling_values=None, domain_split_vals=None)[source]

Scale the state array based on specified scaling values. The scaling is applied to both the training and testing datasets.

This method scales the state array X with shape (n_sim, n_t, n_n, n_dn) and afterwards the feature array x with shape (n_t * n_s, n) using provided scaling values. It can handle multiple domains if specified by domain_split_vals. If scaling values are not provided, it defaults to scaling by the maximum value in each domain.

Parameters:

scaling_valueslist of float, optional

Scalar values used to scale each domain, as defined by domain_split_vals. If None, scaling values of the training dataset are used for the testing dataset

domain_split_valslist of int, optional

List of integers specifying the number of degrees of freedom (DOFs) for each domain. The sum of these values must equal n_dn. If None, the data is treated as a single domain.

Returns:

None

The method updates the internal state to reflect the scaled data and sets self.is_scaled to True.

Notes:

  • The method performs scaling for both the state data (X) and its time derivatives (X_dt).

  • If scaling_values are not provided, the maximum value for each domain is used for scaling.

  • The method assumes that the domains defined by domain_split_vals add up to the total number of DOFs (n_dn).

  • After scaling, the feature representation of states is updated using self.states_to_features().

scale_all(scaling_values=None, domain_split_vals=None, u_train_bounds=None, u_desired_bounds=[-1, 1], mu_train_bounds=None, mu_desired_bounds=[-1, 1])[source]

Scales states X, inputs U, and parameters Mu.

This method applies scaling to the states, inputs, and parameters of both the training and testing datasets. It first scales the state data using the specified scaling values and domain splits, then scales the inputs and parameters according to the given bounds.

Parameters:

scaling_valueslist of float, optional

Scalar values used to scale each domain of the state data. If None, scaling is based on the maximum value of each domain. See scale_X for more details.

domain_split_valslist of int, optional

List specifying the number of degrees of freedom (DOFs) in each domain. The sum of these values must equal the total number of DOFs per node (self.n_dn). If None, the data is treated as a single domain. See scale_X for more details.

u_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_u). The lower and upper bounds of the training input data U. If None, the bounds are automatically determined from the training data.

u_desired_boundslist or scalar, optional

Desired range for the scaled inputs. Can be: - A list of two values (e.g., [-1, 1]) to specify the lower and upper bounds. - A scalar to scale all input values by this value. - The string “max” to scale all input values by the maximum value observed. Default is [-1, 1].

mu_train_boundsnp.ndarray, optional

Scaling factors obtained from the training parameter dataset. Expected shape is (2, n_mu). The lower and upper bounds of the training parameter data Mu. If None, the bounds are automatically determined from the training data.

mu_desired_boundslist, scalar, or “max”, optional

Desired range for the scaled values. It can be: - A list of two values (e.g., [-1, 1]) specifying the lower and upper bounds. - A scalar, where all values are scaled by this single value. - The string “max”, where scaling is based on the maximum value in Mu. Default is [-1, 1].

Returns:

None

The method updates the internal state, input, and parameter data to reflect the scaled values.

Notes:

  • This method scales the state data X, input data U, and parameter data Mu of both the training

and testing datasets. - After scaling, the feature representation of the states is updated using self.scale_X. - The scaling for inputs and parameters is performed based on the specified or automatically determined bounds.

property shape

Return the shape of the training dataset.

Returns:

tuple

A tuple containing: - n_sim: number of simulations - n_t: number of time steps - n_n: number of nodes - n_dn: number of degrees of freedom per node - n_u: number of inputs - n_mu: number of parameters

property shape_test

Return the shape of the testing dataset.

Returns:

tuple

A tuple containing: - n_sim: number of simulations - n_t: number of time steps - n_n: number of nodes - n_dn: number of degrees of freedom per node - n_u: number of inputs - n_mu: number of parameters

split_state_into_domains(domain_split_vals)[source]

Splits the state array into different domains based on the specified dimensions. The splitting is applied to both the training and testing datasets.

This method divides the state array X into multiple domains according to the given domain_split_vals. Each domain corresponds to a subset of degrees of freedom (dofs), and the sum of these values must equal the total number of dofs per node (self.n_dn).

Parameters:

domain_split_valslist of int, optional

List specifying the number of dofs in each domain. For example, [1, 2, 2] indicates three domains with 1, 2, and 2 dofs, respectively. The sum of these values must equal self.n_dn. If None, the entire state is considered as a single domain.

Returns:

X_dom_list: list of ndarray

A list where each element is a state array of the training dataset corresponding to a specific domain. The length of the list equals the number of domains specified by domain_split_vals.

X_dom_test_list: list of ndarray

A list where each element is a state of the testing dataset array corresponding to a specific domain. The length of the list equals the number of domains specified by domain_split_vals.

states_to_features()[source]

Transforms the state array into a feature array for identification purposes.

This method reshapes the state array X and its time derivatives X_dt into feature arrays that are required for system identification. It also reshapes the input array U and parameter array Mu if they are provided.

The transformation results in: - x: a feature array of shape (n_s, n_f), where:

  • n_s is the number of samples (n_sim * n_t)

  • n_f is the number of features (n_n * n_dn)

  • u: reshaped input array of shape (n_s, n_u) if U is provided

  • mu: reshaped parameters array of shape (n_s, n_mu) if Mu is provided

Returns:

None

The method updates the internal state of the object with the transformed feature arrays.

Notes

  • The transformation applies to both the training and testing datasets.

property test_data

Get test data.

Returns:

The testing dataset object. This property returns the data associated with the testing dataset, which is accessible through self.TEST.data.

Return type:

Data

train_test_split(test_size, seed)[source]

Split the data into training and testing sets.

Parameters:
  • test_size (float or int) – Proportion of the dataset to include in the test split if a float, or the absolute number of test samples if an integer.

  • seed (int) – Random seed for reproducibility of the split.

Returns:

This method does not return any values. Instead, it assigns the split datasets to the TRAIN and TEST attributes of the instance.

Return type:

None

train_test_split_sim_idx(sim_idx_train, sim_idx_test)[source]

Manually split the data into training and testing sets based on simulation indices.

Parameters:
  • sim_idx_train (list or array-like) – List or array of indices for selecting the training simulations.

  • sim_idx_test (list or array-like) – List or array of indices for selecting the testing simulations.

Notes

  • The method ensures that there are no overlapping indices between the training and testing sets.

  • The method updates the TRAIN and TEST attributes with the corresponding subsets of data, including states, time derivatives, inputs, parameters, and pH matrices.

truncate_time(trunc_time_ratio)[source]

Truncates the time values of states for performing time generalization experiments. Only training data is truncated!

This method shortens the time series of states and associated data based on the given ratio. The truncation is applied to time steps, states, time derivatives, inputs, and parameters, if they are provided.

Parameters:

trunc_time_ratiofloat

Ratio of the time series to retain. A value of 1.0 means no truncation, while a value between 0 and 1 truncates the time series to the specified proportion of the total time steps.

Returns:

None

The method modifies the internal state of the object to reflect the truncated time series.

class aphin.utils.data.dataset.DiscBrakeDataset(t, X, X_dt=None, U=None, Mu=None, use_velocities=False, **kwargs)[source]

Bases: Dataset

A dataset class specifically for handling data related to the linear thermoelastic disc brake model.

classmethod from_data(data_path, use_velocities=False, **kwargs)[source]

Reads data from a .npz file and returns it as a dictionary.

Parameters:

data_pathstr

Path to the .npz file or the directory containing the .npz file. If a directory is provided, the method searches for the first .npz file in the directory.

Returns:

dict

Dictionary containing the following keys: - ‘t’: ndarray, array of time steps. - ‘X’: ndarray, states array. - ‘X_dt’: ndarray, time derivatives of the states (optional, may be None). - ‘U’: ndarray, input array (optional, may be None). - ‘Mu’: ndarray, parameters array (optional, may be None). - ‘J’: ndarray, pH interconnection matrix (optional, may be None). - ‘R’: ndarray, pH dissipation matrix (optional, may be None). - ‘Q’: ndarray, pH energy matrix (optional, may be None). - ‘B’: ndarray, pH input matrix (optional, may be None).

classmethod from_txt(txt_path, idx_mu=None, n_t=None, t_start=0.0, save_cache=False, cache_path=None, use_velocities=False, **kwargs)[source]

Load a disc brake dataset from .txt files generated by Abaqus and postprocessed with Abaqus-Python.

This method reads temperature and displacement values for all nodes from a specified directory containing .txt files obtained from the Abaqus field outputs. The method handles the parsing and processing of these files, including optional downsampling of time steps and extraction of parameters that influence the system. The order of the trajectories and parameters Mu may not match the sample numbers from the data files obtained after the postprocessing with Abaqus-Python.

Parameters:

txt_pathstr

Path to the folder containing the .txt files.

idx_muarray-like, optional

Index numbers of the columns corresponding to parameters (not inputs) that influence the system. If None, parameter extraction is skipped. Default is None.

n_tint, optional

Number of time steps after downsampling. If None, all time steps are used. Default is None.

t_startfloat, optional

The starting time for the data. Data before this time is discarded. Default is 0.0.

save_cachebool, optional

If True, the processed dataset will be saved to a cache file at cache_path. Default is False.

cache_pathstr, optional

Path where the cached dataset will be saved if save_cache is True. Default is None.

use_velocitiesbool, optional

If True, the dataset will include velocity information by augmenting the state arrays. Default is False.

**kwargsdict, optional

Additional arguments passed to the DiscBrakeDataset constructor.

Returns:

DiscBrakeDataset

An instance of the DiscBrakeDataset class containing the loaded and processed data.

class aphin.utils.data.dataset.PHIdentifiedDataset[source]

Bases: Dataset

Class representing the identified port-Hamiltonian data which was obtained through the aphin framework. Used to store the PHIdentifiedDataset(Data) class of the training and testing dataset.

classmethod from_identification(data, system_layer, ph_network, integrator_type='IMR', decomp_option='lu', **kwargs)[source]

Create an instance of PHIdentifiedDataset from the identified pH system. Creates these instances under TRAIN and TEST.

This method generates both the training and testing datasets by processing the results obtained from a port-Hamiltonian identification procedure. It uses the specified system layer, pH network, integrator type, and decomposition option to compute the relevant data.

Parameters:

dataDataset

The dataset containing the raw data for training and testing.

system_layerPHLayer or PHQLayer

The port-Hamiltonian system layer that defines the system matrices.

ph_networkPHIN or APHIN

The network responsible for the identification of the pH system.

integrator_typestr, optional

The type of integrator to use for time integration (default is “IMR”).

decomp_optionstr, optional

The decomposition option for the pH matrices (default is “lu”).

**kwargsdict, optional

Additional parameters to pass to the identification process.

Returns:

PHIdentifiedDataset

An instance of PHIdentifiedDataset containing the training and testing datasets with the identified pH system results.

Notes:

  • This method logs the progress of obtaining results for both the training and testing datasets.

  • The PHIdentifiedData.from_identification method is called separately for training and testing datasets.

Module contents