WPLSC

class pyplsc.WPLSC(boot_stat='score-covariate-corr', svd_method='lapack', random_state=None)

Within-participants PLSC (Roberts et al., 2016). Used for analyzing within-partcipants correlations. Cross-correlation matrices are computed within participants, averaged, and submitted to singular value decomposition.

Parameters:
  • boot_stat (str, optional) –

    Name of statistic to recompute on each bootstrap resample to get a confidence interval. Must be one of:

    • 'score-covariate-corr' (default): Correlations between covariates and data scores (i.e., output of transform()). Covariates and data may be original or resampled but scores are always computed by multiplying data by data_sals_ (i.e., the saliences from the initial decomposition). This is the what is computed in the original Matlab version of PLS.

    • 'condwise-scores': Condition-wise average data (original or resampled) multiplied by data_sals_.

  • svd_method (str, optional) –

    Method to use for singular value decomposition. Must be one of:

    • 'lapack' (default): use numpy.linalg.svd.

    • 'randomized': use sklearn.utils.extmath.randomized_svd.

  • random_state (int, optional) – Random state of model for reproducible premutation and bootstrap resampling. Passed to numpy.random.default_rng internally. Default is None.

bootstrap(n_boot=5000, confint_level=0.95, alignment_method='rotate-design-sals', return_boot_stat_dist=True, n_jobs=1, print_prog=True)

Perform (stratified) bootstrap resampling to assess the reliability of the data saliences.

Parameters:
  • n_boot (int, optional) – Number of bootstrap resamples to compute. The default is 5000.

  • confint_level (float, optional) – The confidence level of the quantile-based confidence intervals to compute. The default is 0.95.

  • alignment_method (string, optional) –

    Method to be used for aligning recomputed data saliences with original data saliences. Must be one of: ‘rotate-design-sals’ and ‘rotate-data-sals’ use the solution to the orthogonal Proctrustes problem to align the recomputed design or data saliences, respectively, with the originals. ‘flip-signs’ flips the signs of the resampled data saliences so that their inner products with original saliences are positive. The default is ‘rotate-design-sals’.

    • 'rotate-design-sals' (default): Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original design saliences, then apply this to the recomputed data saliences. This is the what is computed in the original Matlab version of PLS.

    • 'rotate-data-sals': Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original data saliences, then apply this to the recomputed data saliences.

    • 'flip-design-sals': Find the set of sign flips that ensures the inner product of the recomputed and original design saliences are positive, then apply these sign flips to the recomputed data saliences.

    • 'flip-data-sals': Find the set of sign flips that ensures the inner product of the recomputed and original data saliences are positive, then apply these sign flips to the recomputed data saliences.

    • 'none': Perform no alignment.

  • return_boot_stat_dist (bool, optional) – If True, distribution of boot_stat from resampling is returned. This is the distribution used to compute quantile-based confidence intervals. Default is True.

  • n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.

  • print_prog (bool, optional) – Specifies whether to display a progress bar. Default is True.

Returns:

design_resampled – If return_boot_dist is true, returns the bootstrap distribution of the statistic named by boot_stat

Return type:

numpy.ndarray

Examples

>>> mod.bootstrap(1000, n_jobs=-1)
>>> print(mod.data_sals_z_)
>>> print(mod.boot_stat_ci[..., 0]) # Print CI of boot_stat for first LV
fit(data, covariates, design=None, within=None, participant=None, weighted=False)

Fit a within-participants PLSC model.

Parameters:
  • data (list) – List of participant-specific data arrays. Each should be a numpy.ndarray of shape (n. trials, n. observed vars).

  • covariates (list or str) – List of participant-specific covariates (in which case each list element must be a valid covariates argument to PLSC.fit), or the names of the columns in design that contain the covariates.

  • design (list, optional) – List of participant-specific design matrices. Each list element must be a valid design argument to PLSC.fit. The default is None.

  • within (list or str, optional) – List of participant-specific indicators of within-participant condition (in which case each list element must be a valid between argument to PLSC.fit), or the names of the columns in design that contain the within-participant condition indicators.

  • participant (list, optional) – A list of participant identifiers (integers or strings).

  • weighted (bool, optional) – Specifies whether participant-level cross-covariance matrices should weighted by number of trials when averaged together. Default is False.

Return type:

None

Examples

>>> # Simulate null data
>>> n_var = 10
>>> ptptwise_n_trials = [10, 10, 9, 8, 12]
>>> data = [np.random.normal(size=(n_trials, n_var)) for n_trials in ptptwise_n_trials]
>>> covs = [np.random.normal(size=(n_trials, 1)) for n_trials in ptptwise_n_trials]
>>> # Fit model
>>> mod = pyplsc.WPLSC()
>>> mod.fit(data=data, covariates=covs, weighted=True)
flip_signs(lv_idx=None)

Flips the signs of one or more latent variables, to aid with interpretation.

Parameters:
  • lv_idx (indexer) – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.

  • lv_idx – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.

Examples

>>> mod.flip_signs() # Flip all signs
>>> mod.flip_signs(0) # Flip signs for the first latent variable
>>> mod.flip_signs([0, 1]) # Flip signs for the first two   latent variables
get_boot_stat_frame(lv_idx=None)

Get boot_stat as a dataframe, including upper and lower confidence limits if bootstrap resampling has been done.

Parameters:

lv_idx (indexer, optional) – Index of latent variable the dataframe should cover. The default is None, which yields a dataframe covering all latent variables.

Returns:

dfboot_stat as a dataframe.

Return type:

pandas.dataframe

get_boot_stat_yerr(lv_idx)

Get yerr for statistic named by boot_stat that can be passed to a matplotlib bar plot.

Parameters:

lv_idx (int) – Integer indexing the latent variable of interest.

Returns:

yerr – 2D array with shape (2, n. design saliences) that can be passed to matplotib’s pyplot.bar() as the yerr= argument.

Return type:

numpy.ndarray

Examples

>>> # Make bar plot of boot_stat
>>> x = mod.design_sal_labels_['between']
>>> lv_idx = 0 # First latent variable
>>> height = mod.boot_stat_val_[:, lv_idx]
>>> yerr = mod.get_boot_stat_yerr(lv_idx)
>>> matplotlib.pyplot.bar(x=x, height=height, yerr=yerr)
get_scores_frame(lv_idx=None)

Get dataframe containing design and data scores for each trial.

Parameters:

lv_idx (indexer, optional) – Index of latent variable(s) for which to include design and data scores. The default is None, which includes scores for all latent variables.

Returns:

df – Dataframe containing design and data scores for each trial.

Return type:

pandas.dataframe

Notes

Data is in long format, with a column specifying the latent variable corresponding to each score.

Examples

>>> mod.get_scores_frame().to_csv('scores.csv')
permute(n_perm=5000, return_null_dist=True, n_jobs=1, print_prog=True)

Perform permutation testing to assess the significance of the latent variables. p values become available after running this method through the pvals_ property.

Parameters:
  • n_perm (int, optional) – Number of permutations t operform. The default is 5000.

  • return_null_dist (bool, optional) – If True, permutation samples will be returned as a 2D (n. perms, n. latent vars) array. Default is True.

  • n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.

  • print_prog (bool, optional) – Specifies whether to display a progress bar. Default is True.

Returns:

null_dist – 2D array containing null distribution of singular values, where each row is a different permutation and each columns is a different singular value.

Return type:

numpy.ndarray

Examples

>>> mod.permute(n_perm=1000, n_jobs=-1)
>>> print(mod.pvals_)
transform(data=None, lv_idx=None)

Compute data scores, i.e., coordinates of array data in the new basis defined by the latent variables, by multiplying a data array by the data saliences (the data_sals_ property)

Parameters:
  • data (numpy.ndarray, optional) – Data to transform. The default is None, which yields scores for the data on which the model was fit (the data_ property).

  • lv_idx (indexer, optional) – Index of latent variable(s) for which to compute scores. Default is None, which computes scores for all latent variables.

Returns:

data_scores – A 2D array of scores where rows correspond to different observations and columns correspond to different latent variables.

Return type:

numpy.ndarray

Examples

>>> scores = mod.transform() # Get scores for data used to fit model
>>> scores = mod.transform(new_data) # Get scores for new data
boot_stat

Name of statistic whose distribution is derived during bootstrap resampling.

Type:

str

boot_stat_ci_

Confidence interval on stat named by boot_stat derived from bootstrap resampling. CI level is determined by confint_level_. Set by bootstrap().

Type:

numpy.ndarray

boot_stat_val_

numpy.ndarray Point estimate from initial decomposition of statistic whose distribution is derived during bootstrap() resampling. Set by fit().

confint_level_

Level of confidence interval on stat named by boot_stat to derive during bootstrap resampling (e.g., 0.95). Set by bootstrap().

Type:

float

data_

Data used to fit model. Set by fit().

Type:

numpy.ndarray

data_sals_

Right saliences/singular vectors used to compute data scores. Shape (n. observed vars, n. latent vars). Set by fit().

Type:

numpy.ndarray

data_sals_std_

Standard deviations of data saliences (data_sals_) as estimated during bootstrap resampling. Set by bootstrap().

Type:

numpy.ndarray

data_sals_z_

Data saliences (data_sals_) divided by their standard deviations (data_sals_std_) as estimated during bootstrap resampling. Set by bootstrap().

Type:

numpy.ndarray

design_sals_

Left saliences/singular vectors used to compute design scores. Shape (n. design saliences, n.latent variables). Set by fit().

Type:

numpy.ndarray

models_

Participant-specific PLSC models. Set by fit().

Type:

list

n_boot_

Number of bootstrap resamples used. Set by bootstrap().

Type:

int

n_sv_

Number of singular values, i.e., the number of latent variable pairs in the model. Set by fit().

Type:

int

participant_labels_

Participants labels. Set by fit().

Type:

pandas.Categorical

pvals_

Permutation p values for the latent variable pairs. Set by permute().

Type:

numpy.ndarray

random_state

Random state for reproducible permutation and bootstrap resampling.

Type:

int

singular_vals_

Singular values from the decomposition of the mean-centred data. Set by fit().

Type:

numpy.ndarray

svd_method

SVD method used

Type:

str

variance_explained_

Proportion of variance explained by each latent variable pair. Set by fit().

Type:

np.ndarray

weights_

Weights, based on number of trials, applied to participant-level cross-correlation matrices when averaged together. Set by weighted argument to fit().

Type:

numpy.ndarray