PLSC

class pyplsc.PLSC(boot_stat='score-covariate-corr', svd_method='lapack', random_state=None)

Partial least squares correlation model, also known as behavioural PLS. Used for analyzing between-participants relationships between data and covariates across multiple conditions. For analyzing within-participant correlations, see WPLSC.

Parameters:
  • boot_stat (str, optional) –

    Name of statistic to recompute on each bootstrap resample to get a confidence interval. Must be one of:

    • 'score-covariate-corr' (default): Correlations between covariates and data scores (i.e., output of transform()). Covariates and data may be original or resampled but scores are always computed by multiplying data by data_sals_ (i.e., the saliences from the initial decomposition). This is the what is computed in the original Matlab version of PLS.

    • 'condwise-scores': Condition-wise average data (original or resampled) multiplied by data_sals_.

  • svd_method (str, optional) –

    Method to use for singular value decomposition. Must be one of:

    • 'lapack' (default): use numpy.linalg.svd.

    • 'randomized': use sklearn.utils.extmath.randomized_svd.

  • random_state (int, optional) – Random state of model for reproducible premutation and bootstrap resampling. Passed to numpy.random.default_rng internally. Default is None.

bootstrap(n_boot=5000, confint_level=0.95, alignment_method='rotate-design-sals', return_boot_stat_dist=True, n_jobs=1, print_prog=True)

Perform (stratified) bootstrap resampling to assess the reliability of the data saliences.

Parameters:
  • n_boot (int, optional) – Number of bootstrap resamples to compute. The default is 5000.

  • confint_level (float, optional) – The confidence level of the quantile-based confidence intervals to compute. The default is 0.95.

  • alignment_method (string, optional) –

    Method to be used for aligning recomputed data saliences with original data saliences. Must be one of: ‘rotate-design-sals’ and ‘rotate-data-sals’ use the solution to the orthogonal Proctrustes problem to align the recomputed design or data saliences, respectively, with the originals. ‘flip-signs’ flips the signs of the resampled data saliences so that their inner products with original saliences are positive. The default is ‘rotate-design-sals’.

    • 'rotate-design-sals' (default): Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original design saliences, then apply this to the recomputed data saliences. This is the what is computed in the original Matlab version of PLS.

    • 'rotate-data-sals': Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original data saliences, then apply this to the recomputed data saliences.

    • 'flip-design-sals': Find the set of sign flips that ensures the inner product of the recomputed and original design saliences are positive, then apply these sign flips to the recomputed data saliences.

    • 'flip-data-sals': Find the set of sign flips that ensures the inner product of the recomputed and original data saliences are positive, then apply these sign flips to the recomputed data saliences.

    • 'none': Perform no alignment.

  • return_boot_stat_dist (bool, optional) – If True, distribution of boot_stat from resampling is returned. This is the distribution used to compute quantile-based confidence intervals. Default is True.

  • n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.

  • print_prog (bool, optional) – Specifies whether to display a progress bar. Default is True.

Returns:

design_resampled – If return_boot_dist is true, returns the bootstrap distribution of the statistic named by boot_stat

Return type:

numpy.ndarray

Examples

>>> mod.bootstrap(1000, n_jobs=-1)
>>> print(mod.data_sals_z_)
>>> print(mod.boot_stat_ci[..., 0]) # Print CI of boot_stat for first LV
fit(data, covariates, design=None, between=None, within=None, participant=None)

Fit a partial least squares correlation model.

Parameters:
  • data (numpy.ndarray) – Data array of shape (n. observations, n. features). Each row should contain the average data for a participant, possibly the average for some within-participants condition for a participant.

  • covariates (numpy.ndarray or pandas.DataFrame or list) – 2D data of size (n. observations, n. features) to be used as covariates, or names of columns in design containing covariates.

  • design (pandas.DataFrame, optional) – DataFrame with columns to indicate between-participant group membership, within-participant condition, and/or participant identity, as applicable. The default is None.

  • between (str or iterable, optional) – Between-participants condition. This can be specified as a string referring to the appropriate column in design or as an iterable containing an indicator of group membership (e.g., a list of strings or integers). The default is None, indicating an absence of between-participant conditions.

  • within (str or iterable, optional) – Within-participants condition. This can be specified as a string referring to the appropriate column in design or as an iterable containing an indicator of condition (e.g., a list of strings or integers). The default is None, indicating an absence of within-participant conditions.

  • participant (str or iterable, optional) – Participant identifier. This can be specified as a string referring to the appropriate column in design or as an iterable containing an indicator of participant identity (e.g., a list of strings or integers). The default is None, which is only permitted when there are no within-participant conditions.

Examples

>>> mod = pyplsc.PLSC()
>>> data = numpy.random.normal(size=(4, 3)) # 4 observations of 3 variables
>>> covariates = numpy.random.normal(size=(4, 2)) # 4 observations of 2 covariates
>>> design = pandas.DataFrame({'group': [0, 0, 1, 1]})
>>> # Pattern 1: provide design matrix, specify column names of condition indicators
>>> mod.fit(data, design, between='group')
>>> # Pattern 2: provide condition indicators directly as iterables
>>> mod.fit(data, between)
flip_signs(lv_idx=None)

Flips the signs of one or more latent variables, to aid with interpretation.

Parameters:
  • lv_idx (indexer) – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.

  • lv_idx – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.

Examples

>>> mod.flip_signs() # Flip all signs
>>> mod.flip_signs(0) # Flip signs for the first latent variable
>>> mod.flip_signs([0, 1]) # Flip signs for the first two   latent variables
get_boot_stat_frame(lv_idx=None)

Get boot_stat as a dataframe, including upper and lower confidence limits if bootstrap resampling has been done.

Parameters:

lv_idx (indexer, optional) – Index of latent variable the dataframe should cover. The default is None, which yields a dataframe covering all latent variables.

Returns:

dfboot_stat as a dataframe.

Return type:

pandas.dataframe

get_boot_stat_yerr(lv_idx)

Get yerr for statistic named by boot_stat that can be passed to a matplotlib bar plot.

Parameters:

lv_idx (int) – Integer indexing the latent variable of interest.

Returns:

yerr – 2D array with shape (2, n. design saliences) that can be passed to matplotib’s pyplot.bar() as the yerr= argument.

Return type:

numpy.ndarray

Examples

>>> # Make bar plot of boot_stat
>>> x = mod.design_sal_labels_['between']
>>> lv_idx = 0 # First latent variable
>>> height = mod.boot_stat_val_[:, lv_idx]
>>> yerr = mod.get_boot_stat_yerr(lv_idx)
>>> matplotlib.pyplot.bar(x=x, height=height, yerr=yerr)
get_scores_frame(lv_idx=None)

Get dataframe containing design and data scores for each observation in data_, alongside condition information from the design matrix (design_).

Parameters:

lv_idx (indexer, optional) – Index of latent variable(s) for which to include design and data scores. The default is None, which includes scores for all latent variables.

Returns:

df – Dataframe containing design and data scores for each observation.

Return type:

pandas.dataframe

Notes

Data is in long format, with a column specifying the latent variable corresponding to each score.

Examples

>>> mod.get_scores_frame().to_csv('scores.csv')
permute(n_perm=5000, return_null_dist=True, n_jobs=1, print_prog=True)

Perform permutation testing to assess the significance of the latent variables. p values become available after running this method through the pvals_ property.

Parameters:
  • n_perm (int, optional) – Number of permutations t operform. The default is 5000.

  • return_null_dist (bool, optional) – If True, permutation samples will be returned as a 2D (n. perms, n. latent vars) array. Default is True.

  • n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.

  • print_prog (bool, optional) – Specifies whether to display a progress bar. Default is True.

Returns:

null_dist – 2D array containing null distribution of singular values, where each row is a different permutation and each columns is a different singular value.

Return type:

numpy.ndarray

Examples

>>> mod.permute(n_perm=1000, n_jobs=-1)
>>> print(mod.pvals_)
transform(data=None, lv_idx=None)

Compute data scores, i.e., coordinates of array data in the new basis defined by the latent variables, by multiplying a data array by the data saliences (the data_sals_ property)

Parameters:
  • data (numpy.ndarray, optional) – Data to transform. The default is None, which yields scores for the data on which the model was fit (the data_ property).

  • lv_idx (indexer, optional) – Index of latent variable(s) for which to compute scores. Default is None, which computes scores for all latent variables.

Returns:

data_scores – A 2D array of scores where rows correspond to different observations and columns correspond to different latent variables.

Return type:

numpy.ndarray

Examples

>>> scores = mod.transform() # Get scores for data used to fit model
>>> scores = mod.transform(new_data) # Get scores for new data
boot_stat

Name of statistic whose distribution is derived during bootstrap resampling.

Type:

str

boot_stat_ci_

Confidence interval on stat named by boot_stat derived from bootstrap resampling. CI level is determined by confint_level_. Set by bootstrap().

Type:

numpy.ndarray

boot_stat_val_

numpy.ndarray Point estimate from initial decomposition of statistic whose distribution is derived during bootstrap() resampling. Set by fit().

confint_level_

Level of confidence interval on stat named by boot_stat to derive during bootstrap resampling (e.g., 0.95). Set by bootstrap().

Type:

float

covariates_

Data frame containing covariate data. One column per covariate, one row per observation. Set by fit().

Type:

pd.dataframe

data_

Data used to fit model. Set by fit().

Type:

numpy.ndarray

data_sals_

Right saliences/singular vectors used to compute data scores. Shape (n. observed vars, n. latent vars). Set by fit().

Type:

numpy.ndarray

data_sals_std_

Standard deviations of data saliences (data_sals_) as estimated during bootstrap resampling. Set by bootstrap().

Type:

numpy.ndarray

data_sals_z_

Data saliences (data_sals_) divided by their standard deviations (data_sals_std_) as estimated during bootstrap resampling. Set by bootstrap().

Type:

numpy.ndarray

design_

Design matrix with columns “between”, “within”, and “participant”. Set by fit().

Type:

pandas.DataFrame

design_sal_labels_

Dataframe with rows corresponding to rows of the design saliences and columns specifying between-participants conditions, within-participants conditions, and covarites. Set by fit().

Type:

pandas.DataFrame

design_sals_

Left saliences/singular vectors used to compute design scores. Shape (n. design saliences, n.latent variables). Set by fit().

Type:

numpy.ndarray

design_scores_

numpy.ndarray Design scores for the data used to fit the model. Set by fit().

n_boot_

Number of bootstrap resamples used. Set by bootstrap().

Type:

int

n_sv_

Number of singular values, i.e., the number of latent variable pairs in the model. Set by fit().

Type:

int

pvals_

Permutation p values for the latent variable pairs. Set by permute().

Type:

numpy.ndarray

random_state

Random state for reproducible permutation and bootstrap resampling.

Type:

int

singular_vals_

Singular values from the decomposition of the mean-centred data. Set by fit().

Type:

numpy.ndarray

stratifier_

Integer array that indexes each unique combination of between- and within-participants condition. Used to stratify the data for mean-centering. Set by fit().

Type:

numpy.ndarray

svd_method

SVD method used

Type:

str

variance_explained_

Proportion of variance explained by each latent variable pair. Set by fit().

Type:

np.ndarray