BDA¶
- class pyplsc.BDA(boot_stat='condwise-scores-centred', svd_method='lapack', random_state=None)¶
Barycentric discriminant analysis model, also known as mean-centred PLS. Used for analyzing condition-wise differences.
- Parameters:
boot_stat (str, optional) –
Name of statistic to recompute on each bootstrap resample to get a confidence interval. Must be one of:
'condwise-scores-centred'(default): Mean-centred condition-wise average data (original or resampled) multiplied bydata_sals_. This is the what is computed in the original Matlab version of PLS.'condwise-scores': Condition-wise average data (original or resampled) multiplied bydata_sals_.
svd_method (str, optional) –
Method to use for singular value decomposition. Must be one of:
'lapack'(default): usenumpy.linalg.svd.'randomized': usesklearn.utils.extmath.randomized_svd.
random_state (int, optional) – Random state of model for reproducible premutation and bootstrap resampling. Passed to
numpy.random.default_rnginternally. Default isNone.
- bootstrap(n_boot=5000, confint_level=0.95, alignment_method='rotate-design-sals', return_boot_stat_dist=True, n_jobs=1, print_prog=True)¶
Perform (stratified) bootstrap resampling to assess the reliability of the data saliences.
- Parameters:
n_boot (int, optional) – Number of bootstrap resamples to compute. The default is 5000.
confint_level (float, optional) – The confidence level of the quantile-based confidence intervals to compute. The default is 0.95.
alignment_method (string, optional) –
Method to be used for aligning recomputed data saliences with original data saliences. Must be one of: ‘rotate-design-sals’ and ‘rotate-data-sals’ use the solution to the orthogonal Proctrustes problem to align the recomputed design or data saliences, respectively, with the originals. ‘flip-signs’ flips the signs of the resampled data saliences so that their inner products with original saliences are positive. The default is ‘rotate-design-sals’.
'rotate-design-sals'(default): Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original design saliences, then apply this to the recomputed data saliences. This is the what is computed in the original Matlab version of PLS.'rotate-data-sals': Find the rotation that solves the orthogonal procrustes problem to align the recomputed and original data saliences, then apply this to the recomputed data saliences.'flip-design-sals': Find the set of sign flips that ensures the inner product of the recomputed and original design saliences are positive, then apply these sign flips to the recomputed data saliences.'flip-data-sals': Find the set of sign flips that ensures the inner product of the recomputed and original data saliences are positive, then apply these sign flips to the recomputed data saliences.'none': Perform no alignment.
return_boot_stat_dist (bool, optional) – If
True, distribution ofboot_statfrom resampling is returned. This is the distribution used to compute quantile-based confidence intervals. Default isTrue.n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.
print_prog (bool, optional) – Specifies whether to display a progress bar. Default is
True.
- Returns:
design_resampled – If return_boot_dist is true, returns the bootstrap distribution of the statistic named by
boot_stat- Return type:
numpy.ndarray
Examples
>>> mod.bootstrap(1000, n_jobs=-1) >>> print(mod.data_sals_z_) >>> print(mod.boot_stat_ci[..., 0]) # Print CI of boot_stat for first LV
- fit(data, design=None, between=None, within=None, participant=None, effects='all')¶
Fit a barycentric discriminant analysis model.
- Parameters:
data (numpy.ndarray) – Data array of shape (n. observations, n.features). Each row should contain the average data for a participant, possibly the average for some within-participants condition for a participant.
design (pandas.DataFrame, optional) – DataFrame with columns to indicate between-participant group membership, within-participant condition, and/or participant identity, as applicable. The default is None.
between (str or iterable, optional) – Between-participants factor. This can be specified as a string referring to the appropriate column in
designor as an iterable containing an indicator of group membership (e.g., a list of strings or integers). The default is None, indicating an absence of between-participant conditions.within (str or iterable, optional) – Within-participants factor. This can be specified as a string referring to the appropriate column in
designor as an iterable containing an indicator of condition (e.g., a list of strings or integers). The default is None, indicating an absence of within-participant conditions.participant (str or iterable, optional) – Participant identifier. This can be specified as a string referring to the appropriate column in
designor as an iterable containing an indicator of participant identity (e.g., a list of strings or integers). The default is None, which is only permitted when there are no within-participant conditions.effects (str or iterable, optional) –
Effects to be included in the model. If only a between-participants factor is specified, then only a main effect of between-participants condition can be measured (same goes for within-participants, mutatis mutandis). However, if both a between- and a within-participants factor are specified, then any of the following effects can be specified:
'between': main effect of between-participants condition'within': main effect of within-participants condition'interaction': interaction of between- and within-participants factors
In the original Matlab PLS, the default behaviour is to remove the between-participants factor, which is equivalent to
effects=('within', 'interaction').
Examples
>>> mod = pyplsc.BDA() >>> data = numpy.random.normal(size=(4, 3)) >>> design = pandas.DataFrame({'group': [0, 0, 1, 1]}) >>> # Pattern 1: provide design matrix, specify column names of condition indicators >>> mod.fit(data, design, between='group') >>> # Pattern 2: provide condition indicators directly as iterables >>> mod.fit(data, between=design['group']) >>> # Specifying effects >>> mod.fit(data, between=between, within=within, participant=participant, ... effects=('within', 'interaction'))
- flip_signs(lv_idx=None)¶
Flips the signs of one or more latent variables, to aid with interpretation.
- Parameters:
lv_idx (indexer) – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.
lv_idx – The index or indices of latent variables whose signs should be flipped. If None (default), signs are flipped for all latent variables.
Examples
>>> mod.flip_signs() # Flip all signs >>> mod.flip_signs(0) # Flip signs for the first latent variable >>> mod.flip_signs([0, 1]) # Flip signs for the first two latent variables
- get_boot_stat_frame(lv_idx=None)¶
Get
boot_statas a dataframe, including upper and lower confidence limits if bootstrap resampling has been done.- Parameters:
lv_idx (indexer, optional) – Index of latent variable the dataframe should cover. The default is None, which yields a dataframe covering all latent variables.
- Returns:
df –
boot_statas a dataframe.- Return type:
pandas.dataframe
- get_boot_stat_yerr(lv_idx)¶
Get yerr for statistic named by
boot_statthat can be passed to a matplotlib bar plot.- Parameters:
lv_idx (int) – Integer indexing the latent variable of interest.
- Returns:
yerr – 2D array with shape (2, n. design saliences) that can be passed to matplotib’s pyplot.bar() as the yerr= argument.
- Return type:
numpy.ndarray
Examples
>>> # Make bar plot of boot_stat >>> x = mod.design_sal_labels_['between'] >>> lv_idx = 0 # First latent variable >>> height = mod.boot_stat_val_[:, lv_idx] >>> yerr = mod.get_boot_stat_yerr(lv_idx) >>> matplotlib.pyplot.bar(x=x, height=height, yerr=yerr)
- get_scores_frame(lv_idx=None)¶
Get dataframe containing design and data scores for each observation in
data_, alongside condition information from the design matrix (design_).- Parameters:
lv_idx (indexer, optional) – Index of latent variable(s) for which to include design and data scores. The default is None, which includes scores for all latent variables.
- Returns:
df – Dataframe containing design and data scores for each observation.
- Return type:
pandas.dataframe
Notes
Data is in long format, with a column specifying the latent variable corresponding to each score.
Examples
>>> mod.get_scores_frame().to_csv('scores.csv')
- permute(n_perm=5000, return_null_dist=True, n_jobs=1, print_prog=True)¶
Perform permutation testing to assess the significance of the latent variables. p values become available after running this method through the
pvals_property.- Parameters:
n_perm (int, optional) – Number of permutations t operform. The default is 5000.
return_null_dist (bool, optional) – If
True, permutation samples will be returned as a 2D (n. perms, n. latent vars) array. Default isTrue.n_jobs (int, optional) – Number of parallel jobs to deploy to compute permutations. -1 automatically deploys the maximum number of jobs. The default is 1.
print_prog (bool, optional) – Specifies whether to display a progress bar. Default is
True.
- Returns:
null_dist – 2D array containing null distribution of singular values, where each row is a different permutation and each columns is a different singular value.
- Return type:
numpy.ndarray
Examples
>>> mod.permute(n_perm=1000, n_jobs=-1) >>> print(mod.pvals_)
- transform(data=None, lv_idx=None)¶
Compute data scores, i.e., coordinates of array data in the new basis defined by the latent variables, by multiplying a data array by the data saliences (the
data_sals_property)- Parameters:
data (numpy.ndarray, optional) – Data to transform. The default is None, which yields scores for the data on which the model was fit (the
data_property).lv_idx (indexer, optional) – Index of latent variable(s) for which to compute scores. Default is None, which computes scores for all latent variables.
- Returns:
data_scores – A 2D array of scores where rows correspond to different observations and columns correspond to different latent variables.
- Return type:
numpy.ndarray
Examples
>>> scores = mod.transform() # Get scores for data used to fit model >>> scores = mod.transform(new_data) # Get scores for new data
- boot_stat¶
Name of statistic whose distribution is derived during bootstrap resampling.
- Type:
str
- boot_stat_ci_¶
Confidence interval on stat named by
boot_statderived from bootstrap resampling. CI level is determined byconfint_level_. Set bybootstrap().- Type:
numpy.ndarray
- boot_stat_val_¶
numpy.ndarrayPoint estimate from initial decomposition of statistic whose distribution is derived duringbootstrap()resampling. Set byfit().
- confint_level_¶
Level of confidence interval on stat named by
boot_statto derive during bootstrap resampling (e.g., 0.95). Set bybootstrap().- Type:
float
- data_sals_¶
Right saliences/singular vectors used to compute data scores. Shape (n. observed vars, n. latent vars). Set by
fit().- Type:
numpy.ndarray
- data_sals_std_¶
Standard deviations of data saliences (
data_sals_) as estimated during bootstrap resampling. Set bybootstrap().- Type:
numpy.ndarray
- data_sals_z_¶
Data saliences (
data_sals_) divided by their standard deviations (data_sals_std_) as estimated during bootstrap resampling. Set bybootstrap().- Type:
numpy.ndarray
- design_sals_¶
Left saliences/singular vectors used to compute design scores. Shape (n. design saliences, n.latent variables). Set by
fit().- Type:
numpy.ndarray
- n_boot_¶
Number of bootstrap resamples used. Set by
bootstrap().- Type:
int
- n_sv_¶
Number of singular values, i.e., the number of latent variable pairs in the model. Set by
fit().- Type:
int
- random_state¶
Random state for reproducible permutation and bootstrap resampling.
- Type:
int
- singular_vals_¶
Singular values from the decomposition of the mean-centred data. Set by
fit().- Type:
numpy.ndarray
- svd_method¶
SVD method used
- Type:
str