Discussion Week Nine: Chapter 18 Multivariate Statistics

Hope Enechukwu, Rudolf Ezemba, Julie Pelletier, Brittany Provenza, Deanna Roper and Valencia Suggs

July 31, 2015

** Multivariate Statistics**

**Multivariate statistics**- studies the effect of a single independent variable (IV) on a single dependent variable (DV), or an analysis with at least three variables.

**Regression analysis-**is used to make predictions

- Multiple correlation/multiple regression-the bond between correlation and regression
- In simple regression one IV (X) is used to predict variable (Y)
- The higher correlation between variables validates prediction accuracy
- Errors occur when correlations of X and Y are not perfect

Correlation coefficients express the dissimilarities of variables among each other

- The stronger the correlation: The better the prediction. The stronger the correlation: The > % of variance explained.

Correlation between two variables are rarely perfect. Researchers try to improve predictions of Y by adding multiple IVs (or predictor variables) using multiple regression.

Redundancy is increased as more variables are added to the equation.

**Tests of Significance**

**Tests of Significance** explicate different aspects of data using multiple regression.

The premise of **Tests of the Overall Equation and R,** as methods, reject the null hypothesis. Not variables of interest, but those that researchers want to eliminate.

**Tests for Adding Predictors** show whether adding more variables or a specific IV, will increase the correlation to a dependent variable.

In **Tests of the Regression Coefficients**, there are single and multiple regressions.

**In multiple regression**, holding extraneous variables constant, strengthens internal validity.

**Simultaneous Multiple Regression **is appropriate when predictors are not prior causal, but are also equally important.

**Hierarchical Multiple Regression** are predictors entered theoretically, but also examine the effect of a key IV after removing the effects of extraneous variables.

**Stepwise Multiple Regression**: 1^{st} step is to select the IV that best correlates with the DV, the 2^{nd} variable is the one that produces the increase to R squared when used with the variable in the first step. This continues until no other IV increases the value of R squared. Remember the variance of the IVs is attributed to the first variable entered into the analysis.

** **

**Power Analysis for Multiple Regression**

Because small samples can lead to errors and inaccuracies, power analysis is demonstrated as a better way to estimate sample size needs.

- The number of participants needed to reject the null hypothesis that R equals zero, is based on effect size, number of predictors, desired power, and significance criterion.
- In multiple regression, estimated effect size is the function of the value of R². This is predicted by earlier research or with the conventions of small (R² = .02), moderate (R² =.13), or large (R² =.30).

**Analysis of Covariance (ANCOVA)**

- Similar to multiple regression and features of ANOVA, this compares the means of two or more groups. The central question for both are the same.
- Allows researchers to control confounding variables.
- After randomization, ANCOVA should only be uses last, to improve validity.

**Example: Effectiveness of biofeedback on therapy on patients’ anxiety**.

- Group in hospital A is exposed, Group in hospital B is not exposed
- Anxiety is measured before and after treatment.
- Pretest anxiety score is controlled through ANCOVA
- DV=posttest anxiety scores
- IV=experimental/comparison group status
- Covariate (continuous variables)= pretext anxiety scores

** **

**Selection of Covariates**

- Background demographics including age and education. These should relate with the DVs as much as possible. Control is especially important with significant confounding demographics among comparison groups.
- A pretest measure (i.e., an early measure of the DV).

Adjusted Means- Allows researchers to determine net effects (i.e., group differences on the DV minus the net effect of covariates), leading to the rejection, nullification, or its continuation.

** **

**Other Least Squares Multivariate Techniques**

- Analysis of variance (ANOVA) and multiple regression are very similar
- Both analyze total variability in a continuous dependent measure and contrast variability d/t IVs with that attributable to individual differences or error.
- Experimental data is typically analyzed by ANOVA
- Correlational data is analyzed by regression
- Any data for which ANOVA is appropriate, can be analyzed by multiple regression
- General Linear Model (GLM) is a brand of statistical techniques that fit into linear solutions; foundational procedure, for use with
*t-*test, ANOVA, & multiple regression.

** **

**Repeated measures ANOVA for mixed designs**: when data are collected three or more times

**Multivariate Analysis of Variance (MANOVA): **is the extension of ANOVA to more than one dependent variable.

**Multivariate Analysis of Covariance (MANCOVA):** allows for the control of confounding variables (covariates) when there are two or more dependent variables

**Discriminant** **Analysis: **makes predictions about membership in groups.For example, to predict membership in groups such as compliant versus noncompliant cancer patients.

**Discriminant function**: for a categorical dependent variable, with independent variables that are either dichotomous or continuous.

**Wilks’ lambda (λ):** indicates the proportion of variance unaccounted for by predictors, or **λ = 1 – R ^{2}**

**Logistic Regression**

**Logistic Regression:** is a widely used multivariate technique like multiple regression, analyzes relationship between multiple independent variables and a dependent variable and yield a predictive equation like discriminant analysis, used to predict categorical dependent variables

relies on an estimation procedure that has less restrictive assumptions that multivariate procedures within the GLM

**Basic Concepts for Logistic Regression**

Logistic regression uses **maximum likelihood estimation (MLE)** – estimate the parameters most likely to have generated the observed data.

**Logistic Regression**: models the probability of an outcome rather than predicting group membership

**Odds**: reflect the ratio of two probabilities (the probability of an event occurring, to the probability that it will not occur)

Example: if 40% of women practice breast self-exams, the **odds** would be 0.40 divided by 0.60, or 0.667.

**Logit**: short for *log*istic probability un*it*: is from minus to plus infinity

The Odds Ratio

**Variables in Logistic Regression**

Dependent variable is typically coded 1 (to represent an event or a characteristic)

Significance Tests in Logistic Regression

**Likelihood Index: **probability of the observed results, given parameters estimated in the analysis .If overall model fits data perfectly, the likelihood index is 1.0(-2*LL*), chi-square statistic is then used to test null hypothesis – called **likelihood ratio test**

**Goodness-of-fit Statistic**: is the analog of the overall *F* test in multiple regression.

**Hosmer-Lemeshow Test:** compares the prediction model to a hypothetically “perfect” model.

Tested against perfect model by computing differences between observed frequencies and expected frequencies.A nonsignificant chi-square is desired: indicates model being tested is not reliably different from the perfect model

**Wald Statistic:** chi-square Effect Size in Logistic Regression

**Nagelkerke R ^{2}: **is the most frequently reported pseudo R

**Survival and Event History Analysis**

**Survival analysis **calculates survival rates among study participants. You can also compare survival scores between experimental and control groups, with a statistic, to test the null hypothesis.

**Causal modeling **examines relationships among > 3 variables. Path analysis and structural equation modeling are two types of causal modeling.

**Path Analysis **is used to study the causal pattern among variables, not the discovery of causes. A **path diagram, **like a **recursive model**, is used to illustrate the impact that one cause will have on another, implying that causal flow is in one direction. For example, variable x has a causal effect on variable y. However, variable y is not causal to variable x. Variables are either, exogenous, endogenous, or residual.

**Path coefficients **indicate significant determinants. As standardized partial regression slopes, they represent the proportion of standard deviation (SD).

**Structural Equations Modeling (SEM) **considers the drawbacks of path analysis, factoring in unmeasurable variables. Testing the overall fit of the causal model to research data using the **Goodness of Fit Index (GFI). **The indexes and a score of > .90 (90%) indicates a good fit.

**Computer and multivariate statistics **

**Computer and multivariate statistics **is an analysis that enhances the researchers’ ability to make a prediction by adding two predictor variables in a multiple regression. Because of its complexity, a computer does this. An example of a multiple regression equation: 18.4 birth weight = (3.119 x age) +48.040. Logistics regression and analysis of covariance use this.

References

Polit, D. F., & Beck, C. T. (2012). Nursing research: Generating and assessing evidence for nursing practice (Laureate Education, Inc., custom ed.). Philadelphia, PA: Lippincott Williams & Wilkins.

Subject | Medicine |

Due By (Pacific Time) | 07/31/2015 12:00 am |

Tutor | Rating |
---|---|

pallavi Chat Now! |
out of 1971 reviews More.. |

amosmm Chat Now! |
out of 766 reviews More.. |

PhyzKyd Chat Now! |
out of 1164 reviews More.. |

rajdeep77 Chat Now! |
out of 721 reviews More.. |

sctys Chat Now! |
out of 1600 reviews More.. |

sharadgreen Chat Now! |
out of 770 reviews More.. |

topnotcher Chat Now! |
out of 766 reviews More.. |

XXXIAO Chat Now! |
out of 680 reviews More.. |