may 19 SPSS commands

SPSS

Week 1: 

 

·      Data cleaning 

·      Visualizing categorical ordinal data in a Frequency Table or Bar Chart: 

o   Analyze - - >  Descriptive - - >  Frequencies 

o   Analyze - - >  Descriptive - - >  Frequencies - - >  Charts - - >  Bar Chart 

·      Visualizing numerical continuous data in a Frequency Table or Histogram: 

o   Analyze - - >  Descriptive - - >  Frequencies - - >  Statistics (choose Mean, Median, Mode, Quartiles, St Deviation, Min, Max, Range) 

o   Analyze - - >  Descriptive - - >  Frequencies - - >  Charts - - >  Histogram

·      Visualizing numerical data in a Box Plot: 

o   Graphs - - >  Boxplot - - >  Simple - - >  add variable of interest (Y) into Variables - - >  add group variable into Category axis - - >  OK

 

Week 2: 

 

·      Confidence Intervals:                 

o   Analyze - - >  Descriptive Statistics - - >  Explore - - >  put variable in Dependent list - - >  Statistics 

 

Week 3: 

 

·      One Sample T-test (means): 

o   Analyze - - >  Compare means - - >  One sample t-test - - >  add variable of interest - - >  add in the known test value - - >  OK

§  Output: one table for Descriptive Stats, one table for t-test

·      One Sample Chi-Squared (x^2) test (proportions):

o   Analyze - - >  Non-parametric tests - - >  Chi-square 

o   Expected difference value suggested: Recall button - - >  Chi square - - >  (If expected difference is 20%, type in ‘Expected Values’ 80 and then 20) - - >  OK

 

 

Week 4: 

·      Checking normality (numerical continuous): 

o   Analyze - - >  Non-parametric tests - - >  Legacy Dialogs - - >  1 Sample K-S - - >  enter variables - - >  OK 

·      Independent Samples T-test: 

o   Check suitability: Data - - >  Split file - - >  Split file by groups (ie Gender) - - >  OK

§  SPSS can now show us frequencies for each gender separately

§  Recall - - >  Frequencies OR Analyze - - >  Descriptive Stats - - >  Frequencies - - >  Charts - - >  Histogram 

§  Re-unite the data after checking suitability: Recall - - >  Split File - - >  Analyze all cases

o   Independent samples t-test: Analyze - - >  Compare means - - >  Independent samples t test - - >  add variable of interest and grouping variable - - >  Define Groups (use values that gender has been coded ie 0,1) - - >  Continue - - >  OK

·      Paired Samples T-test: 

o   Check suitability: 

§  Create a new variable for the difference between two variables: Transform - - >  Compute Variable - - >  give the new variable a name in Target Variable - - >  Add the two variables in Numeric Expression separated by a subtract (-) sign - - >  OK

§  Check suitability of new variable: Recall - - >  Frequencies OR Analyze - - >  Descriptive Stats - - >  Frequencies - - >  add new variable - - >  Charts - - >  histogram - - >  OK

o   Paired Samples T-test: Analyze - - >  Compare Means - - >  Paired Samples T-test - - >  add variables of interest into Paired Variables - - >  OK

 

Non-Parametric Chi-squared test:

·      One Sample Chi Square test:

o    Analyze - - >  non-parametric tests - - >  Legacy Dialogs - - >  Chi-square

·      Pearson’s Chi-Square test: 

o   Analyze - - >  Descriptive Stats - - >  Crosstabs - - >  add variable to row and column - - >  Statistics - - >  Chi-square - - >  Cells - - >  Observed, Column - - >  OK

§  Can also do Observed, Row to look at inversely 

§  Cells - - >  untick Observed, tick Expected - - >  OK

·      McNemar Chi-Square test: 

o   Crosstabs - - >  add in variables to Rows and Columns - - >  Statistics - - >  McNemar - - >  Cells - - >  Observed, Total - - >  OK

 

 

Week 5: 

·      Wilcoxon Signed Rank Test: 

o   Check Suitability: Analyze - - >  Descriptive Stats - - >  Frequences - - >  Statistics, Charts - - >  OK

§  Non-normal distribution - - >  use non-parametric test, equality of medians

o   Analyze - - >  Non-parametric tests - - >  One sample - - >  Fields - - >  Add variable of interest - - >  Settings - - >  Customize tests - - >  compare median (enter hypothesized median) - - >  Run 

·      Mann-Whitney U Test (Wilcoxon Sum Rank)

o   Data - - >  Split file (ie based on gender) 

o   Check suitability: Analyze - - >  Descriptive stats - - >  Frequencies - - >  Charts (histogram) - - >  OK

o   Recall button to unsplit file - - >  Analyze all cases 

o   Analyze - - >  Non parametric test - - >  Independent samples - - >  Fields - - >  Add variable of interest and grouping variable - - >  Settings - - >  Customized tests - - >  Mann Whitney - - >  Run 

·      Wilcoxon Matched-Pair Signed Rank Test: 

o   Analyze - - >  Non parametric Tests - - >  Related Samples - - >  Fields - - >  Add in matched pair variables - - >  Settings - - >  Customize tests - - >  Wilcoxon Matched-pair signed rank (2 samples) - - >  Run

 

Exact Test (assumptions violated)

·      One Sample Chi square test: Analyze - - >  Non parametric Test - - >  Legacy Dialogs - - >  Chi-square - - >  add variable of interest - - >  check coding and add values - - >  OK

o   If assumptions do not hold: Recall - - >  Chi-square - - >  Exact - - >  Exact test - - >  OK

·      Pearson’s Chi-Square test: 

o   Create crosstabs (ie gender): Analyze - - >  Descriptive Stats - - >  Crosstabs - - >  add variables of interest into Row and Column - - >  OK

o   Pearson Chi-Square test: 

§  In crosstabs - - >  Statistics - - >  Chi-square - - >  Cells - - >  Observed, Column - - >  OK

§  If assumptions do not hold: refer to Fisher’s exact test in output 

·      McNemar Chi Square test (Binomial Test): 

o   Crosstabs - - >  add variables to Row and Column - - >  Statistics - - >  McNemar

§  Interpret McNemar Bowker test 

 

📌 KEY REMINDERS:

  • Use t-tests when data is continuous and normally distributed
  • Use Wilcoxon/Mann-Whitney when data is not normal or ordinal
  • Use χ²/Fisher/McNemar for categorical data (like counts, yes/no)
  • Paired = same peopleindependent = different people

 

Week 6: 

 

·      Scatter plot: 

o   Graphs - - >  Scatter/Dot - - >  Simple scatter - - >  enter X and Y - - >  Label cases by “ID” - - >  OK 

§  Double click to add line of best fit 

·      Correlation Coefficient: 

o   Pearsons’ CC: 

§  Analyze - - >  Correlate - - >  Bivariate - - >  enter variables - - >  Pearson’s - - >  OK 

o   Spearman’s CC: 

§  Check suitability of data: Analyze - - >  Descriptive stats - - >  Frequencies - - >  St Dev, min, max, mean, median, mode - - >  Charts - - >  Histogram 

§  Analyze - - >  Correlate - - >  Bivariate - - >  enter variables - - >  Spearman - - >  OK

·      Linear Regression Model: 

o   Analyze - - >  Regression - - >  Linear - - >  add IV and DV, Case Labels ID - - >  Statistics - - >  Estimates, Confidence Intervals - - >  Continue 

·      Predicting a variable:

o   Add value into data set 

o   Analyze - - >  Regression - - >  Linear - - >  add in IV and DV - - >  Save - - >  Unstandardized, Prediction Intervals (mean for for a group, individual for a given person) - - >  Continue - - >  OK

·      Dummy Variables (when a categorical variable has more than 2 levels): 

o   Transform - - >  Recode into Different Variables - - >  new variable name and code - - >  old and new values - - >  code 0’s and 1 - - >  Change  - - >  repeat for each new variable (n-1)

·      Linear Regression with Dummy Variables: 

o   Analyze - - >  Regression - - >  Linear - - >  Add IV and DVs , Case label - - >  Statistics - - >  Estimates, Confidence Intervals - - >  Continue 

 

 

Week 7: 

 

·      Multiple Linear Regression: 

o   Analyze - - >  Regression - - >  Linear - - >  add DV and IVs - - >  Statistics, Confidence Intervals - - >  OK

·      R squared: 

o   Analyze - - >  Regression - - >  Linear 

·      Checking assumptions for multiple linear regression model: 

o   Analyze - - >  Regression - - >  Linear - - >  add DV and IVs - - >  Plots - - >  Histogram, Normal probability plot, Produce all partial plots, ZRESID (Y), ZPRED (X) - - >  ok

·      Predicting a value: 

o   Enter data 

o   Analyze - - >  Regression - - >  Linear - - >  Save - - >  Unstandardized, Individual - - >  OK 

o   View predicted value and 95% CI (low and upper limit values) in the dataset

 

 

Week 8: 

·      Baron & Kenny Steps: 

o   Steps 1-2 (Simple Linear Regression)

o   Steps 3-4 (Multiple Linear Regression)

 

Week 9: 

·      Creating an interaction term: 

o   Transform - - >  Compute variable - - >  write name of interaction term in Target variable - - >  enter (___ * ___ ) in Numeric Expression - - >  OK

·      Estimating the Interaction Effect: 

o   Analyze - - >  Regression - - >  Linear - - >  add DV and IV (and interaction term)

·      Categorical variables with more than two categories: 

o   Recoding into Dummy variables: 

§  Transform - - >  Record into Different variables 

o   Create interaction terms

o   Run multiple linear regression with DV and IV’s (including interaction terms)

·      Sorting to find Outliers: 

o   Sort - - >  Ascending

o   Descriptives - - >  min, max

o   Graphs - - >  Regression Variable Plot - - >  DV in vertical axis, IV in horizontal axis, label

·      Run a regression with and without outlier

o   Select cases - - >  remove outlier - - >  re-run regression 

o   DFBETA and DFFIT: 

§  Analyze - - >  Regression - - >  Linear - - >  Add DV and IV - - >  Save - - >  Influence Statistics - - >  Standardized DBETA, Standardized DFFIT - - >  OK

 

 

Week 10: 

·      Contingency table: 

o   Analyze - - >  Descriptive stats - - >  Crosstabs - - >  add variable of interest (DV or outcome) into Columns, add IV to Rows - - >  Cells - - >  Observed, Column - - >  OK

·      Pearson’s Chi Square: 

o   Recall - - >  Crosstabs - - >  Statistics - - >  change Observed to Expected cell counts - - >  OK

·      Risk: 

o   Create a contingency table - - >  Statistics - - >  Risk - - >  OK

·      Binary Logistic Regression: 

o   Analyze - - >  Regression - - >  Binary Logistic - - >  add DV and covariate - - >  Categorical - - >  add categorical covariate - - >  Reference Category - - >  “First” 

o   Options - - >  CI for exp - - >  Continue - - >  OK

o   Exp(B) is the odds ratio 

o   Nagelkerke R squared is the % variation that can be explained by model 

·      Goodness of Fit: 

o   (1) Classification table: 

§  Binary logistic regression - - >  Categorical variables - - >  Options - - >  Classification plots - - >  classification cut-off (usually 0.5) - - >  OK

·      Shows sensitivity (true positives) and specificity (true negatives)

o   (2) Hosmer and Lemeshow (only with multiple predictors): 

§  Binary Logistic Regression - - >  Options - - >  Classification Plots, Hosmer Lemeshow goodness of fit - - >  OK

·      Non-significant p-value ( > 0.05)   Good fit 

 

 

4 > categories = categorical data, 5< = continuous data 

Symmetrical – report on mean and SD

Skewed – report on median and min/max and IQR

 

Type I error = false positive 

Type II error = false negative 

 

Power = probability of rejecting a false null

 

CI includes 0, not significant

 

Correlation – direction (+/-) and magnitude [-1,1]

 

Pearson’s Correlation Coefficient (parametric/normal) 

Spearman’s Correlation Coefficient (nonparametric/skewed) – measures the monotonic relationship 

 

R: degree of simple correlation

R^2 (goodness of fit): % of variation explained, does not indicate causation

R^2 adjusted: better indicator, higher one should be selected 

 

Simple Linear Regression 

Y= B0 +B1x + e

 

Homoscedasticity: variance of erros is equally distributed 

 

Multiple Linear Regression

 B0 + B1x1 + B2x2 + e

Fits a regression plane

 

Confounder: causes both IV and DV 

 

Assumptions for Linearity: 

1.        The relationship between the DV and each continuous IV is linear.

2.        Residuals (error terms) should be normally distributed. (histogram)

3.        Homoscedasticity: stability in variance of residuals. (ZRESPID and ZPRED)

4.        Independent observations

 

Mediator: third variable (x2), explains a portion of association

C’ : direct effect 

a*b : indirect effect (mediated effect)

c : total effect = c’ + (a*b) 

 

path a: M=B0 + B1x1 + e

path b: Y=B0 + B2M + B3x1 + e

path c: c=B3

 

Baron & Kenny Steps: 

1.        Test path C (x1 - - > Y) using simple linear regression to get B 

2.        Test path a (x1 - -> M) using simple linear regression to get B1

3.        Test path b (M - - > Y, controlling for x1) using multiple linear regression to get B2

4.        Test path c’ (x1 - -> Y, controlling for M) using multiple linear regression to get B3. 

 

Step 1 – not essential for establishing mediation

Steps 2+3 – essential 

 

Complete mediation: p>0.05, no assoc between x1 and y when controlling for M 

 

Partial mediation: B3 is significantly different from 0 (p < 0.05) and c’ is smaller than c

 

Sobel Test of indirect effect: based on (z). If z > 1.96, reject null that there is 0 indirect effect

 

Modifier (moderator) (z): has an interaction effect on assoc between y and x1. 

 

Establishing moderation: 

1.        Create a new variable (interaction term/cross product) (x1 * z) 

2.        Add new variable to linear regression model 

a.        Y= B0 + b1x1 + b2z +( b3x1*z)

3.        Test coefficient B3 

 

B1:  effect of x1 on Y when z=0

B2: effect of z on Y when x1=0

B3: difference of effect of x1 on Y by levels of z

 

Effect of x1 = b1 + (b3 * z)

Effect of z = b2 + (b3 * x1)

 

Odds: ratio of with versus without event (# of times outcome occurs / # of times doesn’t occur)

Risk: probability of occurrence (# of times outcome occurs / total # of possible outcomes)

Relative risk: probability of outcome in exposure versus no exposure

 

Nagelkerke r2 explains variation 

 

Odds = exp(L) 

Probability = exp(L) / 1 + exp(L)

 

Sensitivity (true positives) = TP / TP + FN

 

Specificity (true negatives) = TN / FP + TN 

 

Hosmer & Lemeshow test for goodness of fit, produces a chi-square statistic

                  Nonsignificant (P >0.05) means GOOD FIT 

 

 

Comments