may 19 SPSS commands
SPSS
Week 1:
· Data cleaning
· Visualizing categorical ordinal data in a Frequency Table or Bar Chart:
o Analyze - - > Descriptive - - > Frequencies
o Analyze - - > Descriptive - - > Frequencies - - > Charts - - > Bar Chart
· Visualizing numerical continuous data in a Frequency Table or Histogram:
o Analyze - - > Descriptive - - > Frequencies - - > Statistics (choose Mean, Median, Mode, Quartiles, St Deviation, Min, Max, Range)
o Analyze - - > Descriptive - - > Frequencies - - > Charts - - > Histogram
· Visualizing numerical data in a Box Plot:
o Graphs - - > Boxplot - - > Simple - - > add variable of interest (Y) into Variables - - > add group variable into Category axis - - > OK
Week 2:
· Confidence Intervals:
o Analyze - - > Descriptive Statistics - - > Explore - - > put variable in Dependent list - - > Statistics
Week 3:
· One Sample T-test (means):
o Analyze - - > Compare means - - > One sample t-test - - > add variable of interest - - > add in the known test value - - > OK
§ Output: one table for Descriptive Stats, one table for t-test
· One Sample Chi-Squared (x^2) test (proportions):
o Analyze - - > Non-parametric tests - - > Chi-square
o Expected difference value suggested: Recall button - - > Chi square - - > (If expected difference is 20%, type in ‘Expected Values’ 80 and then 20) - - > OK
Week 4:
· Checking normality (numerical continuous):
o Analyze - - > Non-parametric tests - - > Legacy Dialogs - - > 1 Sample K-S - - > enter variables - - > OK
· Independent Samples T-test:
o Check suitability: Data - - > Split file - - > Split file by groups (ie Gender) - - > OK
§ SPSS can now show us frequencies for each gender separately
§ Recall - - > Frequencies OR Analyze - - > Descriptive Stats - - > Frequencies - - > Charts - - > Histogram
§ Re-unite the data after checking suitability: Recall - - > Split File - - > Analyze all cases
o Independent samples t-test: Analyze - - > Compare means - - > Independent samples t test - - > add variable of interest and grouping variable - - > Define Groups (use values that gender has been coded ie 0,1) - - > Continue - - > OK
· Paired Samples T-test:
o Check suitability:
§ Create a new variable for the difference between two variables: Transform - - > Compute Variable - - > give the new variable a name in Target Variable - - > Add the two variables in Numeric Expression separated by a subtract (-) sign - - > OK
§ Check suitability of new variable: Recall - - > Frequencies OR Analyze - - > Descriptive Stats - - > Frequencies - - > add new variable - - > Charts - - > histogram - - > OK
o Paired Samples T-test: Analyze - - > Compare Means - - > Paired Samples T-test - - > add variables of interest into Paired Variables - - > OK
Non-Parametric Chi-squared test:
· One Sample Chi Square test:
o Analyze - - > non-parametric tests - - > Legacy Dialogs - - > Chi-square
· Pearson’s Chi-Square test:
o Analyze - - > Descriptive Stats - - > Crosstabs - - > add variable to row and column - - > Statistics - - > Chi-square - - > Cells - - > Observed, Column - - > OK
§ Can also do Observed, Row to look at inversely
§ Cells - - > untick Observed, tick Expected - - > OK
· McNemar Chi-Square test:
o Crosstabs - - > add in variables to Rows and Columns - - > Statistics - - > McNemar - - > Cells - - > Observed, Total - - > OK
Week 5:
· Wilcoxon Signed Rank Test:
o Check Suitability: Analyze - - > Descriptive Stats - - > Frequences - - > Statistics, Charts - - > OK
§ Non-normal distribution - - > use non-parametric test, equality of medians
o Analyze - - > Non-parametric tests - - > One sample - - > Fields - - > Add variable of interest - - > Settings - - > Customize tests - - > compare median (enter hypothesized median) - - > Run
· Mann-Whitney U Test (Wilcoxon Sum Rank)
o Data - - > Split file (ie based on gender)
o Check suitability: Analyze - - > Descriptive stats - - > Frequencies - - > Charts (histogram) - - > OK
o Recall button to unsplit file - - > Analyze all cases
o Analyze - - > Non parametric test - - > Independent samples - - > Fields - - > Add variable of interest and grouping variable - - > Settings - - > Customized tests - - > Mann Whitney - - > Run
· Wilcoxon Matched-Pair Signed Rank Test:
o Analyze - - > Non parametric Tests - - > Related Samples - - > Fields - - > Add in matched pair variables - - > Settings - - > Customize tests - - > Wilcoxon Matched-pair signed rank (2 samples) - - > Run
Exact Test (assumptions violated)
· One Sample Chi square test: Analyze - - > Non parametric Test - - > Legacy Dialogs - - > Chi-square - - > add variable of interest - - > check coding and add values - - > OK
o If assumptions do not hold: Recall - - > Chi-square - - > Exact - - > Exact test - - > OK
· Pearson’s Chi-Square test:
o Create crosstabs (ie gender): Analyze - - > Descriptive Stats - - > Crosstabs - - > add variables of interest into Row and Column - - > OK
o Pearson Chi-Square test:
§ In crosstabs - - > Statistics - - > Chi-square - - > Cells - - > Observed, Column - - > OK
§ If assumptions do not hold: refer to Fisher’s exact test in output
· McNemar Chi Square test (Binomial Test):
o Crosstabs - - > add variables to Row and Column - - > Statistics - - > McNemar
§ Interpret McNemar Bowker test
📌 KEY REMINDERS:
- Use t-tests when data is continuous and normally distributed
- Use Wilcoxon/Mann-Whitney when data is not normal or ordinal
- Use χ²/Fisher/McNemar for categorical data (like counts, yes/no)
- Paired = same people, independent = different people
Week 6:
· Scatter plot:
o Graphs - - > Scatter/Dot - - > Simple scatter - - > enter X and Y - - > Label cases by “ID” - - > OK
§ Double click to add line of best fit
· Correlation Coefficient:
o Pearsons’ CC:
§ Analyze - - > Correlate - - > Bivariate - - > enter variables - - > Pearson’s - - > OK
o Spearman’s CC:
§ Check suitability of data: Analyze - - > Descriptive stats - - > Frequencies - - > St Dev, min, max, mean, median, mode - - > Charts - - > Histogram
§ Analyze - - > Correlate - - > Bivariate - - > enter variables - - > Spearman - - > OK
· Linear Regression Model:
o Analyze - - > Regression - - > Linear - - > add IV and DV, Case Labels ID - - > Statistics - - > Estimates, Confidence Intervals - - > Continue
· Predicting a variable:
o Add value into data set
o Analyze - - > Regression - - > Linear - - > add in IV and DV - - > Save - - > Unstandardized, Prediction Intervals (mean for for a group, individual for a given person) - - > Continue - - > OK
· Dummy Variables (when a categorical variable has more than 2 levels):
o Transform - - > Recode into Different Variables - - > new variable name and code - - > old and new values - - > code 0’s and 1 - - > Change - - > repeat for each new variable (n-1)
· Linear Regression with Dummy Variables:
o Analyze - - > Regression - - > Linear - - > Add IV and DVs , Case label - - > Statistics - - > Estimates, Confidence Intervals - - > Continue
Week 7:
· Multiple Linear Regression:
o Analyze - - > Regression - - > Linear - - > add DV and IVs - - > Statistics, Confidence Intervals - - > OK
· R squared:
o Analyze - - > Regression - - > Linear
· Checking assumptions for multiple linear regression model:
o Analyze - - > Regression - - > Linear - - > add DV and IVs - - > Plots - - > Histogram, Normal probability plot, Produce all partial plots, ZRESID (Y), ZPRED (X) - - > ok
· Predicting a value:
o Enter data
o Analyze - - > Regression - - > Linear - - > Save - - > Unstandardized, Individual - - > OK
o View predicted value and 95% CI (low and upper limit values) in the dataset
Week 8:
· Baron & Kenny Steps:
o Steps 1-2 (Simple Linear Regression)
o Steps 3-4 (Multiple Linear Regression)
Week 9:
· Creating an interaction term:
o Transform - - > Compute variable - - > write name of interaction term in Target variable - - > enter (___ * ___ ) in Numeric Expression - - > OK
· Estimating the Interaction Effect:
o Analyze - - > Regression - - > Linear - - > add DV and IV (and interaction term)
· Categorical variables with more than two categories:
o Recoding into Dummy variables:
§ Transform - - > Record into Different variables
o Create interaction terms
o Run multiple linear regression with DV and IV’s (including interaction terms)
· Sorting to find Outliers:
o Sort - - > Ascending
o Descriptives - - > min, max
o Graphs - - > Regression Variable Plot - - > DV in vertical axis, IV in horizontal axis, label
· Run a regression with and without outlier
o Select cases - - > remove outlier - - > re-run regression
o DFBETA and DFFIT:
§ Analyze - - > Regression - - > Linear - - > Add DV and IV - - > Save - - > Influence Statistics - - > Standardized DBETA, Standardized DFFIT - - > OK
Week 10:
· Contingency table:
o Analyze - - > Descriptive stats - - > Crosstabs - - > add variable of interest (DV or outcome) into Columns, add IV to Rows - - > Cells - - > Observed, Column - - > OK
· Pearson’s Chi Square:
o Recall - - > Crosstabs - - > Statistics - - > change Observed to Expected cell counts - - > OK
· Risk:
o Create a contingency table - - > Statistics - - > Risk - - > OK
· Binary Logistic Regression:
o Analyze - - > Regression - - > Binary Logistic - - > add DV and covariate - - > Categorical - - > add categorical covariate - - > Reference Category - - > “First”
o Options - - > CI for exp - - > Continue - - > OK
o Exp(B) is the odds ratio
o Nagelkerke R squared is the % variation that can be explained by model
· Goodness of Fit:
o (1) Classification table:
§ Binary logistic regression - - > Categorical variables - - > Options - - > Classification plots - - > classification cut-off (usually 0.5) - - > OK
· Shows sensitivity (true positives) and specificity (true negatives)
o (2) Hosmer and Lemeshow (only with multiple predictors):
§ Binary Logistic Regression - - > Options - - > Classification Plots, Hosmer Lemeshow goodness of fit - - > OK
· Non-significant p-value ( > 0.05) Good fit
4 > categories = categorical data, 5< = continuous data
Symmetrical – report on mean and SD
Skewed – report on median and min/max and IQR
Type I error = false positive
Type II error = false negative
Power = probability of rejecting a false null
CI includes 0, not significant
Correlation – direction (+/-) and magnitude [-1,1]
Pearson’s Correlation Coefficient (parametric/normal)
Spearman’s Correlation Coefficient (nonparametric/skewed) – measures the monotonic relationship
R: degree of simple correlation
R^2 (goodness of fit): % of variation explained, does not indicate causation
R^2 adjusted: better indicator, higher one should be selected
Simple Linear Regression
Y= B0 +B1x + e
Homoscedasticity: variance of erros is equally distributed
Multiple Linear Regression
B0 + B1x1 + B2x2 + e
Fits a regression plane
Confounder: causes both IV and DV
Assumptions for Linearity:
1. The relationship between the DV and each continuous IV is linear.
2. Residuals (error terms) should be normally distributed. (histogram)
3. Homoscedasticity: stability in variance of residuals. (ZRESPID and ZPRED)
4. Independent observations
Mediator: third variable (x2), explains a portion of association
C’ : direct effect
a*b : indirect effect (mediated effect)
c : total effect = c’ + (a*b)
path a: M=B0 + B1x1 + e
path b: Y=B0 + B2M + B3x1 + e
path c: c=B3
Baron & Kenny Steps:
1. Test path C (x1 - - > Y) using simple linear regression to get B
2. Test path a (x1 - -> M) using simple linear regression to get B1
3. Test path b (M - - > Y, controlling for x1) using multiple linear regression to get B2
4. Test path c’ (x1 - -> Y, controlling for M) using multiple linear regression to get B3.
Step 1 – not essential for establishing mediation
Steps 2+3 – essential
Complete mediation: p>0.05, no assoc between x1 and y when controlling for M
Partial mediation: B3 is significantly different from 0 (p < 0.05) and c’ is smaller than c
Sobel Test of indirect effect: based on (z). If z > 1.96, reject null that there is 0 indirect effect
Modifier (moderator) (z): has an interaction effect on assoc between y and x1.
Establishing moderation:
1. Create a new variable (interaction term/cross product) (x1 * z)
2. Add new variable to linear regression model
a. Y= B0 + b1x1 + b2z +( b3x1*z)
3. Test coefficient B3
B1: effect of x1 on Y when z=0
B2: effect of z on Y when x1=0
B3: difference of effect of x1 on Y by levels of z
Effect of x1 = b1 + (b3 * z)
Effect of z = b2 + (b3 * x1)
Odds: ratio of with versus without event (# of times outcome occurs / # of times doesn’t occur)
Risk: probability of occurrence (# of times outcome occurs / total # of possible outcomes)
Relative risk: probability of outcome in exposure versus no exposure
Nagelkerke r2 explains variation
Odds = exp(L)
Probability = exp(L) / 1 + exp(L)
Sensitivity (true positives) = TP / TP + FN
Specificity (true negatives) = TN / FP + TN
Hosmer & Lemeshow test for goodness of fit, produces a chi-square statistic
Nonsignificant (P >0.05) means GOOD FIT
Comments
Post a Comment