Summary of Six Sigma Analysis - Cheat Sheet
I have put together all the basic statistical analysis used in Measure and Analyze phase of DMAIC methodology of a Six Sigma project in a way that is succinct, convenient and can be used as a cheat sheet for quick reference
Measure Phase – Continuous Data
→ Confidence Interval
(1-α)% CI = Sample mean ± Constant * Std error
CI = Xbar ± Zα/2 * (SD/√n)
Zα/2 = 1.96 at 95% CL from Z table
→ Sample Size
n = (Zα/2 * SD / Δ)^2
Δ=precision
If n/N > 5%, then sample size to be modified by calculating n finite
n finite = n / (1+(n/N))
→ Basic Statistics
Measures of central tendency: Mean, Median, Mode
Measures of dispersion: Range, Variance, Std Deviation
Std dev = √(Sum of Deviations from mean)²/(n-1)
In excel use stdev(data) function
Variance = SD²
Histogram
Number of classes = √n
Class interval = (Max-Min)/no of classes
→ Probability and Distributions
Normal distribution
Data is normal when normality test in Minitab shows p value>=0.05
Standard normal dist: Mean = 0 and SD = 1
Z=(x-mean) / SD
Excel formula for calc probability = normdist(x,mean,SD,cumulative)
→ Gage R&R (Reproducibility, Repeatability)
б² total = б² part-part + б² R&R
Use ANOVA or Xbar-R method to conduct the test
→ Conduct Gage R&R study (crossed) testing in Minitab
if %R&R < 20%, measurement is acceptable, else not acceptable
In Xbar chart for all appraisers should have almost the same pattern with most of the points falling outside control limits
All points in R chart should be in control limits for all appraisers
→ Process Capability
Cp = Tolerance / 6SD
Tolerance = USL - LSL
Cpk = Min[ (USL - mean)/3SD , (mean - LSL)/3SD ]
Cpk<=Cp
Zlt = 3*Cpk ; Zst = Zlt + 1.5
If Cpk>1.33, process is capable
Check if the data is normal, if not, conduct 'Box Cox' transformation, if still not normal, consider data as Discrete and calculate DPMO (capability)
Measure Phase – Discrete Data
→ Types of data: Binary, Count, Ordinal
→ Confidence Interval
CI = p ± Zα/2 * √p(1-p)/n
Zα/2 = 1.96 at 95% CL from Z table
→ Sample Size
n = (Zα/2)² * p(1-p) / Δ²
Δ=precision
→ Probability and Distributions
Binomial distribution (Binary data)
Excel formula: binomdist(numbers, trials, probability, cumulative)
eg: if p=0.02 or 2%
audited errors prob of n or less number of defectives
400 0 0.031%
400 2 1.312%
400 4 9.733%
400 6 31.090%
400 8 59.255%
400 10 81.790%
Poisson distribution (Count data)
Excel formula: poisson(x, mean, cumulative)
Avg repeat calls repeat calls d prob of getting d defects or less
20 0 0.00%
20 5 0.01%
20 10 1.08%
20 15 15.65%
20 20 55.91%
20 24 84.32%
→ Process Capability Zlt = -normsinv(DPO) Zst = Zlt + 1.5
Analyze Phase - Continous Data
→ Hypothesis tests
1. 1t Test (Mean equal to a specified value µ=µ₀)
Test statistic t = (Mean -Specified Value) / (SD/√n)
If t is close to zero (to be determined by p value) mean = specified value
use tdist() in excel; or 1-Sample t test in Minitab (Stats>Basic stats) by selecting a confidence level and alternate hypothesis
if p>=0.05, accept Ho
2. Paired t test (Two means equal or not µ₁=µ₂)
Paired t test is done for dependent samples like before and after improvement of the same process
Use ttest() in excel; or Paired t test in Minitab (Stats>Basic stats)
3. 2t test (Two means equal or not µ₁=µ₂)
2t test is done for two independent samples
Use ttest() in excel; or 2-Sample test in Minitab (Stats>Basic stats)
4. F test (Two variances are equal or not б₁=б₂)
used when x is discrete and Y is continuous
here H₀: S1²=S2²; and Ha: S1² <>S2²
F = S1²/S2² = Variance1 / Variance2
if p>=0.05, then F value is closer to 1, hence accept Ho
Use Ftest() in excel or 2variance test in Minitab (Stats>Basic Stats)
5. One Sample Wilcoxen test
Used for non normal data to test if median is equal to a given value
Use Minitab (Stats> Non parametric)
6. Mann - Whitney test
Used for non normal data to test if two medians are equal
Use Minitab (Stats> Non parametric)
7. Mood's Median test
Used for non normal data to test if three medians are equal
Here H₀: all medians are equal; Ha: all medians are not equal
→ ANOVA
H₀: All means are equal; Ha: All means are not equal; if p>=0.05, then all means are equal
1. One Way ANOVA
Here X is discrete (categorical grouped) and Y is continuous
eg: Effect of Experience (12,6,0) on processing time
Use Minitab (Stat>ANOVA>One way) to calculate p value
if p>=0.05 means the factor doesn’t have significant impact on response
2. Two Way ANOVA
Here X is discrete (categorical grouped) and Y is continuous
Here the number of factors are 2 and replication > = 2
eg. Effect of vintage (Low, med , high) and Speed of data entry (low, high) and their interaction on AHT (2 observation each)
Use Minitab (Stat>ANOVA>One way) to calculate p value
Draw Main effects plot and Interaction plot (Stat>ANOVA) for further inferences
→ Correlation
X and Y need to be continuous
Draw scatter plot and check the kind of correlation
Use correl() in excel to enter the 2 data arrays to find coff of correlation 'r'
Strong +ve correlation r>=0.8; Strong -ve correlation r<=0.8; no correlation when r is close to zero
r=0 means no linear relationship but there still could be a relation in a non linear form
→ Regression
Provides mathematical model for Xs(continuous) effect on Y(continuous)
1. Simple linear regression
Draw a scatter plot and calculate r
Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value)
Rsq should be>=65%; if p<0.05 then factor has significant effect
2. Multiple Linear regression
Fitted linear relation between Y and multiple Xs
Draw scatter plot for all Xs and calculate r
Use Minitab (Stat>Basic stats>Correlation); if there is strong relation between any of the factors, one of the factors need to be ignored
Use Minitab (Stat>Regression)
Rsq (adj) should be>=65%; if p<0.05 then factor has significant effect
3. Polynomial Regression
Fitted non linear relation between Y and X
Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value)
Rsq should be>=65%; if p<0.05 then factor has significant effect
Analyze Phase - Discrete Data
→ Hypothesis tests | ||
1. 1P test (Proportion equal to a specified value p=p₀)
Here X and Y both are binary
Test statistic z = (p - specified value) / √(p(1-p)/n)
Z will be closer to zero if p>=0.05
Use one proportion test in Minitab (Stats>Basic stats)
2. 2P test (Two proportions equal p₁=p₂)
Here X and Y both are binary
Test statistic z calculated by standardization (p1-p2)
Z will be closer to zero if p>=0.05
Use two proportion test in Minitab (Stats>Basic stats)
→ Regression
Provides mathematical model for Xs(continuous) effect on Y(continuous)
Logistic regression
Here X is continuous and Y is discrete
Draw scatter plot for all Xs and calculate r
eg. Regression model for Vintage and Accuracy score
Use Minitab (Stat>Regression>Binary logistic regression)
In Minitab window: Success = total pass; Trial = total audited; Model = vintage
Minitab will generate values: a(const coeff) and b(vintage coeff)
P value should be <=0.05
Regression equation: Accuracy score = {e^(a+b.vintage)} / {1+e(a+b.vintage)}
Provides mathematical model for Xs(continuous) effect on Y(continuous)
Logistic regression
Here X is continuous and Y is discrete
Draw scatter plot for all Xs and calculate r
eg. Regression model for Vintage and Accuracy score
Use Minitab (Stat>Regression>Binary logistic regression)
In Minitab window: Success = total pass; Trial = total audited; Model = vintage
Minitab will generate values: a(const coeff) and b(vintage coeff)
P value should be <=0.05
Regression equation: Accuracy score = {e^(a+b.vintage)} / {1+e(a+b.vintage)}
Comments
Post a Comment