Summary of Six Sigma Analysis - Cheat Sheet

I have put together all the basic statistical analysis used in Measure and Analyze phase of DMAIC methodology of a Six Sigma project in a way that is succinct, convenient and can be used as a cheat sheet for quick reference


Measure Phase – Continuous Data

→ Confidence Interval 
 (1-α)% CI = Sample mean ± Constant * Std error          
 CI = Xbar ± Zα/2 * (SD/√n)     
 Zα/2 = 1.96 at 95% CL from Z table    
 
→ Sample Size 
 n = (Zα/2 * SD / Δ)^2           
 Δ=precision     
 If n/N > 5%, then sample size to be modified by calculating n finite 
 n finite = n / (1+(n/N))         
 
→ Basic Statistics 
Measures of central tendency: Mean, Median, Mode
Measures of dispersion: Range, Variance, Std Deviation

Std dev = √(Sum of Deviations from mean)²/(n-1) 
In excel use stdev(data) function
Variance = SD²

Histogram
Number of classes = √n
Class interval = (Max-Min)/no of classes

→ Probability and Distributions 
 Normal distribution 
 Data is normal when normality test in Minitab shows p value>=0.05          
 Standard normal dist: Mean = 0 and SD = 1              
 Z=(x-mean) / SD              
 Excel formula for calc probability = normdist(x,mean,SD,cumulative)

→ Gage R&R (Reproducibility, Repeatability) 
 б² total = б² part-part + б² R&R                   
 Use ANOVA or Xbar-R method to conduct the test     
       
 → Conduct Gage R&R study (crossed) testing in Minitab     
 if %R&R < 20%, measurement is acceptable, else not acceptable     
 In Xbar chart for all appraisers should have almost the same pattern with most of the points falling outside control limits 
 All points in R chart should be in control limits for all appraisers

→ Process Capability 
 Cp = Tolerance / 6SD                      
 Tolerance = USL - LSL    
      
 Cpk = Min[ (USL - mean)/3SD , (mean - LSL)/3SD ]    
 Cpk<=Cp    
 Zlt = 3*Cpk ; Zst = Zlt + 1.5    
 If Cpk>1.33, process is capable    
      
 Check if the data is normal, if not, conduct 'Box Cox' transformation, if still not normal, consider data as Discrete and calculate DPMO (capability)

Measure Phase – Discrete Data

Types of data: Binary, Count, Ordinal  
  
→ Confidence Interval  
 CI = p ± Zα/2 * √p(1-p)/n      
 Zα/2 = 1.96 at 95% CL from Z table    
  
→ Sample Size  
 n = (Zα/2)² * p(1-p) / Δ²    
 Δ=precision     

→ Probability and Distributions
Binomial distribution (Binary data)
Excel formula: binomdist(numbers, trials, probability, cumulative)
eg: if p=0.02 or 2%
audited errors prob of n or less number of defectives
400              0   0.031%
400              2         1.312%
400              4         9.733%
400              6         31.090%
400              8         59.255%
400            10         81.790%

Poisson distribution (Count data)
Excel formula: poisson(x, mean, cumulative)
Avg repeat calls repeat calls d prob of getting d defects or less
20                                        0 0.00%
20                                        5 0.01%
20                                      10 1.08%
20                                      15 15.65%
20                                      20 55.91%
20                                      24 84.32%

→ Process Capability Zlt = -normsinv(DPO) Zst = Zlt + 1.5

Analyze Phase - Continous Data

→ Hypothesis tests 
1. 1t Test (Mean equal to a specified value µ=µ₀)  
 Test statistic t = (Mean -Specified Value) / (SD/√n)                
 If t is close to zero (to be determined by p value) mean = specified value      
 use tdist() in excel; or 1-Sample t test in Minitab (Stats>Basic stats) by selecting a confidence level and alternate hypothesis  
 if p>=0.05, accept Ho                      
  
 2. Paired t test (Two means equal or not µ₁=µ₂)  
 Paired t test is done for dependent samples like before and after improvement of the same process  
 Use ttest() in excel; or Paired t test in Minitab (Stats>Basic stats)        
  
 3. 2t test (Two means equal or not µ₁=µ₂)  
 2t test is done for two independent samples       
 Use ttest() in excel; or 2-Sample test in Minitab (Stats>Basic stats)  
  
 4. F test (Two variances are equal or not б₁=б₂)  
 used when x is discrete and Y is continuous      
 here H₀: S1²=S2²; and Ha: S1² <>S2²      
 F = S1²/S2² = Variance1 / Variance2      
 if p>=0.05, then F value is closer to 1, hence accept Ho    
 Use Ftest() in excel or 2variance test in Minitab (Stats>Basic Stats)  
  
 5. One Sample Wilcoxen test  
 Used for non normal data to test if median is equal to a given value  
 Use Minitab (Stats> Non parametric)        
  
 6. Mann - Whitney test  
 Used for non normal data to test if two medians are equal     
 Use Minitab (Stats> Non parametric)        
  
 7. Mood's Median test  
 Used for non normal data to test if three medians are equal     
 Here H₀: all medians are equal; Ha: all medians are not equal    
  
→ ANOVA  
 H₀: All means are equal; Ha: All means are not equal; if p>=0.05, then all means are equal  
  
 1. One Way ANOVA  
 Here X is discrete (categorical grouped) and Y is continuous    
 eg: Effect of Experience (12,6,0) on processing time      
 Use Minitab (Stat>ANOVA>One way) to calculate p value    
 if p>=0.05 means the factor doesn’t have significant impact on response  
  
 2. Two Way ANOVA  
 Here X is discrete (categorical grouped) and Y is continuous              
 Here the number of factors are 2 and replication > = 2      
 eg. Effect of vintage (Low, med , high) and Speed of data entry (low, high) and their interaction on AHT (2 observation each)  
 Use Minitab (Stat>ANOVA>One way) to calculate p value      
 Draw Main effects plot and Interaction plot (Stat>ANOVA) for further inferences          
  
→ Correlation  
 X and Y need to be continuous                
 Draw scatter plot and check the kind of correlation      
 Use correl() in excel to enter the 2 data arrays to find coff of correlation 'r'      
 Strong +ve correlation r>=0.8; Strong -ve correlation r<=0.8; no correlation when r is close to zero  
 r=0 means no linear relationship but there still could be a relation in a non linear form    
  
→ Regression  
 Provides mathematical model for Xs(continuous) effect on Y(continuous)      
  
 1. Simple linear regression  
  
 Draw a scatter plot and calculate r                   
 Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value) 
 Rsq should be>=65%; if p<0.05 then factor has significant effect             
  
 2. Multiple Linear regression  
 Fitted linear relation between Y and multiple Xs                 
 Draw scatter plot for all Xs and calculate r     
 Use Minitab (Stat>Basic stats>Correlation); if there is strong relation between any of the factors, one of the factors need to be ignored 
 Use Minitab (Stat>Regression)      
 Rsq (adj) should be>=65%; if p<0.05 then factor has significant effect             
  
 3. Polynomial Regression  
 Fitted non linear relation between Y and X                 
 Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value) 
 Rsq should be>=65%; if p<0.05 then factor has significant effect   

Analyze Phase - Discrete Data

→ Hypothesis tests
1. 1P test (Proportion equal to a specified value p=p₀)  
 Here X and Y both are binary        
 Test statistic z = (p - specified value) / √(p(1-p)/n)    
 Z will be closer to zero if p>=0.05      
 Use one proportion test in Minitab (Stats>Basic stats)  
  
 2. 2P test (Two proportions equal p₁=p₂)  
 Here X and Y both are binary        
 Test statistic z calculated by standardization (p1-p2)  
 Z will be closer to zero if p>=0.05      
 Use two proportion test in Minitab (Stats>Basic stats) 

→ Regression  
 Provides mathematical model for Xs(continuous) effect on Y(continuous)

 Logistic regression
 Here X is continuous and Y is discrete       
 Draw scatter plot for all Xs and calculate r   
 eg. Regression model for Vintage and Accuracy score   
 Use Minitab (Stat>Regression>Binary logistic regression)   
 In Minitab window: Success = total pass; Trial = total audited; Model = vintage
 Minitab will generate values: a(const coeff) and b(vintage coeff) 
 P value should be <=0.05   
 Regression equation: Accuracy score = {e^(a+b.vintage)} / {1+e(a+b.vintage)}



Comments

Popular posts from this blog

Six Sigma, business statistics and basic data analytics using R programming

Process Maturity Model