Summary of Six Sigma Analysis

I have put together all the basic statistical analysis used in Measure and Analyze phase of DMAIC methodology of a Six Sigma project in a way that is succinct, convenient and can be used as a cheat sheet for quick reference

Measure Phase – Continuous Data

→ Confidence Interval

(1-α)% CI = Sample mean ± Constant * Std error

CI = Xbar ± Zα/2 * (SD/√n)

Zα/2 = 1.96 at 95% CL from Z table

→ Sample Size

n = (Zα/2 * SD / Δ)^2

Δ=precision

If n/N > 5%, then sample size to be modified by calculating n finite

n finite = n / (1+(n/N))

→ Basic Statistics

Measures of central tendency: Mean, Median, Mode

Measures of dispersion: Range, Variance, Std Deviation

Std dev = √(Sum of Deviations from mean)²/(n-1)

In excel use stdev(data) function

Variance = SD²

Histogram

Number of classes = √n

Class interval = (Max-Min)/no of classes

→ Probability and Distributions

Normal distribution

Data is normal when normality test in Minitab shows p value>=0.05

Standard normal dist: Mean = 0 and SD = 1

Z=(x-mean) / SD

Excel formula for calc probability = normdist(x,mean,SD,cumulative)

→ Gage R&R (Reproducibility, Repeatability)

б² total = б² part-part + б² R&R

Use ANOVA or Xbar-R method to conduct the test

→ Conduct Gage R&R study (crossed) testing in Minitab

if %R&R < 20%, measurement is acceptable, else not acceptable

In Xbar chart for all appraisers should have almost the same pattern with most of the points falling outside control limits

All points in R chart should be in control limits for all appraisers

→ Process Capability

Cp = Tolerance / 6SD

Tolerance = USL - LSL

Cpk = Min[ (USL - mean)/3SD , (mean - LSL)/3SD ]

Cpk<=Cp

Zlt = 3*Cpk ; Zst = Zlt + 1.5

If Cpk>1.33, process is capable

Check if the data is normal, if not, conduct 'Box Cox' transformation, if still not normal, consider data as Discrete and calculate DPMO (capability)

Measure Phase – Discrete Data

→ Types of data: Binary, Count, Ordinal

→ Confidence Interval

CI = p ± Zα/2 * √p(1-p)/n

Zα/2 = 1.96 at 95% CL from Z table

→ Sample Size

n = (Zα/2)² * p(1-p) / Δ²

Δ=precision

→ Probability and Distributions

Binomial distribution (Binary data)

Excel formula: binomdist(numbers, trials, probability, cumulative)

eg: if p=0.02 or 2%

audited errors prob of n or less number of defectives

400 0 0.031%

400 2 1.312%

400 4 9.733%

400 6 31.090%

400 8 59.255%

400 10 81.790%

Poisson distribution (Count data)

Excel formula: poisson(x, mean, cumulative)

Avg repeat calls repeat calls d prob of getting d defects or less

20 0 0.00%

20 5 0.01%

20 10 1.08%

20 15 15.65%

20 20 55.91%

20 24 84.32%

→ Process Capability Zlt = -normsinv(DPO) Zst = Zlt + 1.5

Analyze Phase - Continous Data

→ Hypothesis tests

1. 1t Test (Mean equal to a specified value µ=µ₀)

Test statistic t = (Mean -Specified Value) / (SD/√n)

If t is close to zero (to be determined by p value) mean = specified value

use tdist() in excel; or 1-Sample t test in Minitab (Stats>Basic stats) by selecting a confidence level and alternate hypothesis

if p>=0.05, accept Ho

2. Paired t test (Two means equal or not µ₁=µ₂)

Paired t test is done for dependent samples like before and after improvement of the same process

Use ttest() in excel; or Paired t test in Minitab (Stats>Basic stats)

3. 2t test (Two means equal or not µ₁=µ₂)

2t test is done for two independent samples

Use ttest() in excel; or 2-Sample test in Minitab (Stats>Basic stats)

4. F test (Two variances are equal or not б₁=б₂)

used when x is discrete and Y is continuous

here H₀: S1²=S2²; and Ha: S1² <>S2²

F = S1²/S2² = Variance1 / Variance2

if p>=0.05, then F value is closer to 1, hence accept Ho

Use Ftest() in excel or 2variance test in Minitab (Stats>Basic Stats)

5. One Sample Wilcoxen test

Used for non normal data to test if median is equal to a given value

Use Minitab (Stats> Non parametric)

6. Mann - Whitney test

Used for non normal data to test if two medians are equal

Use Minitab (Stats> Non parametric)

7. Mood's Median test

Used for non normal data to test if three medians are equal

Here H₀: all medians are equal; Ha: all medians are not equal

→ ANOVA

H₀: All means are equal; Ha: All means are not equal; if p>=0.05, then all means are equal

1. One Way ANOVA

Here X is discrete (categorical grouped) and Y is continuous

eg: Effect of Experience (12,6,0) on processing time

Use Minitab (Stat>ANOVA>One way) to calculate p value

if p>=0.05 means the factor doesn’t have significant impact on response

2. Two Way ANOVA

Here X is discrete (categorical grouped) and Y is continuous

Here the number of factors are 2 and replication > = 2

eg. Effect of vintage (Low, med , high) and Speed of data entry (low, high) and their interaction on AHT (2 observation each)

Use Minitab (Stat>ANOVA>One way) to calculate p value

Draw Main effects plot and Interaction plot (Stat>ANOVA) for further inferences

→ Correlation

X and Y need to be continuous

Draw scatter plot and check the kind of correlation

Use correl() in excel to enter the 2 data arrays to find coff of correlation 'r'

Strong +ve correlation r>=0.8; Strong -ve correlation r<=0.8; no correlation when r is close to zero

r=0 means no linear relationship but there still could be a relation in a non linear form

→ Regression

Provides mathematical model for Xs(continuous) effect on Y(continuous)

1. Simple linear regression

Draw a scatter plot and calculate r

Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value)

Rsq should be>=65%; if p<0.05 then factor has significant effect

2. Multiple Linear regression

Fitted linear relation between Y and multiple Xs

Draw scatter plot for all Xs and calculate r

Use Minitab (Stat>Basic stats>Correlation); if there is strong relation between any of the factors, one of the factors need to be ignored

Use Minitab (Stat>Regression)

Rsq (adj) should be>=65%; if p<0.05 then factor has significant effect

3. Polynomial Regression

Fitted non linear relation between Y and X

Use Minitab (Stat>Regression>Fitted line plot) and select linear, Quadratic or cubic model (whichever gives a higher Rsquare value)

Rsq should be>=65%; if p<0.05 then factor has significant effect

Analyze Phase - Discrete Data

→ Hypothesis tests

1. 1P test (Proportion equal to a specified value p=p₀)

Here X and Y both are binary

Test statistic z = (p - specified value) / √(p(1-p)/n)

Z will be closer to zero if p>=0.05

Use one proportion test in Minitab (Stats>Basic stats)

2. 2P test (Two proportions equal p₁=p₂)

Here X and Y both are binary

Test statistic z calculated by standardization (p1-p2)

Z will be closer to zero if p>=0.05

Use two proportion test in Minitab (Stats>Basic stats)

→ Regression
Provides mathematical model for Xs(continuous) effect on Y(continuous)

Logistic regression
Here X is continuous and Y is discrete
Draw scatter plot for all Xs and calculate r
eg. Regression model for Vintage and Accuracy score
Use Minitab (Stat>Regression>Binary logistic regression)
In Minitab window: Success = total pass; Trial = total audited; Model = vintage
Minitab will generate values: a(const coeff) and b(vintage coeff)
P value should be <=0.05
Regression equation: Accuracy score = {e^(a+b.vintage)} / {1+e(a+b.vintage)}

Search This Blog

Process Excellence and Analytics

Summary of Six Sigma Analysis - Cheat Sheet

Comments

Post a Comment

Popular posts from this blog

Six Sigma, business statistics and basic data analytics using R programming

Process Maturity Model