Cluster
Analysis
Merle
Canfield
The
purpose of cluster analysis is to categorize a set of objects, variables, or
people by placing them into clusters based on their similarity or
differences. It is a method for
developing a taxonomy. If variables are
clustered then it is like factor analysis although there are differences that
will be demonstrated below. In fact, a
correlation may used in cluster analysis.
The
questionnaire on the next page is used in this cluster analysis example. Like discriminant function, cluster analysis
is a method of grouping individuals, or variables. In the first example individuals are
clustered and in the second example variables are clustered. The same set of data is used in both
examples.
The
difference between discriminant function and cluster analysis is that
discriminant function the groups are known and the task is to identify the
variables that will predict which group the individual should be assigned to,
while in cluster analysis the groups are empirically derived from the
variables.
SETTING QUESTIONNAIRE
NAME
________________________________________
DATE ___________________
SETTING
_____________________________________
TIME ___________________
INSTRUCTIONS: For each item draw a circle around the number
that you
think best describes the
setting. IMPORTANT: If you think
that some people are acting or
feeling one way and other
people are acting or feeling
another way then both may be
circled. Two numbers may be circled for one item. Do not
circle more than two. Try hard to circle only one.
When people are in this setting they are:
seldom often
1. 0
1 2 3 4 5 6 7 8 tense
2. 0
1 2 3 4 5 6 7 8 satisfied
3. 0
1 2 3 4 5 6 7 8 easy going
4. 0
1 2 3 4 5 6 7 8 caring
5. 0
1 2 3 4 5 6 7 8 good
6. 0
1 2 3 4 5 6 7 8 friendly
7. 0
1 2 3 4 5 6 7 8 confident
8. 0
1 2 3 4 5 6 7 8 suspicious
9. 0
1 2 3 4 5 6 7 8 lazy
10. 0 1 2 3 4 5 6 7 8 forced to do things
11. 0 1 2 3 4 5 6 7 8 busy
12. 0 1 2 3 4 5 6 7 8 ordered around
When people are in this setting they:
13. 0 1 2 3 4 5 6 7 8 have a say about what to do
14. 0 1 2 3 4 5 6 7 8 share
15. 0 1 2 3 4 5 6 7 8 know what's going on
16. 0 1 2 3 4 5 6 7 8 think
17. 0 1 2 3 4 5 6 7 8 work together
18. 0 1 2 3 4 5 6 7 8 have high self‑esteem
19. 0 1 2 3 4 5 6 7 8 learn
20. 0 1 2 3 4 5 6 7 8 joke around
21. 0 1 2 3 4 5 6 7 8 can come and go as they want
22. 0 1 2 3 4 5 6 7 8 talk about personal problems
23. 0 1 2 3 4 5 6 7 8 have a good time
In this setting:
24. 0 1 2 3 4 5 6 7 8 things get done
25. 0 1 2 3 4 5 6 7 8 its easy to fit in
26. 0 1 2 3 4 5 6 7 8 there is conflict
27. 0 1 2 3 4 5 6 7 8 people like each other
File
Name = psscls1.sps |
get file="E:\rdda\pssstf16.sav". CLUSTER tense
satisfie easygoin caring good friendly confiden suspicio lazy forced busy
ordered whattodo share goingon think worktoge selfeste learn joke comeandg
personal goodtime thingsge easyfiti conflict peopleli /METHOD BAVERAGE /MEASURE= SEUCLID /ID=personra /PRINT SCHEDULE
CLUSTER(2) /PRINT DISTANCE /PLOT DENDROGRAM
HICICLE. |
The
above file was generated with the following clicks:
Click
Analyze
Click
Classify
Click
Hierarchical Cluster
Select
ID
Click
right delta for Label Cases By:
Select
Variables to use for clustering
Click
right delta for Variables
Click
Statistics
Select
Agglomeration
Select
Proximity Matrix
Select
Single Solution and 2 clusters
Click
Continue
Click
Plots
Select
Dendrogram
Select
Horizontal
Click
Continue
Click
Method
Click
OK
PERSONRA |
TENSE |
SATISFIED |
EASYGOING |
CARING |
GOOD |
FRIENDLY |
CONFIDENT |
SUSPICIOUS |
LAZY |
FORCED |
BUSY |
ORDERED |
WHATTODO |
SHARE |
GOINGON |
THINK |
WORKTOGETH |
SELFESTEEM |
LEARN |
JOKE |
COMEANDGO |
PERSONAL |
GOODTIME |
THINGSGET |
EASYFITIN |
CONFLICT |
PEOPLELIKE |
Barb |
2 |
6 |
6 |
8 |
6 |
7 |
6 |
1 |
2 |
2 |
5 |
1 |
7 |
7 |
7 |
6 |
7 |
7 |
6 |
6 |
5 |
4 |
6 |
6 |
7 |
3 |
6 |
John |
4 |
5 |
4 |
6 |
6 |
6 |
5 |
2 |
2 |
2 |
5 |
2 |
5 |
6 |
6 |
7 |
6 |
6 |
7 |
4 |
5 |
4 |
4 |
5 |
5 |
2 |
6 |
Leona |
2 |
5 |
5 |
7 |
6 |
7 |
6 |
1 |
1 |
1 |
6 |
2 |
6 |
6 |
7 |
7 |
6 |
6 |
7 |
5 |
5 |
4 |
4 |
6 |
7 |
5 |
6 |
Leslie |
3 |
5 |
5 |
7 |
5 |
6 |
6 |
2 |
2 |
2 |
5 |
2 |
6 |
6 |
7 |
6 |
6 |
6 |
6 |
6 |
5 |
5 |
6 |
6 |
6 |
4 |
6 |
Nolita |
3 |
5 |
5 |
7 |
6 |
7 |
6 |
2 |
3 |
3 |
5 |
3 |
6 |
6 |
6 |
5 |
6 |
6 |
5 |
5 |
4 |
6 |
5 |
6 |
6 |
4 |
6 |
Reece |
3 |
5 |
5 |
6 |
6 |
6 |
5 |
2 |
2 |
2 |
5 |
2 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
4 |
5 |
5 |
4 |
5 |
6 |
4 |
6 |
Ruth |
4 |
5 |
4 |
5 |
5 |
5 |
6 |
3 |
2 |
3 |
6 |
4 |
5 |
6 |
6 |
6 |
6 |
5 |
5 |
4 |
4 |
4 |
4 |
6 |
4 |
5 |
5 |
Sue |
4 |
5 |
4 |
7 |
6 |
6 |
6 |
2 |
2 |
4 |
6 |
3 |
5 |
6 |
6 |
6 |
6 |
6 |
6 |
4 |
3 |
4 |
4 |
7 |
5 |
4 |
6 |
Couns |
4 |
5 |
5 |
7 |
6 |
6 |
6 |
2 |
2 |
4 |
5 |
4 |
5 |
6 |
6 |
5 |
6 |
5 |
4 |
5 |
4 |
6 |
4 |
5 |
5 |
4 |
6 |
Cluster
Average Linkage (Between
Groups)
Dendrogram
_
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *
Dendrogram using Average Linkage (Between
Groups)
Rescaled
Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Num
+---------+---------+---------+---------+---------+
Nolita 5
òûòòòòòòòòòòòòòòòòòòòòòòòø
Couns 9 ò÷ ùòòòòòø
Ruth 7 òòòòòòòòòòòûòòòòòòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòø
Sue 8 òòòòòòòòòòò÷ ó ó
John 2 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó
Reece 6 ò÷ ó
Barb 1 òòòòòòòòòûòòòòòòòòòø
ó
Leslie 4
òòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷
Leona 3 òòòòòòòòòòòòòòòòòòò÷
The
next analysis request four clusters.
File
Name = psscls2.sps |
get
file = '\rdda\pssstf16.sav'. cluster
tense to peoplelia /id=personra /print=distance /print=schedule cluster(4) /plot=dendrogram hicicle. |
Cluster
>Warning # 708 in column 18. Text: PEOPLELIA
>A variable name is more than 8 characters
long. Only the first 8
>characters will be used.
Average Linkage (Between
Groups)
Dendrogram
_
* * * * * * H I E R A R C H
I C A L C L U S T E R A N A L Y S I S * * * * * *
Dendrogram using Average Linkage (Between
Groups)
Rescaled
Distance Cluster Combine
C A S E 0 5 10 15 20
25
Label Num
+---------+---------+---------+---------+---------+
Nolita 5
òûòòòòòòòòòòòòòòòòòòòòòòòø
Couns 9 ò÷ ùòòòòòø
Ruth 7 òòòòòòòòòòòûòòòòòòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòø
Sue 8 òòòòòòòòòòò÷ ó ó
John 2 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó
Reece 6 ò÷ ó
Barb 1 òòòòòòòòòûòòòòòòòòòø
ó
Leslie 4
òòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷
Leona 3 òòòòòòòòòòòòòòòòòòò÷
Transposing a File
Click on Data; Click on Transpose; Click on PERSONA; Click on delta to Variable Name; Select remaining variables; Click on delta to Variables; Click OK. SAVE AS pssstf18.sav.
ITEM |
BARBE |
JOHN |
LEONA |
LESLIE |
NOLITA |
REECE |
RUTH |
SUE |
COUNS |
TENSE |
2 |
4 |
2 |
3 |
3 |
3 |
4 |
4 |
4 |
SATIS |
6 |
5 |
5 |
5 |
5 |
5 |
5 |
5 |
5 |
EASY |
6 |
4 |
5 |
5 |
5 |
5 |
4 |
4 |
5 |
CARE |
8 |
6 |
7 |
7 |
7 |
6 |
5 |
7 |
7 |
GOOD |
6 |
6 |
6 |
5 |
6 |
6 |
5 |
6 |
6 |
FRIEND |
7 |
6 |
7 |
6 |
7 |
6 |
5 |
6 |
6 |
CONFI |
6 |
5 |
6 |
6 |
6 |
5 |
6 |
6 |
6 |
SUSP |
1 |
2 |
1 |
2 |
2 |
2 |
3 |
2 |
2 |
LAZY |
2 |
2 |
1 |
2 |
3 |
2 |
2 |
2 |
2 |
FORCED |
2 |
2 |
1 |
2 |
3 |
2 |
3 |
4 |
4 |
BUSY |
5 |
5 |
6 |
5 |
5 |
5 |
6 |
6 |
5 |
ORDER |
1 |
2 |
2 |
2 |
3 |
2 |
4 |
3 |
4 |
WHATDO |
7 |
5 |
6 |
6 |
6 |
6 |
5 |
5 |
5 |
SHARE |
7 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
GOON |
7 |
6 |
7 |
7 |
6 |
6 |
6 |
6 |
6 |
THINK |
6 |
7 |
7 |
6 |
5 |
6 |
6 |
6 |
5 |
WORKT |
7 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
SELFE |
7 |
6 |
6 |
6 |
6 |
6 |
5 |
6 |
5 |
LEARN |
6 |
7 |
7 |
6 |
5 |
6 |
5 |
6 |
4 |
JOKE |
6 |
4 |
5 |
6 |
5 |
4 |
4 |
4 |
5 |
COMEGO |
5 |
5 |
5 |
5 |
4 |
5 |
4 |
3 |
4 |
PERSONAL |
4 |
4 |
4 |
5 |
6 |
5 |
4 |
4 |
6 |
TOODT |
6 |
4 |
4 |
6 |
5 |
4 |
4 |
4 |
4 |
THINGSD |
6 |
5 |
6 |
6 |
6 |
5 |
6 |
7 |
5 |
EASYF |
7 |
5 |
7 |
6 |
6 |
6 |
4 |
5 |
5 |
CONFLCT |
3 |
2 |
5 |
4 |
4 |
4 |
5 |
4 |
4 |
PEOPLL |
6 |
6 |
6 |
6 |
6 |
6 |
5 |
6 |
6 |
File
Name = psscls3.sps |
get
file = '\rdda\pssstf18.sav'. cluster
barb to couns /id=case_lbl /print=distance /print=schedule cluster(3) /plot=dendrogram hicicle. |
Cluster
>Warning # 708 in column 18. Text: PEOPLELIA
>A variable name is more than 8 characters
long. Only the first 8
>characters will be used.
Average Linkage (Between
Groups)
Dendrogram
_
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *
Dendrogram using Average Linkage (Between
Groups)
Rescaled Distance
Cluster Combine
C
A S E 0 5 10 15 20 25
Label Num
+---------+---------+---------+---------+---------+
Nolita 5 òûòòòòòòòòòòòòòòòòòòòòòòòø
Couns 9 ò÷ ùòòòòòø
Ruth 7 òòòòòòòòòòòûòòòòòòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòø
Sue 8 òòòòòòòòòòò÷ ó ó
John 2 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó
Reece 6 ò÷ ó
Barb 1 òòòòòòòòòûòòòòòòòòòø ó
Leslie 4 òòòòòòòòò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷
Leona 3 òòòòòòòòòòòòòòòòòòò÷
Cluster
Average Linkage (Between
Groups)
Dendrogram
Dendrogram using Average Linkage (Between
Groups)
Rescaled Distance
Cluster Combine
C A S
E 0 5 10 15 20 25
Label Num
+---------+---------+---------+---------+---------+
SHARE 14 òø
WORKTOGE 17 òú
SELFESTE 18 òú
GOOD 5 òôòø
PEOPLELI 27 òú ó
FRIENDLY 6 òú ó
GOINGON 15 ò÷ ó
CONFIDEN 7 òûò÷
THINGSGE 24 ò÷ ó
WHATTODO 13 òûòüòø
EASYFITI 25 ò÷ ó ó
THINK 16 òûò÷ ùòø
LEARN 19 ò÷ ó ó
SATISFIE 2 òûòòò÷ ùòòòòòø
BUSY 11 ò÷ ó ó
CARING 4 òòòòòòò÷ ó
JOKE 20 òø
ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø
GOODTIME 23 òôòø ó
ó
EASYGOIN 3 ò÷ ùòø ó
ó
COMEANDG 21 òòò÷ ùòòòø ó
ó
PERSONAL 22 òòòòò÷ ùòòò÷
ó
CONFLICT 26 òòòòòòòòò÷ ó
SUSPICIO 8 òûòòòòòø ó
LAZY 9 ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷
FORCED 10 òûòø ó
ORDERED 12 ò÷ ùòòò÷
TENSE 1 òòò÷
The purpose of this section is to show the relationships
among correlation and cluster analysis.
In this example 4 people have taken 4 tests (tests are like
variables). The data are as follows:
The purpose of this next section
is twofold: (1) to demonstrate another method of the use of the statistics and
(2) compare the various statistics methodologically.
The purpose of this section is to
show the relationships between correlation (and factor analysis), and cluster
analysis. In this example 4 people have
taken 4 tests (tests are like variables).
The data are as follows:
CLSDAT1.TXT |
"PER1",2,3,5,2 "PER2",3,2,6,3 "PER3",2,3,5,3 "PER4",3,2,6,2 |
The data is presented graphically:
Correlation of variables (and
consequently factor analysis) will indicate the similarity of tests in terms of
their relative position of each individual on the test, while cluster
analysis will indicate the similarity of tests using the absolute
position difference of each individual on the test. The correlation is presented:
File Name = crscor16.sps |
get file = '\proeval\CLSDAT1.sav' /keep= PERs TEST1 TEST2 TEST3 TEST4. COR TEST1 TO TEST4 /STATISTICS=all. |
Correlations
In Frame CRSCOR16.LIS TEST1,
TEST2, and TEST3 all correlate perfectly with each other, even though test 2 is
negatively correlated with the other two.
Test4 correlates zero with all three tests. It can be seen in the graphic that the
profiles of TEST1 and TEST3 are identical even though are separated in terms of
distance. TEST2 is the mirror image of
the other two. TEST4 although close in
proximal distance to TEST1 and TEST2 is quite dissimilar in terms of the
relative shape or profile.
Factor analysis shows how this
small set of variable can be summarized.
It should be noted that there are not nearly enough variables in this
set for what would be considered appropriate; there should be at a minimum 40
subjects to compute this analysis. The
purpose of this example is to show the differential effects of factor analysis
and cluster analysis. As indicated the
two analysis are similar in that they both summarize the possible underlying
characteristics of a set of variables thus simplifying and consequently
obtaining more parsimony. However, the
summarization process is somewhat different for the two processes and this
demonstration is designed to show.
File Name = crsfac8.sps |
get file =
'\proeval\CLSDAT1.sav' /keep= PERs TEST1 TEST2 TEST3 TEST4. fac var= test1 to test4 / rotation. |
┌───────────────────────────────────────────────────────────────────────────┐
│ CRSFAC8.SPS │
├───────────────────────────────────────────────────────────────────────────┤
│Final Statistics: │
│
│
│Variable Communality *
Factor Eigenvalue Pct of Var
Cum Pct │
│ * │
│TEST1 1.00000 *
1 3.00000 75.0 75.0
│
│TEST2 1.00000 *
2 1.00000 25.0 100.0
│
│TEST3 1.00000 * │
│TEST4 1.00000 *
│
│
│
│Varimax Rotation
1, Extraction 1,
Analysis 1 ‑ Kaiser
Normalization.│
│ │
│ Varimax converged in 2 iterations. │
│
│
│Rotated Factor
Matrix: │
│
│
│ FACTOR 1
FACTOR 2 │
│
│
│TEST1 1.00000 .00000 │
│TEST2 ‑1.00000 .00000 │
│TEST3 1.00000 .00000 │
│TEST4 .00000 1.00000 │
└───────────────────────────────────────────────────────────────────────────┘
TEST1,
TEST2, and TEST3 form the first factor and TEST4 forms a factor of its
own. Further, the first three variables
are perfectly correlated with the first factor.
However, TEST2 is negatively correlated with the factor. The relative weights are perfectly related.
The
cluster analysis is presented. It is
necessary to invert the data in order for the analyses to be comparable as
shown in Frame CLSDAT2.TXT. Frame
CRSCLS7.SPS contains the jobstream and Frame CRSCLS7.LIS contains the output.
CLSDAT2.sav |
"TEST1",2,3,2,3 "TEST2",3,2,3,2 "TEST3",5,6,5,6 "TEST4",2,3,3,2 |
File Name = crscls7.sps |
get file = '\proeval\CLSDAT2.sav' /keep= ID PER1 PER2 PER3 PER4. cluster PER1 TO PER4 /id=ID /print=distance /print=schedule cluster(2) /plot=dendrogram hicicle. |
┌───────────────────────────────────────────────────────────────────────────┐
│ CRSCLS7.LIS │
├───────────────────────────────────────────────────────────────────────────┤
│ Squared Euclidean measure
used.
│
│
│
│ 1 Agglomeration
method specified. │
│ │
│ Squared Euclidean
Dissimilarity Coefficient Matrix │
│
│
│ Case 1 2 3 │
│
│
│ 2 4.0000
│
│ 3
36.0000 40.0000 │
│ 4 2.0000 2.0000 38.0000 │
│
│
│
│
│ Number of
Clusters │
│
│
│ Label
Case 2
│
│
│
│ TEST1
1 1
│
│ TEST2
2 1
│
│ TEST3
3 2
│
│ TEST4
4 1
│
│
│
│
│
│ │
│ Dendrogram using
Average Linkage (Between Groups) │
│
│
│ Rescaled Distance
Cluster Combine │
│
│
│ C A S E 0
5 10 15 20 25
│
│ Label
Seq +‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+ │
│
│
│ TEST2
2 ‑+
│
│ TEST4
4 ‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+ │
│ TEST1
1 ‑+
| │
│ TEST3
3 ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+ │
└───────────────────────────────────────────────────────────────────────────┘
Note
in the cluster analysis that there are also two clusters representing the four
variables but they are constructed of different variables or tests than the
factor analysis. TEST1, TEST2, and TEST4
make up cluster1 and TEST3 is in a cluster alone. The calculations below show that in the
correlation (factor analysis) a relative relationship among variables and
cluster analysis assesses an absolute relationship.
A
more detailed inspection of the analysis will demonstrate the differences. The following jobstream and output shows how
the correlation and factor analysis operate in relative terms.
File
Name = crslis1.sps |
get
file = '\proeval\CLSDAT1.sav' /keep= PERs TEST1 TEST2 TEST3 TEST4. COMPUTE
T1LX = TEST1 ‑ 2.5. COMPUTE
T2LX = TEST2 ‑ 2.5. COMPUTE
T3LX = TEST3 ‑ 5.5. COMPUTE
T4LX = TEST4 ‑ 2.5. COMPUTE
T1LX2=T1LX*T1LX. COMPUTE
T2LX2=T2LX*T2LX. COMPUTE
T3LX2=T3LX*T3LX. COMPUTE
T4LX2=T4LX*T3LX. COMPUTE
T1LXT2LY=T1LX*T2LX. COMPUTE
T1LXT3LY=T1LX*T3LX. COMPUTE
T1LXT4LY=T1LX*T4LX. LIST
T1LX T2LX T3LX T4LX . LIST
T1LX2 T2LX2 T3LX2 T4LX2 T1LXT2LY T1LXT3LY T1LXT4LY. |
CRSLIS1.LIS |
T1LX T2LX T3LX T4LX ‑.50 .50 ‑.50 ‑.50 .50 ‑.50 .50 .50 ‑.50 .50 ‑.50 .50 .50 ‑.50 .50 ‑.50 T1LX2 T2LX2
T3LX2 T4LX2 T1LXT2LY T1LXT3LY
T1LXT4LY .25 .25
.25 .25 ‑.25 .25
.25 .25 .25
.25 .25 ‑.25 .25
.25 .25 .25
.25 ‑.25 ‑.25 .25
‑.25 .25 .25
.25 ‑.25 ‑.25 .25
‑.25
|
Recall that the formula for the
correlation is:
Note that all the little x scores
are either -.5 or +.5 indicating that the differences from the means are the
same for all cases. That is true for the
scores on TEST3 on the plot is considerably distant from the other tests. The scores are the differnence from their own
mean so that the distance between tests will be lost. Each score represents a difference from the
mean for that variable (in this example a test), however, the relative
distribution of the cases for that test will remain. Consequently, the correlation for TEST1 and
TEST2 is:
While the correlation between
TEST1 and TEST3 is:
And one more example of the
relationship between TEST1 and TEST4.
In this instance TEST1, TEST2,
and TEST3 are similar while TEST4 is different.
A look at cluster analysis tells
a different story.
File Name = crslis2.sps |
get file =
'\proeval\CLSDAT1.sav' /keep= PERs TEST1 TEST2 TEST3 TEST4. COMPUTE dif12 = TEST1 ‑
TEST2. COMPUTE dif13 = TEST1 ‑
TEST3. COMPUTE dif14 = TEST1 ‑
TEST4. compute dif12s=dif12*dif12. compute dif13s=dif13*dif13. compute dif14s=dif14*dif14. LIST dif12 dif13 dif14 dif12s
dif13s dif14s. |
CRSLIS2.SPS |
DIF12 DIF13 DIF14 DIF12S DIF13S DIF14S ‑1.00 ‑3.00 .00 1.00 9.00 .00 1.00 ‑3.00 .00 1.00 9.00 .00 ‑1.00 ‑3.00 ‑1.00 1.00 9.00 1.00 1.00
‑3.00 1.00 1.00
9.00 1.00 |
First note that in the absolute
differences between TEST1 and TEST2 (TEST1 minus TEST2) are all 1. However, half of them are in one direction
and the other half are in the opposite direction (note the minus signs). The differences square and summed equal
4. The differences between TEST1 and
TEST3 are all -3; the values squared and summed equal 36 indicating the most
dissimilarity. In the correlation
analysis these latter two variables had a perfect correlation. On the other hand tests 1 and 4 show the most
similarity where their squared differences cumulate to only 2. In the correlation analysis these two
variables had a correlation of zero indicating the relative positions to be the
most dissimilar. [The point of this is
for the investigator to decide what question is being asked.]
***
There is a difference in profile
but also a difference in that profiles can be opposite and still be a part of
the same factor (negatively related to the factor).
***
It might be useful at this point
to compare and contrast the various statistical procedures used in this
set. From a practical point of view
different techniques were selected and it might be useful to note why they were
selected for the various questions.
This chapter is provided to show
similarities and differences between the various statistical procedures.
This data set to be used is made
up of ratings of personality theories by 12 to 16 raters. The questionnaire used for these rating
follows:
Personality
Theory Rating Scale
Name:
_________________________________________
Date: ________________
Use the scale below to rate the
personality theory of ____________________________.
╔═══════════════════════════════════════════════════════════════════════════╗
║ None A Little Somewhat Quite a Bit A Lot ║
╟───────────────────────────────────────────────────────────────────────────╢
║ 0 1
2 3 4
5 6 7
8 ║
╚═══════════════════════════════════════════════════════════════════════════╝
╔══════════════════════╗
║LEAVE THE
QUESTION ║
║BLANK IF YOU
DON'T ║
║KNOW OR IF IT DOESN'T ║
║APPLY. ║
╚══════════════════════╝
ACCORDING
TO THIS THEORY:
_____
...motivation is based on drive reduction.
_____
...the person is an intentional (goal-oriented) being.
_____
...people are hedonistic.
_____
...cognition accounts for the actions of people.
_____
...values account for the actions of people.
_____
...people are actively involved in the development of their personality.
_____
...people's early experiences influence their personality.
_____
...the person imposes perception on the world.
_____
...the environment or learning accounts for the person's actions.
_____
...people are basically good.
_____
...heredity effects the person's actions.
_____
This theory stresses the individual's conscious view of the world.
_____
This theory stresses the individual's unconscious view of the world.
_____
This theory stresses the individual's social consciousness.
_____
This theory accounts for the individual's perception of reality.
_____
This theory has influenced psychology (clinical, research, literature).
_____
This theory focus on "the here and now", the past, or the
future.
(0 = past, 4 = here and now, 8 =
future)
_____
This theory is empirically based.
_____
This theory is parsimonious.
_____
This theory assumes that the individual has free choice.
_____
This theory employs a method of therapeutic intervention.
_____
This theory emphasizes psychopathology.
_____
I agree with this theory.
The names for the respective
items are as follows:
TDATE
THER
THID
CLUS
DRIVE
GOAL
HEDON
COG
VALUE
ACTIVE
EARLY
IMPOSE
LEARN
GOOD
HERED
CONSCI
UNCONS
SOCIAL
PERCEP
INFLU
TIME
DATA
PARSI
FREE
THERA
PATH
AGREE
The theorists rated were:
Freud Sigmund Freud
ADLER Alfred Adler
JUNG Carl Jung
ROGERS Carl Rogers
KELLY George Kelly
HORNEY Karen Horney
SULLIVI Harry Stack Sullivan
BANDURA Albert Bandura
CATTELL Raymond B. Cattell
MASLOW Abraham Maslow
BINSWAN Ludwig Binswanger
ERIKSON Erik Erikson
This data was part of a graduate
student class assignment for students taking a theories of personality
class. Each week the students read the
assignments and completed the questionnaire the day before the class
meeting. There were 17 students enrolled
in the class, however, not all students complete the forms each week and
consequently there is some missing data.
There were ___ completed forms.
In this first example the items
of the questionnaire are grouped using factor analysis. Recall that in this condition the items with
similar profiles will be grouped together (into factors); not necessarily the
items that are closest in distance (refer to the above discussion). The data is in a dBase IV file with 9
indicating that data was omitted. As can
be seen mostly defaults were used in the computer run (see Frame PERFAC5.SPS)
and a principle components extraction method was used and the rotation was
orthoginal. Using the eigenvalue of 1.00
is usually not considered the best method of deciding upon the number of
factors; however, both interpretation and the scree method seemed also to
indicate 5 factors.
File Name = perfac5.sps |
get file=
'\proeval\perall4.sav'/keep= tDATE THER
THID CLUS DRIVE GOAL HEDON COG
VALUE ACTIVE EARLY
IMPOSE LEARN GOOD HERED CONSCI
UNCONS SOCIAL PERCEP
INFLU TIME DATA PARSI FREE
THERA PATH AGREE . missing values drive to agree
(9). fac var= drive to agree /missing=pairwise /plot=eigen /criteria=factors(5) /rotate. |
┌────────────────────────────────────────────────────────────────────────────┐
│
PERFAC5.LIS
│
├────────────────────────────────────────────────────────────────────────────┤
│Final Statistics: │
│
│
│Variable Communality *
Factor Eigenvalue Pct of Var
Cum Pct │
│ * │
│DRIVE .54238 *
1 6.98937 30.4 30.4 │
│GOAL .50485 *
2 2.15730 9.4 39.8 │
│HEDON .54444 *
3 1.72904 7.5 47.3 │
│COG .56063 *
4 1.47348 6.4 53.7 │
│VALUE .66169 *
5 1.32890 5.8 59.5 │
│ACTIVE .70979 *
│
│EARLY .58670 *
│
│IMPOSE .64661 *
│
│LEARN .58716 *
│
│GOOD .51995
*
│
│HERED .58137 *
│
│CONSCI .64024 *
│
│UNCONS .68112 *
│
│SOCIAL .61566 *
│
│PERCEP .61891 *
│
│INFLU .59501 * │
│TIME .58200 *
│
│DATA .56921 *
│
│PARSI .60125 * │
│FREE .61128 *
│
│THERA .64608 *
│
│PATH .52881 *
│
│AGREE .54294
*
│
│
│
│Rotated Factor
Matrix:
│
│ │
│ FACTOR 1
FACTOR 2 FACTOR
3 FACTOR 4
FACTOR 5 │
│
│
│DRIVE ‑.67035** ‑.10424 ‑.12588 ‑.21679 .13893
│
│GOAL .44300 .44580* .16215 .17344 .23128
│
│HEDON ‑.72226** ‑.01600 .14498 .01324 .03653
│
│COG .50422* .28914 .40228 .23887 ‑.06251 │
│VALUE .15529 .79294** ‑.08091 ‑.04701 .00768
│
│ACTIVE .58000** .41073 .21364 .39876 ‑.00630 │
│EARLY ‑.69231** .27344 ‑.13863 .07009 .09220
│
│IMPOSE .22239
.23344 ‑.10607 .72878** .01706
│
│LEARN .02767 .45879 .49137* .21355 ‑.29809 │
│GOOD .57563** .41920 .00750 ‑.00350 .11316
│
│HERED .10169 .28821 ‑.34325 ‑.60077** ‑.09606 │
│CONSCI .55734** .40202 .29750 .26467 ‑.09712 │
│UNCONS ‑.48833* ‑.19803 ‑.48205 ‑.38498 .15119
│
│SOCIAL ‑.05895 .71266** .26140 .18852 .02080
│
│PERCEP .29944 .16227 ‑.10921 .69839** .05684
│
│INFLU ‑.10405 .01029 .21463 ‑.17453 .71242**│
│TIME .72841** .04942 .11085 .18045 ‑.06419 │
│DATA .29151 .04499 .63344** ‑.20723 .19498
│
│PARSI .05321 .06473 .76207** .00803 .11581
│
│FREE .51295* .32588 .28510 .39853 .04315
│
│THERA ‑.13541 ‑.11914 ‑.26013 .24730 .69622**│
│PATH ‑.51195* ‑.08011 ‑.40072 ‑.11859 .29269
│
│AGREE ‑.01068 .34696 .25436 .21198 .55930**│
└────────────────────────────────────────────────────────────────────────────┘
We were somewhat arbitrary in
selecting 5 factors in this solution so that it would match with the five
cluster solution in the cluster analysis solution that follows. It should be noted that one should not be so
casual in determining the number of factors in a solution; the reader is
referred to chapter __ when testing for the number of factors. In developing theory the researcher may do
that in an armchair fashion, reviewing the literature or with exploratory
factor analysis. The major purpose here
to compare factor analysis with cluster analysis so that the number of factors
is done with that purpose in mind.
The factors in Figure __ are
presented in two ways: (1) the criterion of .60 is used to determine whether a
variable loads on a factor, (2) if a variable does not load on any factor then
it is placed on the factor with the highest loading.
Factor I
DRIVE -.67
HEDON -.72
EARLY -.69
TIME .73
---------
GOAL .44
COG .50
ACTIVE .58
GOOD .58
CONSCI .56
UNCONS -.49
FREE .51
PATH -.51
Factor II
VALUE .79
SOCIAL .71
-----------
GOAL .45
Factor III
DATA .63
PARSI .76
----------
LEARN .49
Factor IV
IMPOSE .73
HERED -.60
PERCEP .70
Factor V
INFLU .71
THERA .70
AGREE .56
The next example shows how
cluster analysis can be used to group the same set of data. The data needs to be conditioned before the
cluster analysis can be run. The means
are computed within each theorist for each item. For example, the first item DRIVE for all
respondents to Freud were summed and divided by the number of respondents (the
number was also rounded to the nearest integer to keep it on the same
scale). The matrix was then transposed
because the computer program requires that format for this problem. This data is presented in the frame THER11.sav.
ITEM |
FREUD |
ADLER |
JUNG |
ROGERS |
KELLY |
HORNEY |
SULLIVA |
BANDURA |
CATTELL |
MASLOW |
BINSWAN |
ERIKSON |
DRIVE |
8 |
2 |
3 |
2 |
2 |
3 |
4 |
1 |
3 |
4 |
2 |
4 |
GOAL |
4 |
7 |
5 |
7 |
7 |
5 |
5 |
6 |
5 |
7 |
5 |
6 |
HEDON |
7 |
3 |
2 |
2 |
2 |
4 |
4 |
2 |
3 |
4 |
3 |
3 |
COG |
3 |
6 |
4 |
6 |
7 |
4 |
5 |
7 |
5 |
6 |
6 |
6 |
VALUE |
4 |
6 |
5 |
6 |
4 |
4 |
5 |
5 |
4 |
6 |
6 |
6 |
ACTIVE |
2 |
7 |
5 |
7 |
7 |
5 |
5 |
6 |
5 |
6 |
7 |
6 |
EARLY |
8 |
7 |
4 |
5 |
4 |
6 |
6 |
5 |
4 |
5 |
4 |
7 |
IMPOSE |
4 |
6 |
4 |
7 |
7 |
5 |
6 |
5 |
5 |
6 |
7 |
6 |
LEARN |
3 |
6 |
3 |
5 |
5 |
6 |
6 |
7 |
6 |
5 |
5 |
6 |
GOOD |
2 |
5 |
5 |
8 |
5 |
4 |
4 |
5 |
4 |
6 |
4 |
6 |
HERED |
3 |
4 |
5 |
4 |
2 |
3 |
3 |
2 |
5 |
4 |
3 |
4 |
CONSCI |
2 |
6 |
5 |
6 |
6 |
4 |
5 |
6 |
5 |
6 |
6 |
6 |
UNCONS |
8 |
2 |
7 |
3 |
2 |
6 |
4 |
2 |
4 |
3 |
2 |
5 |
SOCIAL |
4 |
7 |
3 |
6 |
5 |
5 |
6 |
6 |
5 |
5 |
5 |
6 |
PERCEP |
5 |
6 |
5 |
7 |
7 |
5 |
6 |
6 |
5 |
6 |
7 |
5 |
INFLU |
8 |
5 |
5 |
7 |
4 |
3 |
5 |
6 |
5 |
6 |
4 |
5 |
TIME |
0 |
5 |
5 |
4 |
5 |
3 |
4 |
4 |
5 |
5 |
5 |
3 |
DATA |
3 |
3 |
2 |
4 |
4 |
2 |
4 |
6 |
6 |
3 |
2 |
4 |
PARSI |
4 |
5 |
3 |
5 |
6 |
4 |
4 |
5 |
5 |
5 |
3 |
5 |
FREE |
2 |
5 |
3 |
7 |
7 |
5 |
4 |
6 |
4 |
6 |
7 |
5 |
THERA |
7 |
5 |
6 |
7 |
6 |
5 |
6 |
5 |
3 |
3 |
5 |
5 |
PATH |
7 |
3 |
5 |
3 |
3 |
6 |
5 |
3 |
4 |
3 |
4 |
4 |
AGREE |
5 |
5 |
4 |
5 |
5 |
5 |
5 |
5 |
4 |
5 |
4 |
5 |
File
Name = percls3.sps |
get
file = '\proeval\ther11.sav'/keep= ITEM FREUD
ADLER JUNG ROGERS KELLY
HORNEY SULLIVA BANDURA
CATTELL MASLOW BINSWAN
ERIKSON. cluster
freud to erikson /id=item /print=distance /print=schedule cluster(5) /plot=dendrogram hicicle. |
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *
Dendrogram using Average Linkage (Between
Groups)
Rescaled Distance
Cluster Combine
C
A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
IMPOSE 8 òûòø
PERCEP 15 ò÷ ó
GOAL 2 òòòüòø
COG 4 òø ó ó
CONSCI 12 òôò÷ ùòòòø
ACTIVE 6 ò÷ ó ùòø
FREE 20 òòòòò÷ ó ó
VALUE 5 òòòòòòòûò÷ ùòòòòòòòòòø
GOOD 10 òòòòòòò÷ ó
ó
LEARN 9
òûòòòòòø ó ó
SOCIAL 14 ò÷ ùòòò÷ ùòòòòòòòòòòòø
PARSI 19 òûòòòòò÷ ó ó
AGREE 23 ò÷ ó ó
INFLU 16 òòòòòòòòòòòòòø ó
ùòòòòòòòòòòòòòòòø
THERA 21 òòòòòòòòòòòòòüòòòòòòò÷ ó ó
EARLY 7 òòòòòòòòòòòòò÷ ó ó
HERED 11 òòòòòòòòòòòòòòòûòòòø ó ó
TIME 17 òòòòòòòòòòòòòòò÷ ùòòòòòòòòòòòòò÷ ó
DATA 18 òòòòòòòòòòòòòòòòòòò÷ ó
DRIVE 1 òûòòòòòòòòòòòòòø ó
HEDON 3 ò÷
ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷
UNCONS 13 òòòòòûòòòòòòòòò÷
PATH 22 òòòòò÷
If
five factors are chosen (to be comparable to the 5 factor solution above) there
are as follows:
Cluster
1
IMPOSE
PERCEP
GOAL
COG
CONSCI
ACTIVE
FREE
VALUE
GOOD
LEARN
SOCIAL
PARSI
AGREE
Cluster
2
INFLU
THERA
EARLY
Cluster
3
HERED
TIME
DATA
Cluster
4
DATA
Cluster
5
DRIVE
HEDON
UNCONS
PATH
The
first question is whether there is a difference between the factor analysis
solution and the cluster analysis solution?
There is not a test of significance that can be run [or would Chi Square
be appropriate? there is the problem of what is a match is it two or more
variables in the same group; cluseter or factor or must all overlap] so it
mostly a matter determining whether appears that the solution are the same or
different. If one chooses to the
criteria of two or more variables in the same group then it does not look too
bad. Four variables from cluster 1 can
be found in factor 1; 3 variables from cluster 1 can be found in factor 2 (all
of factor 2); 2 variables in cluster 2 can be found in factor 5; and 4
variables in cluster 5 can be found in factor 1. That is 16 variables that overlap and 8
variables that do not [something wrong with this count]. That does give some indication that there is
some fit of the two methods. However,
cluster 3 does not have any variables that are shared in any of the factors and
factor 4 does not have any variables that are shared in any of the clusters. Further, cluster 1 and factor 1 are
fragmented across the two methods.
Finally, if one tries to develop a taxonomy from the two methods it
would seem to be different for the two methods.
[COMMENT1]The graph with the 4 people is clsdat3.cdr