Cluster Analysis

Merle Canfield

The purpose of cluster analysis is to categorize a set of objects, variables, or people by placing them into clusters based on their similarity or differences.  It is a method for developing a taxonomy.  If variables are clustered then it is like factor analysis although there are differences that will be demonstrated below.  In fact, a correlation may used in cluster analysis.

The questionnaire on the next page is used in this cluster analysis example.  Like discriminant function, cluster analysis is a method of grouping individuals, or variables.  In the first example individuals are clustered and in the second example variables are clustered.  The same set of data is used in both examples.

The difference between discriminant function and cluster analysis is that discriminant function the groups are known and the task is to identify the variables that will predict which group the individual should be assigned to, while in cluster analysis the groups are empirically derived from the variables.

SETTING QUESTIONNAIRE

NAME ________________________________________   DATE ___________________

SETTING _____________________________________   TIME ___________________

INSTRUCTIONS:  For each item draw a circle around the number that you

think best describes the setting.  IMPORTANT:  If you think

that some people are acting or feeling one way and other

people are acting or feeling another way then both may be

circled.  Two numbers may be circled for one item.  Do not

circle more than two.  Try hard to circle only one.

When people are in this setting they are:

seldom      often

1.     0 1 2 3 4 5 6 7 8  tense

2.     0 1 2 3 4 5 6 7 8  satisfied

3.     0 1 2 3 4 5 6 7 8  easy going

4.     0 1 2 3 4 5 6 7 8  caring

5.     0 1 2 3 4 5 6 7 8  good

6.     0 1 2 3 4 5 6 7 8  friendly

7.     0 1 2 3 4 5 6 7 8  confident

8.     0 1 2 3 4 5 6 7 8  suspicious

9.     0 1 2 3 4 5 6 7 8  lazy

10.     0 1 2 3 4 5 6 7 8  forced to do things

11.     0 1 2 3 4 5 6 7 8  busy

12.     0 1 2 3 4 5 6 7 8  ordered around

When people are in this setting they:

13.     0 1 2 3 4 5 6 7 8  have a say about what to do

14.     0 1 2 3 4 5 6 7 8  share

15.     0 1 2 3 4 5 6 7 8  know what's going on

16.     0 1 2 3 4 5 6 7 8  think

17.     0 1 2 3 4 5 6 7 8  work together

18.     0 1 2 3 4 5 6 7 8  have high self‑esteem

19.     0 1 2 3 4 5 6 7 8  learn

20.     0 1 2 3 4 5 6 7 8  joke around

21.     0 1 2 3 4 5 6 7 8  can come and go as they want

22.     0 1 2 3 4 5 6 7 8  talk about personal problems

23.     0 1 2 3 4 5 6 7 8  have a good time

In this setting:

24.     0 1 2 3 4 5 6 7 8  things get done

25.     0 1 2 3 4 5 6 7 8  its easy to fit in

26.     0 1 2 3 4 5 6 7 8  there is conflict

27.     0 1 2 3 4 5 6 7 8  people like each other

 File Name = psscls1.sps get file="E:\rdda\pssstf16.sav". CLUSTER  tense satisfie easygoin caring good friendly confiden suspicio lazy   forced busy ordered whattodo share goingon think worktoge selfeste learn   joke comeandg personal goodtime thingsge easyfiti conflict peopleli   /METHOD BAVERAGE   /MEASURE= SEUCLID   /ID=personra   /PRINT SCHEDULE CLUSTER(2)   /PRINT DISTANCE   /PLOT DENDROGRAM HICICLE.

The above file was generated with the following clicks:

Click Analyze

Click Classify

Click Hierarchical Cluster

Select ID

Click right delta for Label Cases By:

Select Variables to use for clustering

Click right delta for Variables

Click Statistics

Select Agglomeration

Select Proximity Matrix

Select Single Solution and 2 clusters

Click Continue

Click Plots

Select Dendrogram

Select Horizontal

Click Continue

Click Method

Click OK

 PERSONRA TENSE SATISFIED EASYGOING CARING GOOD FRIENDLY CONFIDENT SUSPICIOUS LAZY FORCED BUSY ORDERED WHATTODO SHARE GOINGON THINK WORKTOGETH SELFESTEEM LEARN JOKE COMEANDGO PERSONAL GOODTIME THINGSGET EASYFITIN CONFLICT PEOPLELIKE Barb 2 6 6 8 6 7 6 1 2 2 5 1 7 7 7 6 7 7 6 6 5 4 6 6 7 3 6 John 4 5 4 6 6 6 5 2 2 2 5 2 5 6 6 7 6 6 7 4 5 4 4 5 5 2 6 Leona 2 5 5 7 6 7 6 1 1 1 6 2 6 6 7 7 6 6 7 5 5 4 4 6 7 5 6 Leslie 3 5 5 7 5 6 6 2 2 2 5 2 6 6 7 6 6 6 6 6 5 5 6 6 6 4 6 Nolita 3 5 5 7 6 7 6 2 3 3 5 3 6 6 6 5 6 6 5 5 4 6 5 6 6 4 6 Reece 3 5 5 6 6 6 5 2 2 2 5 2 6 6 6 6 6 6 6 4 5 5 4 5 6 4 6 Ruth 4 5 4 5 5 5 6 3 2 3 6 4 5 6 6 6 6 5 5 4 4 4 4 6 4 5 5 Sue 4 5 4 7 6 6 6 2 2 4 6 3 5 6 6 6 6 6 6 4 3 4 4 7 5 4 6 Couns 4 5 5 7 6 6 6 2 2 4 5 4 5 6 6 5 6 5 4 5 4 6 4 5 5 4 6

Cluster

Dendrogram

_

* * * * * * H I E R A R C H I C A L  C L U S T E R   A N A L Y S I S * * * * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E         0         5        10        15        20        25

Label            Num  +---------+---------+---------+---------+---------+

Nolita             5   òûòòòòòòòòòòòòòòòòòòòòòòòø

Couns              9   ò÷                       ùòòòòòø

Ruth               7   òòòòòòòòòòòûòòòòòòòòòòòòò÷     ùòòòòòòòòòòòòòòòòòø

Sue                8   òòòòòòòòòòò÷                   ó                 ó

John               2   òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷                 ó

Reece              6   ò÷                                               ó

Barb               1   òòòòòòòòòûòòòòòòòòòø                             ó

Leslie             4   òòòòòòòòò÷         ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Leona              3   òòòòòòòòòòòòòòòòòòò÷

The next analysis request four clusters.

 File Name = psscls2.sps get file = '\rdda\pssstf16.sav'. cluster tense to peoplelia   /id=personra   /print=distance   /print=schedule cluster(4)   /plot=dendrogram hicicle.

Cluster

>Warning # 708 in column 18.  Text: PEOPLELIA

>A variable name is more than 8 characters long.  Only the first 8

>characters will be used.

Dendrogram

_

* * * * * * H I E R A R C H I C A L  C L U S T E R   A N A L Y S I S * * * * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E         0         5        10        15        20        25

Label            Num  +---------+---------+---------+---------+---------+

Nolita             5   òûòòòòòòòòòòòòòòòòòòòòòòòø

Couns              9   ò÷                       ùòòòòòø

Ruth               7   òòòòòòòòòòòûòòòòòòòòòòòòò÷     ùòòòòòòòòòòòòòòòòòø

Sue                8   òòòòòòòòòòò÷                   ó                 ó

John               2   òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷                 ó

Reece              6   ò÷                                               ó

Barb               1   òòòòòòòòòûòòòòòòòòòø                             ó

Leslie             4   òòòòòòòòò÷         ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Leona              3   òòòòòòòòòòòòòòòòòòò÷

Transposing a File

Click on Data; Click on Transpose; Click on PERSONA; Click on delta to Variable Name; Select remaining variables; Click on delta to Variables; Click OK.  SAVE AS pssstf18.sav.

 ITEM BARBE JOHN LEONA LESLIE NOLITA REECE RUTH SUE COUNS TENSE 2 4 2 3 3 3 4 4 4 SATIS 6 5 5 5 5 5 5 5 5 EASY 6 4 5 5 5 5 4 4 5 CARE 8 6 7 7 7 6 5 7 7 GOOD 6 6 6 5 6 6 5 6 6 FRIEND 7 6 7 6 7 6 5 6 6 CONFI 6 5 6 6 6 5 6 6 6 SUSP 1 2 1 2 2 2 3 2 2 LAZY 2 2 1 2 3 2 2 2 2 FORCED 2 2 1 2 3 2 3 4 4 BUSY 5 5 6 5 5 5 6 6 5 ORDER 1 2 2 2 3 2 4 3 4 WHATDO 7 5 6 6 6 6 5 5 5 SHARE 7 6 6 6 6 6 6 6 6 GOON 7 6 7 7 6 6 6 6 6 THINK 6 7 7 6 5 6 6 6 5 WORKT 7 6 6 6 6 6 6 6 6 SELFE 7 6 6 6 6 6 5 6 5 LEARN 6 7 7 6 5 6 5 6 4 JOKE 6 4 5 6 5 4 4 4 5 COMEGO 5 5 5 5 4 5 4 3 4 PERSONAL 4 4 4 5 6 5 4 4 6 TOODT 6 4 4 6 5 4 4 4 4 THINGSD 6 5 6 6 6 5 6 7 5 EASYF 7 5 7 6 6 6 4 5 5 CONFLCT 3 2 5 4 4 4 5 4 4 PEOPLL 6 6 6 6 6 6 5 6 6

 File Name = psscls3.sps get file = '\rdda\pssstf18.sav'. cluster barb to couns   /id=case_lbl   /print=distance   /print=schedule cluster(3)   /plot=dendrogram hicicle.

Cluster

>Warning # 708 in column 18.  Text: PEOPLELIA

>A variable name is more than 8 characters long.  Only the first 8

>characters will be used.

Dendrogram

_

* * * * * * H I E R A R C H I C A L  C L U S T E R   A N A L Y S I S * * * * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E         0         5        10        15        20        25

Label            Num  +---------+---------+---------+---------+---------+

Nolita             5   òûòòòòòòòòòòòòòòòòòòòòòòòø

Couns              9   ò÷                       ùòòòòòø

Ruth               7   òòòòòòòòòòòûòòòòòòòòòòòòò÷     ùòòòòòòòòòòòòòòòòòø

Sue                8   òòòòòòòòòòò÷                   ó                 ó

John               2   òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷                 ó

Reece              6   ò÷                                               ó

Barb               1   òòòòòòòòòûòòòòòòòòòø                             ó

Leslie             4   òòòòòòòòò÷         ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Leona              3   òòòòòòòòòòòòòòòòòòò÷

Cluster

Dendrogram

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E      0         5        10        15        20        25

Label     Num  +---------+---------+---------+---------+---------+

SHARE      14   òø

WORKTOGE   17   òú

SELFESTE   18   òú

GOOD        5   òôòø

PEOPLELI   27   òú ó

FRIENDLY    6   òú ó

GOINGON    15   ò÷ ó

CONFIDEN    7   òûò÷

THINGSGE   24   ò÷ ó

WHATTODO   13   òûòüòø

EASYFITI   25   ò÷ ó ó

THINK      16   òûò÷ ùòø

LEARN      19   ò÷   ó ó

SATISFIE    2   òûòòò÷ ùòòòòòø

BUSY       11   ò÷     ó     ó

CARING      4   òòòòòòò÷     ó

JOKE       20   òø           ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø

GOODTIME   23   òôòø         ó                                   ó

EASYGOIN    3   ò÷ ùòø       ó                                   ó

COMEANDG   21   òòò÷ ùòòòø   ó                                   ó

PERSONAL   22   òòòòò÷   ùòòò÷                                   ó

CONFLICT   26   òòòòòòòòò÷                                       ó

SUSPICIO    8   òûòòòòòø                                         ó

LAZY        9   ò÷     ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

FORCED     10   òûòø   ó

ORDERED    12   ò÷ ùòòò÷

TENSE       1   òòò÷

The purpose of this section is to show the relationships among correlation and cluster analysis.  In this example 4 people have taken 4 tests (tests are like variables).  The data are as follows:

The purpose of this next section is twofold: (1) to demonstrate another method of the use of the statistics and (2) compare the various statistics methodologically.

The purpose of this section is to show the relationships between correlation (and factor analysis), and cluster analysis.  In this example 4 people have taken 4 tests (tests are like variables).  The data are as follows:

 CLSDAT1.TXT "PER1",2,3,5,2    "PER2",3,2,6,3    "PER3",2,3,5,3         "PER4",3,2,6,2

The data is presented graphically:

[COMMENT1]

Correlation of variables (and consequently factor analysis) will indicate the similarity of tests in terms of their relative position of each individual on the test, while cluster analysis will indicate the similarity of tests using the absolute position difference of each individual on the test.  The correlation is presented:

 File Name = crscor16.sps get file = '\proeval\CLSDAT1.sav'     /keep= PERs  TEST1 TEST2 TEST3 TEST4. COR TEST1 TO TEST4    /STATISTICS=all.

Correlations

In Frame CRSCOR16.LIS TEST1, TEST2, and TEST3 all correlate perfectly with each other, even though test 2 is negatively correlated with the other two.  Test4 correlates zero with all three tests.  It can be seen in the graphic that the profiles of TEST1 and TEST3 are identical even though are separated in terms of distance.  TEST2 is the mirror image of the other two.  TEST4 although close in proximal distance to TEST1 and TEST2 is quite dissimilar in terms of the relative shape or profile.

Factor analysis shows how this small set of variable can be summarized.  It should be noted that there are not nearly enough variables in this set for what would be considered appropriate; there should be at a minimum 40 subjects to compute this analysis.  The purpose of this example is to show the differential effects of factor analysis and cluster analysis.  As indicated the two analysis are similar in that they both summarize the possible underlying characteristics of a set of variables thus simplifying and consequently obtaining more parsimony.  However, the summarization process is somewhat different for the two processes and this demonstration is designed to show.

 File Name = crsfac8.sps get file = '\proeval\CLSDAT1.sav'     /keep= PERs TEST1 TEST2 TEST3 TEST4. fac var= test1 to test4    / rotation.

┌───────────────────────────────────────────────────────────────────────────┐

CRSFAC8.SPS

├───────────────────────────────────────────────────────────────────────────┤

Final Statistics:

Variable     Communality  *  Factor   Eigenvalue   Pct of Var   Cum Pct

*

TEST1            1.00000  *     1       3.00000       75.0         75.0

TEST2            1.00000  *     2       1.00000       25.0        100.0

TEST3            1.00000  *

TEST4            1.00000  *

Varimax   Rotation  1,  Extraction  1,  Analysis  1 ‑ Kaiser Normalization.

Varimax converged in    2 iterations.

Rotated Factor Matrix:

FACTOR  1     FACTOR  2

TEST1          1.00000        .00000

TEST2         ‑1.00000        .00000

TEST3          1.00000        .00000

TEST4           .00000       1.00000

└───────────────────────────────────────────────────────────────────────────┘

TEST1, TEST2, and TEST3 form the first factor and TEST4 forms a factor of its own.  Further, the first three variables are perfectly correlated with the first factor.  However, TEST2 is negatively correlated with the factor.  The relative weights are perfectly related.

The cluster analysis is presented.  It is necessary to invert the data in order for the analyses to be comparable as shown in Frame CLSDAT2.TXT.  Frame CRSCLS7.SPS contains the jobstream and Frame CRSCLS7.LIS contains the output.

 CLSDAT2.sav "TEST1",2,3,2,3  "TEST2",3,2,3,2  "TEST3",5,6,5,6  "TEST4",2,3,3,2

 File Name = crscls7.sps get file = '\proeval\CLSDAT2.sav'     /keep= ID  PER1 PER2 PER3 PER4. cluster PER1 TO PER4   /id=ID   /print=distance   /print=schedule cluster(2)   /plot=dendrogram hicicle.

┌───────────────────────────────────────────────────────────────────────────┐

CRSCLS7.LIS

├───────────────────────────────────────────────────────────────────────────┤

Squared Euclidean measure used.

1 Agglomeration method specified.

Squared Euclidean Dissimilarity Coefficient Matrix

Case              1             2             3

2         4.0000

3        36.0000       40.0000

4         2.0000        2.0000       38.0000

Number of Clusters

Label       Case      2

TEST1          1      1

TEST2          2      1

TEST3          3      2

TEST4          4      1

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E       0         5        10        15        20        25

Label       Seq  +‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+

TEST2         2   ‑+

TEST4         4   ‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+

TEST1         1   ‑+                                               |

TEST3         3   ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+

└───────────────────────────────────────────────────────────────────────────┘

Note in the cluster analysis that there are also two clusters representing the four variables but they are constructed of different variables or tests than the factor analysis.  TEST1, TEST2, and TEST4 make up cluster1 and TEST3 is in a cluster alone.  The calculations below show that in the correlation (factor analysis) a relative relationship among variables and cluster analysis assesses an absolute relationship.

A more detailed inspection of the analysis will demonstrate the differences.  The following jobstream and output shows how the correlation and factor analysis operate in relative terms.

 File Name = crslis1.sps get file = '\proeval\CLSDAT1.sav'     /keep= PERs TEST1 TEST2 TEST3 TEST4. COMPUTE T1LX = TEST1 ‑ 2.5. COMPUTE T2LX = TEST2 ‑ 2.5. COMPUTE T3LX = TEST3 ‑ 5.5. COMPUTE T4LX = TEST4 ‑ 2.5. COMPUTE T1LX2=T1LX*T1LX. COMPUTE T2LX2=T2LX*T2LX. COMPUTE T3LX2=T3LX*T3LX. COMPUTE T4LX2=T4LX*T3LX. COMPUTE T1LXT2LY=T1LX*T2LX. COMPUTE T1LXT3LY=T1LX*T3LX. COMPUTE T1LXT4LY=T1LX*T4LX. LIST T1LX T2LX T3LX T4LX . LIST T1LX2 T2LX2 T3LX2 T4LX2 T1LXT2LY T1LXT3LY T1LXT4LY.

 CRSLIS1.LIS T1LX     T2LX     T3LX     T4LX   ‑.50      .50     ‑.50     ‑.50   .50     ‑.50      .50      .50                              ‑.50      .50     ‑.50      .50                               .50     ‑.50      .50     ‑.50     T1LX2    T2LX2    T3LX2    T4LX2 T1LXT2LY T1LXT3LY T1LXT4LY  .25      .25      .25      .25     ‑.25      .25      .25    .25      .25      .25      .25     ‑.25      .25      .25    .25      .25      .25     ‑.25     ‑.25      .25     ‑.25    .25      .25      .25     ‑.25     ‑.25      .25     ‑.25

Recall that the formula for the correlation is:

Note that all the little x scores are either -.5 or +.5 indicating that the differences from the means are the same for all cases.  That is true for the scores on TEST3 on the plot is considerably distant from the other tests.  The scores are the differnence from their own mean so that the distance between tests will be lost.  Each score represents a difference from the mean for that variable (in this example a test), however, the relative distribution of the cases for that test will remain.  Consequently, the correlation for TEST1 and TEST2 is:

While the correlation between TEST1 and TEST3 is:

And one more example of the relationship between TEST1 and TEST4.

In this instance TEST1, TEST2, and TEST3 are similar while TEST4 is different.

A look at cluster analysis tells a different story.

 File Name = crslis2.sps get file = '\proeval\CLSDAT1.sav'     /keep= PERs TEST1 TEST2 TEST3 TEST4. COMPUTE dif12 = TEST1 ‑ TEST2. COMPUTE dif13 = TEST1 ‑ TEST3. COMPUTE dif14 = TEST1 ‑ TEST4. compute dif12s=dif12*dif12. compute dif13s=dif13*dif13. compute dif14s=dif14*dif14. LIST dif12 dif13 dif14 dif12s dif13s dif14s.

 CRSLIS2.SPS DIF12    DIF13    DIF14   DIF12S   DIF13S   DIF14S   ‑1.00    ‑3.00      .00     1.00     9.00      .00  1.00    ‑3.00      .00     1.00     9.00      .00 ‑1.00    ‑3.00    ‑1.00     1.00     9.00     1.00  1.00    ‑3.00     1.00     1.00     9.00     1.00

First note that in the absolute differences between TEST1 and TEST2 (TEST1 minus TEST2) are all 1.  However, half of them are in one direction and the other half are in the opposite direction (note the minus signs).  The differences square and summed equal 4.  The differences between TEST1 and TEST3 are all -3; the values squared and summed equal 36 indicating the most dissimilarity.  In the correlation analysis these latter two variables had a perfect correlation.  On the other hand tests 1 and 4 show the most similarity where their squared differences cumulate to only 2.  In the correlation analysis these two variables had a correlation of zero indicating the relative positions to be the most dissimilar.  [The point of this is for the investigator to decide what question is being asked.]

***

There is a difference in profile but also a difference in that profiles can be opposite and still be a part of the same factor (negatively related to the factor).

***

It might be useful at this point to compare and contrast the various statistical procedures used in this set.  From a practical point of view different techniques were selected and it might be useful to note why they were selected for the various questions.

This chapter is provided to show similarities and differences between the various statistical procedures.

This data set to be used is made up of ratings of personality theories by 12 to 16 raters.  The questionnaire used for these rating follows:

Personality Theory Rating Scale

Name: _________________________________________   Date: ________________

Use the scale below to rate the personality theory of ____________________________.

╔═══════════════════════════════════════════════════════════════════════════╗

None           A Little          Somewhat        Quite a Bit        A Lot

╟───────────────────────────────────────────────────────────────────────────╢

0        1        2        3        4        5        6        7        8

╚═══════════════════════════════════════════════════════════════════════════╝

╔══════════════════════╗

LEAVE THE QUESTION

BLANK IF YOU DON'T

KNOW OR IF IT DOESN'T

APPLY.

╚══════════════════════╝

ACCORDING TO THIS THEORY:

_____   ...motivation is based on drive reduction.

_____   ...the person is an intentional (goal-oriented) being.

_____   ...people are hedonistic.

_____   ...cognition accounts for the actions of people.

_____   ...values account for the actions of people.

_____   ...people are actively involved in the development of their personality.

_____   ...people's early experiences influence their personality.

_____   ...the person imposes perception on the world.

_____   ...the environment or learning accounts for the person's actions.

_____   ...people are basically good.

_____   ...heredity effects the person's actions.

_____   This theory stresses the individual's conscious view of the world.

_____   This theory stresses the individual's unconscious view of the world.

_____   This theory stresses the individual's social consciousness.

_____   This theory accounts for the individual's perception of reality.

_____   This theory has influenced psychology (clinical, research, literature).

_____   This theory focus on "the here and now", the past, or the future.

(0 = past, 4 = here and now, 8 = future)

_____   This theory is empirically based.

_____   This theory is parsimonious.

_____   This theory assumes that the individual has free choice.

_____   This theory employs a method of therapeutic intervention.

_____   This theory emphasizes psychopathology.

_____   I agree with this theory.

The names for the respective items are as follows:

TDATE

THER

THID

CLUS

DRIVE

GOAL

HEDON

COG

VALUE

ACTIVE

EARLY

IMPOSE

LEARN

GOOD

HERED

CONSCI

UNCONS

SOCIAL

PERCEP

INFLU

TIME

DATA

PARSI

FREE

THERA

PATH

AGREE

The theorists rated were:

Freud               Sigmund Freud

JUNG              Carl Jung

ROGERS         Carl Rogers

KELLY           George Kelly

HORNEY        Karen Horney

SULLIVI         Harry Stack Sullivan

BANDURA     Albert Bandura

CATTELL       Raymond B. Cattell

MASLOW      Abraham Maslow

BINSWAN      Ludwig Binswanger

ERIKSON       Erik Erikson

This data was part of a graduate student class assignment for students taking a theories of personality class.  Each week the students read the assignments and completed the questionnaire the day before the class meeting.  There were 17 students enrolled in the class, however, not all students complete the forms each week and consequently there is some missing data.  There were ___ completed forms.

In this first example the items of the questionnaire are grouped using factor analysis.  Recall that in this condition the items with similar profiles will be grouped together (into factors); not necessarily the items that are closest in distance (refer to the above discussion).  The data is in a dBase IV file with 9 indicating that data was omitted.  As can be seen mostly defaults were used in the computer run (see Frame PERFAC5.SPS) and a principle components extraction method was used and the rotation was orthoginal.  Using the eigenvalue of 1.00 is usually not considered the best method of deciding upon the number of factors; however, both interpretation and the scree method seemed also to indicate 5 factors.

 File Name = perfac5.sps get file= '\proeval\perall4.sav'/keep= tDATE     THER      THID      CLUS      DRIVE     GOAL      HEDON     COG       VALUE     ACTIVE    EARLY     IMPOSE    LEARN     GOOD      HERED     CONSCI    UNCONS    SOCIAL    PERCEP    INFLU     TIME      DATA      PARSI     FREE      THERA     PATH      AGREE . missing values drive to agree (9). fac var= drive to agree    /missing=pairwise    /plot=eigen    /criteria=factors(5)    /rotate.

┌────────────────────────────────────────────────────────────────────────────┐

PERFAC5.LIS

├────────────────────────────────────────────────────────────────────────────┤

Final Statistics:

Variable     Communality  *  Factor   Eigenvalue   Pct of Var   Cum Pct

*

DRIVE             .54238  *     1       6.98937       30.4         30.4

GOAL              .50485  *     2       2.15730        9.4         39.8

HEDON             .54444  *     3       1.72904        7.5         47.3

COG               .56063  *     4       1.47348        6.4         53.7

VALUE             .66169  *     5       1.32890        5.8         59.5

ACTIVE            .70979  *

EARLY             .58670  *

IMPOSE            .64661  *

LEARN             .58716  *

GOOD              .51995  *

HERED             .58137  *

CONSCI            .64024  *

UNCONS            .68112  *

SOCIAL            .61566  *

PERCEP            .61891  *

INFLU             .59501  *

TIME              .58200  *

DATA              .56921  *

PARSI             .60125  *

FREE              .61128  *

THERA             .64608  *

PATH              .52881  *

AGREE             .54294  *

Rotated Factor Matrix:

FACTOR  1     FACTOR  2     FACTOR  3     FACTOR  4     FACTOR  5

DRIVE      ‑.67035**     ‑.10424       ‑.12588       ‑.21679        .13893

GOAL        .44300        .44580*       .16215        .17344        .23128

HEDON      ‑.72226**     ‑.01600        .14498        .01324        .03653

COG         .50422*       .28914        .40228        .23887       ‑.06251

VALUE       .15529        .79294**     ‑.08091       ‑.04701        .00768

ACTIVE      .58000**      .41073        .21364        .39876       ‑.00630

EARLY      ‑.69231**      .27344       ‑.13863        .07009        .09220

IMPOSE      .22239        .23344       ‑.10607        .72878**      .01706

LEARN       .02767        .45879        .49137*       .21355       ‑.29809

GOOD        .57563**      .41920        .00750       ‑.00350        .11316

HERED       .10169        .28821       ‑.34325       ‑.60077**     ‑.09606

CONSCI      .55734**      .40202        .29750        .26467       ‑.09712

UNCONS     ‑.48833*      ‑.19803       ‑.48205       ‑.38498        .15119

SOCIAL     ‑.05895        .71266**      .26140        .18852        .02080

PERCEP      .29944        .16227       ‑.10921        .69839**      .05684

INFLU      ‑.10405        .01029        .21463       ‑.17453        .71242**

TIME        .72841**      .04942        .11085        .18045       ‑.06419

DATA        .29151        .04499        .63344**     ‑.20723        .19498

PARSI       .05321        .06473        .76207**      .00803        .11581

FREE        .51295*       .32588        .28510        .39853        .04315

THERA      ‑.13541       ‑.11914       ‑.26013        .24730        .69622**

PATH       ‑.51195*      ‑.08011       ‑.40072       ‑.11859        .29269

AGREE      ‑.01068        .34696        .25436        .21198        .55930**

└────────────────────────────────────────────────────────────────────────────┘

We were somewhat arbitrary in selecting 5 factors in this solution so that it would match with the five cluster solution in the cluster analysis solution that follows.  It should be noted that one should not be so casual in determining the number of factors in a solution; the reader is referred to chapter __ when testing for the number of factors.  In developing theory the researcher may do that in an armchair fashion, reviewing the literature or with exploratory factor analysis.  The major purpose here to compare factor analysis with cluster analysis so that the number of factors is done with that purpose in mind.

The factors in Figure __ are presented in two ways: (1) the criterion of .60 is used to determine whether a variable loads on a factor, (2) if a variable does not load on any factor then it is placed on the factor with the highest loading.

Factor I

DRIVE            -.67

HEDON          -.72

EARLY           -.69

TIME              .73

---------

GOAL             .44

COG                .50

ACTIVE          .58

GOOD .58

CONSCI          .56

UNCONS        -.49

FREE              .51

PATH              -.51

Factor II

VALUE           .79

SOCIAL          .71

-----------

GOAL             .45

Factor III

DATA             .63

PARSI .76

----------

LEARN           .49

Factor IV

IMPOSE          .73

HERED           -.60

PERCEP          .70

Factor V

INFLU .71

THERA           .70

AGREE           .56

The next example shows how cluster analysis can be used to group the same set of data.  The data needs to be conditioned before the cluster analysis can be run.  The means are computed within each theorist for each item.  For example, the first item DRIVE for all respondents to Freud were summed and divided by the number of respondents (the number was also rounded to the nearest integer to keep it on the same scale).  The matrix was then transposed because the computer program requires that format for this problem.  This data is presented in the   frame THER11.sav.

 ITEM FREUD ADLER JUNG ROGERS KELLY HORNEY SULLIVA BANDURA CATTELL MASLOW BINSWAN ERIKSON DRIVE 8 2 3 2 2 3 4 1 3 4 2 4 GOAL 4 7 5 7 7 5 5 6 5 7 5 6 HEDON 7 3 2 2 2 4 4 2 3 4 3 3 COG 3 6 4 6 7 4 5 7 5 6 6 6 VALUE 4 6 5 6 4 4 5 5 4 6 6 6 ACTIVE 2 7 5 7 7 5 5 6 5 6 7 6 EARLY 8 7 4 5 4 6 6 5 4 5 4 7 IMPOSE 4 6 4 7 7 5 6 5 5 6 7 6 LEARN 3 6 3 5 5 6 6 7 6 5 5 6 GOOD 2 5 5 8 5 4 4 5 4 6 4 6 HERED 3 4 5 4 2 3 3 2 5 4 3 4 CONSCI 2 6 5 6 6 4 5 6 5 6 6 6 UNCONS 8 2 7 3 2 6 4 2 4 3 2 5 SOCIAL 4 7 3 6 5 5 6 6 5 5 5 6 PERCEP 5 6 5 7 7 5 6 6 5 6 7 5 INFLU 8 5 5 7 4 3 5 6 5 6 4 5 TIME 0 5 5 4 5 3 4 4 5 5 5 3 DATA 3 3 2 4 4 2 4 6 6 3 2 4 PARSI 4 5 3 5 6 4 4 5 5 5 3 5 FREE 2 5 3 7 7 5 4 6 4 6 7 5 THERA 7 5 6 7 6 5 6 5 3 3 5 5 PATH 7 3 5 3 3 6 5 3 4 3 4 4 AGREE 5 5 4 5 5 5 5 5 4 5 4 5

 File Name = percls3.sps get file = '\proeval\ther11.sav'/keep= ITEM  FREUD     ADLER     JUNG      ROGERS    KELLY     HORNEY SULLIVA    BANDURA   CATTELL   MASLOW    BINSWAN   ERIKSON. cluster freud to erikson   /id=item   /print=distance   /print=schedule cluster(5)   /plot=dendrogram hicicle.

* * * * * * H I E R A R C H I C A L  C L U S T E R   A N A L Y S I S * * * * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E      0         5        10        15        20        25

Label     Num  +---------+---------+---------+---------+---------+

IMPOSE      8   òûòø

PERCEP     15   ò÷ ó

GOAL        2   òòòüòø