I frequently play around with multivariate techniques like K-means cluster analysis in SPSS. There were some big holes in the SPSS procedure that performs cluster analysis so wrote an SPSS Macro to automate what I wanted to be done. Below is a quick demo of the macro in use. It can really save a bunch of time! A few things mine does differently is:
- assigns labels to the segments (these will be changed later but butter than just a 1,2,3)
- Computes frequencies on the size of the segments (Why SPSS doesn’t do this automatically is beyond me)
- Color-codes and merges the significance testing in with the profile-plot showing what is different.
SPSS macro performing K-means Cluster analysis
Older online vendor tools and databases would frequently put multi-select questions into one column having a pipe,tab,semicolon or comma delimiter (what was real fun is when they would use a comma for a delimiter in a CSV file).
This can be very problematic in nearly any tool. In this video I demonstrate how easy it can be to move data from one column to many with an SPSS macro.
Move one column to many
Here is the SPSS macro demonstrated in the video:
DEFINE !Parse (Var !TOKENS (1) / Stem !TOKENS (1) /Del !TOKENS (1))
LOOP IF INDEX(#,!Del)>0.
COMPUTE #cnt=#cnt + 1.
Var width !Concat(!Stem,1) to !Concat(!Stem,25) (10).
/* !Parse Var=NAIC_All Stem=NAIC Del=";".
I frequently need to randomize my lists and save them into separate text files. This SPSS macro makes it a breeze!
If you haven’t already played with macros you’ll want to get yourself familiar with them by reviewing this post on Intro to Macros. The macro will need a few parameters from you like: the path to save the files, Stem (beginning) name of the files to create, # of groups to create and whether or not to keep the variable used to create the groups.
DEFINE !Rand_Gp_txt (Path !Tokens(1)/Stem !Tokens (1)/Groups !TOKENS (1) /DropNewVars !Tokens(1))
*/////drop GP if already exists .
Match files file=* / DROP Temp_GP
RANK VARIABLES=rand (A) /NTILES (!Groups) into Temp_GP /PRINT=Yes /TIES=HIGH.
***********************break out and save in groups********************************.
!DO !cnt=1 !TO !Groups.
Select if Temp_GP=!cnt.
SAVE TRANSLATE OUTFILE = !Path+!stem+!QUOTE(!CONCAT(!cnt,'.txt'))
/TYPE=TAB /FIELDNAMES /replace /KEEP= ALL /drop=Temp_GP rand.
***Drop unwanted vars if *******.
!IF (!Unquote(!Upcase(!DropNewVars))="Y") !THEN
Match files file=* / DROP Rand Temp_GP.
!Rand_Gp_txt Path="c:\temp\" Stem="Rand_" Groups=2 DropNewVars="Y".
Remember to define the SPSS macro before calling it!.
Heteroscedasticity & Transformations
Financial data often has problems with heteroscedasticity also spelled Heteroskedasticity (funnel shaped). Performing transformations is necessary however I often have a hard time explaining what the transformation is actually doing. This video helps show what is happening during a log transformation.
After watching the data literally “transform” it was much easier to grasp exactly what was happening and how it would help. Just remember you’ll have to transform it back if you wish to interpret the values. While the below video was created using SYSTAT, I transform data so often I have some killer SPSS macros which transform data effortlessly.
You want your data to be normally distributed around the line which is “homosecasticity“.
If you’re interested in reading more, SAGE has a specific book on Heteroscedasticity & transformations