
Part 1: Setting up Data Files
I. Using this documentPart 2: Sample Analyses
II. Getting data into SPSSA. Opening SPSSIII. Editing and modifying data
B. Sample data
C. Defining variables
D. Saving your dataset
E. Entering data
F. Saving a copy of your rawdataA. Data integrityIV. Generating a Data Dictionary
B. Inserting a new variable
C. Collapsing categories within a variable
D. Using formulas to create new variables
E. Moving a variableV. A few words about output files
A. Saving output filesVI. Rawdata
B. Printing output files
I. FrequenciesPart 3: Within-Subject Analyses
II. Graphing Procedures
III. Descriptive Statistics
IV. Regression
V. Correlation
VI. One-sample t-test
VII. Two-sample t-test
VIII. Oneway ANOVA
IX. Two-way ANOVA
X. Chi-square
I. Two-sample Correlated t-testPart 4: Multivariate Analyses
II. Repeated Measures ANOVA
III. Mixed Factorial ANOVA
I. Multiple Regression
II. Multivariate ANOVA
III. Factor Analysis
Pull-down menu
commands and options that you click on with the mouse are written in bold.
Dialog box (window) names and variable names
are italicized.
Back to table of contentsII. Getting data into SPSS
A. Opening SPSS: Click on the Start button in the lower left hand corner of the PC screen; select Programs > SPSS for Windows > SPSS 11.0 for Windows. This procedure will open SPSS and create an empty data screen (entitled SPSS Data Editor) into which you can enter data directly. SPSS datasets should be set up so that all the data from an individual participant appears in one row of the dataset. Each column should represent a different variable (or type of demographic information).B. Sample data: To illustrate the process of entering and analyzing data we will use a hypothetical dataset based on actual research findings about gender and school performance. This exercise will teach you the basics of defining variables, entering data, and running some simple transformations and analyses. Below is a description of the variables used in the dataset. Rawdata are at the end of this web page. It is good practice to define your variable names (see below) before entering any data.
Variable name Description and coding case participant identification number gender 1 = female, 2 = male school type of high school: 1 = all female, 2 = all male, 3 = coed write writing score on the National Assessment of Educational Progress test math math score on the National Assessment of Educational Progress test esteem response to the statement, "On the whole, I am satisfied with myself." 1 = strongly disagree, 2 = somewhat disagree, 3 = somewhat agree, 4 = strongly agree conf response to the question "How do you feel about participating in class discussions." 1 - 10 Likert scale: 1= not at all confident, 10 = very confident C. Defining variables: Before entering any data, you need to define all your variable names. While SPSS will assign default names, such as var00001, it is better to give your variables descriptive names. This will help you keep track of what you are doing and will cause descriptive labels to automatically appear in results, printouts and graphs.
- To define variables you will need to switch to the Variable View spreadsheet on the SPSS Data Editor. At the bottom left of the SPSS Data Editor screen you will see two tabs, one labeled Data View and the other labeled Variable View. Click on the Variable View tab.
- Under the Variable View window each variable, which represented the header row in the Data View window (the grey area at the top of the spreadsheet), now represents a single row with a new header row (this new row begins with Name). In the first row (which represents the first variable in your dataset) under the Name column, type the variable name. The name can be up to 8 characters, without spaces. For example, if you look at the raw data at the bottom of this page, the first column is labeled case. Enter case into the Name column and hit enter.
III. Editing and modifying dataNote: If it is not already built into your study design, it is a good idea to assign each participant in your study a number (from 1 to n), which is written on the hard copy of each person's data and also entered as the first column of your computer dataset (this is the case variable in our dataset). If you now sort your data, or rearrange it, you will not lose track of which person produced which data (note that data may also be "shuffled" by SPSS during some analyses). If you later discover (e.g., by looking at an individual's data sheet) that you need to drop someone or edit a line of data, it will be easy to identify the row in question. This is especially important when participants have written comments on the data sheets that you may later want to match up with their numerical data.
Back to table of contents
- The Type column will allow you to set the format of your data. The default setting is for Numeric data (i.e., consisting of numbers only). You do not need to change this setting unless you are entering non-numeric data such as dates or words. To make a change, click on the 3 dots shaded in grey. A Variable Type window appears which allows you to change the setting by placing a dot in the circle next to the new setting.
The Variable Type window also allows you to change the column width (the default is 8) and the number of decimal places displayed (the default is 2 places). Note that SPSS carries 16 decimal places internally for doing calculations. Try changing your width to 9 and the # of decimal places to 0. Click on OK. You should see that the changes appear in the next two columns (i.e., a 9 in the Width column and a 0 in the Decimal column). It should also be noted that by clicking on the Width or Decimal column and entering the appropriate number you could adjust these numbers.
- While still in the Variable View window, move to the Label column (by clicking on the cell in first row under the Label column). In this column you can write a more descriptive name for your variable than the one you used for Name in the previous column (here, you can use spaces and the character limit is 256). When you are dealing with complex datasets it becomes difficult to be descriptive enough with your 8-character names and surprisingly easy to forget which abbreviation refers to which variable. For practice, type in the description noted above (i.e., participant identification number).
Note: SPSS, by default, will print the Label, which may be useful when printing out some information. However, it can also clutter up your printouts. Rather than leave the Label column blank and risk forgetting the descriptive information, you can modify the SPSS output to print only the Name. To alter what appears in the SPSS output select Edit > Options from the menu bar. An Option window will open. Click on the Output Labels tab. In the Pivot table Labeling section (bottom left of window), click on the arrow in the Variables in labels shown as: box. Choose Name. Click OK.
- Move your cursor to the second row to define your second variable, gender. Type gender in the Name column. Leave the Type, Width, and Decimal columns in the default setting (i.e., do not make changes in these columns).
Note: Gender is a categorical variable rather than a quantitative variable (i.e., it tells you what category a participant belongs in rather than quantifying some characteristic of the participant). When you enter gender data into the data entry screen you can either enter the information using words, such as "female" and "male," (This is done by going to the Type column and changing numeric to string. You can then go to the Data View spreadsheet and type the word “female” under the gender column.) or you can give each gender a numeric code such as "female = 1," "male = 2" and then enter the numbers "1" and "2." The decision to enter numbers or category labels (i.e., words) depends on how you will be analyzing your data. Using numbers will almost always make it easier to transfer your data to other programs (e.g., Excel, Datasim, graphing programs).
When you use numeric codes for categorical data, you should tell SPSS what codes you are using (e.g., does "male = 1" or does "female=1"?). This is beneficial both as a back-up record for you, in case you forget your coding scheme, and also because SPSS will use the descriptive names rather than the codes when generating graphs and statistical tables.
To assign value labels, go to the Values column and click on the 3 dots shaded in grey. A Value Labels window will appear. Type "1" into the Value box and "female" into the Value Label box. Click on the Add button. This will add the code to your list of codes for that variable and empty the Value and Value Label boxes so that you can add additional codes. Now enter "2" and "male." Be sure to click on Add after each addition including the final one. Assigning value labels is only necessary for categorical variables to which you have assigned numeric codes.
- Label your remaining variables: school, write, math, esteem, and conf. Enter variable labels when appropriate. For example, the variable label for write should be “writing score on the National Assessment of Educational Progress test.” When values are needed, assign them such that the first possible answer to the question is coded as a "1," the second as a "2," and so forth (e.g., for esteem, "strongly disagree" will be assigned the value "1"). When using a Likert-type scale, such as with the conf question, you do not need to assign value labels because you will be treating the data as quantitative data. The esteem variable can be treated as either quantitative or categorical (we will treat it as a categorical variable in order to illustrate a chi-square analysis, which requires categorical data).
D. Saving your dataset: Make it a habit to save frequently while you are defining variables and entering data. For your first save, choose File > Save to open the Save Data As window. Make sure you save to your disk (i.e., the name of your disk should be in the little box at the top of the Save window). Click on Save when you are finished naming your file. Any future saves can now be done by choosing File > Save from the menu bar.
E. Entering data: To begin entering data you must get back to the Data View spreadsheet. Go to the bottom left of the SPSS Data Editor screen and click on the Data View tab. You can now begin to enter data starting with the first participant. Enter the case number “1” in the case column. You can use the tab keys or the arrow keys to move around through rows and columns. Continue to enter the remaining data.
Note: If you have missing data, leave the cells blank rather than entering zeros. When SPSS does calculations it will not include these missing values. SPSS printouts will tell you how many cases were missing and on how many cases a given calculation was based.
Note: Also you have two options for viewing your data as you enter it. The first option allows you to see only the numbers you enter. For example, when you enter “1” for gender you will see “1.” The second option allows you to view the value labels you entered into the Values column in the Variable View spreadsheet. For instance, when you enter “1” for gender you will see “female.” The default option is to view only numbers. To view the variable labels select View > Value Labels.
F. Saving a copy of your rawdata: Once you have entered your raw data, always SAVE a copy somewhere safe and DO NOT use this copy for any transformations or analyses. Especially if you are new to SPSS, and even if you are not, it is surprisingly easy to corrupt a dataset when running analyses and doing transformations of your data. To make matters worse, sometimes a problem is not immediately obvious and you might accidentally save over a good copy with a copy you do not yet realize has been corrupted.
You will frequently need to edit or modify your raw data before you run specific analyses. Below are two general editing procedures as well as two common methods for modifying data (collapsing categories and using formulas to create new variables), but first a note on data integrity...
A. Data integrity: Once you have entered your data, it is important to check carefully for errors. For many inferential statistical tests, random errors (typos) will reduce the power of your test. Ideally, you should enter the data twice and have the computer compare the two datasets. If you do not wish to do this, there are a few other easy ways to quickly check large datasets for mistakes. You should run descriptive statistics on all your variables (see below). Check that the minimum and maximum values for each variable do not exceed the possible value range for that variable (e.g., if you find your maximum value for gender is 3, you know there is a problem). Examine the standard deviation and sample size for all variables. Depending on your data it can be very helpful to make a scatter plot (can easily spot outliers) or a histogram (can check for normality). Looking over your data early on can save you rerunning all your analyses when you discover problems later.To move a variable:B. Inserting a new variable: Sometimes it is necessary to enter a new variable within a dataset already created. The new variable may represent a new piece of information that was collected (e.g., ethnicity of the participant) or represent a transformed or recoded variable already within the dataset. To add a new variable click on the variable to the right of where you would like the new variable to be located. For practice, create a new variable after the math variable. Click on the esteem variable (the grey portion containing the variable name). Select Data > Insert variable from the menu bar.
C. Collapsing categories within a variable: The esteem variable indicates students' responses to the statement "On the whole, I am satisfied with myself." You may be interested in comparing students who generally disagree with the statement (all students indicating "strongly disagree" and "somewhat disagree") to students who generally agree with the statement ("strongly agree" and "somewhat agree"). To do this, you decide to collapse answers 1 and 2 into a new category that you define as "disagree" and to which you assign the number "1." Likewise you collapse 3 and 4 into the category "agree" and assign it a value of "2."
To carry out the above transformation:
Back to table of contents
- Select Transform > Recode > Into Different Variables from the menu bar. This will open a window called Recode into Different Variables. If you recode into the same variable, the new data will overwrite the old data and you will have permanently lost the more detailed information. If you recode into a new variable, then you will have two columns: one column containing the original data (scores from 1-4) and another column containing the new data (scores from 1 -2). Generally it is better to code into different variables--you never know when you might want the original data. The new variable will be appended to the end of your dataset.
- The box on the left lists all the variables in your dataset. Highlight the variable you want to recode, in this case esteem.
Note: The esteem variable may not appear in the variable list if you have created a label for it in the Variable View spreadsheet. To change the variable list to recognize the Name (esteem) rather than Label (i.e., Self-Esteem (On the whole, I am satisfied with myself.) select Edit > Options from the menu bar. An Options window will appear. Click on the General tab. In the Variable Lists table (right upper corner) click on the circle next to Display names. Click on OK.
Now make sure the esteem variable is highlighted and click on the little arrow button to the right and this will move the highlighted variable into the Input Variable box. Notice that after doing this the arrow reverses so you can move variables in the other direction: from the Input Variable box to the list of variables.
- Place your cursor in the Output Variable box under Name. Type in the name of the new variable: esteem2.
- Click on the Change button to the right of the Name box. This tells SPSS to recode the input variable (esteem) into a new variable (esteem2).
- Click on the button Old and New Values. This opens a window in which you specify exactly how you want the recoding to be done.
- To recode 1 and 2 into 1, go to the Old Value box. You can specify a single value or a range of values that you want to recode. You want to recode values from 1-2, so click on the circle next to the first Range option, and enter "1" and "2" in the two boxes.
- Go to the New Value box and enter the new value, in this case "1."
- Click on Add. The specified transformation should appear in the Old to New box and should tell you that values "1 thru 2" will be recoded as "1."
- Go back to the Old Value box and repeat the steps to enter information for transforming 3 and 4 into 2.
- Click on Continue to return to the Recode into Different Variables box and click on OK to implement the change. SPSS will have created a new column at the end of your dataset, with the new variable name (esteem2) and the new codes. Remember that if you want to give descriptive names to your new categories (e.g., "1 = disagree," "2 = agree,") you need to enter the Variable View spreadsheet and make changes in the Value column as discussed earlier.
Note on reverse coding: using the transform function outlined above is the easiest way to reverse code a variable. Sometimes a recoding scheme requires that a specific value remain the same in the original and recoded data (e.g., when you recode 1, 2, 3 into 3, 2, 1, values that were "2" in the original data will still be "2" in the recoded data). For these cases, you need to SPECIFY that "2" is to be recoded as "2." If you omit this step when defining old and new values, any value that was "2" in the original data will be listed as a missing value in the new data.
D. Using formulas to create new variables: SPSS allows you to create new variables that are based on mathematical transformations of already existing variables. For example, suppose you wanted a single numerical index of students' combined writing and math scores. One way to do this is to create a new variable that is the sum of these two scores.
To carry out the above transformation:
Back to table of contents
- Select Transform > Compute from the menu bar. This will open a Compute Variable window.
- Enter the name of your new variable in the Target Variable box. For example, you might name your new variable total. In this case, the new variable will be the sum of the two old variables: write and math. It will appear at the end of your dataset and will not affect the two original variables as long as you give it a name different from any variables that already exist in your dataset.
- Highlight write in the box listing all the variables and click on the arrow to copy it into the Numeric Expression box. (Note that the name write will now appear in both the Numeric Expression box and the variable list box).
- Click on the "+" from the calculator keypad below. This places a "+" after write in the Numeric Expression box.
- Highlight math and click on the arrow button to move it into the Numeric Expression box. Notice how you have built an equation along the top of the Compute Variable window that reads total = write + math.
- Click on OK. SPSS will add the new variable to the end of your dataset. Remember that if you want to provide further description of the total variable you need to enter the Variable View spreadsheet and provide a description in the Label column as discussed earlier.
E. Moving a variable: It is sometimes necessary to move variables to better organize the data file. For example, when a new variable is created, such as the total variable created above, it is placed at the end of your dataset. However, it may be more appropriate to place the total variable after the write and math variables.
Back to table of contentsIV. Generating a Data Dictionary: Once all the variables have been defined you can generate a data dictionary. To do this, select Utilities > File Info from the menu bar. An Output1 window will appear with your data dictionary displayed. To print your data dictionary refer to the section on Printing output files.
Back to table of contents
V. A few words about output files: Output
files are created automatically by SPSS to hold the results of any analyses
you run. Before getting into the specifics of data analysis, here are some
things you should know:
A. Saving output files: Output generated in SPSS is automatically created in a separate Output window that is separate from the Data Editor window. All results from all the tests you run during a given SPSS session will appear in a single output window (one result after another). Thus, running a new test will not overwrite the results of a previous test. The important thing to remember is that the contents of your output window are not saved in the same file as your dataset; the output file needs to be named and saved separately from your data file.To save the contents of your output window to a file, make sure the output window is active (click anywhere in the output window to make it active), then select File > Save on the menu bar. SPSS defaults to saving output files as SPSS files, which means that you have to open and run SPSS to look at them. It is often better to save your output within a word processor. To save output to a word processor requires that you cut and paste the tables or text into the word processor. Simply highlight the output, copy it, and paste it into your word processor. It will be pasted into the word processor as a table, however, any formatting within the table (i.e., borders) will be lost. You will be able to recreate the table formatting or conduct other editing procedures (e.g., changing the font) in your word processing program. If you would like to keep the table formatting found in SPSS you can highlight the desired output and copy it using the copy object command (it is directly under the copy command). It can then be pasted as a graphic into a word processor. This option is less desirable because the output can no longer be edited and takes up more memory causing your word processor to run more slowly or crash. This option should be avoided unless you only have a small number of tables to transfer.
When you quit SPSS, you may get the following prompt: "Save contents of output viewer to Output1?" If you have not run any analyses, or do not want to save the results, you can click on No without losing any changes you have made to your dataset. You will be prompted separately to save your dataset, unless you have already done so.
B. Printing output files: Printing directly from SPSS output files (without modifying files or saving into a word processing file and printing from there) is enormously wasteful of paper. SPSS prints out many extra blank pages and other junk, which can be easily eliminated by manual editing in SPSS or in a word processor...so unless you are in a super hurry, please make the effort to edit and condense your output. If you have to print directly from the SPSS output window, you can delete unneeded output and hard page breaks directly from the output window before printing. Be aware that deleting hard page breaks in SPSS may cause individual analyses to be printed across page breaks, so if you are handing this work in, you may want delete cautiously.
Click here to see Sample Analyses
case gender school write math esteem conf 1 1 1 286 279 1 5 2 2 2 281 321 3 6 3 1 1 306 341 3 7 4 1 1 300 298 3 8 5 1 3 277 303 3 5 6 2 3 290 312 4 6 7 1 3 292 301 4 6 8 1 1 257 332 2 6 9 2 2 274 350 1 5 10 2 2 278 320 4 8 11 1 1 311 303 3 8 12 2 3 273 301 3 6 13 2 2 265 295 4 8 14 2 2 229 312 4 7 15 2 3 286 337 4 6 16 1 1 301 340 2 6 17 1 1 284 275 3 7 18 2 2 285 306 2 8 19 1 1 254 249 3 8 20 1 3 301 301 1 3 21 2 3 235 267 4 6 22 1 3 323 321 4 5 23 1 1 311 350 3 8 24 2 2 274 301 4 7 25 2 2 291 337 4 9 26 2 2 280 311 1 5 27 2 3 241 298 4 6 28 1 3 289 315 2 3 29 2 2 283 341 2 9 30 1 1 254 283 3 7