header
Cooperative Extension Skip Navigation UW Extension
Local Program Evaluation in Tobacco Control
Home Sitemap Contact Search

 

 

Navigation

RESOURCES

About Our Program
Evaluation Manual
Multi Year Action Planning
Existing Data
Evaluation Planning

Evaluation Methods

Analyzing Data
Using Results
Resources
Restaurant and Worksite Surveys
Clean Indoor Air
Coalition Development
Youth Prevention
Upcoming Training
Conferences & Presentations

Download a copy of the free Adobe Acrobat Reader to view and print information provided as PDF files.
Get Adobe Acrobat Reader

 

Data Coding and Cleaning for Worksite Surveys

  1. Overview of Coding and Cleaning
  2. Data Coding
  3. Data Entry
  4. Data Cleaning (Excel)
  5. Data Entry (Access)
  6. Data Cleaning(Access)
  7. Other questions about data management

You can find templates and instructions for data entry in the worksite and restaurant surveys at www.uwex.edu/ces/tobaccoeval/.

  • 1. Overview of Data Coding and Cleaning:

    Principal Questions that will be answered:
    How do you get your data into a format that a computer can help you analyze?
    How do you increase the quality of your data?


    These tips will address the steps to:
    • "Code" your data so that the computer can work with numbers rather than text. Coding your data will also help you "clean" your data.
    • Enter your data into a spreadsheet or database.
    • "Clean" your data to make sure responses are accurately entered according to guidelines you establish and record.


The goal of data coding, entry and cleaning is for you to simplify the information you have collected. This is a very important step which can "make or break" your data analysis.

The following information will focus on using Excel and Access to assist you in these tasks, but can be applied to other programs you may want to use.

2. Data Coding:

  • Coding is the process of assigning numbers to each response.

 

  • Why code data?: Coded data is much easier to analyze with a computer and is easier to check for errors. It allows straightforward analysis.

 

  • We have already provided codes for many of the questions as can be seen in figure 1 below, which is is an example of assigning codes to response items.
    Figure 1: Data coding example
    From WTCB MEP Worksite questionnaire

12.              Is this establishment supervised or owned by another office, such as a corporate office?

13.  Yes

14.  No

8. Don’t Know

9. Refused to respond

The numbers next to the responses are called "codes."

 

  • Coding choices: Why do we assign "8" and "9" to "Don’t know" and "Refused to respond"? We looked at all the possible survey responses and saw that the highest number of responses was 7. We decided that 8 and 9 would universally represent these answers.

 

  • Open-ended questions: In survey design, we try to design questions that are not open-ended. It allows for more straightforward data analysis. We try to categorize people’s answers the best that we can. Open-ended questions are sometimes useful, but they should be used sparingly on a longer survey. You can also assign codes to qualitative, or text data. The survey has been pilot tested to make sure the response possibilities make sense and that there are a minimum of open ended questions.

 

  • Q: Will you be providing codes for Question 12 in the worksite survey?
    (Q12: What is your Worksite's industry?)

    A: Yes, the industry codes can be found in the Guide to Conducting a Survey on this site (available in multiple formats) and they will be in the Access database template we are providing for data entry which can be found on the same page.

    From Joyce Mann: Fond du Lac - Categorizing industry codes presented some challenging decisions. Make sure you keep track of how you made these decisions and what you have done.

 

  • Q: The numbers in the coding scheme do not always mean the same thing throughout the survey (i.e. "1" does not always stand for "yes"). Will this affect data analysis?
    A: No. The key is good record keeping. Make sure you keep track of what each number was assigned to represent. As long as you provide the key when you submit your data to the state level, we will be able to deal with it. If you are doing your own coding (or data coding for another survey) you can certainly make sure "1" will always represent "yes" and the like, if that would be easier for you. Keeping good and consistant records will allow you to aviod problems.

3. Data Entry:

Why do we assign a "unique identifier" to each questionnaire?

We use a unique identifier for the same reasons that credit card companies, doctors offices, and the social security administration use them. They allow us to track records more precisely. (We often refer to each completed questionnaire as a "record" or "case" in the dataset). Questionnaires which have been assigned a unique identidication number are easier to put into a logical order, which is helpful when you need to go back and look at the original document (helpful in data cleaning). One can also strip away the identifying factors (name, phone, etc.) but still keep track of the record.
Figure 2 below shows and example of unique identifiers.

Figure 2: Unique ID
Unique ID

  • To create a unique number for your surveys, simply start at 1001 and count upward. (The number 1 may be confused with response codes). This will allow you to easily tell how many total cases you have entered.

    Q: Does it matter if we use a mix of letter and numbers in our unique id's?
    A: No, but make sure to keep track of your identification scheme and how it will be implemented so it will be apparent should you need to revisit the data a few years down the road.

    Q: If we have already assigned unique ID's beginning with "1" should we go back and change them?
    A: No. Again, just be sure you documented this so that it will not be confusing in the future.



Figure 3: What the data will look like when entered in Excel
Entering data into Excel.

  • We have labeled the column headings "Q1, Q2, Q3…”. These refer to question numbers on the MEP Worksite Questionnaire. Note that each row is a "record." It represents answers to one questionnaire. When you begin to analyze your data, you’ll need a code book, or a "legend" that tells you what each of the questions and response options is.

 

  • When you’re entering data, be sure to include the records you screened out. For example, on the worksite survey, question 10 asks how many people are employed at the worksite and tells the interviewer to terminate the interview if the worksite has fewer than 5 employees. You should keep those worksites in the data set for now so that you can track the worksites you actually contacted and record the number you found with fewer than 5 employees. The same goes for restaurants—you may wish to track those you screened out early.

    If your county has a lot of worksites with fewer than 5 employees, you may want to include all those worksites with less than 5 employees. Or, there may be seasonal employees that, say, fluctuate between two on the off-season and 20 during the high season.

 

  • Important: Always remember to save your data often as you enter it. In addition, always make a backup copy before you begin to work with the data.

 

4. Data Cleaning (Excel):

Figure 4: Data Cleaning Example
Note that question number 13 on the worksite questionnaire only allows response options 1, 2, 8 or 9. The number 6 is not a valid response entry. To determine the correct entry, use the Unique ID to find the paper questionnaire and enter the correct number.
data cleaning


Q: Is there an automated way to do this?
A: Yes, use the Autofilter in Excel. I would recommend "eyeballing" the data (i.e. you don't have to spend time to look over every questionnaire or piece of data, just look for anything out of the normal range of responses) using the "Auto Filter" command by highlighting the top row in your spreadsheet (the row with the question numbers or varialbe names) and choosing Data > Filter > Autofilter from the Excel menu. This provides a dropdown list of your data for each variable. You can use the Autofilter to examine several characteristics. For example, if you wanted to look at the worksites with more than 50 employees who said they had a formal, written smoking policy, Autofilter will help you do this. (See the Excel "Help" function for more on Autofilter.)

If you are comfortable with Excel, you can also begin to explore the Pivot Table command. You can manipulate data spreadsheets using Pivot Table. It is step-by-step and fairly straightforward to use. See Data > Pivot table report. Look at the "Help" feature of Excel for more information.

Note the bottom of the Excel Spreadsheet (in the figure above). There are tabs that you can click to go back and forth between different spreadsheets in the same Excel "workbook." You can rename these spreadsheets by clicking twice on the tab’s title. You can add multiple spreadsheets to one workbook by using the Insert > Worksheet command on the menu at the top.

Important: Always remember to save your data often as you enter it. In addition, always make a backup copy before you begin to work with the data.

5. Data Entry (Access):

Microsoft Access is a database. It is different from a spreadsheet in that it allows you to break up your data into more manageable parts. It also allows you to create forms to enter your data and to delineate more specifically the data you put into the database and will cut down on data entry errors.

Figure 5: Data Entry Template - Microsoft Access
data entry

This is a picture of the MS Access form that we will provide you to use for data entry. Note that coded response options have been programmed into the form, and for multiple choice questions, you can select a response from the pull-down list. For text questions, you should be able to type beyond the box size limit on the form, and the data will be entered.

Data entered on this form is stored in a "data table" within your MS Access database. A picture of a data table appears in Figure 6. The data can either be analyzed in Access or imported into Excel.

Important: Always remember to save your data often as you enter it. In addition, always make a backup copy before you begin to work with the data.

6. Data Cleaning (Access):

The following shows a Microsoft Access data table with a missing value. Each row represents a "record," or one completed questionnaire. Each column represents a "variable," or one question. Note that question 5 (County) has numeric values. Because there is a finite list of counties in Wisconsin, we have assigned them codes for you. (A list of these codes appears in the June 2000 survey implementation guide.) This data table represents a statewide survey, whereas your surveys will most likely reflect just one or two counties.

Figure 6: Data Table - Microsoft Access
data table

Important: Always remember to save your data often as you enter it. In addition, always make a backup copy before you begin to work with the data.

7. Other questions about data management:

  • Q: What is each coalition expected to present to the statewide data collection effort?
    A: Please submit your raw data to us so that we may compare approaches and data sets on the local level. We want to collect whatever we can, but our priority is for each coalition to collect data that will be useful for themselves (i.e. if you need to take a few questions out of the survey, we do not encourage it, but if it is necessary to collect useful data, you should do it).

 

  • Q: How do we enter data for interviews where the people would hang up or refuse to participate in the survey?
    A: Create a new field in whatever program you are using for data analysis or use one of the already existing text boxes in the Access application we will provide.

 

  • Q: Do you advocate using Access or Excel?
    A: You can choose whatever you are more comfortable with using. There are advantages and disadvantages to each, however you can transfer data between these software packages.

 

  • Q: How do we get help with data analysis?
    A: Using a local college is a great resource. In Fond du Lac, Joyce Mann got the phone number of a professor at a local college and he suggested students to help who used the community service time toward their degrees. Fond du Lac chose Excel to analyze data due to their personal comfort level and the comfort level of their helpers. Your local health department or University will also likely have other statistical software packages (such as SPSS) which they could use to analyze your data (if you can get them to volunteer to help or are able to pay something for their analysis services). Whatever the case, make sure you document what you do. You will want to go back to this data at some point and you need to have good records to keep things clear when you do.

    We are preparing materials to guide you through survey data analysis and reporting. For your analysis, we will concentrate on percentages, frequencies (counts) and averages. To familiarize yourself with these descriptive statistics, refer to the MEP Guide to conducting restaurant and worksite tobacco policy surveys.

    Or check out the UW Cooperative Extension’s publication, Analyzing Quantitative Data, found at:
    www.uwex.edu/ces/pdande/evaluation/index.html (will open in a new browser window)
  • Q: How do you trim down your list of worksites (before beginning the survey)?
    A: Include the worksites that will help guide your programs. For example, you could choose to target a specific industry. Decide what you want your survey to accomplish. For a longer description of sampling methods, see the notes from the July 9 Worksite Implementation Conference Call.

 

  • Q: How should I go about obtaining a list of worksites?
    A: The local UW Extension Community Natural Resources Economic Development agent (CNRED) should be able to help or a local commerce association or the local sanitarian. If someone wants to charge you for a list, try to negotiate a deal to swap survey results for it.