You are currently viewing Electronic vs. paper-based data collection

PAPI in action

Computer Assisted Personal Interviewing (CAPI or just “electronic”) and Paper and Pencil Interviewing (PAPI or just “paper-based”) are two different methods of conducting surveys and collecting data more generally. PAPI is the traditional method in which an enumerator fills in a paper form or questionnaire. CAPI is the newer method, gaining in popularity, where the enumerator uses a tablet, smartphone, or laptop computer to move through the interview and record responses.

CAPI in action

CAPI technology is improving by the day and its popularity is increasing steadily – but PAPI may still be the right choice in some situations. To help you find the method that best fits your needs, we’ve produced the summary comparison below.

Think something about this comparison is inaccurate or biased? Leave us a comment below!

Summary comparison of PAPI vs. CAPI

PAPI CAPI
Template design Physical survey forms must be designed before data collection can begin; a data-entry template needs to be designed before data from the paper surveys can be entered into a computer. The template will define each variable or question, specify what entries are valid, and indicate which fields should be skipped under which circumstances. This is often done after the survey has already launched. A data-collection template needs to be designed before data collection can begin. This template will be loaded onto the devices data collectors use in the field, and the data will be recorded directly from the interview onto the device.
Question types Forms can allow both multiple-choice and write-in responses. They can also allow spaces for the respondent to sign or draw directly onto the form. Forms can allow multiple-choice and write-in responses; they can also capture the current GPS location, take photos, record audio or video, perform calculations, capture signatures, allow respondents to draw onto photos, and more.
Hardware Computers will be needed for data entry. If the project will do in-field data entry, laptops or netbooks will be needed. Smartphones, tablets, laptops, or netbooks will be needed for each of the data collectors to perform interviews and record responses. * Hardware is one of the key costs in CAPI data collection.
Software Data-entry software will be needed. Some options are available at no cost (e.g., CSPro), while others are not (e.g., Microsoft Access). Different software options are available. Pricing, programming difficulty, support, and flexibility vary widely.
Printing Paper-based survey forms must be printed. And, if they are printed in bulk and then revisions are later required (a common situation), stickers or replacement pages may need to be printed and then inserted into existing forms. * Printing is one of the key costs in PAPI data collection. No printing is required for a CAPI survey.
Enumerator training Complex or numerous skip patterns can be difficult for enumerators to master, and it may take much practice for enumerators to reliably enter valid responses into all fields. Therefore, the necessary training period could be lengthy. Since skip patterns and field validation are handled by the data-collection device, enumerators can be trained more quickly. However, those not used to using smartphone or tablet technology may need extra time to become comfortable using the devices.
Transportation & storage A secure system must be designed to bring the surveys from the field to the office, and to store the surveys in the office. Once data is entered into the computer, it should be stored securely (generally using data encryption). Surveys are typically “transported” via USB connection, local wi-fi network, or the Internet, then “stored” on a server or local hard drive. Proper precautions must be taken (generally using data encryption) to ensure that the data is secure both in transit and, to the greatest extent possible, at rest.
Data entry Data must be entered into the programmed template by trained data-entry operators. For each form or questionnaire, typically two entries are completed, they are compared, and any differences between the two are corrected. Data can be entered in the field if laptops or other mobile hardware are available and the data-entry operators are able to travel. The advantage of in-field entry is that the data is available faster for scrutiny, allowing errors to be caught while the team is still in the field. * Data entry is one of the key costs in PAPI data collection. The data is instantly digitized as it is entered into the tablet.
Quality control Typically the survey team will include “scrutinizers”, whose job it is to look carefully at the completed questionnaires and catch enumerator errors. The questionnaire can then be sent back to the field for correction. The flow of questions is automated, so the enumerator does not need to decide which question comes next; it is calculated by the phone or tablet. Many other quality control measures are possible with CAPI, such as logical checks, pre-filling information, setting constraints on answer ranges, and secretly recording audio for later review. The data is also available right away in a digitized format to look for any other problems.
Data cleaning Once the data entry is complete, the cleaning process can begin. The data cleaning process can begin after the first day of surveying. At this time the enumerators are still able to return to any household if they need to make a correction.
Time to data Data typically becomes available only after all of it has been entered and cleaned. This can be weeks or months after the actual data collection. In the typical set-up where field teams securely upload data to a server on a nightly basis, data is available by day two of data collection.

Weighing the options

CAPI in action
CAPI in action

The table above was meant to give an overview of the key elements involved in data collection, and how they vary between CAPI and PAPI. You’ll notice that, actually, the key elements are all very similar – but the content within them varies.

Just a few other points, as you compare these options against your own needs:

  1. Questionnaire & template design: It is a common misconception that one can avoid creating a computer template if using PAPI, and especially avoid the tedious work involved to define each variable and the logic surrounding it. Just as CAPI does, however, PAPI requires the creation of a template for data-entry. Because this template isn’t needed to begin fieldwork, there is a little flexibility on when time can be allotted to the task, but it is better to do it sooner than later! The sooner it is complete, the sooner data-entry can begin, and the sooner data will be available. With CAPI, the template will need to be created before training or fieldwork begins, so there is definitely less flexibility there.
  2. Hardware: Note that hardware is required for both PAPI and CAPI surveys. However, more hardware is required for CAPI since all enumerators need to have a device. This can be a large capital expense for a project. Rental options may exist, or the same hardware can be shared with other projects or used for other rounds of data collection down the line – but budgeting and accounting might be a challenge. The largest expense for PAPI is typically data-entry; it’s more common to outsource this task or, if doing it in-house, to rent computers for the necessary period, so budgeting and accounting can be easier.
  3. Interview flow: One great advantage of CAPI is that the enumerator spends less time thinking about what question to ask next, and can focus on the respondent. As a result, the CAPI interview has a more natural conversational flow than a PAPI interview might. It also tends to move faster, reducing the total time it takes to administer the questionnaire. Respondents (and enumerators) are less likely become tired and/or frustrated with the interview.
  4. Data quality: It is worth drawing attention to the great number of options for controlling data quality with CAPI. Some, such as setting constraints on answer ranges, are typically done anyway in the PAPI data entry template. The advantage with CAPI is that they are implemented in real-time, so corrections are made on the spot if the enumerator tries to enter an out-of-range value. Other quality-control measures, such as audio recordings or the pre-filling of information, may take additional time and effort to initially set up, but can be very helpful. Finally, the nature of CAPI itself – because the computer controls the enumerator’s movement through the questionnaire – avoids many errors typical in PAPI (such as skip pattern violations).
  5. Timeline: Finally, it is important to note the differences in timelines for the two methods of data collection. Because PAPI requires data entry, it will typically take longer. This can be reduced if data is entered in the field, concurrent to the survey, but it will still take extra time for cleaning and corrections (and data-entry of those corrections). CAPI data skips the paper step, and is instantly available for use. The automated template minimizes human error, and reduces the time needed for data cleaning and corrections.

In general, for most projects, investment in CAPI will pay off in better-quality, more-timely data. The CAPI vs. PAPI decision may be straightforward for some projects, however, and less so for others. Often, it comes down to your timeline and the complexity of your questionnaire. For example, if you have a very simple survey with a very flexible timeline for the data (you don’t need the data any time soon) – but you don’t have any time now to set up a data-collection or data-entry template – then PAPI might be the way to go. If you can invest the up-front time in a solid template, however, then CAPI is likely better for getting you high-quality data fast.

* Photo credits: Cindy Sobieski (the two enumerators) and Md. Muntasir (the tablet close-up)

Chris Robert

Founder

Chris is the founder of SurveyCTO. He now serves as Director and Founder Emeritus, supporting Dobility in a variety of part-time capacities. Over the course of Dobility’s first 10 years, he held several positions, including CEO, CTO, and Head of Product.

Before founding Dobility, he was involved in a long-term project to evaluate the impacts of microfinance in South India; developed online curriculum for a program to promote the use of evidence in policy-making in Pakistan and India; and taught statistics and policy analysis at the Harvard Kennedy School. Before that, he co-founded and helped grow an internet technology consultancy and led technology efforts for the top provider of software and hardware for multi-user bulletin board systems (the online systems most prominent before the Internet).