Flawed data can guide even the greatest leaders to the wrong conclusions. When success hangs in the balance, you need to be absolutely sure that you're gathering the right data with the right methods.



So we asked our data scientist, Christopher Peters, to craft this guide about how to collect and analyze data. It's like a college-level course in survey design: you'll learn how to write questions, distribute them, and synthesize the responses.



Surveys can make a major impact on the direction of your company—especially if you get the results in front of decision-makers.

Whether that impact is positive or negative depends on the quality of your survey. Sound survey design and analysis can illuminate new opportunities; faulty design leaves your team swinging in the dark.

As Zapier's data scientist, I lead testing and analysis for everything related to our app automation tool. I've used surveys to dissect how many seconds each Zapier Task saves someone (it's close to 180 seconds), and why people upgrade to a paid Zapier plan.

I've seen how data can be used as an instrument to help teams make smart choices. In this chapter, I'll teach you more than a dozen techniques that I use build an effective survey the first time.

Before We Start

It's important to note that there's a great deal of controversy among social scientists about survey design, with conflicting suggestions about methods. Statistics like "margin of error" are still widely used, but they're rarely appropriate for online surveys—The Huffington Post's senior data scientist and senior polling editor, for example, consider them an "ethical lapse". Conventional wisdom about what matters is not always grounded in statistical science. To cope with this, this chapter sticks to simple tried-and-true methods. I hope you'll find them useful.

1. How to Design a Survey

Before creating a survey, it's important to think about its purpose. Common purposes include:

Compiling market research

Soliciting feedback

Monitoring performance

Write down specific knowledge you'd like to gain from your survey, along with a couple of simple questions you think might answer your hypotheses (including the set of possible answers).

Next to the answers, write down the percentage of responses you'd expect in each bucket—comparing the future results against these guesses will reveal where your intuition is strong and where blind-spots exist.

This pre-survey process will also help you synthesize the important aspects of the survey and guide your design process. Remember: As the scope of your survey widens, fewer people are likely to respond, making it more difficult for stakeholders to act on results. Simplicity is probably the most important—and most under-appreciated—survey design feature.

2. The Best Survey Question and Answer Styles

The way you structure questions and answers will define the limits of analysis that are available to you when summarizing results. These limits can make or break your ability to gain insights about your key questions. So it's important to think about how you'll summarize the response to questions as you design them—not afterwards.

There are four main question and answer styles, and therefore four main response data types:

Categorical - Unordered labels like colors or brand names; also known as "nominal"

Ordinal - Likert scales like "strongly disagree to strongly agree" or "never to often"

Interval - Ranges like "number of employee"

Ratio - Numbers like inches of rain

Survey apps provide a wide range of data-collection tools, but every data type falls into at least one of these four buckets.

Categorical Data

The categorial type of data uses specific names or labels as the possible set of answers. For example:

What do you like (most / least) about our product? Fast customer service

Ease of use

Quality

Quantity

Categorical data is sometimes referred to as "nominal" data, and it's a popular route for survey questions. Categorical data is the easiest type of data to analyze because you're limited to calculating the share of responses in each category. Collect, count, divide and you're done.

However, categorical data can't answer "How much?" type questions, such as "How much do you value the speed of customer service?"

If you're not sure which dimensions are important (e.g. customer service, ease of use, etc.), start with a categorical question—they're more compact than the other question types, and can help your survey stay focused. Then, in a follow-up survey, you can ask "How much?" It's better to send out a few rounds of improving surveys than a huge blast that misses the mark.

Sampling is your friend. Consider dividing your sample group so that you can send multiple successive surveys as you learn more about your respondents.

Ordinal Data

Once you've identified categories of importance, asking ordinal style questions can help you assess that "How much?" type question. The ordinal response type presents answers that make sense as an order.

Never Rarely Sometimes Often Always Strongly Disagree Disagree Neutral Agree Strongly Agree Not important Somewhat important Neutral Important Very Important

If you're wondering, order can matter! Researchers at the University of Michigan's Institute for Social Research found that the order in which answers like these were read to respondents determined how they answered.

If it's possible, randomly flip the order of answers to ordinal questions for each participant. Be sure to keep the order consistent throughout the survey, though, or you might confuse respondents and collect data that doesn't represent their true feelings.

Alternatively, you could achieve the same effect by randomly splitting respondents into two groups and administering two surveys: one with the order of questions flowing from left-to-right, and the other from right-to-left.

Interval Data

Data must meet two requirements to be called "interval": it needs to be ordered, and the distance between the values needs to be meaningful.

For example, a predetermined set of incomes like "$20k, $30k, $40k" fits the interval data model. Another example might be: "1-50 employees, 51-100 employees, 100-150 employees."

Interval data is useful for collecting segmentation data (that is, it's useful for categorizing other questions). For example, you might want to ask a follow-up question about a respondent's plans to purchase a specific product—you could segment this question based on their response to a previous interval-style question.

If possible, it's best to use equally-sized intervals. This will allow for clarity in visualization when summarizing results, and also allow for the use of averages. If intervals aren't equal sizes, you should treat this data as categorical data.

Ratio Data

Ratio data is said to be the richest form of survey data. It represents precise measurements. A key characteristic of ratio data is that it contains an amount that could be referred to as "none of some quantity"—where the value "0" or "none" is just as valid a response as "45" or "987,123" or any other number.

Here's an example of ratio data: You might ask respondents about their income level with an input field that allows for numeric responses, like $24,315, $48,630 or even $0.

The defining characteristic of ratio data is that it's possible to represent the responses as fractions, like "$24,315/$48,630 = 1/2". This means that summary statistics like averages and variance are valid for ratio data—they wouldn't be with data from the previously listed response types.

If you'd like to calculate averages and measures of variance like standard deviation, asking for a specific number as a response is the way to go.

2. How to Phrase Survey Questions and Answers

Avoid leading questions

It's easy to accidentally suggest a certain answer in your question—like a hidden psychological nudge that says "hey, pick that one!"

Imagine that you're taking a poll on your local newspaper's website. It asks "Would you support putting a waste management facility next to the town square if it was privately or publicly funded?"

A. Privately funded

B. Publicly funded

But what if you don't want to build a waste management facility next to the town square? The smell of garbage lofting through the air probably won't encourage people to visit your city. The survey only gives us two options, though: build it with private funding, or build it with public funding.

Without a "neither" option, you can't capture how every respondent truly feels. The question in the example assumes a piece of information that the respondent didn't agree on. The fancy word for that is "presupposition."

It's perfectly fine to ask questions like "How useful do you consider Product XYZ?", as long as the answer "Not at all" is included as an option. The key thing to avoid is "presuppositions."

Presuppositions are an artifact of your own cultural sphere; you probably won't even recognize when you're including them in questions. The best way to avoid this is to send your survey to a few people in your target audience who you think would disagree with you on the topic. Soliciting feedback from a diverse audience can help you squash presuppositions and avoid creating a bias feedback-loop in your results.

Allow for Neutral or NA Responses

It's hard to cover all of the possible ways a person might feel about a question. When you force a respondent to give an answer, it can pollute your data with non-responses masquerading as real answers. At first it may seem undesirable to let respondents off the hook, but doing so can improve the quality of your data.

Avoid Compound Questions

If I asked:

On a scale of 1-100 rate the following statement(s): - Zapier and its blog posts help me do my job.

You would be forced to give a single answer reflecting feelings about both Zapier and its blog. This is sometimes called a "double-barrel question," and it can cause respondents to choose the subject they feel most strongly about. These cases can lead you to falsely interpret the results. It may also be possible that respondents have opposing views about both subjects. In that case, you're sure to collect misleading results.

Split the questions like these into multiple questions. Remember: Keep your questions as short and direct as possible.

Use Simple Language

Cleverness, humor, and business jargon can confuse respondents, especially if it causes them to misinterpret the question you're asking. Intentionally or not, we tend to write questions using ourselves and our cultural experiences as a reference, which can lead to poorly phrased copy that could confuse people. Using simple language can reduce the risk that the data you collect does not reflect the respondent's meaning.

Randomize Answers

Suppose you want to ask which of three products your users value the most (after making sure to include NA and "none"!). It's common for respondents to select the first answer simply because it's the easiest and most available. Randomization for categorical-type answers can help you avoid this bias.

Beware, though: if your question asks for an ordered answer (e.g. from Strongly disagree to Strongly agree), you should keep the order of the answers consistent throughout the survey to avoid confusion.

4. How to Select Survey Respondents

Most surveys are sent to a small subset of a larger population. Using such samples to make general statements about the population is called inference. Descriptive statistics are statements about just the sample; inferential statistics are statements about a population using a sample.

It's worth noting that inferential statistics with surveys is difficult and commonly impossible, even for experts. Sometimes you just can't generalize the sample to the population in a reliable way—you're stuck making statements about people who actually filled out the survey.

Most of the time, you can chalk this up to sampling bias: when your sample is not reflective of the population that you're interested in. Avoiding sampling bias is particularly important if you intend to analyze the results by segment.

One of the most famous examples of this problem occurred in the U.S. presidential election of 1948.

Pollsters during this era used a technique called quota sampling. Interviewers were each assigned a certain number of people to survey. Republicans during that time tended to be easier to interview than Democrats, according to Arthur Aron, Elaine N. Aron, and Elliot J. Coups in Statistics for the Behavioral and Social Sciences, a Brief Course. This caused interviewers to survey a higher proportion of Republicans than existed in the overall voting population. The quota system was actually an attempt to avoid this problem, as CBS News found, by creating representative cohorts of sex, age, and social status—but it missed that the segment (political party) itself was related to the survey mode.

The message is clear: Insofar as respondents don't match the population you wish to make a statement about, your survey statistics can be misleading. So what can you do?

If you send a survey by email, consider how respondents by email may differ from the population you wish to make a statement about.

Keep in mind that respondents to an emailed survey may not be representative of those who use your website. The opposite is true, too: if you place the survey on your website, the sample may not reflect those who interact with your organization through other methods.

To counteract that, try administering the same survey via each of the channels that your organization uses to interact with customers (email, website, phone, in-person, etc.).

If you can only use one mode, carefully consider if that mode is related to segments you'd like to analyze (e.g. are repeat customers more likely to respond?). The goal is to use a mode that will yield segment proportions that are representative of the whole population. This might mean you should distribute the survey through a variety of channels.

5. How to Calculate the Number of Survey Respondents You Need

The short answer is: as many as achieves a useful level of variability in responses. The right amount can be found by giving consecutive surveys and calculating the standard deviation of measures like ratio data.

If you're asking normal, ordinal, or interval-type questions, conduct a few baseline surveys and compare the results.

If the variability from survey-to-survey is low enough for the purpose of the survey, you've found the right number of people to sample. If your purpose requires less variability, increase your sample size relative to the population.

Another technique is to randomly break a sample group into a few equal-sized groups, administer the survey, analyze the results and then compare the results across the groups. The results will be statistically equivalent and the difference between the groups will be due to what statisticians call sampling error. If the differences are smaller than what you consider a difference important enough to act on, the group size is large enough for future surveys. However, if the differences between the groups are large in your view, increase your sample size—repeat these steps until the difference between the random groups is smaller than you'd consider important enough to act on.

Unless you're a surveying expert, deploying a voluntary survey in a way that delivers a valid measure of margin of error won't be possible—so the only way to get a feel for the number of people to survey is guess-and-check.

Need more precision? Increase your sample size.

6. How to Analyze Survey Results

It's easier than ever to build an online survey and send it out to customers, but analyzing the results is the tricky part.

As previously mentioned in the survey design section, there are four main ways to collect responses to each question and hence four main data types that you might confront when analyzing the results of a survey.

Categorical Data

Calculate the total number of responses and then divide the number in each category by the total. These are called a relative frequency statistics. Many just call them percentages or shares, but the important aspect is that the sum should be 100%. For example:

What do you like most about our product?

(Relative) Frequency Table

Answer Responses Share Fast customer service 30 30 / 100 = 30% Ease of use 40 40 / 100 = 40% Quality 16 16 / 100 = 16% Quantity 14 14 / 100 = 14% Total 100 100%

Categorical data can be made more useful by grouping results by customer segment. For example, you might want to know if new customers answered differently than long-time customers. Other popular categories include:

Product segments like "low-end", "mid-level", and "high-end"

Geographical segments like ZIP codes, county, or country

New customer versus established customers

The important thing is to carefully think about which categories are likely to be most meaningful to your organization. The worst thing you can do is blindly choose categories that aren't meaningful to your business. Age groups and differentiation by sex are commonly seen market segmentations, but what will you actually do with that information?

After categorizing by groups, make a table or graph to report the data. For example, a contingency table (also called a cross-tabulation or crosstab)—which is a matrix of response counts or shares with one segment structured as rows and another as columns—can be very useful.

Contingency Table

This table summarizes a fictitious set of 100 responses. First, I split the surveys into two groups that become the rows of the contingency table: those who were new customers, and those who were established customers. The groups are mutually exclusive (not overlapping) and exhaustive (sum to 100%).

Next, I count the number of responses by answer to the question: What do you like most about our product? Finally, I divide each count within each cell by the total number of responses to this question (including both groups).

(Total / share) Fast Customer Response Ease of Use Quality Quantity Total New customer 37% (28/75) 43% (32/75) 12% (9/75) 8% (6/75) 75% (75) Est. customer 8% (2/25) 32% (8/25) 24% (6/25) 36% (9/25) 25% (25) Total 30% (30) 40% (40) 16% (16) 14% (14) 100% (100)

Contingency tables show how responses differ by each category. What's interesting in this fictitious set of data is that new customers tend to like fast customer service the most, 4.6 times the rate that established customers do (37% / 8%). Also, established customers chose quality and quantity as most-liked characteristics 2- and 4.5-times more often than new customers chose those same characteristics, respectively.

Ordinal Data

Ordinal-type questions are very popular, but many people make a critical mistake when it comes to analyzing the data they produce. The worst thing you can do is convert the responses to numbers and then calculate the average of those numbers. The reason is that an arithmetic mean (the most common type of average, and there are many) like (1 + 2 + 3 + 4 +5) / 5 = 3 implies that there is some measure of distance between values.

However, it doesn't make sense to say that feeling neutral is three times the feeling of strong disagreement, or that the feeling that something is important is twice the feeling that something is somewhat important. These are simple clues that converting ordinal labels to numbers can cause misleading results.

Instead, the best thing to do is to create a simple relative frequency table or contingency table like those shown above for categorical data.

How wrong can things really go? Well, consider a controversial question where most people are in either strong disagreement or strong agreement. In that case, an average would indicate that the data are centered in the neutral category. That's an extreme example, but the same thing can happen if the largest buckets are, say, "neutral" and "very important." Suppose responses were like:

Don't do this:

Not important (1) Somewhat inportant (2) Neutral (3) Important (4) Very Important (5) Average 1 x 3 = 3 2 x 60 = 120 3 x 5 = 15 4 x 2 = 8 5 x 30 = 150 2.96 3% 60% 5% 2% 30%

The average of 2.96 would seem to imply that respondents felt neutral, when in reality a majority felt the subject was "somewhat important" (60%) and another large group (30%) felt the subject was "very important." In this context, even the label "neutral" feels out of place.

Instead, leave the data as a frequency table and allow the end-user to see the distribution of results directly. Avoid influencing stakeholders by showing the average. People love averages and tend to focus on them instead of the real story. Intentionally avoid averages and instead describe the data.

Do this intstead:

Not important Somewhat inportant Neutral Important Very Important 3% (3) 60% (60) 5% (5) 2% (2) 30% (30)

Most respondents felt the subject was only somewhat important, but another large group felt the subject was very important. There are two main groups of customers here—we should try to figure out what those segments might be. This could let us focus resources on those who feel the subject is important and avoid wasting resources on those that feel the subject is only somewhat important.

How to Graph Ordinal Scale Data

Diverging bar charts are a great way to visualize ordinal data. The distinctive element is a common baseline that allows the eye to measure the length of each bar very quickly. These charts are great for comparisons across segments. Let's take a look at a public data set for an example graph.

Every year (since 2010) the Federal Reserve Bank of New York publishes a survey of small businesses (as defined by a business with less than 500 employees) covered by the Reserve Banks of Cleveland, Atlanta, New York and Philadelphia. The main purpose of this study is to determine which small businesses are applying for and receiving loans—that's the context being referred to when you see the term "(credit) applicants" in this data.

By graphing the data with a common baseline, comparisons of losses, breaking-even, and profit are made clear across category.

In the first half of 2014, did your business operate at a profit, break even or at a loss?

Retail businesses did poorly the first half of 2014. Successful applicants for credit were also much more likely to be profitable. Also, the larger the business the more likely it was profitable. This could be due to survivorship bias. That is, as insofar as a business is profitable does it become large. It could be that smaller businesses are more willing to operate at a loss. Or, it could be that larger U.S. businesses recovered faster from the financial crisis that began in late 2007.

If you find this graph style useful I've made a template that you can use. For more info on this graphical style, be sure to check out Naomi B. Robbins and Richard. M. Heiberger's article "Plotting Likert and Other Rating Scales."

Interval Data

A useful and safe way to summarize interval data is as if they are ordinal data.

Summarizing interval data with averages and standard deviations (see the "ratio data" section below for a guide) is possible, but only if the distance between intervals is even. For example, questions like "on a scale of 1-10" with answers of 1, 2, …. 9, 10. are generally considered even intervals. However, there is some controversy to this.

People tend to avoid extremes, so it might not be accurate to say that the interval of 5-6 is 11 times an answer of 0-1. Think of measures of pain, for example: is the distance from 5-6 the same as 0-1 or 9-10? I bet not.

My suggestion is to treat interval data as ordinal data if the intervals are even, otherwise treat it as nominal data and use a contingency table for summary.

Below is an example of the way that uneven interval data can misrepresent data. This example comes directly from someone I consider a great visualizer of information: Stephen Few. I highly recommend Stephen's site on visualization, especially with his article about selecting the right graph for your data.

You can also use a free template for Google Sheets.



Ratio Data

There's one big advantage to using ratio data: it's rich enough to support averages. As before, for our purposes here, when I say "average" I'm specifically referring to the popular arithmetic mean, for example (1 + 2) / 2 = 1.5.

It's perfectly valid to take a set of ratio data and calculate the arithmetic mean like ($38,500 + $65,214) / 2 = $51,857.

Averages give you, the surveyor, a measure of where the data centered. They are also useful to measure the spread of responses, especially using the standard deviation statistic. Intuitively, it can be thought of as the average distance from the center of the data. Calculating the standard deviation requires a two step process.

Calculate the variance statistic Take the square root of the variance statistic

The variance statistic is defined as: SUM( [each value - mean]^2 ) / N - 1

For example:

Response (N = 3) Sessions Attended Avg. Sessions Deviation Sq. Deviation 1 2 5 -3 9 2 6 5 1 1 3 7 5 2 4 Sum 14 N - 1 2 Variance 7 Std. Dev. 2.64

For this survey data, we would report, "the average number of sessions attended was 5 +/- 2.64 sessions." Ratio data is special because it allows for measures of centrality (average) and dispersion (standard deviation) unlike nominal, ordinal, and non-equal interval data.

7. How to Interpret Survey Results

Focus on the High-Points

Visualizing data is one of the most important activities I carry out at Zapier. It's a passion of mine because graphs can elicit a wide variety of emotional responses. People have very different reactions to data based on how it's graphed, so it's important to be thoughtful when creating visualizations.

Knowing the challenges with measurement, I guide my coworkers at Zapier to focus on trends and avoid reading too much into small differences in data. It's easy to lose the big picture when looking at statistics and graphs, so it's important to remember that some error exists with any method.

Don't miss the forest through the trees; when interpreting results, start with the largest differences first, not the most unusual. If you notice an unusual result, be skeptical and see if the result can be replicated in another survey.

Collect a Few Baseline Surveys Before Making a Large Change.

If it's practical, try repeating and summarizing surveys a few times before making a large product or business change. Get a feel for what's normal and how much responses vary from survey from survey. It's possible to fall into a trap of chasing noise (sampling error) and effects that are not repeatable. Replication (repeated surveys) is the best way to learn what represents signal and what represents statistical noise.

When repeating the same survey, you might find that responses vary wildly for the same question even though no great change was made (see the section titled "How many people should I survey?"). In this case, you'll learn that the question is not a reliable metric for defining success. Or, you might be lucky and find that responses are generally similar before making a large change.

Once you make the change, you'll have a better idea of whether changes in response to the survey question are due to the decision you made or not. The point is to learn a bit about how users respond to the survey before using it to make a large decision.

Respect Your Surveys Limits of Precision

It's crucial to understand the limits of precision for each dataset you work with. Since most surveys represent only a small fraction of the group of interest, there exists error when making inference on a population. If the survey were actually sent to several groups at the same time, the resulting relative frequencies (percentages) would likely vary by more than 1%. That means showing numbers like 25.67% would communicate a false degree of precision.

When reporting your survey results, round to numbers like 25% to avoid communicating a false degree of precision. How much should you round? That depends on your survey's sampling variability (see the section titled "How many people should I survey?").

Table or Graph?

Tables are most useful when you're looking at precise numbers, or when there are few comparisons. Large tables make it hard to reason about the distribution of outcomes, and in these cases visualizations are preferable.

I'll use an interesting survey of small businesses as an example. The survey is carried out by Gallup, Inc. for Wells Fargo bank, and they present a raw table for you to use on the Wells Fargo site.

In this case, the individual numbers are the important part. The table is useful, but it's very difficult to digest. Below is a visual representation of the table titled "Financial Situation 12 Months From Now":

The visualization (called a diverging bar chart) makes it clear that small businesses turned very pessimistic about their financial situations beginning in the first quarter of 2009. It's also clear that optimism hasn't yet returned to the levels seen when the survey data begins in 2004.

Conclusion

Surveys and polls are a very effective tool for gathering feedback from customers and reducing the uncertainty around important decisions. By writing down the purpose of your survey and hypotheses up front, you'll be able to learn where your intuition is strong and find organizational blind spots.

Surveying is hard and biases can enter through poor survey delivery and poor question design. It's important to think about which data type will be most useful to answer the questions at hand. Focused surveys are the most likely to yield actionable results.

Rather than sending out one massive survey, iterate on a set of survey instruments sampling a bit of the population as you go. The process is as much about finding the right questions as it is about finding their respective answers.

Once you feel confident with your design, send out one large final survey. Keep in mind that the best designed survey in the world is useless if its results are not communicated effectively to stakeholders. Don't abuse categorical or ordinal data by taking averages, summarize by relative frequencies. Don't bombard readers with huge tables that are impossible to digest—take a bit of time and create a diverging bar chart. If you use interval data, keep in mind its utility for segmentation and don't fool readers by visualizing uneven intervals.

Finally, surveys are no place to get fancy. Keep it simple and you'll find that no matter the results you'll learn something of use!

You've made it. You've learned about the difference between forms, surveys, and polls, have found the best form apps and survey builders, learned how to integrate forms into your work, and now have the tools you need to analyze your data. But there's something more. Sometimes, you need a bit more power than just a standard survey for form builder gives you. Perhaps you want an easier way to analyze your data directly from a database, or want to build your forms into an in-house tool that works together with the rest of your data.

For that and more, there are database-powered app builders. In Chapter 9, for some bonus apps to help you do even more with forms and surveys, you'll find a roundup of the best apps to build your own in-house tools without much more work than most form builder apps require.

Go to Chapter 9!

