You can be a good data scientist by sitting at your computer. After all, the job description involves poring through huge quantities of often disparate data to find insights that may prove helpful in every aspect of a business, including marketing, logistics, and human resources. It also includes cleaning data, dealing with gaps, and sifting through incomplete poor definitions.

But great data scientists know they must do more. They recognize that there are nuances and quality issues in the data that they can’t understand while sitting at their desks. They recognize that the world is filled with “soft data,” relevant sights, sounds, smells, tastes, and textures that are yet to be digitized — and hence are unavailable to those working at their computers. (Think of things like the electricity in the air at a political rally and the fear in the eyes of an executive faced with an unexpected threat.) They know they must understand the larger context, the real problems and opportunities, how decision makers decide, and how their predictions will be used.

Great data scientists know the only way to acquire this smorgasbord of information is to go get it. So they spend time on the road with truckers, probe decision makers, wander the factory floor, pretend to be a customer, ask experts in other disciplines for help, and so forth. They delve deeply into processes of data creation and the complexities of measurement equipment. They ask old hands how their recommendations will be used, what the likely results are, and what can go wrong.

Consider an example from the oil business. Where the oil is thick, it is hard to pump out of the ground. To make this process easier, companies heat the oil first with steam. Steam is expensive and must be used according to strict ecological guidelines, so putting the right amount in is critical. A good data scientist can take many factors into account — the underlying geology, the current temperature of the oil, the well’s production history — to optimize the amount of steam.

But a great data scientist would also spend some time in the oil field. There, they would notice that the probe used to estimate current temperature is sometimes lowered into the well clean, while at other times it is covered with mud. As it happens, mud is a great insulator, leading to a “too-low” temperature and, in turn, too much steam. Having verified this through a simple experiment, the great data scientist will tackle the root of the issue, namely the lack of a work instruction advising the technician to insert a clean probe.

Great data scientists are deeply curious about the data and everything surrounding it. In this case, optimizing the amount of steam is important, but rooting out the data quality issue (the mud-covered probe) is more fundamental and saves millions of dollars.

Not all data scientists spend enough time understanding the deeper reality they study. Some concentrate too much on the numbers. For example, in predicting the most recent election, the place to be was in the mind of the potential voter. You can’t go there directly, so many individuals and publications, from the New York Times to the Princeton Election Commission, used polls to predict who would win. But most were way off.

Many people have dissected how this may have happened, but I’d argue that it comes down to what a great data scientist does that a lesser one may not. Great data scientists know they have to understand the strengths and weaknesses in the data in great detail. Polling is fraught, as Nate Silver cautioned, and great data scientists worry more about nonsampling error than about models for aggregating polls. They study the accuracy of past polls, ask themselves what would happen if people lie to pollsters, and ponder biases such as whether people who say they are likely to vote will actually do so. A few in the media commented on how they felt much more energy at Trump rallies than at Clinton rallies, even suggesting that this could translate to greater turnout for his supporters. Great data scientists conduct such analyses to develop a wider perspective.

What’s more, great data scientists cast a wide net in looking for relevant data. Might Americans distrust political dynasty, lowering the chances of a candidate seeking election after a two-term president from her party? Might economic performance help — or hurt — the incumbent party? Might the winner of the Super Bowl correlate with the winner of the election?

Of course, a healthy dose of skepticism should go with each of these analyses (it’s hard to see any relationship between a football game and a presidential election), but for great data scientists, this business of exploring the world from as many angles as possible, being deeply curious about the data, whether digitized or not, and asking how the pieces fit together never stops. If you’re not actively doing these things, take steps to fold this into your daily work.

First, see how the data is actually collected. Treat Osborn’s Law — “variables won’t; constants aren’t” — as your watchword. Measurement instruments get clogged with sand, pollsters don’t follow their scripts, and survey developers inadvertently design their instruments in ways that bias results. You can’t assume that your data is unbiased and correct. Take a hard look, in person.

Second, get to know the full context in which you work. Look at the business you serve, the critical issues you face, and so on. Read, study, and go to conferences. Build and utilize an extensive network with special focus on people outside the field of data science. Seek out veterans and managers who will make time to explain the business. Get them to introduce you to others, and ask a couple of them to serve as informal mentors. You may have to nudge them from time to time, but you’ll very likely find plenty of people eager to help.

Third, integrate these efforts into your day-in, day-out work. Engage in important analyses with those who can help you frame and flesh out the real problem and suggest data sets and theories that you may not have considered. Connect with people who have different perspectives. Test your initial results on others and get them to help you work through everything that could go wrong. Make sure you report — in ways decision makers can understand — assumptions, uncertainties in your results, and concerns.

Great data scientists know that the goal is to solve real-world problems. They use data to do so, but they don’t stop there. Make it your mission to learn all you can about your data, starting from where it was first created. Embrace the broader reality, with a special emphasis on all the information that is yet to be stored in technology.