BOLTON LANDING, New York—Arriving at Rensselaer Polytechnic Institute's field research station on the shores of lovely Lake George, the offices appeared deserted. The station's staff didn't hide from us; they had all relocated to another building for a training session on a new piece of technology. They've been doing a lot of that lately.

The scene in their meeting room was mostly pretty standard—tables, chairs, coffee, and snacks—but not many field stations have a shiny new nine-panel computer display on the wall. And no field stations have what that display will soon be showing.

Nestled along the eastern edge of New York’s stout and beautiful Adirondack Mountains, south of sprawling Lake Champlain, Lake George is a long, glacially sculpted basin filled by clear waters. The lake is 51 kilometers long, doesn’t get much more than three kilometers wide, and has long been a natural attraction. Thomas Jefferson once called it “the most beautiful water I ever saw." But today, a partnership between IBM, the Rensselaer Polytechnic Institute, and the local FUND for Lake George has a different descriptor in mind—“smartest” lake in the world. It's an effort dubbed the “Jefferson Project.”

Over 30 years ago, Rensselaer established its field station at a donated property in the town of Bolton Landing. (The space was previously a lodge, and it still provides a place to sleep for visiting students and scientists.) This station has served as a base for long-term monitoring of Lake George, as well as other research in the area—including monitoring a number of Adirondack lakes following the acid rain regulations passed in 1990. Now, it is home to the Jefferson Project. And with IBM's technological and financial support, researchers are getting ready to take advantage of a whole new approach to studying Lake George: Big Data.

Why can this shiny research buzzword help out these environmental initiatives? Well, why do fruit flies get so much attention from biologists? It's not because they’re necessarily more interesting than other creatures, but they’re a nice model organism for research. Since we know so much about them, it's easier to ask complex questions with some hope of figuring out the answer.The same strategy can be applied to aquatic ecosystems. If you can nail down enough of the basic processes in a specific lake, that rich context can open up new opportunities for research—your investments pay scientific dividends beyond one body of water. So in this way, the Jefferson Project is out to turn Lake George into a larger, more elegant fruit fly.

Since science is driven by data, it's frequently held back by a lack. One way for scientists to make the most of their funding and time is to conscript robotic assistants—devices that stay out in the field, dutifully making measurements on a pre-determined schedule. That’s a huge service, but such robots don't always gather the kind of data that a scientist wants (more is definitely better when it comes to data, but better is better, too).

The Jefferson Project sees the next evolutionary research step as putting some brains in those robots and increasing their potential. Let’s say you want to measure something that happens during rainstorms. If your instrument measures too infrequently, it’s likely to miss most of the action when a storm happens to roll through. If you set it to measure too frequently, its memory banks will fill and its battery will empty before a storm even shows up. A smarter instrument would just hang out until a rain storm arrives and then sample like crazy until the window of interest closes.

The result would be more of the data you’re after. And that will put a smile on any scientist's face.

Get smart

When Ars visited the Bolton Landing field station in February, IBM’s Harry Kolar described the tools they are developing to make all this possible. “From [IBM’s] perspective, we are supporting the entire project by building the most advanced infrastructure that we can, to really extract the maximum knowledge out of the lake,” he said.

Part of that is a flood of measurements. The largest streams that flow into Lake George are now being monitored year-round by sampling stations. Water is pulled into a heated enclosure (ice isn’t helpful) where it is analyzed by a data-logging sensor device with probes for things like temperature, pH, dissolved oxygen, conductivity, dissolved organic matter, and algae content. Other instruments measure the water level and flow velocity of the stream. The data is periodically uploaded over a cellular Internet connection and entered into the project database. And at the same time, a carousel of sample bottles is automatically filled and periodically retrieved for additional tests in the analytical chemistry labs back at the field station.

That same kind of sensor device is also going to be active out in the lake itself, operating from a set of anchored, floating platforms. The floats raise and lower the sensors, allowing them to sample a vertical profile of the lake. That’s enormously useful, because the lake stratifies into a warmer surface layer and a cooler deep portion (as most lakes do). The first two floats hit the lake last summer, but five will go out this spring after the ice melts.

A number of current profilers will be placed at the bottom of the lake as well. These devices bounce acoustic waves off particles drifting by, using the slight Doppler shift of the returning waves to calculate velocities at various heights above the device. For now, they will simply store data to be downloaded when they are retrieved, but they may run cables in the future to allow near real-time access.

Of course, researchers also want to put some brains in these devices. “[IBM is] not only building the architecture to bring the data back together, back into the system with sophisticated analytics,” Kolar said, “but we’re also building an embedded [computing system] on the sensors[…] that will have custom code on it—open-standards-based kind of stuff, and Linux platforms and all that. We can actually work to put some of the analytics there. It increases the data availability, it increases the data integrity, the data quality, and it allows us to do some interesting things.”

That includes things like ramping up sampling frequency when something anomalous happens. In the case of the profiling floats, the devices might first find the boundary between the surface and deeper layers—called the thermocline—and automatically set the rest of the sampling depths accordingly to target whatever region researchers are most interested in.

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute

Scott K. Johnson

Scott K. Johnson

Scott K. Johnson

Scott K. Johnson

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute

Scott K. Johnson

The system will also be able to go beyond that level of intelligence because of the other main portion of the project: modeling. The first example is an implementation of IBM’s “Deep Thunder” weather model. Utilizing all the weather data they can get their hands on (including from their own weather stations around the lake), the weather model will generate a local (with 1-kilometer resolution) 48-hour forecast twice a day.

The output of that model can feed right into the others. One models surface runoff throughout the area that drains into Lake George, simulating the pulses of water that flow into the lake through streams following rainfall. The other is a circulation model of the lake. From the lake’s inflows and outflows, and the winds from the weather model, it simulates the movement and mixing of water throughout the lake.

To build these models, extremely detailed topography and lake-bottom bathymetry were acquired. The bathymetry data (the product of sonar mapping) is gorgeous, with a resolution of just half a meter. Plane-based laser mapping of the surrounding hills and shallow shoreline were stitched together for a seamless digital representation of the landscape.