How deep is that snow? Machine learning helps us know

�鶹��Ѱ��Boulder researchers apply machine learning to snow hydrology in Colorado mountain drainage basins, finding a new way to accurately predict the availability of water

Determining how much water is contained as snow in mountain drainage basins is very important for water management, because measuring it is a necessary part of predicting the availability of water—especially in places that rely on snowmelt for their water supply, like Colorado and other western states.

Snow water equivalent is the amount of water in a mass of snow or snowpack. The depth of this water is a fraction of the snow depth, and this fraction is obtained by multiplying the depth by the snow density, which is expressed as a percentage of the density of water. If there are 10 inches of snow with a density of 10%, the snow water equivalent is 1 inch.

A persistent challenge is that snow water content is calculated from both snow depth and snow density, yet it remains unfeasible to directly measure snow density over a large area. Traditionally, this issue has been addressed with remote sensing, which allows for consistent and relatively large-scale measurements. However, remote sensing methods have their own limitations, which has prompted the search for an alternative in machine-learning technology.

��

portraits of Jordan Herbert and Eric Small

�鶹��Ѱ��Boulder researchers Jordan Herbert (left), a PhD candidate, and Eric Small, a professor of geological sciences, developed a model that can estimate the snow density at times when and in places where it has not been observed or sensed.

, �鶹��Ѱ�� Ph.D. candidate��Jordan Herbert and Professor��Eric Small of the Department of Geological Sciences developed a model that can estimate the snow density at times when and in places where it has not been observed or sensed. This model is split into different scenarios, each trained on a different subset of the data, and while performance varied, all scenarios were more accurate than extrapolation from remote sensing methods, according to Herbert and Small.

Model performance analyses also demonstrated that information from Airborne light detection and ranging (LIDAR) can be transferred to different times and places within the region it was collected.

LIDAR and SNOTEL data

LIDAR surveys are an important tool in snow hydrology, as they provide detailed information about snow properties, specifically through their detection of snow depth.

“You fly the plane twice,” Small says, “once when there’s no snow, once when there is snow. The laser reflects off the surface, and if you know where the plane is and the distance to the surface, then you know the height of the snow relative to the ground surface.” This is called differential LIDAR altimetry.

While LIDAR is very useful in snow hydrology, it does have some limitations. The first is that it only measures snow depth, but snow density (either measured or modeled) is also needed to determine snow water equivalent. This isn’t a unique limitation, however, because snow density cannot be surveyed in the same way as snow depth.

“Measuring snow density in the field reveals just how variable the snowpack is,” Herbert explains. “Depending on if you dig a snow pit under a tree or on a north versus south facing aspect, you can get a completely different answer.”

This is a major limitation of on-site observations. Density also varies with depth, and remote sensing signals will be affected by the amount of liquid water content in snow, which makes measuring snow density remotely or over a broad scale impossible for the foreseeable future.

The second and more easily addressed issue with LIDAR surveys is the logistical issues associated with necessary plane flights.

“You can’t fly a plane all the time,” Small says. “It’s too expensive, and we don’t have enough planes to fly everywhere.” Planes also cannot be flown when the weather is bad, and surveys only provide a snapshot of snow depth, which can change rapidly as snow falls or melts.

��

two cabin eaves barely visible in deep snow

“Measuring snow density in the field reveals just how variable the snowpack is. Depending on if you dig a snow pit under a tree or on a north versus south facing aspect, you can get a completely different answer,” says �鶹��Ѱ��Boulder researcher Jordan Herbert. (Photo: ��Pixabay)

These limitations can be worked around by using the LIDAR data to train computer models. “Based on that,” Small says, “you can use the LIDAR information to make predictions in the absence of LIDAR at another time or date or location. So, you’re leveraging the scientific information from LIDAR to improve your knowledge generally.”

Snow telemetry (SNOTEL) is an automated system of snow and climate sensors run by the National Resource Conservation Service, which is part of the U.S. Department of Agriculture. There are about�� across the western United States—small wilderness areas filled with sensing equipment that measures precipitation, snow mass and snow depth.

“All snow hydrology is based on data from these stations,” Small says. “The problem is that they only cover a small area. If you take all the SNOTEL stations in the western U.S. and put them next to each other, they’d be about the size of a football field, so they’re vastly under sampling. That’s why people want to use LIDAR to fill in all the spaces around them.”

The random forest model

Linear regression makes quantitative predictions based on one or more variables, but it becomes difficult to perform when many of these variables interact with each other in complex ways. In this case, some examples are elevation, solar radiation, slope, tree cover and so on. The difficulty of working with all these variables can be minimized by a modeling tool called a regression tree.

“A binary regression tree splits your sample into two groups, and it splits that sample to figure out which variable has the most effect on the thing you're trying to predict,” Small explains. The branching structure created by these splits gives the model its name and is designed to minimize errors. Each branching point is a condition like true/false or yes/no, the answer to which determines the path taken.

Regression trees are useful in that they fit the data better than multiple linear regression models, which are the other option when it comes to using linear regression when there are many variables involved. The better a model fits the observed data, the better it will be at predicting data that have not been observed, Small says.

However, regression trees have their own limitations.

“The downside of a binary regression tree is that it only gives you categorized values,” Small says. “For example, snow depth could be 70 centimeters, 92 centimeters or 123 centimeters. You end up with a map that just has these particular values.” This issue can be solved by combining multiple regression trees into a random forest model.

“What a random forest does,” Small explains, “is take a bunch of these binary regression trees and samples them randomly to give you continuous distributions of the variable that you care about. So instead of it being in these categories, it's more like how we think about snow depth.”

��

overhead view of evergreen trees blanketed with snow

“All snow hydrology is based on data from (SNOTEL) stations. The problem is that they only cover a small area. If you take all the SNOTEL stations in the western U.S. and put them next to each other, they’d be about the size of a football field, so they’re vastly under sampling," says �鶹��Ѱ��Boulder Professor Eric Small. (Photo: Ruvin Miksanskiy/Pexels)

Machine learning

While using binary regression trees allows the predictive model discussed in this study to fit the data better, there are other things to consider, Small says. “In machine learning and other statistics, there’s this trade-off between how well a model can fit the information you give it and how generalizable it is. If I keep adding training data, training the model and tuning the parameters, I can have it fit the data pretty well, but then it becomes fixated on those very specific data, and it’s not going to make good predictions elsewhere.”

This is called “overfitting,” and it can be described simply as the model becoming too used to patterns in the data it was trained on. In anticipating these patterns, the model will make incorrect predictions that would have been right in the same place or under the same circumstances as the training data were collected, but aren’t otherwise.

This explains the different performance of the three different versions of the model: the site-specific model, the regional model and the site-specific and regional (SS+Reg) model. The site-specific model makes predictions about a given basin using LIDAR data from the same basin that was collected at other dates, whereas the regional model makes predictions about a basin using data from other basins and at other dates. The SS+Reg model was trained using all available data.

The SS+Reg model was the most accurate, but all models were generally accurate, both compared to models from prior studies and remote sensing methods. Because models of the sort used in this study output on the 50-meter scale, this scale was used to compare this study’s models to existing ones, and the former were more accurate. The models’ outputs were at a scale of 50 meters, but these were upscaled to 1- and 4-kilometer scales as well.

The 1- and 4-kilometer scales are more typically used in water management applications, and all three models became more accurate when applied to these scales, outperforming SNOTEL. This means that the models were more accurate than extrapolation from observation data. The success of both the SS+Reg and regional models indicates that information gained from LIDAR is transferable to different times and locations within the Rocky Mountain Region.

Besides fitting the data well and being adaptable to different scales between the three model scenarios, this approach is also beneficial because it does not rely on modeling physical processes (like snow formation, accumulation and melt) or on uncertain weather data. This makes it so that, once a model is trained, it doesn’t take long to make predictions. “The big gain is that it's much more computationally efficient and it just takes a fraction of the time,” Small says. “It's about 100 times faster.”

Herbert says “machine learning has been a huge benefit to my research, with the results to back it up. It’s freed up my time in the winter to put skis on and dig more snow pits to get the density data we desperately need.”

“For whatever reason, all our physically based models and our knowledge of science just gets in our way of making predictions,” Small explains, “because we've tried to boil it down to these simple equations, but it's not simple.”

"Machine learning has been a huge benefit to my research, with the results to back it up. It’s freed up my time in the winter to put skis on and dig more snow pits to get the density data we desperately need."

Expanding to other regions

The primary limitation of the snow density-measuring framework that the researchers created for this study was its reliance on on-site and LIDAR data for snow depth measurements. Small says that this could be addressed by bringing in other data sets, which would provide a more independent test of success than models’ ability to predict snow density in regions they were not trained on.

One of these data sets, the fractional snow-covered area (how much of the ground is covered by snow), could be measured using LIDAR equipment mounted to a satellite rather than relying on airplanes. While LIDAR has been used with satellite technology, this doesn’t address the limitations of plane-mounted LIDAR, because as Small says, “the (satellite) overpass interval is very slow. It’s about 90 days before it comes back to the place you’re looking at. So, you get a snapshot very infrequently, but it’s everywhere on the planet.”

The next step of developing this kind of model is to apply it to other regions, and it remains to be seen how easily that translation can be made, Herbert says.

“We’ve just begun running the model in California to see if the model works in regions with different climates,” he says. “We want to see how transferable data from one region is to another, and California is an ideal test site since it has more LIDAR than anywhere else in the world.”

The presence of LIDAR is important because these data were the most useful when it came to statistical model validation, or making sure that the models were accurate and reliable, compared to data limited by the small-area reporting of SNOTEL and the variability of on-the-ground snow density measurements. Without data to judge models’ predictions against, it is impossible to determine how well they do, because the actual snow depth is unknown.

Also, because LIDAR isn’t available everywhere, it is important to continue developing other methods of validation, the researchers say. Small says reducing reliance on LIDAR will help the innovative modeling framework apply to many parts of the country.

Did you enjoy this article?��Passionate about geological sciences?��Show your support.

��

�鶹��Ѱ�����

Search

Other ways to search:

How deep is that snow? Machine learning helps us know

Related Articles

�鶹��Ѱ��