How deep is that snow? Machine learning helps us know
麻豆免费版下载Boulder researchers apply machine learning to snow hydrology in Colorado mountain drainage basins, finding a new way to accurately predict the availability of water
Determining how much water is contained as snow in mountain drainage basins is very important for water management, because measuring it is a necessary part of predicting the availability of water鈥攅specially in places that rely on snowmelt for their water supply, like Colorado and other western states.
Snow water equivalent is the amount of water in a mass of snow or snowpack. The depth of this water is a fraction of the snow depth, and this fraction is obtained by multiplying the depth by the snow density, which is expressed as a percentage of the density of water. If there are 10 inches of snow with a density of 10%, the snow water equivalent is 1 inch.
A persistent challenge is that snow water content is calculated from both snow depth and snow density, yet it remains unfeasible to directly measure snow density over a large area. Traditionally, this issue has been addressed with remote sensing, which allows for consistent and relatively large-scale measurements. However, remote sensing methods have their own limitations, which has prompted the search for an alternative in machine-learning technology.
听

麻豆免费版下载Boulder researchers Jordan Herbert (left), a PhD candidate, and Eric Small, a professor of geological sciences, developed a model that can estimate the snow density at times when and in places where it has not been observed or sensed.
, 麻豆免费版下载 Ph.D. candidate听Jordan Herbert and Professor听Eric Small of the Department of Geological Sciences developed a model that can estimate the snow density at times when and in places where it has not been observed or sensed. This model is split into different scenarios, each trained on a different subset of the data, and while performance varied, all scenarios were more accurate than extrapolation from remote sensing methods, according to Herbert and Small.
Model performance analyses also demonstrated that information from Airborne light detection and ranging (LIDAR) can be transferred to different times and places within the region it was collected.
LIDAR and SNOTEL data
LIDAR surveys are an important tool in snow hydrology, as they provide detailed information about snow properties, specifically through their detection of snow depth.
鈥淵ou fly the plane twice,鈥 Small says, 鈥渙nce when there鈥檚 no snow, once when there is snow. The laser reflects off the surface, and if you know where the plane is and the distance to the surface, then you know the height of the snow relative to the ground surface.鈥 This is called differential LIDAR altimetry.
While LIDAR is very useful in snow hydrology, it does have some limitations. The first is that it only measures snow depth, but snow density (either measured or modeled) is also needed to determine snow water equivalent. This isn鈥檛 a unique limitation, however, because snow density cannot be surveyed in the same way as snow depth.
鈥淢easuring snow density in the field reveals just how variable the snowpack is,鈥 Herbert explains. 鈥淒epending on if you dig a snow pit under a tree or on a north versus south facing aspect, you can get a completely different answer.鈥
This is a major limitation of on-site observations. Density also varies with depth, and remote sensing signals will be affected by the amount of liquid water content in snow, which makes measuring snow density remotely or over a broad scale impossible for the foreseeable future.
The second and more easily addressed issue with LIDAR surveys is the logistical issues associated with necessary plane flights.
鈥淵ou can鈥檛 fly a plane all the time,鈥 Small says. 鈥淚t鈥檚 too expensive, and we don鈥檛 have enough planes to fly everywhere.鈥 Planes also cannot be flown when the weather is bad, and surveys only provide a snapshot of snow depth, which can change rapidly as snow falls or melts.
听

鈥淢easuring snow density in the field reveals just how variable the snowpack is. Depending on if you dig a snow pit under a tree or on a north versus south facing aspect, you can get a completely different answer,鈥 says 麻豆免费版下载Boulder researcher Jordan Herbert. (Photo: 听Pixabay)
These limitations can be worked around by using the LIDAR data to train computer models. 鈥淏ased on that,鈥 Small says, 鈥測ou can use the LIDAR information to make predictions in the absence of LIDAR at another time or date or location. So, you鈥檙e leveraging the scientific information from LIDAR to improve your knowledge generally.鈥
Snow telemetry (SNOTEL) is an automated system of snow and climate sensors run by the National Resource Conservation Service, which is part of the U.S. Department of Agriculture. There are about听 across the western United States鈥攕mall wilderness areas filled with sensing equipment that measures precipitation, snow mass and snow depth.
鈥淎ll snow hydrology is based on data from these stations,鈥 Small says. 鈥淭he problem is that they only cover a small area. If you take all the SNOTEL stations in the western U.S. and put them next to each other, they鈥檇 be about the size of a football field, so they鈥檙e vastly under sampling. That鈥檚 why people want to use LIDAR to fill in all the spaces around them.鈥
The random forest model
Linear regression makes quantitative predictions based on one or more variables, but it becomes difficult to perform when many of these variables interact with each other in complex ways. In this case, some examples are elevation, solar radiation, slope, tree cover and so on. The difficulty of working with all these variables can be minimized by a modeling tool called a regression tree.
鈥淎 binary regression tree splits your sample into two groups, and it splits that sample to figure out which variable has the most effect on the thing you're trying to predict,鈥 Small explains. The branching structure created by these splits gives the model its name and is designed to minimize errors. Each branching point is a condition like true/false or yes/no, the answer to which determines the path taken.
Regression trees are useful in that they fit the data better than multiple linear regression models, which are the other option when it comes to using linear regression when there are many variables involved. The better a model fits the observed data, the better it will be at predicting data that have not been observed, Small says.
However, regression trees have their own limitations.
鈥淭he downside of a binary regression tree is that it only gives you categorized values,鈥 Small says. 鈥淔or example, snow depth could be 70 centimeters, 92 centimeters or 123 centimeters. You end up with a map that just has these particular values.鈥 This issue can be solved by combining multiple regression trees into a random forest model.
鈥淲hat a random forest does,鈥 Small explains, 鈥渋s take a bunch of these binary regression trees and samples them randomly to give you continuous distributions of the variable that you care about. So instead of it being in these categories, it's more like how we think about snow depth.鈥
听

鈥淎ll snow hydrology is based on data from (SNOTEL) stations. The problem is that they only cover a small area. If you take all the SNOTEL stations in the western U.S. and put them next to each other, they鈥檇 be about the size of a football field, so they鈥檙e vastly under sampling," says 麻豆免费版下载Boulder Professor Eric Small. (Photo: Ruvin Miksanskiy/Pexels)
Machine learning
While using binary regression trees allows the predictive model discussed in this study to fit the data better, there are other things to consider, Small says. 鈥淚n machine learning and other statistics, there鈥檚 this trade-off between how well a model can fit the information you give it and how generalizable it is. If I keep adding training data, training the model and tuning the parameters, I can have it fit the data pretty well, but then it becomes fixated on those very specific data, and it鈥檚 not going to make good predictions elsewhere.鈥
This is called 鈥渙verfitting,鈥 and it can be described simply as the model becoming too used to patterns in the data it was trained on. In anticipating these patterns, the model will make incorrect predictions that would have been right in the same place or under the same circumstances as the training data were collected, but aren鈥檛 otherwise.
This explains the different performance of the three different versions of the model: the site-specific model, the regional model and the site-specific and regional (SS+Reg) model. The site-specific model makes predictions about a given basin using LIDAR data from the same basin that was collected at other dates, whereas the regional model makes predictions about a basin using data from other basins and at other dates. The SS+Reg model was trained using all available data.
The SS+Reg model was the most accurate, but all models were generally accurate, both compared to models from prior studies and remote sensing methods. Because models of the sort used in this study output on the 50-meter scale, this scale was used to compare this study鈥檚 models to existing ones, and the former were more accurate. The models鈥 outputs were at a scale of 50 meters, but these were upscaled to 1- and 4-kilometer scales as well.
The 1- and 4-kilometer scales are more typically used in water management applications, and all three models became more accurate when applied to these scales, outperforming SNOTEL. This means that the models were more accurate than extrapolation from observation data. The success of both the SS+Reg and regional models indicates that information gained from LIDAR is transferable to different times and locations within the Rocky Mountain Region.
Besides fitting the data well and being adaptable to different scales between the three model scenarios, this approach is also beneficial because it does not rely on modeling physical processes (like snow formation, accumulation and melt) or on uncertain weather data. This makes it so that, once a model is trained, it doesn鈥檛 take long to make predictions. 鈥淭he big gain is that it's much more computationally efficient and it just takes a fraction of the time,鈥 Small says. 鈥淚t's about 100 times faster.鈥
Herbert says 鈥渕achine learning has been a huge benefit to my research, with the results to back it up. It鈥檚 freed up my time in the winter to put skis on and dig more snow pits to get the density data we desperately need.鈥
鈥淔or whatever reason, all our physically based models and our knowledge of science just gets in our way of making predictions,鈥 Small explains, 鈥渂ecause we've tried to boil it down to these simple equations, but it's not simple.鈥
"Machine learning has been a huge benefit to my research, with the results to back it up. It鈥檚 freed up my time in the winter to put skis on and dig more snow pits to get the density data we desperately need."
Expanding to other regions
The primary limitation of the snow density-measuring framework that the researchers created for this study was its reliance on on-site and LIDAR data for snow depth measurements. Small says that this could be addressed by bringing in other data sets, which would provide a more independent test of success than models鈥 ability to predict snow density in regions they were not trained on.
One of these data sets, the fractional snow-covered area (how much of the ground is covered by snow), could be measured using LIDAR equipment mounted to a satellite rather than relying on airplanes. While LIDAR has been used with satellite technology, this doesn鈥檛 address the limitations of plane-mounted LIDAR, because as Small says, 鈥渢he (satellite) overpass interval is very slow. It鈥檚 about 90 days before it comes back to the place you鈥檙e looking at. So, you get a snapshot very infrequently, but it鈥檚 everywhere on the planet.鈥
The next step of developing this kind of model is to apply it to other regions, and it remains to be seen how easily that translation can be made, Herbert says.
鈥淲e鈥檝e just begun running the model in California to see if the model works in regions with different climates,鈥 he says. 鈥淲e want to see how transferable data from one region is to another, and California is an ideal test site since it has more LIDAR than anywhere else in the world.鈥
The presence of LIDAR is important because these data were the most useful when it came to statistical model validation, or making sure that the models were accurate and reliable, compared to data limited by the small-area reporting of SNOTEL and the variability of on-the-ground snow density measurements. Without data to judge models鈥 predictions against, it is impossible to determine how well they do, because the actual snow depth is unknown.
Also, because LIDAR isn鈥檛 available everywhere, it is important to continue developing other methods of validation, the researchers say. Small says reducing reliance on LIDAR will help the innovative modeling framework apply to many parts of the country.
Did you enjoy this article?听听Passionate about geological sciences?听Show your support.
听