A Culture of Checking: Open Climate Models (4)
A Culture of Checking: Open Climate Models (4)
This is a continuation of my series on community approaches to climate science and, specifically, climate models. I am exploring the ideas of open innovation and open source communities. This article will be on verification, testing, evaluation, and validation. I know that this is, perhaps, a bit off the path of what I imagine is my main audience, but these ideas are at the core of climate science, and also part of the politicization (or de-politicization) of science. Before I dig into the arcane, I wanted to reference Jeff Master’s post on the major floods of the past few months. This combination of warm seas, warm air, water vapor and rain and geography and people is difficult to ignore, and I plan to revisit these floods in the context of the recent guest blog by Christine Shearer. Now, deeper into the world of how we build climate models. Here are links to the previous articles in the series: #1 in series, #2 in series, #3 in series
Verification, Testing, Evaluation, Validation: Validation is an important part of the scientific method. The scientific method always relies on observations, in concert with the development of testable hypotheses, and experiments or predictions which provide the tests for the hypotheses. In practice the validation process includes not only a scientist evaluating the results of the experiment or prediction, but providing written documentation which describes their work and their methods of validation. Using this documentation other scientists are able to evaluate the work and design independent experiments or predictions and methods of validation. It is not until there is independent confirmation of a scientist’s work that the work is accepted into the scientific body of knowledge. Once in the body of knowledge, work is not recognized as fact - rather it sits as a contribution that is, often, continuously challenged. It is a harsh, competitive process – not sublime.
By the very definition of scientific practice, a certain level of transparency is required. The transparency allows those who are, essentially, competitors to examine, reproduce, and independently confirm or refute the work of the scientist who initiated the original study. This general process of validation, reporting, and independent certification is a remarkably conservative process. It is slow to interject major shifts and changes into the knowledge base. This conservativeness impacts strongly the way scientists speak to each other and the public; there are always nuances of uncertainty and equivocation (again, see this entry). If you think about this culture of validation, a culture of checking, it has a lot in common with how markets are run and governance. Governance? A set of rules, buy in by participants, and a system of checks and balances. I have a number of previous articles on validation and transparency in a more general sense: Opinions and Anecdotal Evidence, Trust, but Verify, Uncertainty.
Verification, testing, evaluation, validation, I am grouping these words together to describe the culture of checking that is pervasive in the good practice of science. When I went to school, the idea of checking my arithmetic was taught again and again. It had to be taught again and again, because I was not smart enough to understand the value of checking as an abstract concept. As problems get more and more complex, strategies for checking get more and more sophisticated. When I used to hire a lot of new graduates at NASA, during the interviews we explored how they checked their work. During the first year of employment we were often coaching, teaching, and training people in how to check.
How do we translate the skills of checking to climate models? I have been laboring over the words verification, testing, evaluation, and validation. What is the difference? To start, it is important to realize that climate models are computer codes, programs, software. And though many scientists object to the following categorization, the product that climate modelers produce is software. It is software for a purpose – the scientific investigation of climate. As suggested by the previous entries in this series; the software is complex; it represents individual sub-processes (like cloud formation), it represents an approximation to those sub-processes; it is developed by geographically dispersed individuals, who are generally not managed as a coordinated group.
I will start with the easy one. Verification is the process of assuring that the software you have written or implemented is doing what you intended for it to do. Suppose you are writing a simple computer program to calculate how long it will take you to drive from Limon, Colorado to DeKalb, Illinois. This is a simple equation of motion represented by distance traveled equals speed multiplied by time traveled. You might check your program (your model) by seeing how long it takes to drive 100 miles at 20 miles per hour – a simple problem for which you can confidently state the answer. Maybe you try different pairs of speeds and distances. If you are thoroughly scientific, you might collect some data with your car. Of course, that might raise questions of determining the accuracy of your speedometer and your odometer, another requirement for checking. With some confidence you can develop a program that, without driving from Limon to DeKalb, you can make a very good approximation of how long it will take at a given speed. You can perhaps add another question – how fast do I have to drive to make the trip in 24 hours? 18 hours? With this example you can imagine the process of verification, checking that your program is doing what it is supposed to do. You might also say that you are testing your code. Testing is another word in the culture of checking, which takes on more specific meaning in different processes.
Evaluation and validation are more difficult to explain. Both words are linked with a comparison with independent information, specifically, observed information. At the risk of being tedious, when I worked at NASA, there were different sub-cultures represented by those who made instruments and those who made models. Validation at NASA often defined the process by which people who took measurements from space assured that those measurements, say, measured temperature. This would require deploying different types of temperature measuring devices, like thermometers, to take concurrent measurements at the same place. The point here is that a new way to measure temperature was being evaluated with an accepted, established way to measure temperature. Within NASA, it was a widely held belief that models could not be “validated,” because in general there was not such a clean comparison to a standard of accepted knowledge. Hence, the word “evaluation” emerged as the way to state that the model was being compared with observed information.
I will ultimately maintain that models can be validated in a formal sense. This remains an assertion that many of my colleagues disagree with. While, I accept the nuances of evaluation and validation and testing, it is important that climate science embrace the rigor implied by “validation” of models. Before I go on, I provide a link to couple of papers that were provided to me by Doug Post some time ago. These papers drew largely from the experience in the U.S. National Laboratories responsible for assuring the robustness and safety of our nuclear weapons through the use of computational models (think about that application!). These papers generated transient discussion in the climate community: Computational Science Demands a New Paradigm and Software Project Management …..
As happens with my blogs, they sometimes, get a bit long, so in the spirit of the medium, I am going to search for the take away message. The existence of the semantic arguments concerning the words evaluation and validation suggest that defining measures of quality assurance of climate models is a difficult process. It is not uniquely defined. It depends on what you are trying to do, for example, predict El Nino or how the ocean melts the bases of glaciers in Antarctica. The evaluation or validation process also depends on how a modeling system performs. This system is constructed from sub-components developed by individual scientists, in practice, spread all over the world. The migration of individual components is sometimes performed by those reading the literature and reproducing work as described in the literature; local adaptation of algorithms is often performed.
As I stated above, there is a culture of checking in our field. Individuals check their work at multiple levels. But as the components are brought together, the ability to check gets more and more difficult. Remember, these pieces are, themselves, neither unique nor absolutely accurate. As we consider the Earth’s climate, the question becomes, what do we check against? We get to the question of the quality of measurements - just how good are they? And we get to the social problem that as the climate model is built from its pieces developed by individuals how do we define and codify a process that rises to the standard of validation? As this process becomes more and more complex, we are often moved to using the word “art” to describe the process of building models (see The Art of Climate Modeling ).
The final point that I want to make in this entry is that the culture of checking that scientists intuitively accept as individuals extends as an essential ingredient to the collective development of complex software systems. There are, in the development of these complex software systems, tensions that are not rationalized by convergent, deductive reasoning. These tensions might represent the choice of the quality of the oceanic circulation versus the quality of the atmospheric circulation in a computationally and human resource constrained environment. These tensions might abstract to conflicts between oceanographers and meteorologists, perhaps even the program managers that fund oceanographers and meteorologists.
The description and codification of a validation plan for a climate model, therefore, extends far beyond the definition of a set of observations that uniquely or adequately define the Earth’s climate. There are judgments and decisions that need to be made. There are tensions, perhaps conflicts, which need to be reconciled. There are even philosophical discussions about whether or not climate models can be validated. If the open innovation communities I am exploring in this series are to be realized, then the description and codification of the validation process is necessary. Beyond the narrow world of scientists, we need to be able to point to the elements and measurements of validation in order to provide the foundation of the use of models in mitigation, adaptation, and geo-engineering.
Figure 1. From Online Math Tutor.
Updated: 10:28 PM GMT am 25. Januar 2011
A A A
Open Source Communities, What Are the Problems? Open Climate Models (3)
Open Source Communities, What Are the Problems? Open Climate Models (3)
I want to return to the series that I started about community approaches to climate modeling. Just to help me get started I am going to repeat the last two paragraphs from the previous entry in the series. (#1 in series, #2 in series)
I managed large weather and climate modeling activities when I was at NASA. On a good day, I maintain that I managed successfully. When I was a manager I sought control, and I grimaced at some naïve ideas of community. My experience tells me that we need to investigate new ways of model development and model use. This need arises because the complexity is too large to control, and this is especially true as we extend the need to use climate models to investigate energy policy decisions and adaptation to climate change.
In the past decade we have seen the emergence of community approaches to complex problem solving. Within these communities we see the convergence of creativity and the emergence of solution paths. We see self-organizing and self-correcting processes evolve. Counter intuitively, perhaps, we see not anarchy, but the emergence of governance in these open communities. The next entry in the series will focus more on describing open communities.
Open Communities, Open Innovation: The past 10 years have seen the emergence of open communities that do things from build software, to collecting information about birds, to building large knowledge bases. An example that often comes to mind is Wikipedia. Wikipedia represents an immense knowledge base. Experts (and not) can write and modify entries. And while anyone can modify the entries that does not mean that there is complete anarchy. There are rules of governance, that in this case translates to editorial standards that assure some level of evaluation of information and affirms some level of accuracy. Such a standard is exemplified in, for example, Wikipedia’s policy of no original research. Wikipedia is even evolving as a place to provide documentation about Earth system modeling infrastructure.
Open communities also include efforts to build software. One of the most famous examples is the development of the computer operating system Linux. Another example of software development is the Apache Foundation. The Apache Foundation represents many software projects, and from their website is “not simply a group of projects sharing a server, but rather a community of developers and users.” “The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field.” If you explore these websites the community is open, but there are rules and values that are shared by those working in the community. There is a process by which individuals contributions migrate into the products that are branded and provided by the community. That is, there is a governance model.
The two previous paragraphs are examples of two types of community approaches, and there are other types of communities such as Project Budburst and the Encyclopedia of Life. There are grassroots communities such as the atmospheric chemistry community GEOS-CHEM. Some communities have been remarkably successful. They inspire and harvest creative solutions to complex problems. They provide a culture in which ideas and solutions converge and emerge; they contain the attributes of being self-organizing and self-correcting. And in many cases people contribute to these communities without what is traditional compensation; that is, they do it for free.
What is the motivation to participate in such a community for free? And are such communities sustainable and reliable? The participation without being paid is contrary to the intuition of traditional managers. There are people who study the motivation and governance of communities, for example, Sonali Shah and Matthias Stürmer. Some who are motivated by contributing to knowledge, and others by making their mark in some large effort. Others are motivated because they need something that is otherwise not available, and the existing efforts in the community provide the foundation for filling that need. In this case the participation in the community lets them do something that is not otherwise possible. A reason that I often find amongst scientists is the feeling that there are certain tools that should be free, and therefore, they are willing to spend the time to make the tool free, with the expectation that there are others who will also contribute their efforts. Within the federal research community, there is often the value that if tax dollars paid for the generation of data or knowledge, then that data or knowledge should be as widely available as possibly (see National Institutes of Health Public Access Policy). In the same vein, sponsors of research are constantly advocating more community interaction in order to enhance capabilities and, potentially, reduce unneeded duplication of efforts.
Community-based approaches and open access to information are concepts that have been around for far longer than what we might call the internet age. Paul Edwards in his book A Vast Machine talks about the emergence of the need to share information in the study of weather because of the common need to share observations in order for weather forecasts to be useful. Throughout my career at NASA, we would occasionally be asked to do model experiments of what would happen if we (the U.S.) or some other country decided to start charging for all or part of weather data. Sometimes the studies were motivated by – if “our” weather data is “so” important, then others should be paying for it. Well, it turns out everyone’s data is important; for forecasts to be good in the U.S. we need to know what is happening in Canada and out in the Pacific Ocean. So we benefit from open access to the basic information about the Earth’s environment.
I have been exploring the need for open community approaches to addressing climate change in general. The subject of the current set of articles is climate models, and whether or not we could have climate models that are not only accessible, but that could be correctly configured and run by a wide-range of what might be non-expert customers of climate information. To note once again, there are numerous climate models that are accessible, and which can be altered and run by the user, for example, the Community Earth System Model. These models require highly specialized expertise and computational resources.
Sticking with just the focus on climate models, arguably the open source software communities named above provide what might be called an existence criterion. That is, there is the existence of a solution. With this existence, there seem two questions to motivate how to go forward.
1) What are the important elements of successful open source development communities that would be required in an open innovation climate modeling community?
2) What are the similarities and differences of climate modeling to these communities that might help to advance or prohibit the development of a broader, more inclusive, climate modeling activity? Or stated in another way: is climate modeling in some way unique?
I have already hinted at one of the elements of successful software communities - there must exist, namely, governance. When I first started discussing open communities with my manager colleagues in national laboratories, their first response was that climate models could not be developed, evaluated, and implemented in an uncontrolled, anarchist environment. In case you have forgotten, I started this blog with the statement that I was a government manager, and I felt that control was important to me to deliver evaluated systems on time and within budget. It is important to realize, to inculcate, that open communities are not ungoverned, and if they are functional, they are not anarchist. So the development of governance approaches is an essential element; one that will be addressed more fully in future entries.
Approaching the second question posed above, how is climate modeling different from the software developed in the successful software communities mentioned above? One difference is the need to express complex phenomena with quantitative, scientific expressions. In an earlier entry I posed you could image a climate model by posing the following questions: If you were to look around at the clouds, sky, the plants, the people, the landscape, the streams, and ask the question – how do I represent these things as numbers? How do I represent how these things will change? How do I represent how these things interact with each other?
If you imagine developing the operating system for a computer, there are certain well defined tasks that need to be done, and it is possible to check with some precision whether or not you have accomplished the task. In climate modeling such precise definition is not possible, which means there is always an element of scientific judgment that is needed in the evaluation of whether or not the development of a component or sub-component has been successful. And, there is no reason to expect that combining successful sub-components and components yields a functioning climate model. Some would state that building, evaluating and deploying a successful climate model is not just a matter of building software, but it is a combined science-software activity. There is concern that community approaches that have been successful for task-oriented software projects cannot adequately incorporate the scientific integrity needed for proper climate model evaluation. This need to maintain science-based evaluation is perhaps the most formidable hurdle that must be addressed, not only, towards the ambitious goal I outline of configurable models for use by non-experts, but even for broader inclusion of the expert community.
I will end this entry here. Note a couple of new things below.
Another Big Flood
There have been a lot of big floods in the past year. Now we have the record flood in Australia (a great summary in the Boston Globe). I argued that the 2010 flood in Pakistan brought together people, geography, societal assets, wealth, weather and climate in a way that it was a case study in a climate disaster. So does the Australian flood, but it is, perhaps, on the opposite side of the scale.
Figure 1. From the Australian flood. Taken from the excellent summary at the Boston Globe.
Pakistani Flood Relief Links
Doctors Without Borders
The International Red Cross
MERLIN medical relief charity
U.S. State Department Recommended Charities
The mobile giving service mGive allows one to text the word "SWAT" to 50555. The text will result in a $10 donation to the UN Refugee Agency (UNHCR) Pakistan Flood Relief Effort.
Portlight Disaster Relief at Wunderground.com
An impressive list of organizations