by Terry Oldberg
This essay continues the argument which I initiated in Part I. To summarize, in Part I, I described a kind of model that was a procedure for making inferences. One kind of inference was a prediction from a known state of nature called a “condition” to an uncertain state of nature called an “outcome.” Conditions and outcomes were both examples of abstracted states. I pointed out that sets of conditions of infinite number could be defined on the Cartesian product space of a model’s independent variables and that each of these sets defined a different model. Thus, models of infinite number were candidates for being built.
Using this description of a model, I posed the problem of induction. As I posed it, this problem was to identify that unique model for which no inference made by it was incorrect. The correct inferences were identified by the principles of reasoning. The problem of induction was to identify these principles.
I introduced a terminological convention under which conditions were called “patterns” when they belonged to that unique model for which no inference made by it was incorrect.
I pointed out that events were of two kinds. “Observed” events were a product of observational science. “Virtual” events were a product of theoretical science. A model could be built from observed events or virtual events or both kinds of events.
A path toward a solution
A path toward a solution for the problem of induction opens up when one realizes that when we make a deductive inference to the outcome of an event of specific description, sometimes we err. In a sequence of statistically independent observations, the relative frequency with which we do not err can be measured. As the number of observations increases, sometimes we observe that the numerical value of the relative frequency gives the appearance of approaching a stable value. This value is called the “limiting relative frequency.” The limiting relative frequency is the empirical counterpart of the theoretical entity which we call “probability.”
The advent of the probabilistic logic
Writing the 16th century, Girolamo Cardano took this path. In doing so, he provided the first description of the mathematical theory of probability. Though Cardano could not have known it, for measure theory would not be described for another 4 centuries, a probability was an example of a measure. In particular, it was the measure of an event.
In its mathematical aspects, probability theory was rooted in the deductive branch of logic. However, in its logical aspects probability theory potentially extended beyond the deductive logic and through the inductive branch of logic.
With it extended in this way, I’ll call this logic the “probabilistic logic.” In the deductive branch of logic, every proposition is associated with a variable that is called its “truth value.” The truth value takes on the values of true and false. In the probabilistic logic, the truth value of the proposition that an event of a particular description will be observed is replaced by the probability of this proposition. When the values of the probability are restricted to 0 or 1, this logic reduces to the deductive logic, for 0 for the probability corresponds to false and 1 to true. Otherwise, this logic is inductive.
The probabilistic logic having been described, I’ll proceed with the task of describing its principles of reasoning.
The law of non-contradiction
To commence discovery of the principles of reasoning, I’ll observe that the law of non-contradiction is a principle of reasoning. Unlike the other principles of reasoning, it cannot be derived. Rather, it serves as a part of the definition of what is meant by “logic.” By the definition of “logic,” a proposition is false if self-contradictory.
The principle of entropy maximization
Acting intuitively, Cardano stumbled into making an application of a principle of reasoning before this principle was articulated. This was the principle of entropy maximization.
When a model makes an inference to the numerical value that is assigned to the probability that an event of a particular description will be observed, inferences of infinite number are candidates for being made. Each of these inferences corresponds to a different numerical value for this probability. Which of these inferences is correct? This is a question that is posed by the problem of induction
In the context of this question, it can be proved that the quantity which we call the “entropy” is the unique measure of an inference from state space A to state space B in which A contains a single state and this state is abstracted from the states in B.
Additionally, it can be proved that if and only if the states belonging to B are at the level of least abstraction, then the entropy possesses a maximum. I’ll call the states at the level of least abstraction the “ways in which an abstracted state can occur” or “ways” for short. In the roll of a pair of dice, there 36 ways in which an abstracted state can occur of which 2 ways are associated with the abstracted state (1, 2) OR (2, 1). Here, (1, 2) signifies that 1 dot is facing upward on die A and 2 dots are facing upward on die B while (2, 1) signifies that 2 dots are facing upward on A while 1 dot is facing upward on B.
In view of the uniqueness of the entropy as the measure of this kind of inference and provided that the states of B are examples of ways, the inductive question can be answered by optimization.
In particular, that inference is correct which maximizes the entropy, under constraints expressing the available information. As will be shown later, the entropy of the inference is the missing information in this inference about the state in B for a deductive conclusion about this state, given the state in A. Maximization of the entropy pulls the missing information upward. The constraints push the missing information downward by the amount of the available information. The constraints are applied mathematically but they reflect information collected by observations made in nature.
The principle that the model builder should maximize the entropy under the constraints is called the “principle of entropy maximization.” It joins the principle of non-contradiction as one of several principles of reasoning for the probabilistic logic.
Cardano’s interest was in modeling games of chance. The designs of these games defined state-spaces containing the ways in which an abstracted state could occur. As I’ve already pointed out, in the throw of a pair of dice, there are 36 ways.
Though unaware of the principle of entropy maximization, Cardano maximized the entropy without constraints. This procedure uniquely identified an inference to the numerical values that should be assigned to the probabilities of the ways. Maximization of the entropy without constraints assigned equal numerical values to the probabilities of the ways. Thus, for example, it assigned 1/36 to the probability of each way in which an abstracted state could occur in a throw of two dice. By this assignment, Cardano’s model provided no information to the user of this model about the way in which an abstracted state would occur. A game that had this characteristic of providing no information was called “fair.”
The probability of an abstracted state was the sum of the probabilities of the ways in which it could occur. Thus, for example, the probability of the abstracted state (1, 2) OR (2, 1) was 1/36 + 1/36.
Let t designate the number of ways in which an abstracted state can occur. Let f designate the number of ways in which this abstracted state cannot occur. Let Pr designate the probability of observing the event of the abstracted state. By entropy maximization without constraints one builds the inference that
Pr = t / (t + f) (1)
In equation (1), t is the frequency of virtual events of a particular abstracted state while t + f is the frequency of virtual events of any description.
Tying the probabilistic logic to observational science
In arriving at the rule that equal numerical values were assigned to the probabilities of the ways in which an abstracted state could occur, Cardano had made a purely theoretical argument. In order for the probabilistic logic to be tied to observational science, a mathematical procedure had to be found by which observed frequencies influenced assignments of numerical values to the probabilities that events of various descriptions would be observed. In the 18th century, the mathematician Thomas Bayes and mathematical astronomer Pierre-Simon Laplace independently responded with a proposed solution. Their solution lay in the theorem from probability theory that became known as “Bayes’ inverse probability theorem.” Purportedly, by this theorem the existence was proved of a function that mapped event-descriptions plus the frequencies of these descriptions in observed events to the probability values of these descriptions. This function became known as the “posterior probability distribution function” (posterior PDF).
However, there was a catch. The catch was that an input to the posterior PDF was the “prior PDF.” The latter was similar to the posterior PDF but differed in the respect that it supposedly was known in the absence of observational data.
How could the prior PDF be known when there was no empirical evidence? In effect, Bayes and Laplace argued that the probabilities of the prior PDF must have equal numerical values because there was insufficient reason to argue otherwise. The logician John Venn countered that the choice of equal probability values was arbitrary thus violating the law of non-contradiction.
The followers of Venn sought means for accomplishment of the same task that required no prior PDF. The solution which they found was called “frequency probability.” This was a definition of the word “probability” under which the probability of observing an event of given description was identical to the limiting relative frequency of this event description. For the followers of Venn, probability was not just the theoretical counterpart of the limiting relative frequency; probability was the limiting relative frequency! This was a purely empirical definition of “probability” and stood in stark contrast to Cardano’s purely theoretical definition.
The followers of Bayes and Laplace became known as “Bayesians.” The followers of Venn became known as “frequentists.” The conflict between the two camps continues to this day.
From reading the literature one might get the impression that in building a model it is necessary to choose between the approach of the Bayesians and the approach of the frequentists. However, to choose between these two is logically unpalatable for both approaches violate the law of non-contradiction.
Excepting special circumstances, the critique of Bayesianism that is offered by the frequentists is accurate: the selection of the prior PDF really is arbitrary. However, frequentism has an element of arbitrariness of its own.
Under frequentism, supposedly a model exists in nature which is complete save the numerical values of its parameters. This supposition is wrong, for in nature there are not parameterized models for us to observe. Models belong to the theoretical world and not to nature. In nature, we observe events and their descriptions. We do not observe parameterized models. Thus, the choice of parameterized model is arbitrary, violating the law of non-contradiction.
Bayesianism fails to tie the probabilistic logic to observational science in a logical manner. Frequentism has the same failing. Fortunately, there is a third alternative that makes this tie in a logical matter. This tie is the principles of reasoning.
The advent of thermodynamics
In the 19th century, the physicist Rudolf Clausius found, in data on the efficiencies of steam engines, a previously unrecognized property of matter. Clausius called this property “entropy.” The entropy became a key ingredient of the theory of heat which became known as “thermodynamics.” Under the first law of thermodynamics, the energy of a closed system was conserved. Under the second law, the entropy of a closed system rose to a maximum.
Later in the same century, Ludwig Boltzmann and Willard Gibbs discovered, in effect, that the entropy was the measure of an inference to a state-space whose states were the ways in which the abstracted state called the “macrostate” could occur. The ways became known as “microstates” for they described a chunk of matter in microscopic detail. The entropy of the inference to the microstate of a chunk rose to a maximum under the constraint of energy conservation. In this way, the second law of thermodynamics was an application of the principle of entropy maximization.
Measure theory
Early in the 20th century, Henri Lebesgue published measure theory. Under the theory, a measure was a real valued function on a class of sets. Measure theory had a pair of precepts. They were:
- the measure of an empty class of sets was nil and,
- the measure of disjoint sets was the sum of the measures of the individual sets.
Entropy and information
Measure theory forges a link between the ideas of entropy and information. It follows from the precepts of measure theory that if A is a set and B is a set, the measure of the set difference B – A is the measure of B less the measure of the intersection of A with B.
Now, let A designate a state-space and let B designate a state-space. Let the set difference B – A designate the inference from A to B. From the precepts of probability theory it can be proved that the conditional entropy is the unique measure of this inference in the probabilistic logic.
Under measure theory A, B, and the intersection of A with B have the same measure as B – A. The measure of A is the entropy of A. The measure of B is the entropy of B. The The measure of the intersection of A with B is the information about the state in B given the state in A as the word “information” is defined by the developer of information theory, Claude Shannon. Conversely, the measure of this intersection is the information about the state in A given the state in B.
It follows from the semantics that were imparted to the word “information” by Shannon that the conditional entropy of the inference from A to B is the missing information in this inference for a deductive conclusion about the state in B given the state in A. Provided that A does not intersect with B, knowledge of the state in A provides no information about the state in B; under this circumstance, the conditional entropy of the inference from A to B reduces to the entropy of this inference. Thus, the entropy of the inference is the missing information in this inference for a deductive conclusion about the state in B. Whether or not knowing the state in A provides information about the state in B, the measure of the inference from A to B is the missing information in it, for a deductive conclusion about the state in B given the state in A and the missing information is the unique measure of this inference. This finding is of great significance for philosophy as I’ll soon show.
Optimization
That the missing information in an inference exists as the measure of an inference and that this measure is unique has the consequence that the problem of induction can be solved by optimization. There is a “principle of entropy maximization” which I’ve already covered. There is a “principle of conditional entropy minimization” which I’ll cover immediately below. There is a “principle of maximum entropy expectation” which I’ll cover later. Each of these principles satisfies the law of non-contradiction and is consistent with all of the other principles of reasoning for the probabilistic logic.
The principle of minimum conditional entropy
From part I, please recall that a model makes an inference from a known condition in a set of conditions to the uncertain outcomes of statistical events. Sets of conditions of infinite number are possibilities. Working in concert with all of the other principles of reasoning, the principle of minimum conditional entropy selects that unique set of conditions which maximizes the information about the outcome, given the condition. By terminological convention, these conditions are called “patterns.”
The principle of maximum entropy expectation
In the discovery of the set of patterns, the model builder must compute the conditional entropy that is associated with each set of conditions in the class of sets of all possible conditions. Before calculating the conditional entropy corresponding to a particular set of conditions, it is necessary to assign numerical values to the probabilities of the conditions and to the probabilities of the outcomes, given the conditions. Under the language of mathematical statistics, a certain procedure for doing so is both the “unbiased estimator” and the “maximum likelihood estimator.” This procedure makes the assignment
Pr = x / n (2)
where x is the frequency of observed events of a particular description and n is the frequency of observed events of any description. Note that equation (2) makes an inference that is based entirely upon the frequencies of observed events. In this respect, equation (2) differs diametrically from equation (1), which makes an inference that is based entirely upon the frequencies of virtual events. The inference that is made by equation (1) belongs entirely to the theoretical world while the inference that is made by equation (2) belongs entirely to the empirical world. Generally the two inferences differ in the assignments that they make. That they differ violates the law of non-contradiction.
If equation (2) is used with conditional entropy minimization, the effect is to discover patterns for which each pattern is based upon a single observed event. The finding of the existence of each pattern lacks statistical significance and thus the model fails when tested. The failure is observable as a disparity between predicted probability values and observed relative frequency values. The failure can be traced to presumption by equation (2) of more information than is available in the observed events.
Entropy maximization provides a route of escape from this predicament. The possibility of applying the principle of entropy maximization to the problem of assigning a numerical value to Pr arises from the following set of facts. In 1 trial of an experiment, the relative frequency of a particular event-description will be 0 or 1. In 2 trials, the relative frequency will be 0 or ½ or 1. In N trials, the relative frequency will be 0 or 1/N or 2/N or…or 1. Note that the distance between adjacent values of the relative frequency possibilities is 1/N signifying that the relative frequency possibilities are evenly spaced in the interval along the real line that lies between 0 and 1.
Now let N increase without limit. The relative frequencies become limiting relative frequencies. The limiting relative frequency possibilities are 0, 1/N, 2/N,…,1. As they are states at the level of least abstraction, these possibilities are examples of ways in which an abstracted state can occur. It follows that the principle of entropy maximization can be applied to the problem of identifying the correct inference to the set of limiting relative frequency possibilities. Under this principle, the entropy of the inference to the state-space { 0, 1/N, 2/N,…,1 } is maximized, under constraints expressing the available information. As it turns out, maximization of the entropy provides a unique answer to the question of the prior PDF over the values in the state-space { 0, 1/N, 2/N,…,1 } and in this way, violation of the law of non-contradiction by the use of Bayes’ theorem is avoided. This invention of Ron Christensen exploits a loophole in the generalization that the prior PDF is arbitrary.
Given the existence of the prior PDF, the posterior PDF is determined by the observed frequencies of events of particular descriptions through Bayes’ theorem. The numerical value which is assigned to the probability of a condition or to the probability of an outcome, given a condition, is then the expected value of the posterior PDF.
The forms of the prior and posterior PDF’s vary dependent upon the kinds of information that are available. In practice, they often take on the form of a Beta distribution. In this circumstance, maximum entropy expectation makes the assignment.
Pr = (t + x) / (t + f + n) (3)
In equation (3), note that the value assigned to Pr lies between the purely theoretically based value of equation (1) and the purely empirically based value that is produced by equation (2). By equation (1), the information about the way in which an abstracted state will occur is nil. By equation (2), this information is overstated. As the value that is assigned by equation (3) lies between the value that is assigned by equation (1) and the value that is assigned by equation (2), there is the possibility of correctly representing the information that is available. In particular, t + f can be viewed as a parameter of the model and empirically tuned to that value for which the available information is correctly represented. In an elaboration of this idea, the virtual events that are generated by a mechanistic model (for example, a mechanistic climate model resembling today’s IPCC climate models) are conceived to provide an additional constraint on entropy maximization and these virtual events are weighted such that the information content of them is correctly represented.
These are the ideas underlying Christensen’s principle of maximum entropy expectation. This procedure is unique in representing the available information correctly. By defining a unique procedure for assignment to Pr, the principle of maximum entropy expectation avoids violation of the law of non-contradiction.
Reduction to the deductive logic
To recapitulate, I’ve provided an argument for the existence of principles of reasoning for the probabilistic logic. These principles are:
- The law of non-contradiction,
- The principle of entropy maximization,
- The principle of conditional entropy minimization and,
- The principle of maximum entropy expectation.
You should understand that my claim is that these are the principles of reasoning for the deductive branch of logic as well as for the inductive branch of the probabilistic logic. These principles relate to traditional thinking about the deductive logic by constructing a model that makes the arguments known as Modus Ponens and Modus Tollens.
The nature of demonstrable knowledge
An interesting sidelight to discovery of the principles of reasoning for the probabilistic logic is that this discovery provides a logical definition for the Latin word scientia meaning “demonstrable knowledge” in English. The English word “science” is rooted in scientia. Thus, provision of a logical definition for scientia clarifies what one means by “science.”
Under its definition in the probabilistic logic, scientia is the information about the outcomes of events, given the associated patterns. Through conformity to the principles of reasoning in the construction of a model, the maximum possible scientia is created from fixed informational resources. The scientia is created by the discovery of patterns. Thus, the role of the scientific investigator is to support pattern discovery.
Neither the Bayesian approach to the construction of a model nor the frequentist approach discovers patterns. Thus, both approaches are stumbling blocks for rather than helpmates for the scientific investigator.
Empirical support
As logic is a science, its principles are subject to falsification by the empirical evidence. There seems to be general agreement that the deductive branch of logic has frequently been tested but has never been falsified by the resulting evidence. Similarly, the inductive branch of the probabilistic logic has frequently been tested but has never falsified by the resulting evidence.
In its application to the second law of thermodynamics, the principle of entropy maximization is continuously tested in the real world by the machines and processes which engineers construct on the assumption that entropy maximization is a law of nature. If a machine or process were to be discovered that violated entropy maximization, the principle of entropy maximization would be falsified by the evidence.
In communications theory, the information rate of a communications channel is the entropy of the encoder less the conditional entropy of the decoder. The information rate is maximized by maximizing the entropy of the encoder; this is an application of the principle of entropy maximization. The information rate is maximized by minimizing the conditional entropy of the decoder; this is an application of the principle of conditional entropy minimization. Virtually all modern communications channels incorporate encoders and decoders. There are huge numbers of these devices and they are used continuously. If an encoder were to be discovered whose performance exceeded the maximum entropy limit, then the principle of entropy maximization would be falsified by the evidence. If a decoder were to be discovered whose performance exceeded the minimum conditional entropy limit then the principle of conditional entropy minimization would be falsified by the evidence.
Tests of maximum entropy expectation are reported in the works of Christensen and his colleagues. Tests of the entire set of principles of reasoning for the probabilistic logic are reported by Christensen in a 1986 paper (Int J. General Systems, 1986, Vol 12, 227-305). In the paper, Christensen compares the performance of all models for which, at the date of writing, there was a model that had been constructed under the principles of reasoning for the probabilistic logic and at least one model that had not been constructed under these principles from the same data. In every case, the model that had been constructed under the principles of reasoning exhibited superior performance. In some cases, the degree of outperformance was of an astounding order of magnitude.
A fuzzy logic
In the development of my topic, I have tacitly assumed that the elements of partitions of Cartesian product spaces are crisply defined sets. By this assumption, the probabilistic logic is created. By relaxation of this restriction, a generalization from the probabilistic logic can be created with principles of reasoning that are similar to those of the probabilistic logic. This example of a “fuzzy logic” has uses in model building when it is necessary for the abstracted states in a partition of the Cartesian product space of the independent variables or of the dependent variables of the model to be ordered.
Applications in meteorology
In meteorology, models built under the probabilistic logic and its principles of reasoning have consistently outperformed models built under the method of heuristics; the outperformance has been of an astounding order of magnitude. In the year 1978, a team that included an engineer-physicist (Christensen), a criminologist and a psychologist but no meteorologists set out to try to build the first long range weather forecasting model. Initially, the plan called for generation of virtual events by a general circulation model (GCM). However, this facet of the plan had to be aborted when it was discovered that the GCM was numerically unstable over periods of prediction of more than a few weeks.
In the construction of the model, more than 100,000 time series were examined for their potential for providing information about the outcome. These time series included ones that provided measurements of tree ring widths, stream flows, sea surface temperatures, atmospheric temperatures, atmospheric pressures, Palmer drought severity index and other measurements.
In 1980, Christensen et al published the characteristics of the first weather forecasting model to be built under the principles of reasoning for the probabilistic logic. At the time, centuries of meteorological research using the method of heuristics had made it possible to predict the weather with statistical significance no more than 1 month in advance. By using publically available data but by replacement of the method of heuristics by the principles of reasoning, Christensen et al extended the time span over which precipitation could be forecasted to 12 to 36 months. This was a factor of 12 to 36 improvement and this improvement produced the first successful long range weather forecasting model. Later, replacement of the method of heuristics by the principles of reasoning was tried in building long-range weather forecasting models for all of the far Western states of the U.S.; the outcomes of these models included whether or not the spatially and temporally averaged temperature at Earth’s surface would be above the median as well as whether the spatially and temporally averaged precipitation would be above the median. The resulting models exhibited startling improvements in performance over heuristically based models ( Int J. General Systems, ibid).
A barrier to application
There is a barrier to construction in IPCC climatology of a model under the probabilistic logic and its principles of reasoning. This is that, as IPCC climatology has thus far been conceived, it has many fewer observed events than meteorology.
In the work of Christensen et al that produced the first long range weather forecasting model, the weather was averaged over 1 year. Time series extending backward in time to the year 1852 provided 126 independent observed statistical events. By comparison, it is not unusual for an epidemiological study to have 100,000 independent observed events at its disposal.
For IPCC climatology, the situation is much worse than for meteorology because the averaging period is much longer. If the averaging period had been 10 years, Christensen et al would have had only 12 observed events at their disposal. If the averaging period had been 20 years, they would have had only 6 observed events and if the averaging period had been 100 years, they would have had only one observed event. In any case, there would have been far too few observed events for patterns to be discovered. The attempt at building a long range weather forecasting model would have failed.
In order for a successful climatological model to be built under the probabilistic logic and its principles of reasoning, it would be necessary to have available many more observed events than 1, 6 or 12. In order for this to happen, it would be necessary reorient the plan for the research research such that it yielded these events. This would result in the model being based upon paleoclimatological data; a consequence would be that it would be necessary to use a proxy for the surface temperature rather than the surface temperature itself.
Separating climatology from politics
Some of us favor separation of climatology from politics. The grounds for separation are suggested by the slogan that “science is value-free.” On the other hand, politics is value-laden.
How to go about making a field of inquiry value free is not completely obvious. Interestingly, an argument from the probabilistic logic establishes a procedure which, if followed, would make climatology value-free. Under the probabilistic logic an inference has a unique measure that is unrelated to values such as the costs or benefits from regulating CO2 emissions. Under the probabilistic logic, the measure of an inference is uniquely the missing information in this inference for a deductive conclusion. By religiously measuring inferences by the missing information in them, climatologists would make their field value-free.
Resources for further learning
Readers who wish to learn more about my topic will find a bibliography at Ron Christensen’s web site and at my web site. The tutorial at my Web site could be helpful. For readers with serious interest in the topic, I recommend engagement of a competent tutor over trying to come up to speed by reading the literature as tutoring would be far more cost effective than unguided study.
Terry was so busy working on Part II that he didn’t reply to comments on Part I; he is now available for discussion on both Parts I and II.
I’d like to have a PDF version of Terry’s article. Instead of converting these Web pages in Judith’s blog, may one find a PDF version somewhere else?
Including part I and II, I mean.
Regarding your statement “…Christensen et al extended the time span over which precipitation could be forecasted to 12 to 36 months.” (In 1980)
So you folks claim to have been able to predict precipitation 12 to 36 months out for the last 30 years. Where can we see that having been done over an extended period of time? Who is running this model?
David Wojick
I noted that W.J.R. Alexander has been predicting droughts in Southern Africa based on the 22 year Hale magnetic solar cycle. The three lowest years before a sunspot minimum had average flow of 52. The three following years average was 300. i.e. a 577% increase.
Alexander WJR, Bailey F, Bredenkamp DB, van der Merwe A and Willemse N, (2007). Linkages between solar activity, climate predictability and water resource development. Journal of the South African Institution of Civil Engineering, Vol 49, No 2, June 2007, pages 32-44, paper 659.
The likelihood of a global drought in 2009-2016 Alexander, W J R, Civil Engineering (Yeoville). Vol. 16, no. 6, pp. 22-26. June 2008
Development of a multi-year climate prediction model
WJR Alexander ISSN 0378-4738 = Water SA Vol. 31 No. 2 April 2005
And you believe this is really predictable? For a region this small?
What “small” region did you mean?
Alexander addresses the statistics on one river basin in his linkages (2007) paper. See Table 3.
Alexander addresses “The likelihood of a global drought in 2009-2016″
How do you compare Alexander’s 125 year statistics versus IPCC GWM “projections”?
I mean Southern Africa. I do not believe that the Hale magnetic cycle or anything else makes rainfall predictable at this scale. These are chaotic processes so intrinsically unpredictable. I have no faith in GCM’s either but whether global precipitation is chaotic or not on decade to century scales is still an open research question. It is probably the most important question in climate science.
I successfully predicted the 1983 drought in Swaziland based on my study of the regional cycles in the summer rainfall area as a necessary part of my work. It saved the town of Siteki on the Lubombo mountain as the hospital had spent 4 years improving their water collection and storage at my behest. In 1983 it was the only place in town with water and was able to remain open. Only cyclone Demoina save the maize crop by dumping rain once in January. Much of RSA was not so lucky.
At the moment Southern Africa in the summer rainfall area is coming to the peak of the wet end of the cycle (two more years). Sow and reap while it lasts!
The winter rainfall area (Cape Town) has a 10 year cycle for which there is 400 years of data: a clear sine wave.
David:
1) When you say “you folks claim” you mix up the provenance of the claim. This claim is made by the folks who did the study, among which I am not.
2) I’m not aware of a published study covering the past 30 years.
3) When the model was built, the database of 126 observed events was randomly split into 2 equally populated parts. One of these parts was used in the construction of the model. The other part was used in the validation of the model.
In the validation, predictions were made of “wet” or “dry” in 49 of the 63 observed events. In the remaining 14 events, a prediction could not be made because no pattern was matched. The predictions were correct in 63% of the 63 observed events; this yielded a confidence level of 94% that the result was not due to random chance.
27 of the observed events were classified as “extremely wet” or “extremely dry” years. In these events, The predictions were correct in 88% percent of the observed events; this yielded a confidence level exceeding 99.8% that the result was not due to random chance.
Source: Final report for OWRT Contract #14-34-001-8409, September 5, 1980.
This one test is hardly a credible basis for claiming to be able to predict rainfall 12 to 36 months out. Ant other evidence? Is this model running anywhere now?
David:
Your paragraph contains a strawman argument and a question. The strawman argument is that someone is climing to be able to predict rainfall 12 to 36 months out. The actual claim is to be able to predict rainfall 12 to 36 months out with statistical significance at stated levels. This claim is supported by the evidence. Now regarding your question, I don’t know the answer.
I admit that I did not read the whole argument very carefully, but it seems to me to be obvious that the argumentation has a fundamental error.
Calculating the entropies requires the choice of the measure, but choosing the measure is an arbitrary choice equivalent to the choice of the prior PDF in the Bayesian approach.
It is not possible to escape this arbitrariness. It is only possible to state it in different ways.
Pekka:
Entropy is an example of a measure. Under the probabilistic logic, it is the unique measure of an inference. In view of the existence and uniqueness, we can distinguish the one correct inference from the many incorrect ones by measuring the buggers and identifying as correct the one that is of greatest or least measure. This process is not arbitrary.
Hmm…. What are the units you are using to designate specific levels of entropy? How do you derive a numeric value for a specific argument? Do you use numeric values to compare different arguments in question? If not, how do you value one against another? I guess at this point I see that value judgments should be made to derive a final best model but am missing a description of how to rationally assign relative levels of entropy. Surely it is not just a matter of what feels right.
Gary:
Each inference that is a candidate for being made by a model has an entropy that is uniquely determined by the numerical values which are assigned to the probabilities of the states in the state-space to which these inferences are made. The ordering of these candidates is independent of the unit of measure of the entropy. Hence, the ordering of the candidates is by actual measurement of the various candiates and not by what “feels right.”
Perhaps an example will help. If a basketball player had only the single measure of height, the coach could identify the best candidate for the school basketball team by measuring each of the candidates with a tape measure and selecting the candidate of greatest height. For real human candidates for a basketball team, this procedure would not work, for human candidates have many measures in addition to height. However, an inference has only a single measure.
For an inference, selecting the candidate of greatest entropy has the effect of maximizing the missing information. This having been accomplished, we drive the missing information downward by the exact amount of the available information. This results in the missing information being accurately represented by the model.
I’m sorry Terry but it does not help to give an example and then conclude that it would not work in the real world.
Gary:
I don’t know what works for you but thought the basketball player analogy might help.
This is not true. There is no unique way of determining entropy. As the posterior PDF depends on prior PDF in Bayesian approach, one the measure of entropy depends on a basic measure determined arbitrarily.
This is very well known in physics and I cannot see any way of avoiding the same issue in your case. You are trying to reach an impossible goal and as in other cases, where people try that, you appear to have just made a trivial error.
Pekka:
In this set of circumstances, the terminology in which there is a “prior PDF” and a “posterior PDF” is misleading. In the procedure called “maximum entropy expectation,” there is a feedback mechanism by which the prior PDF is dependent upon by the posterior PDF. That the prior PDF is dependent upon the posterior PDF makes the “prior” and “posterior” terminology misleading. Because it is misleading, I use this terminology with misgivings and only because it is the traditional way of labelling the two PDFs.
The prior PDF is dependent upon the numerical value of the parameter which is designated by “t + f” in equation (3); Christensen calls this parameter the “a priori weight normalization.” The feedback mechanism homes in on a numerical value for the a priori weight normalization that minimizes the informational difference between predicted probability values and observed relative frequency values in a sample that is independent of the one from which the predicted probability values were extracted.
The prior PDF is dependent upon the a priori weight normalization which is dependent upon the informational difference which is dependent upon the posterior PDF. Thus, the prior PDF is dependent upon the posterior PDF making the “prior” and “posterior” terminology misleading. Also, by the feedback mechanism the prior PDF is uniquely determined by the observational data; in view of this uniqueness, the selection of the prior PDF is not arbitrary.
Terry,
We have a problem.
The laws of themodynamics does not include centrifugal force. This is the energy the planet generates to counter gravity(electro-magnetics) and atmospheric pressure.
Solar heat can ONLY be absorbed, reflected or a combination. Alone without planetary rotation, it is like the moon with no atmosphere.
Stop the planet dead and energy at 1669.8km/hr will sheer off the planet from the atmosphere being pulled around by this planet.
So going by current theories, they are incorrect as the presure build-up is generating all kinds of nifty things that cannot be explained by AGW.
Joe:
By my use of the term “entropy,” you might be getting the idea that my “entropy” is defined on the set of a system’s microstates as it is in thermodynamics. However, while my “entropy” is a measure on a set of states these states are not necessary microstates. The state-space {centrifugal force absent, centrifugal force present} has an entropy, for example. The notion of “entropy” is perfectly capable of measuring inferences about centrifugal forces.
I got lost around “Entropy and information.” This would require probably more time than I have to get my arms around.
Jim:
To get a good feel for the entropy and information argument you need to spend a few days proving theorems for yourself. If you’ve not done that, the notion that the set difference between two state-spaces is an inference from one of these state-spaces to the other sounds really wild!
Terry, are there any text books that I could use as a better foundation? Preferably, one.
Terry’s website has tutorials. Let us know if you become a believer.
Probability Theory: The Logic of Science (Vol 1) by E.T. Jaynes
the reviews on amazon.com are quite amazing, i’ve also googled jaynes, i’m convinced, i’ve ordered the book.
See also:
Unofficial Errata and Commentary for E. T. Jaynes’s Probability Theory: The Logic of Science
Since the book was incomplete at Jaynes’ death there are plenty of errors, and, as the amazon reviews say, it gets a little rough towards the latter chapters. Good reason to work the examples out for yourself. Having said that; it is still highly recommended.
I consider his insightful take on Converging and Diverging Views worth the cost of the book.
Jim:
“Introduction to Information Theory” by Farolla Reza would take you part way there. There is the need for a book that starts with measure theory, derives probability theory from measure theory, derives information theory from probability theory, proves that the the missing information is the unique measure of an inference and from this basis solves the problem of induction. This would be fairly easy to do but I’m not aware of a book or article that does the whole job. I’m able to do the whole job and would write the book if I thought this would be profitable but my market research suggests it would be very unprofitable. The evidence is that very few people have yet grasped the stupendous importance for science and philosophy of the fact that the unique measure of an inference is the missing information in it for a deductive conclusion or the fact that the existence and uniqueness of this inference support solution for the problem of induction. There is no market for the book because people don’t know what they don’t know.
Hmmm … looks like if you got together some climate modelers, climatologists, professional statisticians, professional programmers, and Entropy Limited in one place; some good could come of it.
Wow. It seems as if an entire century of progress in epistemology never happened. Such fetishization of deductive interference probably reached its pinnacle with the publication of Whitehead and Russell’s Principia Mathematica in 1910. Ever since then, it’s been all downhill. So you say “There seems to be general agreement that the deductive branch of logic has frequently been tested but has never been falsified by the resulting evidence”. Maybe this was true in 1910. But then along came Godel and Wittgenstein, to demonstrate that no deductive system can be both sound and complete. Or did you miss that bit?
And all this worry over the problem of induction. Popper demonstrated that the formulation of “the problem” that you give is just plain wrong, and that scientific work doesn’t proceed by induction at all. Maybe you never read Popper.
According to you, the law of non-contradiction is a fundamental principle of reasoning. So you’re obviously not aware of several decades of work on reasoning systems that don’t depend on it (sometimes known as paraconsistent logics, or, in its strongest form, dialethism). Scientific theories do not depend on non-contradiction. Go read Toulmin and Laudan, and you’ll see that science has very little to do with either deduction, induction or heuristic reasoning.
But what’s the point anyway? Your entire essay on forms of reasoning was put together merely to support a conclusion that starts “Some of us favor separation of climatology from politics”. Such a non-sequitur! You babble on about reasoning systems, in order to get to this? Your argument is that any scientist who doesn’t follow your prescription for reasoning is politicized? Give me a break. You could apply the same argument to an entire century of scientific progress. *poof*. Quantum physics? Gone. Genomics? Gone. Neuroscience? Gone. Astrophysics? Gone. and so on. According to your reasoning, they are all politicized, and therefore wrong.
Or maybe it’s your grasp of epistemology that’s wrong.
PS The bit where your went round touting your ideas to an the entire faculty of a university and were resoundingly rejected was amusing. Professors often have to fend off people like you with half-baked theories of everything. It’s a professional hazard.
The ghost of Wittgenstein hovers around many of these threads. But it’s worth remembering that a career of ever thinner salami-slicing drove him nuts. Caveat pensor.
My philosophy of mind tutor told me that the philosophy of language was a bit like going to a good restaurant,,, and eating the menu. :)
I have suspected for some time that the end result of this may be simply a dispute (Swiftian or Wittgensteinian, as you will) over, and a plumbing of the limits of, language.
Somewhere I remember a story about Dr Johnson, who knew a bit about words, responding to some counterintuitive hypothesis he had heard by kicking a rock. I googled it, and here’s Boswell’s account.
“After we came out of the church, we stood talking for some time together of Bishop Berkeley’s ingenious sophistry to prove the nonexistence of matter, and that every thing in the universe is merely ideal. I observed, that though we are satisfied his doctrine is not true, it is impossible to refute it. I never shall forget the alacrity with which Johnson answered, striking his foot with mighty force against a large stone, till he rebounded from it — “I refute it thus.”
Boswell: Life”
Better to kick the wall than bang your head against it. ;)
Dr Johnson’s take on “observation trumps theory”….
Steve,
In your article on ‘plug compatibility and climate models’ you say that:
“So plug compatibility and code sharing sound great in principle. In practice, no amount of interface specification and coding standards can reduce the essential complexity of earth system processes.”
I put it to you that The real problem is the instability of the models and the ad hoc tuning which has to be done to get them to produce output which ‘looks right’. This is the underlying reason for the difficulty of incorporating externally produced ‘modules’.
It seems to me that Terry’s approach is aimed at a transparency that is naturally resisted by climate modelers. It would bring their tweaks out into the open.
I expect that you will vehemently deny this.
tallbloke: The models are not unstable. I don’t know where you got that from. If it were true, then spinning up the model would be pointless – it would never stabilize. The fact that the models do stabilize after a spin-up period is really quite remarkable – a testament to the validity of the modelling approach.
Some tuning is required to ensure the climatology matches that of the earth, but I don’t think it’s correct to call this ad hoc; it’s more to do with compensating for the approximations that are needed to make the models computationally tractable. What’s important here is that the modellers understand the trade-offs they’re making, in choosing, for example, which processes to resolve and which to parameterize for a particular scientific question. The need for tuning doesn’t invalidate a model – if it reproduces a particular observed pattern in specific earth system processes very well, but with the mean values consistently shifted, then tuning it just makes it into a more scientifically useful tool. Nobody is claiming the models are perfect, and nobody is presenting model results as truth.
The real problem is that people like Terry want all-or-nothing: if they can’t be given a model that is perfect (and always gives correct results), then they want to claim the whole thing is worthless. But that’s not how such modeling works, and that’s not how science in general proceeds. Models aren’t supposed to produce “truth”. They’re supposed to simulate (approximately) a particular set of theories about salient earth processes, in order to test our understanding of them. Modellers learn more when the model goes wrong than when it looks right. Because the models are learning tools, not generators of truth.
The other problem of course, is that people who don’t understand modeling (and that includes both Terry, and our host on this blog) go round claiming that policy prescriptions are being based on model outputs. Which is pure bullshit. Policy prescriptions are based on the sum total of our understanding of climatic processes, *some* of which was arrived at by *experimenting* with models. But of course, those people who are politically motivated to reject large parts of our knowledge of climatic processes find it relatively easy to mislead their audiences by lying about how models are actually used as scientific tools.
Steve, thanks for coming by and posting. Your posts are spot on and well articulated.
This is just a poor version of the “science is settled” argument. Steve first invokes “the sum total of our understanding of climatic processes” as though this were something real, as opposed to what individual people believe. Put another way, just who is this “our”? Am I in there? Are the skeptics in there, or just the AGW believers?
Then he explains the debate on the usual suspects, ignorance and evil. Specifically — (1) lack of understanding, in this case “people who don’t understand modeling” including Dr. Curry, and (2) “those people who are politically motivated to reject large parts of our knowledge of climatic processes.” (note the invocation of “our knowledge” again.
The problem is that skeptics have a great deal of knowledge too. If “the sum total of our understanding of climatic processes” includes everyone’s understanding then it includes a lot of controversy and uncertainty as well. In fact there are certain aspects of the logic of modeling that I probably understand far better than Steve, like the relation between modeling and truth. Sorry Steve, but the people who disagree with you are neither ignorant nor evil. The scientific debate is real.
> This is just a poor version of the “science is settled” argument.
I believe it’s more an argument to the effect that Terry Oldberg has no idea what he’s talking about, both in epistemology and modelling.
As far as epistemology is concerned, Steve has a very strong case.
Steve’s remarks are hardly about Terry at all and I think my quotations make this clear. If you want to defend his case as “strong” please respond to my comments.
Steve Easterbrook’s remarks has started there. Here is a quote where he is claiming that Terry Oldberg knows nothing about epistemology:
> It seems as if an entire century of progress in epistemology never happened.
Here is another one:
> Maybe you never read Popper.
And another one:
> Go read Toulmin and Laudan, and you’ll see that science has very little to do with either deduction, induction or heuristic reasoning.
And a last one:
> Or maybe it’s your grasp of epistemology that’s wrong.
***
Steve’s remarks have continued there. Here is a quote he is arguing that Terry Oldberg knows nothing about modelling:
> The models are not unstable.
This remark was directed to Tallbloke, but unless I am incorrect Oldberg do say something about unstability.
Here is another one, where Easterbrook he’s criticizing a popular pea and thimble game against modelling:
> The real problem is that people like Terry want all-or-nothing: if they can’t be given a model that is perfect (and always gives correct results), then they want to claim the whole thing is worthless. But that’s not how such modeling works, and that’s not how science in general proceeds. Models aren’t supposed to produce “truth”. They’re supposed to simulate (approximately) a particular set of theories about salient earth processes, in order to test our understanding of them. Modellers learn more when the model goes wrong than when it looks right. Because the models are learning tools, not generators of truth.
Notice the name we emphasized. Here is another quote:
> The other problem of course, is that people who don’t understand modeling (and that includes both Terry, and our host on this blog) go round claiming that policy prescriptions are being based on model outputs. Which is pure bullshit.
Notice again the name we emphasized.
These two quotes should suffice to show that to say:
> Steve’s remarks are hardly about Terry at all […]
is plainly false, unless we are willing to weasel our way out overinterpreting “hardly”.
***
Finally, note this last quote from Steve Easterbrook:
> Nobody is claiming the models are perfect, and nobody is presenting model results as truth.
I presume Steve Easterbrooke has read the IPCC’s relevant work. This contradicts what Shub is claiming elsewhere in this thread and that David Wojick seems to endorse.
” The fact that the models do stabilize after a spin-up period is really quite remarkable – a testament to the validity of the modelling approach.”
And perhaps a reflection of the stability of Earth’s climate system?
“Bullshit”
“Terry, and our host on this blog) go round claiming that policy prescriptions are being based on model outputs.”
“Lying”
Are you sure you are justified in being so hostile? I think you’d need to show that they were saying policy prescriptions were being based on models *and nothing else*, to warrant such a statement.
Are you intending your mode of discourse to lead to rational and reasonable discussion?
Steve
As you well know models are useful for analyzing how subsystems work and to enhance testing of hypotheses about those and their relationships to other subsystems. However that models stabilize may provide insights, or it may just reflect how they have been specified.
Equally as you well know models in their current form weren’t developed to forecast future climates, nor to make inferences about local climates, nor to be used as surrogates for reality in experiments.
Regrettably all the above are regular features of the published literature.
Was that a lie?
“Nobody is claiming the models are perfect, and nobody is presenting model results as truth.”
You’re kidding right? Its all AGW has as “truth” for the future.
So how confident are you that when the conditions change after 10, 20 or 50 years of a run that the models are producing useful results? It seems to me models that needed to be tuned to represent currently known conditions are almost certainly going to be un-tuned when considering different conditions.
nobody is presenting model results as truth.
The IPCC is.
“If climate is modelled without the anthropogenic contributions, it would have been different and we know how that would have been. We can therefore attribute the difference between such output and actual temperatures to CO2.”
In order to do this, a model run and real temperatures have to considered to belong in the same class of truth.
> The IPCC is [presenting model results as truth].
Citation needed.
Where do you think the idea of dangerous warming comes from, if not the high end model predictions? Have you read the TAR and AR4 WGI SPM’s?
If I understand you correctly, you are citing the TAR and AR4 WGI SPM’s to back up the claim that
> The IPCC is [presenting model results as truth].
If that’s the case, would you care to provide a quote or two? I am unsure where the IPCC is presenting their model results as truth in these documents.
Take a look. They are short documents. I know the TAR presents two pictures showing 20th century model runs without and with human emissions and says this shows that the warming is due to our emissions. Do you see those pictures? Read the accompanying text.
This is the standard argument for AGW from the models. They then go on to discuss the range of future warming based on more model runs. This is the threat of dangerous warming. It does not matter if they use the word projection. The world clearly understands the IPCC to be saying this will happen if we do not act, otherwise the UNFCCC negotiations are insane.
You are aware I hope that the folks in Cancun claim to know what the CO2 levels have to be held to, to limit warming to 2C degrees? Where do you think these numbers come from?
There are other arguments for AGW that don’t depend on climate models at all. Do you recognize those?
Please back up your claim.
Watch this fun video:
http://www.youtube.com/watch?v=m-AXBbuDxRY
Talks about manmade global warming before there were climate models. How did they manage that?
Thanks, but in case it was not clear, I was asking David Wojick.
Your claims are preposterous. Modeling results are the only evidence for dangerous AGW.
David:
The claims are not only “preposterous”. In some cases they are demonstrable falsehoods.
For example, these statements,
“The need for tuning doesn’t invalidate a model – if it reproduces a particular observed pattern in specific earth system processes very well, but with the mean values consistently shifted, then tuning it just makes it into a more scientifically useful tool. ”
In the abstract, that could be true, but in the case of the climate models under discussion it is demonstrably not true.
As I explained in detail on another thread, the models each use a different climate sensitivity then compensate for the difference by inputting a different value of assumed aerosol cooling. Hence, the tuning by the assumed aerosol cooling forces agreement of each model with observed mean global temperatures over the twentieth century.
The range of used climate sensitivities varies by a factor of nearly 3 between the models. Hence, the models are each modelling a different climate system, and the tuning forces each model to emulate the Earth’s mean global temperature over the twentieth century.
But there is only one climate system of the real Earth. And the need for the tuning demonstrates that at most only one of the models is emulating the climate system of the real Earth.
And, in this case, the need for the tuning to reproduce “particular observed pattern in specific earth system processes” (i.e. mean global temperature) demonstrates the models are not useful scientific tools (except as a proof that our understanding of the real Earth system is insufficient for it to be modelled other than superficially).
Richard
Ironically Richard, I agree completely with Steve’s basic sentiment, namely that the modeling results have no truth value. In modal logic we would say they express possibility, not (physical) necessity, or even probability. However, clearly the sum of my understanding is very different from the sum of Steve’s understanding, for I go on to conclude that we do not understand climate change and there is no reason whatever to accept dangerous AGW.
But this is the point that people who claim the science is settled seemingly cannot grasp, that different people can look at the same body of evidence and draw opposite conclusions, because the sums of their understandings differ. That is why I want to know if my understanding (and yours) is included in Steve’s concept of “our” understanding? Only then will we know what he is claiming. (You can see why I, an expert on conceptual confusion, love the climate debate. It is confusion personified.)
Did you read the story of Johnson’s riposte? Sure, “different people can look at the same body of evidence and draw opposite conclusions”, but there’s a nice touchstone in that some of the conclusions some people draw are entirely wrong, because they don’t comport with reality. In other words, two people can stand at the top of a cliff and step off, and the one who accepts that gravity exists will fall to her death, and the one who doesn’t accept that gravity exists will fall to his death as well. The 2nd person’s “opposite conclusion” to what the 1st person knows is entirely irrelevant. Nature has the final say.
Dr Johnson accomplished his riposte by kicking a rock and experiencing both its existence and its physical properties – painfully. Where is your rock? Latimer Alder has asked you for it, and I would be interested to see it. So far, you haven’t provided it.
See my post on what can we learn from climate models, especially the section on “Fit for purpose?” I also agree that model simulations should be interpreted as modal statements of possibilities.
David:
Thank you for your response.
To be clear, I agree with your statement that “In modal logic we would say they [i.e. the climate models] express possibility”. Indeed, that is what I meant when I said, “at most only one of the models is emulating the climate system of the real Earth”. Simply, the climate models are each a possible description of the real climate system and nothing more.
However, it is important to note that the models do not encompass the total range of possible descriptions.
Assuming for the moment that the models are correct in their basic assumption that climate change is driven by change to radiative forcing, then none of the models represents possibilities provided by the low values of climate sensitivity obtained empirically by Idso, Lindzen&Choi, etc..
But the range of the models’ outputs is presented (e.g. in the AR4) as being the true range of possibilities. It is not the true range and, therefore, the presentation is – in fact – a misrepresentation.
And this misrepresentation is why I said it is a “falsehood” to assert that an ability to tune each of the models with assumed aerosol forcing makes any of the models a “more useful scientific tool”. Such tuning merely makes each model capable of representing the possibility which that model is showing.
Furthermore, none of the models attempts to emulate other possible causes of climate change than the assumption that climate change is defined by change to radiative forcing. One such alternative possibility is that the climate system is constantly seeking chaotic attractors while being continuously forced from equilibrium by the Earth’s orbit varying the Earth’s distance from the Sun (n.b. I explained this possibility in a response to another of your comments on another thread of this blog). That possibility reduces the possible climate sensitivity to zero.
Please note that I am stating more than there is uncertainty about the models’ results. I am talking about a clear misrepresentation of what the models do and what their outputs represent.
It is bad science to report a range of possibilities derived from assumed climate sensitivities without clearly explaining that other possibilities derived from lower values of climate sensitivity (which are empirically obtained) are equally possible.
And if that reported range is taken as being sufficient to justify claims that actions to change the future are required, then that claimed justification is pseudo-science of precisely the same type as astrology.
Richard
> [I] agree completely with Steve’s basic sentiment, namely that the modeling results have no truth value. In modal logic we would say they express possibility, not (physical) necessity, or even probability.
It would be interesting to know in which modal logic we do not have truth values. Modalities qualify the truth of a judgment, they do not eliminate truth:
http://en.wikipedia.org/wiki/Modal_logic
Let’s take a rough example. Saying that it might rain (i.e. it is possible that it will rain) roughly means that there are worlds (or state spaces or else) where it will rain, and others where it will not. There are clearly two sets of worlds, one where the statement “It will rain” is true, and the other where it won’t.
This is not very surprising, as the logical definition of a model entails something like truth:
http://en.wikipedia.org/wiki/Model_theory
It would be interesting to know how the notion of model used in climate science compares to the one of model theory. Perhaps Steve Easterbrook could help us with that one.
Steve, perhaps your need to read my post on “What can we learn from climate models?”
http://judithcurry.com/2010/10/03/what-can-we-learn-from-climate-models/
I would appreciate being enlightened on exactly what I don’t understand about modeling, and which of my modeling publications are incorrect.
http://curry.eas.gatech.edu/onlinepapers.html
I would also appreciate your pointing out any instance where I have said or written that policy prescriptions are based on models. In fact, I have gone to great lengths to say exactly the opposite.
While there is merit in some of what you say, some of your statements are absolutely incorrect, which detracts from the merit in your arguments.
In my experience this situation is a sure indicator that the model is not yet production grade. The objective of production grade models / software / applications is to learn more about the application area, not to learn more about problems with the model.
Yet another description of research-grade models / software / applications / users. And yet another generalization-to-the-max. It is undeniable that many models are used specifically because they provide exceedingly useful truths about application areas.
The climate model developers could use more money to turn their models from research tools to “production grade”. Then again, are the places where climate models come from scientific research centers, or software shops?
Contrary to what Steve says, the modelers are already asking for more money so they can make forecasts for specific local regions, for policy makers to act on. However, more money will not help as long as most natural variability is ignored, as it presently is. Lack of understanding is the problem, not lack of money.
Show me that “natural variability” is ignored by climate modelers. Perhaps you can introduce the concept to them. Find a few and tell them it exists – I’m sure they’ll be suitably surprised.
“Show me that “natural variability” is ignored by climate modelers.”
Its not ignored, rather its not understood. For example no models can reproduce the MWP or LIA to any useful extent.
Mann, Jones et al will strenuously argue that these events never really happened but Archaeology trumps model results every time and even if regional, they had significant duration that the models simply cant reproduce.
The current CMIP project asks for modelers to run an 850-1850 experiment, so we’ll see if climate models can reproduce the MWP and LIA.
You’ve got the impetus backwards – it’s the policy makers who want information upon which to make decisions, and since they operate on local and regional scales, they want climate models to be able to inform their decisions. Ergo, climate models are being pressed to the regional and local scale.
I guess I’m missing something, but is the climate ever stable? Why would the models be stable? Do you have some special meaning for “stable” here?
ditto.
Steve, Popper didn’t have the last word. Jaynes provides an alternative, Bayesian approach (and includes entropy maximization as a useful principle, rather than something to fetishize ; ).
In reply to Terry:
You are still wrong in this small thing; it damages your credibility in regards to the larger (and more interesting) topic of sound inference (if you were pitching this stuff for me to spend money on, and you couldn’t get the simple linear algebra right, I’d reject your proposals too).
I’d recommend Jaynes’ (unfortunately incomplete) work to folks interested in reading more on this. His development is very accessible; really, little more than high-school math skills are required to start.
You write
“Jaynes provides an alternative, Bayesian approach (and includes entropy maximization as a useful principle, rather than something to fetishize ; ).”
But does Jaynes suggest that entropy maximization might remove the influence of the prior PDF and provide an unique objective result as Terry claims?
Entropy maximization does generally give you a unique solution, but if this gave you “the objective truth,” then there would be a lot fewer pins and angels for statisticians to argue over. Jaynes also recommends Jeffreys priors, but it’s easy to point out that this is sensitive to your chosen parameterization (of course the parameterization you choose is information itself, so shouldn’t we leverage that?). I like Jaynes because at the end of the day he’s a physicist interested in getting useful results; his hard-nosed pragmatism appeals to me, but I have no objective justification for my prior in this matter.
jstults:
There is no linear algebra in any of my assertions!
Your assertions about decadal averages and sample sizes in the comments, which I pointed out at the bottom of the thread, are conveniently written as matrix operations, but you are free to use whatever bookkeeping method you prefer.
Steve:
Do you you know how your HDTV works, from a logical perspective?
Sounds like someone has a major bug up their butt!
Terry I’m doing my best to follow this, but find it frustrating to encounter the likes of “when we make a deductive inference to the outcome of an event”. Personally, I draw inferences from things. Is this a special use of “inference”, and if so, can you explain it? And if what you really mean is “when we DRAW a deductive inference FROM the outcome of an event”,
a/ why not say so?
b/ is that really any different from “when we make a deduction from an event”?, and if so, how?
Terry should have had part II pre-written with plenty of practical examples to illustrate his points before posting part I. I anticipate confusion will now reign.
One obvious problem is that the only practical example given is the dice rolling. Dice have a known number of sides with known outcomes.
We don’t know how many sides the solar die has. We only know a few of the outcome magnitudes on some of the unknown number of faces. To use Terry’s method we would have to make guesses, assumptions and exclusions. This would introduce a ‘false precision’ to the outcome.
Terry should have had part II pre-written with plenty of practical examples to illustrate his points before posting part I.
Heavens, no. Writing part II before part I would require a ‘prior PDF’ and the blog would have exploded.
Tom:
An inference is from one proposition and to another. Thus, for example, it is from the proposition “it is cloudy” to the proposition “it will rain.”
As I’ve defined the terms “condition” and “outcome,” an inference is from a condition to an outcome. This is true because a condition is defined on the Cartesian product space of a model’s independent variables while an outcome is defined on the Cartesian product space of a model’s dependent variables.
Thus, to “draw an inference” from a condition is OK. To “draw an inference” from an outcome is a contradiction in terms.
Terry, thank you for this, and either I was being sloppy, or you have clarified my thinking, or both!
I trust this is not simply an example of the common, and godawful, confusion between “infer” and “imply”?
In plain English, “it is cloudy” IMPLIES (sorry, I’ve given up trying to use italics) “it will rain” (qualified by probability). I don’t see how, in plain English, we can infer forwards in time – we must imply, wait for the outcome, and then infer. Our inference may have further implications, and so on. Likewise, I imagine we can infer nothing from an outcome without at least one condition.
So I should perhaps have said the following:
“I draw an inference from BOTH the condition AND the outcome.” Since that inference can only be made upon the later arrival of the outcome (until then the outcome is surely but an implication, perhaps having some of the qualities of a prediction?), I treated the condition as data and wrote “draw an inference from”.
But again, perhaps I have misunderstood your definitions of “condition” and “outcome”. If so, fine, so long as these words do not reappear in their more colloquial guise at a later stage in your argument. But this rather suggests to me that your thesis may involve such extensive use of special meanings for common words and concepts as to render it incomprehensible to all but those who subscribe to it, and of questionable practical value to the rest.
The ghost of Wittgenstein hovers….
Tom:
Here, by example, is how I understand the terminology:
“cloudy implies rain”
is an implication.
“cloudy implies rain”
“cloudy”
“therefore rain”
is an inference.
If the outcome “rain” lies in the future of the condition “cloudy” then the inference is a prediction.
Once again, Terry, thank you for taking the time to respond. I hope you agree that these things should be as plainly expressed as possible, consistent with accuracy, if they are to be accessible to the widest possible readership.
You write:
“cloudy implies rain”
is an implication. [I would prefer “is a description of an implication.” The implication exists whether we observe and describe it or not – see below]
“cloudy implies rain”
“cloudy”
“therefore rain”
is an inference.
I can only say – “maybe, but it’s not an inference TO something, if you want to be widely understood by users of common English.”
To me, and to most of us, I suspect, the second is still a prediction – simply parsing it doesn’t allow us to reverse the logical duty of the word “inference”.
Part of the problem here seems to be that we share a well-grounded (at least at the level of observation) familiarity with the consequential relationship between cloudiness and rain (and other forms of precipitation), whereas the reason we’re all here (or at least why I am) is to see if we can shed light on consequential relationships with which we are NOT familiar, which may be new, and which may be in dispute. For the cloudy/rain couple to work better as an example, we have to go back in time to when this was true of the cloudiness/raininess coupling – to the first human who conceived the relationship between cloudiness and subsequent rain/hail/what have you. For him/her, cloudiness became “intellectually coupled” (to avoid the use of either of our disputed terms) with “raininess” upon the commencement of a precipitation event, AND NOT BEFORE. All subsequent intellectual couplings of cloudiness and rain have depended on the foundation provided by this original, retrospective, inference. And that inference was, it seems to me, NECESSARILY retrospective.
Another way of looking at this MAY be to observe that “implications” are insentient couplings of phenomena that exist independently of human mind (“cloudiness implies rain, whether humans know it or not”), while “inference” is a necessarily human, or at least sentient, phenomenon.
The phrase that first troubled me was “make a deductive inference to” – might it not be rephrased “assign a deductive inference to”? This would make it clear to me that “to… “was the object of the verb “assign”, not of “inference”, and would not leave me wondering, in vain, how one can “infer to”, rather than “infer from”.
I’m not vain enough to suggest that all this may constitute a flaw in your thesis, rather than a revelation of my own intellectual limitations. But I do insist that it is unwise to allow counter-semantic English usage to creep into scholarship (which may be difficult enough to grasp at its plainest) without the strongest possible justification. I can see none here. And the reason I presume to press the point is that I have many times encountered explications in science and philosophy which appear to be beyond me, only to later encounter an alternative explication that I do, indeed grasp, merely because it was rendered in plain, semantically orthodox English.
If unorthodox usage really is indispensable to your thesis, then I hope I have not offended you with my naive enquiry.
Now THAT I follow…
Steve: “The other problem of course, is that people who don’t understand modeling (and that includes both Terry, and our host on this blog) go round claiming that policy prescriptions are being based on model outputs. Which is pure bullshit. Policy prescriptions are based on the sum total of our understanding of climatic processes, *some* of which was arrived at by *experimenting* with models. But of course, those people who are politically motivated to reject large parts of our knowledge of climatic processes find it relatively easy to mislead their audiences by lying about how models are actually used as scientific tools.”
You seem very angry about the allegations that climate models are the sole source for policy prescriptions on global warming and point out that these prescriptions are based on the sum total of our understanding of the
process including the outputs from models. Fair enough, but it is the output from the models that’s telling us the world will warm by 1-5C and it is this warming that’s being used to try to drive policy proposals that will surely empoverish us all if they are implemented, so the modellers need to be damn sure that they convey the uncertainties.
“… politically motivated to reject large parts of our knowledge of climatic processes find it relatively easy to mislead their audiences…”
I am yet to come across a climate consensus observer/scientist who gives explanations without invoking this type of politics mixed in.
How does one measure disorder?
Mac:
That’s an interesting question. Entropy (or the generalization from it that is called the “conditional entropy”) is the measure of an inference. Traditionally, in courses on thermodynamics, students were taught that entropy was synonymous with disorder. This use of language would make disorder the measure of an inference. The measure of disorder would be the measure of the measure of an inference but a measure of a measure is mathematically undefined, for in measure theory a measure is a function but the entities that are measured are sets. Thus, if you retain the old idea that entropy and disorder are synonymous, disorder is the measure of an inference and disorder has no measure. However, it is more precise to say that “entropy” and “missing information” are synonymous, as we have the mathematical definition of “information” that was left to us by Shannon. Thus, perhaps there is a sense in which disorder is a set and possesses a measure. I don’t know what this measure is, though.
Terry,
If your trying to generate a correct working model on this planet, then the science to be absolutely correct, must work on other planets. By just changing the atmospheric composition, rotational speed and the same variables.
But our basic understanding generated by current laws of science do not work well if you were to do this. Tying centrifugal force to planetary rotation works extremely well in determining how heavy an object will be on another planet with the atmospheric composition. The slower the rotational speed, the heavier the gravity due to the countering force of centrifugal force. Our science could not figure out what that energy was and deemed the whole area psuedo-science. But it can be mechanically reproduced to show how this energy is viable.
I am glad to see I am not the only one who finds this post incomprehensible. But also, unlike some others, I say this respectfully. I am sure you have put a lot of thought into this. A lot more than me. And I could be wrong.
But quoting from the Wikipedia: “In the philosophy of science, abduction has been the key inference method to support scientific realism, and much of the debate about scientific realism is focused on whether abduction is an acceptable method of inference.” IMHO, this post does not contribute to this debate.
As a fundamental example: to the extent I understand you, I disagree with your notion of inconsistency. Take this perfectly consistent argument: I AM ALWAYS RIGHT. :-) Because, before new experimental evidence come in, I can demonstrate that my theories always fit the facts as they are known to me. And on the rare occasion that new falsifying evidence is discovered, I immediately change my theories to fit the new total body of facts. Therefore, I am only wrong during that brief period of time required to adjust my theories. And, as clever and humble as I am, that period is always arbitrarily brief. :-) Therefore, for all practical purposes, I am always right.
My question to you would be, what is wrong with always being right?
“George Crews is only Wrong on a set of measure Zero”
Has a nice ring to it; pretty soon we’ll have a site for George to go along with Chuck Norris and Bruce Schneier ;-)
George:
I AM ALWAYS RIGHT is an example of a heuristic. It yields a theory but is not a logical process as it violates the law of non-contradiction.
Terry,
“I have many faults, but being wrong isn’t one of them. ”
Does this formulation get around non-contradiction? ;)
tallbloke:
It seems to me that it has the same deficiency as “I am always right.” This is that it identifies many different inferences as the correct one.
Terry, is it possible to be logically consistent and yet still violate the law of non-contradiction? If not, what specifically is inconsistent with my argument? My argument’s assumption is that the SOLE test of knowledge is experiment. So if, for all practical purposes, my believes are consistent with the experimental evidence, then, for all practical purposes, my beliefs represent knowledge.
Yes this is my question also. When you are dealing with complex issues and high levels of uncertainty, paraconsistent logic seems to have some advantages?
The principle of non contradiction is challenged by paraconsistent logic.
Note, the jigsaw puzzle analogy in the frames and narratives thread can be interpreted in the context of paraconsistent logic.
Judy:
I wouldn’t say paraconsistent logic “challenges” the law of non-contradition but rather that it redefines the word “logic” such that non-contradiction is not one of its principles. To redefine “logic” in this way has the perverse effect that is no solution to the problem of induction nor any principles of reasoning. As a science of the principles of reasoning, logic disappears.
My interest in Terry’s induction approach comes from having flirted with using mutual information, entropy and bits in an analysis of hurricane data (Hoyos et al., i’m trying to find the supplementary info where this is explained).
http://curry.eas.gatech.edu/currydoc/Hoyos_Science312.pdf
It seems like these techniques would be relevant for a range of applications.
I am not sure why the law of non-contradiction is needed for this, tho.
Judy:
The law of non-contradiction is the driver toward discovery of the principles of reasoning. Absent the law of non-contradiction, we are left with heuristics as the deciders of which one of many inferences that are candidates for being made by a model is the one correct inference. The logical deficiency in the use of heuristics is precisely that this usage violates the law of non-contradiction.
A consequence from violation of the law of non-contradiction by the method of heuristics is variability in the quality of the model that is built in which the quality depends upon the model builder’s luck in the selection of the heuristic or heuristics that decide upon the inferences that are made. In contrast, when a model is built under the principles of reasoning, the inferences that are made are selected by optimization. There is no variability and the models that are built are always of the highest possible quality.
What I get from Terry’s essay is that it is “possible” to measure the unknown. Well if that’s the case, then a priori this “unknown” is not fully unknown. But how can that be the case, since we have defined the unknown as being unknown from the start?
The case seems completely straightforward to me: it is impossible to measure the unmeasurable, i.e., what we simply don’t know about something. Because we could have the most amazing model of reality on our hands, and make some calculations and say, “look, my model is just great, and the inferences I derive from it are guaranteed to work” and then something simply remarkably different happens. This is all too common, and so I get a little confused whenever I read someone who “believes” he got the “induction problem solved”. It’s like reading people who believe that they got the problem of squaring circles “solved”. No, they are unsolvable by definition.
Realism has been defeated a century ago, with some whiners still polluting the nets, people who still believe that “Absolute Truth” is something tenable, meaningful, sensible and graspable. No, it isn’t. We live in a world full of prejudicial traditions, full of “words”, concepts, and the limitation of this brain of ours. We live in a world where it either is raining or not, and this has nothing to do with “Truth”, it’s a prejudice of ours, a fetiche if you like.
At best, we can measure how models perform against observations, and how they differ from our own generated noise models. And we can reason how far they are useful to us or not. This shenanigan about “truth” is worthless and irrelevant.
Luis:
You’ve got the wrong idea of what I’m saying. What I’m saying is that an inference has a unique measure. The measure of an inference is the missing information in it for a deductive conclusion.
Here is an example. Let us suppose you are the captain of a football team. The head referee has tossed a coin and while it is in the air he has called upon you to infer whether the outcome will be heads or tails. If the coin toss is “fair,” the probability of heads equals the probability of tails. Thus the probability of heads is 1/2 and the probability of tails is 1/2. From the two probability values, the missing information in your inference to the outcome of the coin toss can be computed. The missing information is 1 bit. Thus, the measure of your inference is 1 bit.
The fundamental problem with the entire essay parts I & II is that it is talking about probability models only. The example Terry gave for weather forecasting is a probabilistic model. Consider celestial mechanics. It is possible to compute the orbits of the planets but not based on a probability distribution–instead there are equations of motion. These equations were developed by forming a hypothesis and then testing it. The various statistical approaches are useful for testing a theory/model, but only sometimes directly yield a predictive model. If we have a mechanism for how something works, we can forecast what will happen if we alter conditions in ways never seen before (and thus design new plastics, etc) but a statistical model can not do this.
While I like the entropy ideas here, the inference that dynamical models from physically based equations are somehow illogical is not a useful one.
Judy:
Can you expand upon your view here?
Nope. Completely incorrect. You are drawing distinctions where there are none.
Good example. Take your deterministic trajectory code, provide distributions for your initial conditions (because you don’t know the ICs down to the machine precision do you?) and distributions of the parameters (because while we can measure big G pretty accurately, it’s still a measurement subject to uncertainty, and so are all of those stellar masses), and turn the crank on your deterministic code according to a method like stochastic collocation, and you end up with pdfs for your outputs. Guess what? Your “physical model” and the statisticians “probabilisitc model” or “linear regression” are no different. You just happen to be particularly choosy about your basis (and possibly lazy about uncertainty quantification) where the statistician might just be happy with polynomials and the whole point is quantifying uncertainty.
Be my guest at sending a probe to mars based on a polynomial or a pdf.
Gee, on second thought, you’re right. Your single sentence reply has convinced me of the error of my ways and added greatly to the discussion. It is obviously soooo stupid to plan a spacecraft trajectory that takes uncertainty into account:
stochastic collocation trajectory optimization
Craig:
It sounds as though fog continues to lie over the problem of induction for you. Let me try to lift this fog.
A solution has been found for the problem of induction. Under this solution, a model is built from two kinds of statistical event. Events of one kind are “virtual”; they are are a product of theoretical science. Events of the other kind are “observed”; they are a product of observational science.
It sounds as though your thinking is that a model is built from observed events or from virtual events but is not built from both kinds of event. Though this is a traditional way of thinking, it is not the way of thinking that has led to a solution for the problem of induction.
” If we have a mechanism for how something works, we can forecast what will happen if we alter conditions in ways never seen before (and thus design new plastics, etc) but a statistical model can not do this.”
Neither can a deterministic model if its not a true representation of those processes. GCMs aren’t true representations of climatic processes, they’re a mix of approximations and simplifications on the whole.
Terry,
In order to move forward with my current research, I have had to show how the line of current theories cannot withstand time and in some cases distance measurements in rotation and motion.
Theories that collapse are: Laws of relativity
Laws of motion
Laws of Thermaldynamics
Quantuum Physics
These theories collapse due to this planets rotational speed was much faster in the past generating more centrifugal force. Evaporation was not being relased until a billion years ago due to the density it had with salt.
With all objects moving in space, quantuum physics has no start point or end point to be exactly accurate to triangulate an exact point(time travel and dimensional travel are impossible).
As I read the article and grappled with the ideas, one thought that occurred to me was the ‘lumpy’ nature of probability. This being the aspect of probability that allows for rolling ten ‘sixes’ in a row and casinios to get rich off those ‘on a roll.’ Of course, over large numbers, the result should be as predicted. In the case of climate models, what leads people to believe that the numbers are large enough, and that we are not just ‘on a roll?’
There have been a couple of studies done to test the idea that global temperature variation could simple be a ‘random walk’. The outcome is that it could indded be a ‘random walk’.
Personally I find such studies intellectually unsatisfying. While they may be ‘true’ and useful as null hypotheses, I think in general that ‘randomness’ is a lazy brained cop out. Things which are apparently random can turn out on deeper analysis to be the products of interacting variables.
While the solution of such problems may be intractable presently, we can anticipate that greater effort will be rewarded with deeper understanding.
Random walks are a legitimate inquiry,eg Ghil 2001
The global temperature increase through the 1990s is certainly rather unusual in terms of the instrumental record ofthe last 150 years or so. It does not correspond, however, to a rapidly accelerating increase in greenhouse-gas emissions or a substantial drop in aerosol emissions. How statistically significant is, therefore, this temperature rise, if the null hypothesis is not a random coincidence of small, stochastic excursions of global temperatures with all, or nearly all, the same sign? The presence of internally arising regularities in the climate system with periods of years and decades suggests the need for a different null hypothesis.
Interference is observed in both physical and biological systems due to competing periodicities and where synchronzation or mode locking occur.
Random walks may also exhibit quasi periodicity eg http://www.whoi.edu/oceanus/viewImage.do?id=37230&aid=19947
Random walks may in a persistant way, also exhibit historical behavior,without being recurrent eg David Ruelle millennium problems a significant constraint in probabilistic descriptions eg Zeldovich et al
http://www.worldscibooks.com/physics/0862.html
The presence of internally arising regularities in the climate system with periods of years and decades suggests the need for a different null hypothesis.
This I agree with. But I may also have found some of what were supposed to be internally arising regularities do in fact have external controlling phenomena. There is plenty of room for some randomness as well as for cyclic phenomena with underlying causes we havn’t yet deternined to everyone’s satisfaction.
Chip:
Excellent! Research on the thinking of casino gamblers suggests that their behavior is motivated by their belief in the existence of 2 patterns that are not actually present. One is that a player has “hot hands” because he/she has had a string of winning outcomes. The other is that a particular outcome is “due” because it has not occurred in a long sequence of outcomes. Gamblers who base their decisions upon these phantom patterns have no information upon which to base a decision but think they have information.
As this relates to climatology, the number of observed statistical events upon which the IPCC reaches its conclusion is very few. How few is indeterminant because the IPCC does not identify these events. However, it can be said that if the time series upon which the IPCC’s conclusion is based extends backward in time for 150 years and the period of time over which weather is averaged in arriving at the “climate” is 20 years, then the number of observed events is 7. Seven observed events provide a grossly deficient basis for the discovery of patterns. Like the person who detects the “hot hands” pattern, the IPCC is gambling on a non-existent pattern.
I thank one and all for having the courage to expose their contentions to possible falsification by commenting. I extend particularly hearty thanks to those who have managed to criticize the ideas of other people without attacking these people.
In reviewing the comments, I’ve noticed a common thread among some of the readers who appear to have failed to get a grip on the subject matter. The common thread is to assume that a model is built either from the statistical events which I’ve called “virtual events” or from the statistical events which I’ve called “observed events” but not from both types of event. Actually, the problem of induction is solved by the rule that a model is built from both types of event. The false assumption that a model is built from one or the other type of event appears to be blocking the comprehension of these readers.
The assignment made by equation (1) is a result from maximum entropy expectation under no constraints on entropy maximization. Equation (2) is a result from application of the unbiased estimator. Equation (3) is a result from maximum entropy expectation under constraints on entropy maximization that express the available information
In the assignment that is made by equation (1), “t” is the frequency of the ways in which the abstracted state can occur while “f” is the frequency of the ways in which the abstracted state can not occur. “t” and “f” are both the frequencies of virtual events.
In the assignment that is made by equation (2), “x” is the frequency of events that are in the abstracted state while “n” is the frequency of events of all descriptions. “x” and “n” are both the frequencies of observed events.
In the assignment that is made by equation (1) only virtual events are counted. In the assignment that is made by equation (2) only observed events are counted.
In the assignment that is made by equation (3) both virtual and observed events are counted. Counting both the virtual and observed events has the consequence that the value which is assigned lies between the value that is assigned by equation (1) and the value which is assigned by equation (2). This value expresses all of the available information but no more. To express all of it but no more is the principle of maximum entropy expectation.
Equation (1) expresses no information. Equation (2) expresses more information than is available. Equations (1) and (2) both violate the principle of maximum entropy expectation. For satisfaction of this principle, the virtual events and the observed events must both be counted.
Terry, I appreciate what you are attempting with these two posts (though I’m still unsure why you persist in such a small error when it’s pointed out) . Here’s how Jaynes describes the relationship between the different inference methods, maybe this will help some folks who are already somewhat familiar with the subject matter:
We discuss the relations betweeen maximum-entropy (MAXENT) and other methods of spectral analysis such as Schuster, Blackman-Tukey, maximum-likelihood, Bayesian, and Autoregressive (AR, ARMA, ARIMA) models, emphasizing that they are not in conflict, but rather are appropriate in different problems. We conclude that:
1) “Orthodox” sampling theory models are useful in problems where we have a known model (sampling distribution) for the properties of the noise, but no appreciable prior information about the quantities being estimated.
2) MAXENT is optimal in problems where we have prior information about multiplicities, but no noise.
3) The full Bayesian solution includes both of these special cases and is needed in problems where we have both prior information and noise.
4) AR models are in one sense a special case of MAXENT, but in another sense they are ubiquitous in all spectral analysis problems with discrete time series.
5) Empirical methods such as Blackman-Tukey, which do not invoke even a likelihood function, are useful in the preliminary, exploratory phase of a problem where our knowledge is sufficient to permit intuitive judgements about how to organize a calculation (smoothing, decimation, windows, prewhitening, padding with zeros, etc.) but insufficient to set up a quantitative model which would do the proper things for us automatically and optimally.
On the rationale of maximum-entropy methods
jstults:
Thanks for taking the time to comment!
I’m unsure of the nature of the small error which you say I persist in making. Please clarify.
Regarding Ed Jaynes, I’ve read many of his works including the recently published book. He was an early exponent of entropy maximization under constraints (MAXENT). As I’ve tried to show, entropy maximization under constraints is a principle of reasoning. However, it is not the only such principle.
So far as I am aware, Jaynes did not employ the principle of minimum conditional entropy in his life’s work though this principle appeared in Shannon’s paper “A Mathematical Theory of Communication” (1948) and Jaynes’s seminal paper “Information Theory and Statistical
Mechanics” (1957) was based upon Shannon’s paper.
In the design of a communications channel, entropy maximization under constraints optimizes the design of the encoder. Conditional entropy minimization optimizes the design of the decoder.
Discovery of the remaining principles of reasoning resulted from Ron Christensen’s insight (1963) that a model (aka theory) could be viewed as the algorithm for an optimal decoder. A lapse in Shannon’s theory was that it failed to describe the procedure by which numerical values should be assigned to the probabilities of abstracted states. Christensen filled in this gap with maximum entropy expectation. In his book, Jaynes cites none of
Christensen’s work. I gather from this evidence that he was unaware of Christensen’s work. Perhaps, then, he was unaware of the significance of the principles of conditional entropy minimization and maximum entropy expectation for the probabilistic logic.
Relevant wiki links:
Effective Degrees of Freedom
Moving Average
“Hat” Matrix
Relevant comment links:
one
two
three
I think it would be instructive for you to do a Part III (pending our host’s continued indulgence) that consisted of a worked example illustrating your solution to inference. A really germane example would be making an inference about the appropriate averaging window for global mean surface temperature since 1850. This has the advantage of being a well studied time-series, there are a wide variety of reconstructions to choose from, and you keep making references to 150 years of data averaged over decadal time-scales and how this relates to a number of “observations”. Comparing and contrasting your proposed solution to this example with various ad hoc solutions would provide something concrete for folks to wrap their thoughts around.
Maybe if you are feeling especially inspired you could do Part IV, and bring in discussion of auto-regressive models:
The Relevance of Rooting for a Unit Root
– a comment
– a comment
Does this have a unit root?
Maybe this is veering too technical for this site? I bet Blackboard or OurChangingClimate would host something that added to the discussion (applying MAXENT would I think). Anyway, I certainly understand if time-constraints keep you from following-up on any or all of these suggestions.
In the next 18 days I’ll be on vacation and out of touch.
Readers of my essay on The Principles of Reasoning have posted a number of comments on it. So far as I’m aware, none of these comments falsify my contention regarding the nature of the principles of reasoning.
In an attempt at focusing the discussion or bringing it to closure, I shall repeat my contention and issue the request that anyone knowing of an argument leading to falsification of this contention shall post the detailed argument to the Comments section of Part II of the essay. My contention is that the principles of reasoning are:
* The law of non-contradiction,
* The principle of entropy maximization,
* The principle of conditional entropy minimization and,
* The principle of maximum entropy expectation.
Terry, any interest in doing a part III with a practical application of relevance to climate?
Judy:
Shortly, I’ll email you with outlines for possible additional parts.
Terry
David Harriman’s book “The Logical Leap” has some coverage of induction. A useful book in any case.