Stanford Complexity Group

19 April, 2017

Aaron Goodman of the Stanford Complexity Group invited me to give a talk there on Thursday April 20th. If you’re nearby—like in Silicon Valley—please drop by! It will be in Clark S361 at 4:20 pm.

Here’s the idea. Everyone likes to say that biology is all about information. There’s something true about this—just think about DNA. But what does this insight actually do for us, quantitatively speaking? To figure this out, we need to do some work.

Biology is also about things that make copies of themselves. So it makes sense to figure out how information theory is connected to the replicator equation—a simple model of population dynamics for self-replicating entities.

To see the connection, we need to use ‘relative information’: the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Then everything pops into sharp focus.

It turns out that free energy—energy in forms that can actually be used, not just waste heat—is a special case of relative information Since the decrease of free energy is what drives chemical reactions, biochemistry is founded on relative information.

But there’s a lot more to it than this! Using relative information we can also see evolution as a learning process, fix the problems with Fisher’s fundamental theorem of natural selection, and more.

So this what I’ll talk about! You can see my slides here:

• John Baez, Biology as information dynamics.

but my talk will be videotaped, and it’ll eventually be put here:

Stanford complexity group, YouTube.

You can already see lots of cool talks at this location!

 


Periodic Patterns in Peptide Masses

6 April, 2017

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers m_1, ... , m_{20}, how many ways can you write any natural number N as a finite ordered sum of these numbers? Call it F(N) and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for F(N). It obeys a linear recurrence:

F(N) = F(N - m_1) + \cdots + F(N - m_{20})

and we can solve this using the ansatz

F(N) = x^N

Then the recurrence relation will hold if

x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}

for all N. But this is fairly easy to achieve! If m_{20} is the biggest mass, we just need this polynomial equation to hold:

x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1

There will be a bunch of solutions, about m_{20} of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for F(N) we need to find the right linear combination of functions x^N where x ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root x with the largest magnitude will make x^N grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses m_1 = 1, m_2 = 2. Then the numbers F(N) are the Fibonacci numbers. I hope you know this: the Nth Fibonacci number is the number of ways to write N as the sum of an ordered list of 1’s and 2’s!

1

1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

x^2 = x + 1

and the root with largest magnitude is the golden ratio:

\Phi = 1.6180339...

The other root is

1 - \Phi = -0.6180339...

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers m_1 < \cdots < m_k are all the roots of

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

real?

I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that F(N) is asymptotically proportional to x^N where x is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of F(N). The red roots have the second largest magnitude. These give the main oscillations in F(N), which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers m_1 < \cdots < m_k and draw all the roots of the equations

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The beauty of roots.

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:


But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses m_i are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!


Applied Category Theory

6 April, 2017

The American Mathematical Society is having a meeting here at U. C. Riverside during the weekend of November 4th and 5th, 2017. I’m organizing a session on Applied Category Theory, and I’m looking for people to give talks.

The goal is to start a conversation about applications of category theory, not within pure math or fundamental physics, but to other branches of science and engineering—especially those where the use of category theory is not already well-established! For example, my students and I have been applying category theory to chemistry, electrical engineering, control theory and Markov processes.

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through cyberspace to me! Abstracts are due September 12th, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 4–5 November 2017.

We’ll be having some interesting plenary talks:

• Paul Balmer, UCLA, An invitation to tensor-triangular geometry.

• Pavel Etingof, MIT, Double affine Hecke algebras and their applications.

• Monica Vazirani, U.C. Davis, Combinatorics, categorification, and crystals.


Jobs at U.C. Riverside

30 March, 2017

The Mathematics Department of the University of California at Riverside is trying to hire some visiting assistant professors. We plan to make decisions quite soon!

The positions are open to applicants who have PhD or will have a PhD by the beginning of the term from all research areas in mathematics. The teaching load is six courses per year (i.e. 2 per quarter). In addition to teaching, the applicants will be responsible for attending advanced seminars and working on research projects.

This is initially a one-year appointment, and with successful annual teaching review, it is renewable for up to a third year term.

For more details, including how to apply, go here:

https://www.mathjobs.org/jobs/jobs/10162


Restoring the North Cascades Ecosystem

13 March, 2017

In 49 hours, the National Park Service will stop taking comments on an important issue: whether to reintroduce grizzly bears into the North Cascades near Seattle. If you leave a comment on their website before then, you can help make this happen! Follow the easy directions here:

http://theoatmeal.com/blog/grizzlies_north_cascades

Please go ahead! Then tell your friends to join in, and give them this link. This can be your good deed for the day.

But if you want more details:

Grizzly bears are traditionally the apex predator in the North Cascades. Without the apex predator, the whole ecosystem is thrown out of balance. I know this from my childhood in northern Virginia, where deer are stripping the forest of all low-hanging greenery with no wolves to control them. With the top predator, the whole ecosystem springs to life and starts humming like a well-tuned engine! For example, when wolves were reintroduced in Yellowstone National Park, it seems that even riverbeds were affected:

There are several plans to restore grizzlies to the North Cascades. On the link I recommended, Matthew Inman supports Alternative C — Incremental Restoration. I’m not an expert on this issue, so I went ahead and supported that. There are actually 4 alternatives on the table:

Alternative A — No Action. They’ll keep doing what they’re already doing. The few grizzlies already there would be protected from poaching, the local population would be advised on how to deal with grizzlies, and the bears would be monitored. All other alternatives will do these things and more.

Alternative B — Ecosystem Evaluation Restoration. Up to 10 grizzly bears will be captured from source populations in northwestern Montana and/or south-central British Columbia and released at a single remote site on Forest Service lands in the North Cascades. This will take 2 years, and then they’ll be monitored for 2 years before deciding what to do next.

Alternative C — Incremental Restoration. 5 to 7 grizzly bears will be captured and released into the North Casades each year over roughly 5 to 10 years, with a goal of establishing an initial population of 25 grizzly bears. Bears would be released at multiple remote sites. They can be relocated or removed if they cause trouble. Alternative C is expected to reach the restoration goal of approximately 200 grizzly bears within 60 to 100 years.

Alternative D — Expedited Restoration. 5 to 7 grizzly bears will be captured and released into the North Casades each year until the population reaches about 200, which is what the area can easily support.

So, pick your own alternative if you like!

By the way, the remaining grizzly bears in the western United States live within six recovery zones:

• the Greater Yellowstone Ecosystem (GYE) in Wyoming and southwest Montana,

• the Northern Continental Divide Ecosystem (NCDE) in northwest Montana,

• the Cabinet-Yaak Ecosystem (CYE) in extreme northwestern Montana and the northern Idaho panhandle,

• the Selkirk Ecosystem (SE) in northern Idaho and northeastern Washington,

• the Bitterroot Ecosystem (BE) in central Idaho and western Montana,

• and the North Cascades Ecosystem (NCE) in northwestern and north-central Washington.

The North Cascades Ecosystem consists of 24,800 square kilometers in Washington, with an additional 10,350 square kilometers in British Columbia. In the US, 90% of this ecosystem is managed by the US Forest Service, the US National Park Service, and the State of Washington, and approximately 41% falls within Forest Service wilderness or the North Cascades National Park Service Complex.

For more, read this:

• National Park Service, Draft Grizzly Bear Restoration Plan / Environmental Impact Statement: North Cascades Ecosystem.

The picture of grizzlies is from this article:

• Ron Judd, Why returning grizzlies to the North Cascades is the right thing to do, Pacific NW Magazine, 23 November 2015.

If you’re worried about reintroducing grizzly bears, read it!

The map is from here:

• Krista Langlois, Grizzlies gain ground, High Country News, 27 August 2014.

Here you’ll see the huge obstacles this project has overcome so far.


Pi and the Golden Ratio

7 March, 2017

Two of my favorite numbers are pi:

\pi = 3.14159...

and the golden ratio:

\displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} } = 1.6180339...

They’re related:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Greg Egan and I came up with this formula last weekend. It’s probably not new, and it certainly wouldn’t surprise experts, but it’s still fun coming up with a formula like this. Let me explain how we did it.

History has a fractal texture. It’s not exactly self-similar, but the closer you look at any incident, the more fine-grained detail you see. The simplified stories we learn about the history of math and physics in school are like blurry pictures of the Mandelbrot set. You can see the overall shape, but the really exciting stuff is hidden.

François Viète is a French mathematician who doesn’t show up in those simplified stories. He studied law at Poitiers, graduating in 1559. He began his career as an attorney at a quite high level, with cases involving the widow of King Francis I of France and also Mary, Queen of Scots. But his true interest was always mathematics. A friend said he could think about a single question for up to three days, his elbow on the desk, feeding himself without changing position.

Nonetheless, he was highly successful in law. By 1590 he was working for King Henry IV. The king admired his mathematical talents, and Viète soon confirmed his worth by cracking a Spanish cipher, thus allowing the French to read all the Spanish communications they were able to obtain.

In 1591, François Viète came out with an important book, introducing what is called the new algebra: a symbolic method for dealing with polynomial equations. This deserves to be much better known; it was very familiar to Descartes and others, and it was an important precursor to our modern notation and methods. For example, he emphasized care with the use of variables, and advocated denoting known quantities by consonants and unknown quantities by vowels. (Later people switched to using letters near the beginning of the alphabet for known quantities and letters near the end like x,y,z for unknowns.)

In 1593 he came out with another book, Variorum De Rebus Mathematicis Responsorum, Liber VIII. Among other things, it includes a formula for pi. In modernized notation, it looks like this:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

This is remarkable! First of all, it looks cool. Second, it’s the earliest known example of an infinite product in mathematics. Third, it’s the earliest known formula for the exact value of pi. In fact, it seems to be the earliest formula representing a number as the result of an infinite process rather than of a finite calculation! So, Viète’s formula has been called the beginning of analysis. In his article “The life of pi”, Jonathan Borwein went even further and called Viète’s formula “the dawn of modern mathematics”.

How did Viète come up with his formula? I haven’t read his book, but the idea seems fairly clear. The area of the unit circle is pi. So, you can approximate pi better and better by computing the area of a square inscribed in this circle, and then an octagon, and then a 16-gon, and so on:

If you compute these areas in a clever way, you get this series of numbers:

\begin{array}{ccl} A_4 &=& 2 \\  \\ A_8 &=& 2 \cdot \frac{2}{\sqrt{2}} \\  \\ A_{16} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}}  \\  \\ A_{32} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}}  \end{array}

and so on, where A_n is the area of a regular n-gon inscribed in the unit circle. So, it was only a small step for Viète (though an infinite leap for mankind) to conclude that

\displaystyle{ \pi = 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

or, if square roots in a denominator make you uncomfortable:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

The basic idea here would not have surprised Archimedes, who rigorously proved that

223/71 < \pi < 22/7

by approximating the circumference of a circle using a regular 96-gon. Since 96 = 2^5 \times 3, you can draw a regular 96-gon with ruler and compass by taking an equilateral triangle and bisecting its edges to get a hexagon, bisecting the edges of that to get a 12-gon, and so on up to 96. In a more modern way of thinking, you can figure out everything you need to know by starting with the angle \pi/3 and using half-angle formulas 4 times to work out the sine or cosine of \pi/96. And indeed, before Viète came along, Ludolph van Ceulen had computed pi to 35 digits using a regular polygon with 2^{62} sides! So Viète’s daring new idea was to give an exact formula for pi that involved an infinite process.

Now let’s see in detail how Viète’s formula works. Since there’s no need to start with a square, we might as well start with a regular n-gon inscribed in the circle and repeatedly bisect its sides, getting better and better approximations to pi. If we start with a pentagon, we’ll get a formula for pi that involves the golden ratio!

We have

\displaystyle{ \pi = \lim_{k \to \infty} A_k }

so we can also compute pi by starting with a regular n-gon and repeatedly doubling the number of vertices:

\displaystyle{ \pi = \lim_{k \to \infty} A_{2^k n} }

The key trick is to write A_{2^k}{n} as a ‘telescoping product’:

A_{2^k n} = A_n \cdot \frac{A_{2n}}{A_n} \cdot  \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}}

Thus, taking the limit as k \to \infty we get

\displaystyle{ \pi = A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }

where we start with the area of the n-gon and keep ‘correcting’ it to get the area of the 2n-gon, the 4n-gon, the 8n-gon and so on.

There’s a simple formula for the area of a regular n-gon inscribed in a circle. You can chop it into 2 n right triangles, each of which has base \sin(\pi/n) and height \cos(\pi/n), and thus area n \sin(\pi/n) \cos(\pi/n):

Thus,

A_n = n \sin(\pi/n) \cos(\pi/n) = \displaystyle{\frac{n}{2} \sin(2 \pi / n)}

This lets us understand how the area changes when we double the number of vertices:

\displaystyle{ \frac{A_{n}}{A_{2n}} = \frac{\frac{n}{2} \sin(2 \pi / n)}{n \sin(\pi / n)} = \frac{n \sin( \pi / n) \cos(\pi/n)}{n \sin(\pi / n)} = \cos(\pi/n) }

This is nice and simple, but we really need a recursive formula for this quantity. Let’s define

\displaystyle{ R_n = 2\frac{A_{n}}{A_{2n}} = 2 \cos(\pi/n) }

Why the factor of 2? It simplifies our calculations slightly. We can express R_{2n} in terms of R_n using the half-angle formula for the cosine:

\displaystyle{ R_{2n} = 2 \cos(\pi/2n) = 2\sqrt{\frac{1 + \cos(\pi/n)}{2}} = \sqrt{2 + R_n} }

Now we’re ready for some fun! We have

\begin{array}{ccl} \pi &=& \displaystyle{ A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }  \\ \\ & = &\displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{R_{2n}} \cdot \frac{2}{R_{4n}} \cdots } \end{array}

so using our recursive formula R_{2n} = \sqrt{2 + R_n}, which holds for any n, we get

\pi =  \displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{\sqrt{2 + R_n}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_n}}} \cdots }

I think this deserves to be called the generalized Viète formula. And indeed, if we start with a square, we get

A_4 = \displaystyle{\frac{4}{2} \sin(2 \pi / 4)} = 2

and

R_4 = 2 \cos(\pi/4) = \sqrt{2}

giving Viète’s formula:

\pi = \displaystyle{ 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

as desired!

But what if we start with a pentagon? For this it helps to remember a beautiful but slightly obscure trig fact:

\cos(\pi / 5) = \Phi/2

and a slightly less beautiful one:

\displaystyle{ \sin(2\pi / 5) = \frac{1}{2} \sqrt{2 + \Phi} }

It’s easy to prove these, and I’ll show you how later. For now, note that they imply

A_5 = \displaystyle{\frac{5}{2} \sin(2 \pi / 5)} = \frac{5}{4} \sqrt{2 + \Phi}

and

R_5 = 2 \cos(\pi/5) = \Phi

Thus, the formula

\pi =  \displaystyle{ A_5 \cdot \frac{2}{R_5} \cdot \frac{2}{\sqrt{2 + R_5}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_5}}} \cdots }

gives us

\pi =  \displaystyle{ \frac{5}{4} \sqrt{2 + \Phi} \cdot \frac{2}{\Phi} \cdot \frac{2}{\sqrt{2 + \Phi}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdots }

or, cleaning it up a bit, the formula we want:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Voilà!

There’s a lot more to say, but let me just explain the slightly obscure trigonometry facts we needed. To derive these, I find it nice to remember that a regular pentagon, and the pentagram inside it, contain lots of similar triangles:



Using the fact that all these triangles are similar, it’s easy to show that for any one, the ratio of the long side to the short side is \Phi to 1, since

\displaystyle{\Phi = 1 + \frac{1}{\Phi} }

Another important fact is that the pentagram trisects the interior angle of the regular pentagon, breaking the interior angle of 108^\circ = 3\pi/5 into 3 angles of 36^\circ = \pi/5:



Again this is easy and fun to show.

Combining these facts, we can prove that

\displaystyle{ \cos(2\pi/5) = \frac{1}{2\Phi}  }

and

\displaystyle{ \cos(\pi/5) = \frac{\Phi}{2} }

To prove the first equation, chop one of those golden triangles into two right triangles and do things you learned in high school. To prove the second, do the same things to one of the short squat isosceles triangles:

Starting from these equations and using \cos^2 \theta + \sin^2 \theta = 1, we can show

\displaystyle{ \sin(2\pi/5) = \frac{1}{2}\sqrt{2 + \Phi}}

and, just for completeness (we don’t need it here):

\displaystyle{ \sin(\pi/5) = \frac{1}{2}\sqrt{3 - \Phi}}

These require some mildly annoying calculations, where it helps to use the identity

\displaystyle{\frac{1}{\Phi^2} = 2 - \Phi }

Okay, that’s all for now! But if you want more fun, try a couple of puzzles:

Puzzle 1. We’ve gotten formulas for pi starting from a square or a regular pentagon. What formula do you get starting from an equilateral triangle?

Puzzle 2. Using the generalized Viète formula, prove Euler’s formula

\displaystyle{  \frac{\sin x}{x} = \cos\frac{x}{2} \cdot \cos\frac{x}{4} \cdot \cos\frac{x}{8} \cdots }

Conversely, use Euler’s formula to prove the generalized Viète formula.

So, one might say that the real point of Viète’s formula, and its generalized version, is not any special property of pi, but Euler’s formula.


Saving Climate Data (Part 6)

23 February, 2017

Scott Pruitt, who filed legal challenges against Environmental Protection Agency rules fourteen times, working hand in hand with oil and gas companies, is now head of that agency. What does that mean about the safety of climate data on the EPA’s websites? Here is an inside report:

• Dawn Reeves, EPA preserves Obama-Era website but climate change data doubts remain, InsideEPA.com, 21 February 2017.

For those of us who are backing up climate data, the really important stuff is in red near the bottom.

The EPA has posted a link to an archived version of its website from Jan. 19, the day before President Donald Trump was inaugurated and the agency began removing climate change-related information from its official site, saying the move comes in response to concerns that it would permanently scrub such data.

However, the archived version notes that links to climate and other environmental databases will go to current versions of them—continuing the fears that the Trump EPA will remove or destroy crucial greenhouse gas and other data.

The archived version was put in place and linked to the main page in response to “numerous [Freedom of Information Act (FOIA)] requests regarding historic versions of the EPA website,” says an email to agency staff shared by the press office. “The Agency is making its best reasonable effort to 1) preserve agency records that are the subject of a request; 2) produce requested agency records in the format requested; and 3) post frequently requested agency records in electronic format for public inspection. To meet these goals, EPA has re-posted a snapshot of the EPA website as it existed on January 19, 2017.”

The email adds that the action is similar to the snapshot taken of the Obama White House website.

The archived version of EPA’s website includes a “more information” link that offers more explanation.

For example, it says the page is “not the current EPA website” and that the archive includes “static content, such as webpages and reports in Portable Document Format (PDF), as that content appeared on EPA’s website as of January 19, 2017.”

It cites technical limits for the database exclusions. “For example, many of the links contained on EPA’s website are to databases that are updated with the new information on a regular basis. These databases are not part of the static content that comprises the Web Snapshot.” Searches of the databases from the archive “will take you to the current version of the database,” the agency says.

“In addition, links may have been broken in the website as it appeared” on Jan. 19 and those will remain broken on the snapshot. Links that are no longer active will also appear as broken in the snapshot.

“Finally, certain extremely large collections of content… were not included in the Snapshot due to their size” such as AirNow images, radiation network graphs, historic air technology transfer network information, and EPA’s searchable news releases.”

‘Smart’ Move

One source urging the preservation of the data says the snapshot appears to be a “smart” move on EPA’s behalf, given the FOIA requests it has received, and notes that even though other groups like NextGen Climate and scientists have been working to capture EPA’s online information, having it on EPA’s site makes it official.

But it could also be a signal that big changes are coming to the official Trump EPA site, and it is unclear how long the agency will maintain the archived version.

The source says while it is disappointing that the archive may signal the imminent removal of EPA’s climate site, “at least they are trying to accommodate public concerns” to preserve the information.

A second source adds that while it is good that EPA is seeking “to address the widespread concern” that the information will be removed by an administration that does not believe in human-caused climate change, “on the other hand, it doesn’t address the primary concern of the data. It is snapshots of the web text.” Also, information “not included,” such as climate databases, is what is difficult to capture by outside groups and is what really must be preserved.

“If they take [information] down” that groups have been trying to preserve, then the underlying concern about access to data remains. “Web crawlers and programs can do things that are easy,” such as taking snapshots of text, “but getting the data inside the database is much more challenging,” the source says.

The first source notes that EPA’s searchable databases, such as those maintained by its Clean Air Markets Division, are used by the public “all the time.”

The agency’s Office of General Counsel (OGC) Jan. 25 began a review of the implications of taking down the climate page—a planned wholesale removal that was temporarily suspended to allow for the OGC review.

But EPA did remove some specific climate information, including links to the Clean Power Plan and references to President Barack Obama’s Climate Action Plan. Inside EPA captured this screenshot of the “What EPA Is Doing” page regarding climate change. Those links are missing on the Trump EPA site. The archive includes the same version of the page as captured by our screenshot.

Inside EPA first reported the plans to take down the climate information on Jan. 17.

After the OGC investigation began, a source close to the Trump administration said Jan. 31 that climate “propaganda” would be taken down from the EPA site, but that the agency is not expected to remove databases on GHG emissions or climate science. “Eventually… the propaganda will get removed…. Most of what is there is not data. Most of what is there is interpretation.”

The Sierra Club and Environmental Defense Fund both filed FOIA requests asking the agency to preserve its climate data, while attorneys representing youth plaintiffs in a federal climate change lawsuit against the government have also asked the Department of Justice to ensure the data related to its claims is preserved.

The Azimuth Climate Data Backup Project and other groups are making copies of actual databases, not just the visible portions of websites.