Saving Climate Data (Part 1)


I try to stay out of politics on this website. This post is not mainly about politics. It’s a call to action. We’re trying to do something rather simple and clearly worthwhile. We’re trying to create backups of US government climate data.

The background is, of course, political. Many signs point to a dramatic change in US climate policy:

• Oliver Milman, Trump’s transition: sceptics guide every agency dealing with climate change, The Guardian, 12 December 2016.

So, scientists are now backing up large amounts of climate data, just in case the Trump administration tries to delete it after he takes office on January 20th:

• Brady Dennis, Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump, Washington Post, 13 December 2016.

Of course saving the data publicly available on US government sites is not nearly as good as keeping climate programs fully funded! New data is coming in all the time from satellites and other sources. We need it—and we need the experts who understand it.

Also, it’s possible that the Trump administration won’t go so far as trying to delete big climate science databases. Still, I think it can’t be a bad thing to have backups. Or as my mother always said: better safe than sorry!

Quoting the Washington Post article:

Alarmed that decades of crucial climate measurements could vanish under a hostile Trump administration, scientists have begun a feverish attempt to copy reams of government data onto independent servers in hopes of safeguarding it from any political interference.

The efforts include a “guerrilla archiving” event in Toronto, where experts will copy irreplaceable public data, meetings at the University of Pennsylvania focused on how to download as much federal data as possible in the coming weeks, and a collaboration of scientists and database experts who are compiling an online site to harbor scientific information.

“Something that seemed a little paranoid to me before all of a sudden seems potentially realistic, or at least something you’d want to hedge against,” said Nick Santos, an environmental researcher at the University of California at Davis, who over the weekend began copying government climate data onto a nongovernment server, where it will remain available to the public. “Doing this can only be a good thing. Hopefully they leave everything in place. But if not, we’re planning for that.”


“What are the most important .gov climate assets?” Eric Holthaus, a meteorologist and self-proclaimed “climate hawk,” tweeted from his Arizona home Saturday evening. “Scientists: Do you have a US .gov climate database that you don’t want to see disappear?”

Within hours, responses flooded in from around the country. Scientists added links to dozens of government databases to a Google spreadsheet. Investors offered to help fund efforts to copy and safeguard key climate data. Lawyers offered pro bono legal help. Database experts offered to help organize mountains of data and to house it with free server space. In California, Santos began building an online repository to “make sure these data sets remain freely and broadly accessible.”

In Philadelphia, researchers at the University of Pennsylvania, along with members of groups such as Open Data Philly and the software company Azavea, have been meeting to figure out ways to harvest and store important data sets.

At the University of Toronto this weekend, researchers are holding what they call a “guerrilla archiving” event to catalogue key federal environmental data ahead of Trump’s inauguration. The event “is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted,” the organizers said. “This includes climate change, water, air, toxics programs.”

The event is part of a broader effort to help San Francisco-based Internet Archive with its End of Term 2016 project, an effort by university, government and nonprofit officials to find and archive valuable pages on federal websites. The project has existed through several presidential transitions.

I hope that small “guerilla archiving” efforts will be dwarfed by more systematic work, because it’s crucial that databases be copied along with all relevant metadata—and some sort of cryptographic certificate of authenticity, if possible. However, getting lots of people involved is bound to be a good thing, politically speaking.

If you have good computer skills, good understanding of databases, or lots of storage space, please get involved. Efforts are being coordinated by Barbara Wiggin and others at the Data Refuge Project:

• PPEHLab (Penn Program in the Environmental Humanities), DataRefuge.

You can contact them at Nick Santos is also involved, and if you want to get “more plugged into the project” you can contact him here. They are trying to build a climate database mirror website here:

Climate Mirror.

At the help form on this website you can nominate a dataset for rescue, claim a dataset to rescue, let them know about a data rescue event, or help in some other way (which you must specify).

PPEHLab and Penn Libraries are organizing a data rescue event this Thursday:

• PPEHLab, DataRefuge meeting, 14 December 2016.

At the American Geophysical Union meeting in San Francisco, where more than 20,000 earth and climate scientists gather from around the world, there was a public demonstration today starting at 1:30 PST:

Rally to stand up for science, 13 December 2016.

And the “guerilla archiving” hackathon in Toronto is this Saturday—see below. If you know people with good computer skills in Toronto, get them to check it out!

To follow progress, also read Eric Holthaus’s tweets and replies here:

Eric Holthaus.

Guerrilla archiving in Toronto

Here are details on this:

Guerrilla Archiving Hackathon

Date: 10am-4pm, December 17, 2016

Location: Bissell Building, 4th Floor, 140 St. George St. University of Toronto

RSVP and up-to-date information: Guerilla archiving: saving environmental data from Trump.

Bring: laptops, power bars, and snacks. Coffee and pizza provided.

This event collaborates with the Internet Archive’s End of Term 2016 project, which seeks to archive the federal online pages and data that are in danger of disappearing during the Trump administration. Our event is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted. This includes climate change, water, air, toxics programs. This project is urgent because the Trump transition team has identified the EPA and other environmental programs as priorities for the chopping block.

The Internet Archive is a San Francisco-based nonprofit digital library which aims at preserving and making universally accessible knowledge. Its End of Term web archive captures and saves U.S. Government websites that are at risk of changing or disappearing altogether during government transitions. The Internet Archive has asked volunteers to help select and organize information that will be preserved before the Trump transition.

End of Term web archive:

New York Times article: “Harvesting Government History, One Web Page at a Time


Identifying endangered programs and data

Seeding the End of Term webcrawler with priority URLs

Identifying and mapping the location of inaccessible environmental databases

Hacking scripts to make accessible to the webcrawler hard to reach databases.

Building a toolkit so that other groups can hold similar events

Skills needed: We need all kinds of people — and that means you!

People who can locate relevant webpages for the Internet Archive’s webcrawler

People who can identify data targeted for deletion by the Trump transition team and the organizations they work with

People with knowledge of government websites and information, including the EPA

People with library and archive skills

People who are good at navigating databases

People interested in mapping where inaccessible data is located at the EPA

Hackers to figure out how to extract data and URLs from databases (in a way that Internet Archive can use)

People with good organization and communication skills

People interested in creating a toolkit for reproducing similar events


15 Responses to Saving Climate Data (Part 1)

  1. mjg0 says:

    How do you obtain next December’s Guardian and WaPo articles?

  2. Bruce Smith says:

    Hackers to figure out how to extract data and URLs from databases (in a way that Internet Archive can use)

    I’m sure (I hope) that by “hackers” they mean “good creative programmers” rather than “people skilled at illegal means of access”. Given various recent events, it would be good if they’d clarify that.

  3. John Baez says:

    We’re having a lively discussion on my G+ post. The most useful comment so far was by MK Taylor:

    Former digital archivist here, if the groups involved with making backups of all gov data haven’t considered it, I’d strongly recommend trying to coordinate/contact the Society of American Archivists. There are groups who have been working for decades on issues related to long term archiving/preservation of digital only assets. Including scientific data. Primary SIG would be the Electronic Records Section, Metadata and Digital Objects Roundtable

    I put Nancy Beaumont, executive director of the Society of American Archivists, in touch with Bethany Wiggin, director of Penn Program in Environmental Humanities, which is leading the DataRefuge project. They are both very eager to have each other’s help, and we’re going to have a conference call on Thursday.

  4. Keith McClary says:

    This inspired me to donate to Internet Archive.

  5. The Sage says:

    There are already guerilla archives of the various published series — they get used frequently on anti-alarmist sites like Watts Up With That to show how each new series with further corrections and adjustments has somehow managed to cool the past and warm the present yet further.

    • John Baez says:

      There’s a lot more data than the famous time series of global temperatures or even the larger amounts of temperature data those are based on. There’s carbon dioxide concentration data, sea ice data, sea level data, ocean pH and salinity data, air pollution data, hurricane data, etc. etc. etc.

  6. Carl says:

    Like Sage said, archiving is very bad when you want to change the past..

    NOAA global report for 1997 stated that global avg for 1997 was 16.92C (62.45F) without much alarming language.

    NOAA report for 2015 stated that global avg was 14.8 C which a lot of alarm in the report, 2C global cooling.. Inconvenient truth, let’s hope this can be erased somehow..


    • John Baez says:

      Archiving is very good if you want to know the truth. The fact that 1997, one of the biggest El Niño years in recent history, had a higher global average temperature than 2005 is not a secret:

      1997 sticks up as a bump in this graph from GISTEMP. That’s why you chose it to argue for global cooling. That’s called cherry-picking. You can see it more clearly on the monthly averages:

      This graph ends on February 2016, another huge El Niño spike. But the overall trend is up.

      I’ll delete further global warming denial on this thread. I want to discuss ways of saving climate data.

  7. domenico says:

    Google have project to show to the public some access to image data, for example Google project art and Google books.
    I think that these are project to allow the public to obtain free access to lots of data.
    Having the software, the servers and storage space, it could be useful to use its resources to get the free distribution of scientific data (not only climate, not to alienate a political party) to create a reliable tool used by all the world’s scientists: the gains are obtained from the number of the users, and would be a method to bring on its platforms other users, and some billionaire could finance the project.
    I worried that it would be easy to attack not independent sources of data manipulation.

  8. I want to get you involved in the Azimuth Environmental Data Backup Project. But first here’s a post with some background.

    Starting a few days ago, many scientists, librarians, archivists, computer geeks and environmental activists have started to make backups of US government environmental data. We’re trying to beat the January 20th deadline just in case these backups are required.

    Backing up data is always a good thing, so there’s no point in arguing about politics or the likelihood that these backups are needed. The present situation is just a nice reason to hurry up and do some things we should have been doing anyway.

    As of 2 days ago the story looked like this:

    Saving climate data (Part 1), Azimuth, 13 December 2016.

    A lot has happened since then, but much more needs to be done.

  9. It’s about time. I suggested using multiple mirrors to prevent censorship years ago.

  10. […] Uniti, dove Eric Holthaus ha lanciato Data Refuge, con l'aiuto di università locali e non, e John Baez l'Azimuth Backup Project che sembra andare […]

  11. I think we’re doing well. The Climate Mirror effort shows how well we are doing overall, and their reports do not reflect the substantial captures this project, here, has been able to achieve, despite technical obstacles.

    It’s a good time!

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.