Azimuth Backup Project (Part 3)

22 January, 2017


azimuth_logo

Along with the bad news there is some good news:

• Over 380 people have pledged over $14,000 to the Azimuth Backup Project on Kickstarter, greatly surpassing our conservative initial goal of $5,000.

• Given our budget, we currently aim at backing up 40 terabytes of data, and we are well on our way to this goal. You can see what we’ve done at Our Progress, and what we’re still doing at the Issue Tracker.

• I have gotten a commitment from Danna Gianforte, the head of Computing and Communications at U. C. Riverside, that eventually the university will maintain a copy of our data. (This commitment is based on my earlier estimate that we’d have 20 terabytes of data, so I need to see if 40 is okay.)

• I have gotten two offers from other people, saying they too can hold our data.

I’m hoping that the data at U. C. Riverside will be made publicly available through a server. The other offers may involve it being held ‘secretly’ until such time as it became needed; that has its own complementary advantages.

However, the interesting problem that confronts us now is: how to spend our money?

You can see how we’re currently spending it on our Budget and Spending page. Basically, we’re paying a firm called Hetzner for servers and storage boxes.

We could simply continue to do this until our money runs out. I hope that long before then, U. C. Riverside will have taken over some responsibilities. If so, there would be a long period where our money would largely pay for a redundant backup. Redundancy is good, but perhaps there is something better.

Two members of our team, Sakari Maaranen and Greg Kochanski, have thoughts on this matter which I’d like to share. Sakari posted his thoughts on Google+, while Greg posted his in an email which he’s letting me share here.

Please read these and offer us your thoughts! Maybe you can help us decide on the best strategy!

Sakari Maaranen

For the record, my views on our strategy of using the budget that the Azimuth Climate Data Backup Project now has.

People have contributed it to this effort specifically.

Some non-government entities have offered “free hosting”. Of course the project should take any and all free offers to host our data. Those would not be spending our budget however. And they are still paying for it, even if they offered it to us “for free”.

As far as it comes to spending, I think we should think in terms of 1) terabytemonths, and 2) sufficient redundancy, and do that as cost-efficiently as possible. We should not just dump the money to any takers, but think of the best bang for the buck. We owe that to the people who have contributed now.

For example, if we burn the cash quick to expensive storage, I would consider that a failure. Instead, we must plan for the best use of the budget towards our mission.

What we have promised to the people is that we back up and serve these data sets, by the money they have given to us. Let’s do exactly that.

We are currently serving the mission at approximately €0.006 per gigabytemonth at least for as long as we have volunteers to work for free. The cost could be slightly higher if we paid for professional maintenance, which should be a reasonable assumption if we plan for long term service. Volunteer work cannot be guaranteed forever, even if it works temporarily.

This is one view and the question is open to public discussion.

Greg Kochanski

Some misc thoughts.

1) As I see it, we have made some promise of serving the data (“create a better interface for getting it”) which can be an expensive thing.

UI coding isn’t all that easy, and takes some time.

Beyond that, we’ve promised to back up the data, and once you say “backup”, you’ve also made an implicit promise to make the data available.

2) I agree that if we have a backup, it is a logical extension to take continuous backups, but I wouldn’t say it’s necessary.

Perhaps the way to think about it is to ask the question, “what do our donors likely want”?

3) Clearly they want to preserve the data, in case it disappears from the Federal sites. So, that’s job 1. And, if it does disappear, we need to make it available.

3a) Making it available will require some serving CPU, disk, and network. We may need to worry about DDOS attacks, thought perhaps we could get free coverage from Akamai or Google Project Shield.

3b) Making it available may imply paying some students to write Javascript and HTML to put up a front-end to allow people to access the data we are collecting.

Not all the data we’re collecting is in strictly servable form. Some of the databases, for example aren’t usefully servable in the form we collect, and we know some links will be broken because of missing pages, or because of wget’s design flaw.*

[* Wget stores http://a/b/c as a file, a/b/c, where a/b is a directory. Wget stores http://a/b as a file a/b, where a/b is a file.

Therefore, both cannot exist simultaneously on disk. If they do, wget drops one.]

Points 3 & 3a imply that we need to keep some money in the bank until either the websites are taken down, or we decide that the threat has abated. So, we need to figure out how much money to keep as a serving reserve. It doesn’t sound like UCR has committed to serve the data, though you could perhaps ask.

Beyond the serving reserve, I think we are free to do better backups (i.e. more than one data collection), and change detection.


Saving Climate Data (Part 4)

21 January, 2017

At noon today in Washington DC, while Trump was being inaugurated, all mentions of “climate change” and “global warming” were eliminated from the White House website.

Well, not all. The word “climate” still shows up here:

President Trump is committed to eliminating harmful and unnecessary policies such as the Climate Action Plan….

There are also reports that all mentions of climate change will be scrubbed from the website of the Environmental Protection Agency, or EPA.

From Motherboard

Let me quote from this article:

• Jason Koebler, All references to climate change have been deleted from the White House website, Motherboard, 20 January 2017.

Scientists and professors around the country had been rushing to download and rehost as much government science as was possible before the transition, based on a fear that Trump’s administration would neglect or outright delete government information, databases, and web applications about science. Last week, the Radio Motherboard podcast recorded an episode about these efforts, which you can listen to below, or anywhere you listen to podcasts.

The Internet Archive, too, has been keeping a close watch on the White House website; President Obama’s climate change page had been archived every single day in January.

So far, nothing on the Environmental Protection Agency’s website has changed under Trump, but a report earlier this week from Inside EPA, a newsletter and website that reports on the agency, suggested that pages about climate are destined to be cut within the first few weeks of his presidency.

Scientists I’ve spoken to who are archiving websites say they expect scientific data on the NASA, NOAA, Department of Energy, and EPA websites to be neglected or deleted eventually. They say they don’t expect agency sites to be updated immediately, but expect it to play out over the course of months. This sort of low-key data destruction might not be the type of censorship people typically think about, but scientists are treating it as such.

From Technology Review

Greg Egan pointed out another good article, on MIT’s magazine:

• James Temple, Climate data preservation efforts mount as Trump takes office, Technology Review, 20 January 2010.

Quoting from that:

Dozens of computer science students at the University of California, Los Angeles, will mark Inauguration Day by downloading federal climate databases they fear could vanish under the Trump Administration.

Friday’s hackathon follows a series of grassroots data preservation efforts in recent weeks, amid increasing concerns the new administration is filling agencies with climate deniers likely eager to cut off access to scientific data that undermine their policy views. Those worries only grew earlier this week, when Inside EPA reported website that the Environmental Protection Agency transition team plans to scrub climate data from the agency’s website, citing a source familiar with the team.

Earlier federal data hackathons include the “Guerrilla Archiving” event at the University of Toronto last month, the Internet Archive’s Gov Data Hackathon in San Francisco at the beginning of January, and the DataRescue Philly event at the University of Pennsylvania last week.

Much of the collected data is being stored in the servers of the End of Term Web Archive, a collaborative effort to preserve government websites at the conclusion of presidential terms. The University of Pennsylvania’s Penn Program in Environmental Humanities launched the separate DataRefuge project, in part to back up environmental data sets that standard Web crawling tools can’t collect.

Many of the groups are working off a master list of crucial data sets from NASA, the National Oceanic and Atmospheric Administration, the U.S. Geological Survey, and other agencies. Meteorologist and climate journalist Eric Holthaus helped prompt the creation of that crowdsourced list with a tweet early last month.

Other key developments driving the archival initiatives included reports that the transition team had asked Energy Department officials for a list of staff who attended climate change meetings in recent years, and public statements from senior campaign policy advisors arguing that NASA should get out of the business of “politically correct environmental monitoring.”

“The transition team has given us no reason to believe that they will respect scientific data, particularly when it’s inconvenient,” says Gretchen Goldman, research director in the Center for Science and Democracy at the Union of Concerned Scientists. These historical databases are crucial to ongoing climate change research in the United States and abroad, she says.

To be clear, the Trump camp hasn’t publicly declared plans to erase or eliminate access to the databases. But there is certainly precedent for state and federal governments editing, removing, or downplaying scientific information that doesn’t conform to their political views.

Late last year, it emerged that text on Wisconsin’s Department of Natural Resources website was substantially rewritten to remove references to climate change. In addition, an extensive Congressional investigation concluded in a 2007 report that the Bush Administration “engaged in a systematic effort to manipulate climate change science and mislead policymakers and the public about the dangers of global warming.”

In fact these Bush Administration efforts were masterminded by Myron Ebell, who Trump chose to lead his EPA transition team!

Continuing:

In fact, there are wide-ranging changes to federal websites with every change in administration for a variety of reasons. The Internet Archive, which collaborated on the End of Term project in 2008 and 2012 as well, notes that more than 80 percent of PDFs on .gov sites disappeared during that four-year period.

The organization has seen a surge of interest in backing up sites and data this year across all government agencies, but particularly for climate information. In the end, they expect to collect well more than 100 terabytes of data, close to triple the amount in previous years, says Jefferson Bailey, director of Web archiving.

In fact the Azimuth Backup Project alone may gather about 40 terabytes!

From Inside EPA

And then there’s this view from inside the Environmental Protection Agency:

• Dawn Reeves, Trump transition preparing to scrub some climate data from EPA Website, Inside EPA, January 17, 2017

The incoming Trump administration’s EPA transition team intends to remove non-regulatory climate data from the agency’s website, including references to President Barack Obama’s June 2013 Climate Action Plan, the strategies for 2014 and 2015 to cut methane and other data, according to a source familiar with the transition team.

Additionally, Obama’s 2013 memo ordering EPA to establish its power sector carbon pollution standards “will not survive the first day,” the source says, a step that rule opponents say is integral to the incoming administration’s pledge to roll back the Clean Power Plan and new source power plant rules.

The Climate Action Plan has been the Obama administration’s government-wide blueprint for addressing climate change and includes information on cutting domestic greenhouse gas (GHG)emissions, including both regulatory and voluntary approaches; information on preparing for the impacts of climate change; and information on leading international efforts.

The removal of such information from EPA’s website — as well as likely removal of references to such programs that link to the White House and other agency websites — is being prepped now.

The transition team’s preparations fortify concerns from agency staff, environmentalists and many scientists that the Trump administration is going to destroy reams of EPA and other agencies’ climate data. Scientists have been preparing for this possibility for months, with many working to preserve key data on private websites.

Environmentalists are also stepping up their efforts to preserve the data. The Sierra Club Jan. 13 filed a Freedom of Information Act request seeking reams of climate-related data from EPA and the Department of Energy (DOE), including power plant GHG data. Even if the request is denied, the group said it should buy them some time.

“We’re interested in trying to download and preserve the information, but it’s going to take some time,” Andrea Issod, a senior attorney with the Sierra Club, told Bloomberg. “We hope our request will be a counterweight to the coming assault on this critical pollution and climate data.”

While Trump has pledged to take a host of steps to roll back Obama EPA climate and other high-profile actions actions on his first day in office, transition and other officials say the date may slip.

“In truth, it might not [happen] on the first day, it might be a week,” the source close to the transition says of the removal of climate information from EPA’s website. The source adds that in addition to EPA, the transition team is also looking at such information on the websites of DOE and the Interior Department.

Additionally, incoming Trump press secretary Sean Spicer told reporters Jan. 17 that not much may happen on Inauguration Day itself, but to expect major developments the following Monday, Jan. 23. “I think on [Jan. 23] you’re going to see a big flurry of activity” that is expected to include the disappearance of at least some EPA climate references.

Until Trump is inaugurated on Jan. 20, the transition team cannot tell agency staff what to do, and the source familiar with the transition team’s work is unaware of any communications requiring language removal or beta testing of websites happening now, though it appears that some of this work is occurring.

“We can only ask for information at this point until we are in charge. On [Jan. 20] at about 2 o’clock, then they can ask [staff] to” take actions, the source adds.

Scope & Breadth

The scope and breadth of the information to be removed is unclear. While it is likely to include executive actions on climate, it does not appear that the reams of climate science information, including models, tools and databases on the EPA Office of Research & Development’s (ORD) website will be impacted, at least not immediately.

ORD also has published climate, air and energy strategic research action plans, including one for 2016-2019 that includes research to assess impacts; prevent and reduce emissions; and prepare for and respond to changes in climate and air quality.

But other EPA information maintained on its websites including its climate change page and its “What is EPA doing about climate change” page that references the Climate Action Plan, the 2014 methane strategy and a 2015 oil and gas methane reduction strategy are expected targets.

Another possible target is new information EPA just compiled—and hosted a Jan. 17 webinar to discuss—on climate change impacts to vulnerable communities.

One former EPA official who has experience with transitions says it is unlikely that any top Obama EPA official is on board with this. “I would think they would be violently against this. . . I would think that the last thing [EPA Administrator] Gina McCarthy would want to do would to be complicit in Trump’s effort to purge the website” of climate-related work, and that if she knew she would “go ballistic.”

But the former official, the source close to the transition team and others note that EPA career staff is fearful and may be undertaking such prep work “as a defensive maneuver to avoid getting targeted,” the official says, adding that any directive would likely be coming from mid-level managers rather than political appointees or senior level officials.

But while the former official was surprised that such work might be happening now, the fact that it is only said to be targeting voluntary efforts “has a certain ring of truth to it. Someone who is knowledgeable would draw that distinction.”

Additionally, one science advocate says, “The people who are running the EPA transition have a long history of sowing misunderstanding about climate change and they tend to believe in a vast conspiracy in the scientific community to lie to the public. If they think the information is truly fraudulent, it would make sense they would try to scrub it. . . . But the role of the agency is to inform the public . . . [and not to satisfy] the musings of a band of conspiracy theorists.”

The source was referring to EPA transition team leader Myron Ebell, a long-time climate skeptic at the Competitive Enterprise Institute, along with David Schnare, another opponent of climate action, who is at the Energy & Environment Legal Institute.

And while “a new administration has the right to change information about policy, what they don’t have the right to do is change the scientific information about policies they wish to put forward and that includes removing resources on science that serve the public.”

The advocate adds that many state and local governments rely on EPA climate information.

EPA Concern

But there has been plenty of concern that such a move would take place, especially after transition team officials last month sought the names of DOE employees who worked on climate change, raising alarms and cries of a “political witch hunt” along with a Dec. 13 letter from Sen. Maria Cantwell (D-WA) that prompted the transition team to disavow the memo.

Since then, scientists have been scrambling to preserve government data.

On Jan. 10, High Country News reported that on a Saturday last month, 150 technology specialists, hackers, scholars and activists assembled in Toronto for the “Guerrilla Archiving Event: Saving Environmental Data from Trump” where the group combed the internet for key climate and environmental data from EPA’s website.

“A giant computer program would then copy the information onto an independent server, where it will remain publicly accessible—and safe from potential government interference.”

The organizer of the event, Henry Warwick, said, “Say Trump firewalls the EPA,” pulling reams of information from public access. “No one will have access to the data in these papers” unless the archiving took place.

Additionally, the Union of Concerned Scientists released a Jan. 17 report, “Preserving Scientific Integrity in Federal Policy Making,” urging the Trump administration to retain scientific integrity. It wrote in a related blog post, “So how will government science fare under Trump? Scientists are not just going to wait and see. More than 5,500 scientists have now signed onto a letter asking the president-elect to uphold scientific integrity in his administration. . . . We know what’s at stake. We’ve come too far with scientific integrity to see it unraveled by an anti-science president. It’s worth fighting for.”


Give the Earth a Present: Help Us Save Climate Data

28 December, 2016

getz_ice_shelf

We’ve been busy backing up climate data before Trump becomes President. Now you can help too, with some money to pay for servers and storage space. Please give what you can at our Kickstarter campaign here:

Azimuth Climate Data Backup Project.

If we get $5000 by the end of January, we can save this data until we convince bigger organizations to take over. If we don’t get that much, we get nothing. That’s how Kickstarter works. Also, if you donate now, you won’t be billed until January 31st.

So, please help! It’s urgent.

I will make public how we spend this money. And if we get more than $5000, I’ll make sure it’s put to good use. There’s a lot of work we could do to make sure the data is authenticated, made easily accessible, and so on.

The idea

The safety of US government climate data is at risk. Trump plans to have climate change deniers running every agency concerned with climate change. So, scientists are rushing to back up the many climate databases held by US government agencies before he takes office.

We hope he won’t be rash enough to delete these precious records. But: better safe than sorry!

The Azimuth Climate Data Backup Project is part of this effort. So far our volunteers have backed up nearly 1 terabyte of climate data from NASA and other agencies. We’ll do a lot more! We just need some funds to pay for storage space and a server until larger institutions take over this task.

The team

Jan Galkowski is a statistician with a strong interest in climate science. He works at Akamai Technologies, a company responsible for serving at least 15% of all web traffic. He began downloading climate data on the 11th of December.

• Shortly thereafter John Baez, a mathematician and science blogger at U. C. Riverside, joined in to publicize the project. He’d already founded an organization called the Azimuth Project, which helps scientists and engineers cooperate on environmental issues.

• When Jan started running out of storage space, Scott Maxwell jumped in. He used to work for NASA—driving a Mars rover among other things—and now he works for Google. He set up a 10-terabyte account on Google Drive and started backing up data himself.

• A couple of days later Sakari Maaranen joined the team. He’s a systems architect at Ubisecure, a Finnish firm, with access to a high-bandwidth connection. He set up a server, he’s downloading lots of data, he showed us how to authenticate it with SHA-256 hashes, and he’s managing many other technical aspects of this project.

There are other people involved too. You can watch the nitty-gritty details of our progress here:

Azimuth Backup Project – Issue Tracker.

and you can learn more here:

Azimuth Climate Data Backup Project.


Saving Climate Data (Part 3)

23 December, 2016

You can back up climate data, but how can anyone be sure your backups are accurate? Let’s suppose the databases you’ve backed up have been deleted, so that there’s no way to directly compare your backup with the original. And to make things really tough, let’s suppose that faked databases are being promoted as competitors with the real ones! What can you do?

One idea is ‘safety in numbers’. If a bunch of backups all match, and they were made independently, it’s less likely that they all suffer from the same errors.

Another is ‘safety in reputation’. If a bunch of backups of climate data are held by academic institutes of climate science, and another are held by climate change denying organizations (conveniently listed here), you probably know which one you trust more. (And this is true even if you’re a climate change denier, though your answer may be different than mine.)

But a third idea is to use a cryptographic hash function. In very simplified terms, this is a method of taking a database and computing a fairly short string from it, called a ‘digest’.

740px-cryptographic_hash_function-svg

A good hash function makes it hard to change the database and get a new one with the same digest. So, if the person owning a database computes and publishes the digest, anyone can check that your backup is correct by computing its digest and comparing it to the original.

It’s not foolproof, but it works well enough to be helpful.

Of course, it only works if we have some trustworthy record of the original digest. But the digest is much smaller than the original database: for example, in the popular method called SHA-256, the digest is 256 bits long. So it’s much easier to make copies of the digest than to back up the original database. These copies should be stored in trustworthy ways—for example, the Internet Archive.

When Sakari Maraanen made a backup of the University of Idaho Gridded Surface Meteorological Data, he asked the custodians of that data to publish a digest, or ‘hash file’. One of them responded:

Sakari and others,

I have made the checksums for the UofI METDATA/gridMET files (1979-2015) as both md5sums and sha256sums.

You can find these hash files here:

https://www.northwestknowledge.net/metdata/data/hash.md5

https://www.northwestknowledge.net/metdata/data/hash.sha256

After you download the files, you can check the sums with:

md5sum -c hash.md5

sha256sum -c hash.sha256

Please let me know if something is not ideal and we’ll fix it!

Thanks for suggesting we do this!

Sakari replied:

Thank you so much! This means everything to public mirroring efforts. If you’d like to help further promoting this Best Practice, consider getting it recognized as a standard when you do online publishing of key public information.

1. Publishing those hashes is already a major improvement on its own.

2. Publishing them on a secure website offers people further guarantees that there has not been any man-in-the-middle.

3. Digitally signing the checksum files offers the best easily achievable guarantees of data integrity by the person(s) who sign the checksum files.

Please consider having these three steps included in your science organisation’s online publishing training and standard Best Practices.

Feel free to forward this message to whom it may concern. Feel free to rephrase as necessary.

As a separate item, public mirroring instructions for how to best download your data and/or public websites would further guarantee permanence of all your uniquely valuable science data and public contributions.

Right now we should get this message viral through the government funded science publishing people. Please approach the key people directly – avoiding the delay of using official channels. We need to have all the uniquely valuable public data mirrored before possible changes in funding.

Again, thank you for your quick response!

There are probably lots of things to be careful about. Here’s one. Maybe you can think of more, and ways to deal with them.

What if the data keeps changing with time? This is especially true of climate records, where new temperatures and so on are added to a database every day, or month, or year. Then I think we need to ‘time-stamp’ everything. The owners of the original database need to keep a list of digests, with the time each one was made. And when you make a copy, you need to record the time it was made.


Azimuth Backup Project (Part 2)

20 December, 2016


azimuth_logo

I want to list some databases that are particularly worth backing up. But to do this, we need to know what’s already been backed up. That’s what this post is about.

Azimuth backups

Here is information as of now (21:45 GMT 20 December 2016). I won’t update this information. For up-to-date information see

Azimuth Backup Project: Issue Tracker.

For up-to-date information on the progress of each of individual databases listed below, click on my summary of what’s happening now.

Here are the databases that we’ve backed up:

• NASA GISTEMP website at http://data.giss.nasa.gov/gistemp/downloaded by Jan and uploaded to Sakari’s datarefuge server.

• NOAA Carbon Dioxide Information Analysis Center (CDIAC) data at ftp.ncdc.noaa.gov/pub/data/paleo/cdiac.ornl.gov-pub — downloaded by Jan and uploaded to Sakari’s datarefuge server.

• NOAA Carbon Tracker website at http://www.esrl.noaa.gov/psd/data/gridded/data.carbontracker.htmldownloaded by Jan, uploaded to Sakari’s datarefuge server.

These are still in progress, but I think we have our hands on the data:

• NOAA Precipitation Frequency Data at http://hdsc.nws.noaa.gov/hdsc/pfds/ and ftp://hdsc.nws.noaa.gov/pubdownloaded by Borislav, not yet uploaded to Sakari’s datarefuge server.

• NOAA Carbon Dioxide Information Analysis Center (CDIAC) website at http://cdiac.ornl.govdownloaded by Jan, uploaded to Sakari’s datarefuge server, but there’s evidence that the process was incomplete.

• NOAA website at https://www.ncdc.noaa.govdownloaded by Jan, who is now attempting to upload it to Sakari’s datarefuge server.

• NOAA National Centers for Environmental Information (NCEI) website at https://www.ncdc.noaa.govdownloaded by Jan, who is now attempting to upload it to Sakari’s datarefuge server, but there are problems.

• Ocean and Atmospheric Research data at ftp.oar.noaa.gov — downloaded by Jan, now attempting to upload it to Sakari’s datarefuge server.

• NOAA NCEP/NCAR Reanalysis ftp site at ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/ — downloaded by Jan, now attempting to upload it to Sakari’s datarefuge server.

I think we’re getting these now, more or less:

• NOAA National Centers for Environmental Information (NCEI) ftp site at ftp://eclipse.ncdc.noaa.gov/pub/ — in the process of being downloaded by Jan, “Very large. May be challenging to manage with my facilities”.

• NASA Planetary Data System (PDS) data at https://pds.nasa.govin the process of being downloaded by Sakari.

• NOAA tides and currents products website at https://tidesandcurrents.noaa.gov/products.html, which includes the sea level trends data at https://tidesandcurrents.noaa.gov/sltrends/sltrends.htmlJan is downloading this.

• NOAA National Centers for Environmental Information (NCEI) satellite datasets website at https://www.ncdc.noaa.gov/data-access/satellite-data/satellite-data-access-datasetsJan is downloading this.

• NASA JASON3 sea level data at http://sealevel.jpl.nasa.gov/missions/jason3/Jan is downloading this.

• U.S. Forest Service Climate Change Atlas website at http://www.fs.fed.us/nrs/atlas/Jan is downloading this.

• NOAA Global Monitoring Division website at http://www.esrl.noaa.gov/gmd/dv/ftpdata.htmlJan is downloading this.

• NOAA Global Monitoring Division ftp data at aftp.cmdl.noaa.gov/ — Jan is downloading this.

• NOAA National Data Buoy Center website at http://www.ndbc.noaa.gov/Jan is downloading this.

• NASA-ESDIS Oak Ridge National Laboratory Distributed Active Archive (DAAC) on Biogeochemical Dynamics at https://daac.ornl.gov/get_data.shtmlJan is downloading this.

• NASA-ESDIS Oak Ridge National Laboratory Distributed Active Archive (DAAC) on Biogeochemical Dynamics website at https://daac.ornl.gov/Jan is downloading this.

Other backups

Other backups are listed at

The Climate Mirror Project, https://climate.daknob.net/.

This nicely provides the sizes of various backups, and other useful information. Some are ‘signed and verified’ with cryptographic keys, but I’m not sure exactly what that means, and the details matter.

About 90 databases are listed here, along with some size information and some information about whether people have already backed them up or are in process:

Gov. Climate Datasets (Archive). (Click on the tiny word “Datasets” at the bottom of the page!)


azimuth_logo


Saving Climate Data (Part 2)

16 December, 2016

I want to get you involved in the Azimuth Environmental Data Backup Project, so click on that for more. But first some background.

A few days ago, many scientists, librarians, archivists, computer geeks and environmental activists started to make backups of US government environmental databases. We’re trying to beat the January 20th deadline just in case.

Backing up data is always a good thing, so there’s no point in arguing about politics or the likelihood that these backups are needed. The present situation is just a nice reason to hurry up and do some things we should have been doing anyway.

As of 2 days ago the story looked like this:

Saving climate data (Part 1), Azimuth, 13 December 2016.

A lot has happened since then, but much more needs to be done. Right now you can see a list of 90 databases to be backed up here:

Gov. Climate Datasets (Archive). (Click on the tiny word “Datasets” at the bottom of the page!)

Despite the word ‘climate’, the scope includes other environmental databases, and rightly so. Here is a list of databases that have been backed up:

The Climate Mirror Project.

By going here and clicking “Start Here to Help”:

Climate Mirror.

you can nominate a dataset for rescue, claim a dataset to rescue, let everyone know about a data rescue event, or help in some other way (which you must specify). There’s also other useful information on this page, which was set up by Nick Santos.

The overall effort is being organized by the Penn Program in the Environmental Humanities, or ‘PPEHLab’ for short, headed by Bethany Wiggin. If you want to know what’s going on, it helps to look at their blog:

DataRescuePenn.

However, the people organizing the project are currently overwhelmed with offers of help! People worldwide are proceeding to take action in a decentralzed way! So, everything is a bit chaotic, and nobody has an overall view of what’s going on.

I can’t overstate this: if you think that ‘they’ have a plan and ‘they’ know what’s going on, you’re wrong. ‘They’ is us. Act accordingly.

Here’s a list of news articles, a list of ‘data rescue events’ where people get together with lots of computers and do stuff, and a bit about archives and archivists.

News

Here are some things to read:

• Jason Koebler, Researchers are preparing for Trump to delete government science from the web, Vice, 13 December 2016.

• Brady Dennis, Scientists frantically copying U.S. climate data, fearing it might vanish under Trump, Washington Post, 13 December, 2016. (Also at the Chicago Tribune.)

• Eric Holthaus, Why I’m trying to preserve federal climate data before Trump takes office, Washington Post, 13 December 2016.

• Nicole Mortillaro, U of T heads ‘guerrilla archiving event’ to preserve climate data ahead of Trump presidency, CBC News, 14 December 2016.

• Audie Kornish and Eric Holthaus, Scientists race to preserve climate change data before Trump takes office, All Things Considered, National Public Radio, 14 December 2016.

Data rescue events

There’s one in Toronto:

Guerrilla archiving event, 10 am – 4 pm EST, Saturday 17 December 2016. Location: Bissell Building, 4th Floor, 140 St. George St. University of Toronto.

There will be one in Philadelphia:

DataRescuePenn Data Harvesting, Friday–Saturday 13–14 January 2017. Location: not determined yet, probably somewhere at the University of Pennsylvania, Philadelphia.

I hear there will also be events in New York City and Los Angeles, but I don’t know details. If you do, please tell me!

Archives and archivists

Today I helped catalyze a phone conversation between Bethany Wiggin, who heads the PPEHLab, and Nancy Beaumont, head of the Society of American Archivists. Digital archivists have a lot of expertise in saving information, so their skills are crucial here. Big wads of disorganized data are not very useful.

In this conversation I learned that some people are already in contact with the Internet Archive. This archive always tries to save US government websites and databases at the end of each presidential term. Their efforts are not limited to environmental data, and they save not only webpages but entire databases, e.g. data in ftp sites. You can nominate sites to be saved here:

• Internet Archive, End of Presidential Term Harvest 2016.

For more details read this:

• Internet Archive blog, Preserving U.S. Government Websites and Data as the Obama Term Ends, 15 December 2016.


Saving Climate Data (Part 1)

13 December, 2016

guerrilla-2

I try to stay out of politics on this website. This post is not mainly about politics. It’s a call to action. We’re trying to do something rather simple and clearly worthwhile. We’re trying to create backups of US government climate data.

The background is, of course, political. Many signs point to a dramatic change in US climate policy:

• Oliver Milman, Trump’s transition: sceptics guide every agency dealing with climate change, The Guardian, 12 December 2016.

So, scientists are now backing up large amounts of climate data, just in case the Trump administration tries to delete it after he takes office on January 20th:

• Brady Dennis, Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump, Washington Post, 13 December 2016.

Of course saving the data publicly available on US government sites is not nearly as good as keeping climate programs fully funded! New data is coming in all the time from satellites and other sources. We need it—and we need the experts who understand it.

Also, it’s possible that the Trump administration won’t go so far as trying to delete big climate science databases. Still, I think it can’t be a bad thing to have backups. Or as my mother always said: better safe than sorry!

Quoting the Washington Post article:

Alarmed that decades of crucial climate measurements could vanish under a hostile Trump administration, scientists have begun a feverish attempt to copy reams of government data onto independent servers in hopes of safeguarding it from any political interference.

The efforts include a “guerrilla archiving” event in Toronto, where experts will copy irreplaceable public data, meetings at the University of Pennsylvania focused on how to download as much federal data as possible in the coming weeks, and a collaboration of scientists and database experts who are compiling an online site to harbor scientific information.

“Something that seemed a little paranoid to me before all of a sudden seems potentially realistic, or at least something you’d want to hedge against,” said Nick Santos, an environmental researcher at the University of California at Davis, who over the weekend began copying government climate data onto a nongovernment server, where it will remain available to the public. “Doing this can only be a good thing. Hopefully they leave everything in place. But if not, we’re planning for that.”

[…]

“What are the most important .gov climate assets?” Eric Holthaus, a meteorologist and self-proclaimed “climate hawk,” tweeted from his Arizona home Saturday evening. “Scientists: Do you have a US .gov climate database that you don’t want to see disappear?”

Within hours, responses flooded in from around the country. Scientists added links to dozens of government databases to a Google spreadsheet. Investors offered to help fund efforts to copy and safeguard key climate data. Lawyers offered pro bono legal help. Database experts offered to help organize mountains of data and to house it with free server space. In California, Santos began building an online repository to “make sure these data sets remain freely and broadly accessible.”

In Philadelphia, researchers at the University of Pennsylvania, along with members of groups such as Open Data Philly and the software company Azavea, have been meeting to figure out ways to harvest and store important data sets.

At the University of Toronto this weekend, researchers are holding what they call a “guerrilla archiving” event to catalogue key federal environmental data ahead of Trump’s inauguration. The event “is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted,” the organizers said. “This includes climate change, water, air, toxics programs.”

The event is part of a broader effort to help San Francisco-based Internet Archive with its End of Term 2016 project, an effort by university, government and nonprofit officials to find and archive valuable pages on federal websites. The project has existed through several presidential transitions.

I hope that small “guerilla archiving” efforts will be dwarfed by more systematic work, because it’s crucial that databases be copied along with all relevant metadata—and some sort of cryptographic certificate of authenticity, if possible. However, getting lots of people involved is bound to be a good thing, politically speaking.

If you have good computer skills, good understanding of databases, or lots of storage space, please get involved. Efforts are being coordinated by Barbara Wiggin and others at the Data Refuge Project:

• PPEHLab (Penn Program in the Environmental Humanities), DataRefuge.

You can contact them at DataRefuge@ppehlab.org. Nick Santos is also involved, and if you want to get “more plugged into the project” you can contact him here. They are trying to build a climate database mirror website here:

Climate Mirror.

At the help form on this website you can nominate a dataset for rescue, claim a dataset to rescue, let them know about a data rescue event, or help in some other way (which you must specify).

PPEHLab and Penn Libraries are organizing a data rescue event this Thursday:

• PPEHLab, DataRefuge meeting, 14 December 2016.

At the American Geophysical Union meeting in San Francisco, where more than 20,000 earth and climate scientists gather from around the world, there was a public demonstration today starting at 1:30 PST:

Rally to stand up for science, 13 December 2016.

And the “guerilla archiving” hackathon in Toronto is this Saturday—see below. If you know people with good computer skills in Toronto, get them to check it out!

To follow progress, also read Eric Holthaus’s tweets and replies here:

Eric Holthaus.

Guerrilla archiving in Toronto

Here are details on this:

Guerrilla Archiving Hackathon

Date: 10am-4pm, December 17, 2016

Location: Bissell Building, 4th Floor, 140 St. George St. University of Toronto

RSVP and up-to-date information: Guerilla archiving: saving environmental data from Trump.

Bring: laptops, power bars, and snacks. Coffee and pizza provided.

This event collaborates with the Internet Archive’s End of Term 2016 project, which seeks to archive the federal online pages and data that are in danger of disappearing during the Trump administration. Our event is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted. This includes climate change, water, air, toxics programs. This project is urgent because the Trump transition team has identified the EPA and other environmental programs as priorities for the chopping block.

The Internet Archive is a San Francisco-based nonprofit digital library which aims at preserving and making universally accessible knowledge. Its End of Term web archive captures and saves U.S. Government websites that are at risk of changing or disappearing altogether during government transitions. The Internet Archive has asked volunteers to help select and organize information that will be preserved before the Trump transition.

End of Term web archive: http://eotarchive.cdlib.org/2016.html

New York Times article: “Harvesting Government History, One Web Page at a Time

Activities:

Identifying endangered programs and data

Seeding the End of Term webcrawler with priority URLs

Identifying and mapping the location of inaccessible environmental databases

Hacking scripts to make accessible to the webcrawler hard to reach databases.

Building a toolkit so that other groups can hold similar events

Skills needed: We need all kinds of people — and that means you!

People who can locate relevant webpages for the Internet Archive’s webcrawler

People who can identify data targeted for deletion by the Trump transition team and the organizations they work with

People with knowledge of government websites and information, including the EPA

People with library and archive skills

People who are good at navigating databases

People interested in mapping where inaccessible data is located at the EPA

Hackers to figure out how to extract data and URLs from databases (in a way that Internet Archive can use)

People with good organization and communication skills

People interested in creating a toolkit for reproducing similar events

Contacts: michelle.murphy@utoronto.ca, p.keilty@utoronto.ca