Climate Technology Primer (Part 1)

5 October, 2019

Here’s the first of a series of blog articles on how technology can help address climate change:

• Adam Marblestone, Climate technology primer (1/3): basics.

Adam Marblestone is a research scientist at Google DeepMind studying connections between neuroscience and artificial intelligence. Previously, he was Chief Strategy Officer of the brain-computer interface company Kernel, and a research scientist in Ed Boyden’s Synthetic Neurobiology Group at MIT working to develop new technologies for brain circuit mapping. He also helped to start companies like BioBright, and advised foundations such as the Open Philanthropy Project.

Now, like many of us, he’s thinking about climate change, and what to do about it. He writes:

In this first of three posts, I attempt an outsider’s summary of the basic physics/chemistry/biology of the climate system, focused on back of the envelope calculations where possible. At the end, I comment a bit about technological approaches for emissions reductions. Future posts will include a review of the science behind negative emissions technologies, as well as the science (with plenty of caveats, don’t worry) behind more controversial potential solar radiation management approaches. This first post should be very basic for anyone “in the know” about energy, but I wanted to cover the basics before jumping into carbon sequestration technologies.

Check it out! I like the focus on “back of the envelope” calculations because they serve as useful sanity checks for more complicated models… and also provide a useful vaccination against the common denialist argument “all the predictions rely on complicated computer models that could be completely wrong, so why should I believe them?” It’s a sad fact that one of the things we need to do is make sure most technically literate people have a basic understanding of climate science, to help provide ‘herd immunity’ to everyone else.

The ultimate goal here, though, is to think about “what can technology do about climate change?”


The Mathematics of the 21st Century

13 January, 2019

 

Check out the video of my talk, the first in the Applied Category Theory Seminar here at U. C. Riverside. It was nicely edited by Paola Fernandez and uploaded by Joe Moeller.

Abstract. The global warming crisis is part of a bigger transformation in which humanity realizes that the Earth is a finite system and that our population, energy usage, and the like cannot continue to grow exponentially. If civilization survives this transformation, it will affect mathematics—and be affected by it—just as dramatically as the agricultural revolution or industrial revolution. We should get ready!

The slides are rather hard to see in the video, but you can read them here while you watch the talk. Click on links in green for more information!


Azimuth Backup Project (Part 5)

5 October, 2017

I haven’t spoken much about the Azimuth Climate Data Backup Project, but it’s going well, and I’ll be speaking about it soon, here:

International Open Access Week, Wednesday 25 October 2017, 9:30–11:00 a.m., University of California, Riverside, Orbach Science Library, Room 240.

“Open in Order to Save Data for Future Research” is the 2017 event theme.

Open Access Week is an opportunity for the academic and research community to learn about the potential benefits of sharing what they’ve learned with colleagues, and to help inspire wider participation in helping to make “open access” a new norm in scholarship, research and data planning and preservation.

The Open Access movement is made of up advocates (librarians, publishers, university repositories, etc.) who promote the free, immediate, and online publication of research.

The program will provide information on issues related to saving open data, including climate change and scientific data. The panelists also will describe open access projects in which they have participated to save climate data and to preserve end-of-term presidential data, information likely to be and utilized by the university community for research and scholarship.

The program includes:

• Brianna Marshall, Director of Research Services, UCR Library: Brianna welcomes guests and introduces panelists.

• John Baez, Professor of Mathematics, UCR: John will describe his activities to save US government climate data through his collaborative effort, the Azimuth Climate Data Backup Project. All of the saved data is now open access for everyone to utilize for research and scholarship.

• Perry Willett, Digital Preservation Projects Manager, California Digital Library: Perry will discuss the open data initiatives in which CDL participates, including the end-of-term presidential web archiving that is done in partnership with the Library of Congress, Internet Archive and University of North Texas.

• Kat Koziar, Data Librarian, UCR Library: Kat will give an overview of DASH, the UC system data repository, and provide suggestions for researchers interested in making their data open.

This will be the eighth International Open Access Week program hosted by the UCR Library.

The event is free and open to the public. Light refreshments will be served.


Saving Climate Data (Part 6)

23 February, 2017

Scott Pruitt, who filed legal challenges against Environmental Protection Agency rules fourteen times, working hand in hand with oil and gas companies, is now head of that agency. What does that mean about the safety of climate data on the EPA’s websites? Here is an inside report:

• Dawn Reeves, EPA preserves Obama-Era website but climate change data doubts remain, InsideEPA.com, 21 February 2017.

For those of us who are backing up climate data, the really important stuff is in red near the bottom.

The EPA has posted a link to an archived version of its website from Jan. 19, the day before President Donald Trump was inaugurated and the agency began removing climate change-related information from its official site, saying the move comes in response to concerns that it would permanently scrub such data.

However, the archived version notes that links to climate and other environmental databases will go to current versions of them—continuing the fears that the Trump EPA will remove or destroy crucial greenhouse gas and other data.

The archived version was put in place and linked to the main page in response to “numerous [Freedom of Information Act (FOIA)] requests regarding historic versions of the EPA website,” says an email to agency staff shared by the press office. “The Agency is making its best reasonable effort to 1) preserve agency records that are the subject of a request; 2) produce requested agency records in the format requested; and 3) post frequently requested agency records in electronic format for public inspection. To meet these goals, EPA has re-posted a snapshot of the EPA website as it existed on January 19, 2017.”

The email adds that the action is similar to the snapshot taken of the Obama White House website.

The archived version of EPA’s website includes a “more information” link that offers more explanation.

For example, it says the page is “not the current EPA website” and that the archive includes “static content, such as webpages and reports in Portable Document Format (PDF), as that content appeared on EPA’s website as of January 19, 2017.”

It cites technical limits for the database exclusions. “For example, many of the links contained on EPA’s website are to databases that are updated with the new information on a regular basis. These databases are not part of the static content that comprises the Web Snapshot.” Searches of the databases from the archive “will take you to the current version of the database,” the agency says.

“In addition, links may have been broken in the website as it appeared” on Jan. 19 and those will remain broken on the snapshot. Links that are no longer active will also appear as broken in the snapshot.

“Finally, certain extremely large collections of content… were not included in the Snapshot due to their size” such as AirNow images, radiation network graphs, historic air technology transfer network information, and EPA’s searchable news releases.”

‘Smart’ Move

One source urging the preservation of the data says the snapshot appears to be a “smart” move on EPA’s behalf, given the FOIA requests it has received, and notes that even though other groups like NextGen Climate and scientists have been working to capture EPA’s online information, having it on EPA’s site makes it official.

But it could also be a signal that big changes are coming to the official Trump EPA site, and it is unclear how long the agency will maintain the archived version.

The source says while it is disappointing that the archive may signal the imminent removal of EPA’s climate site, “at least they are trying to accommodate public concerns” to preserve the information.

A second source adds that while it is good that EPA is seeking “to address the widespread concern” that the information will be removed by an administration that does not believe in human-caused climate change, “on the other hand, it doesn’t address the primary concern of the data. It is snapshots of the web text.” Also, information “not included,” such as climate databases, is what is difficult to capture by outside groups and is what really must be preserved.

“If they take [information] down” that groups have been trying to preserve, then the underlying concern about access to data remains. “Web crawlers and programs can do things that are easy,” such as taking snapshots of text, “but getting the data inside the database is much more challenging,” the source says.

The first source notes that EPA’s searchable databases, such as those maintained by its Clean Air Markets Division, are used by the public “all the time.”

The agency’s Office of General Counsel (OGC) Jan. 25 began a review of the implications of taking down the climate page—a planned wholesale removal that was temporarily suspended to allow for the OGC review.

But EPA did remove some specific climate information, including links to the Clean Power Plan and references to President Barack Obama’s Climate Action Plan. Inside EPA captured this screenshot of the “What EPA Is Doing” page regarding climate change. Those links are missing on the Trump EPA site. The archive includes the same version of the page as captured by our screenshot.

Inside EPA first reported the plans to take down the climate information on Jan. 17.

After the OGC investigation began, a source close to the Trump administration said Jan. 31 that climate “propaganda” would be taken down from the EPA site, but that the agency is not expected to remove databases on GHG emissions or climate science. “Eventually… the propaganda will get removed…. Most of what is there is not data. Most of what is there is interpretation.”

The Sierra Club and Environmental Defense Fund both filed FOIA requests asking the agency to preserve its climate data, while attorneys representing youth plaintiffs in a federal climate change lawsuit against the government have also asked the Department of Justice to ensure the data related to its claims is preserved.

The Azimuth Climate Data Backup Project and other groups are making copies of actual databases, not just the visible portions of websites.


Azimuth Backup Project (Part 4)

18 February, 2017

The Azimuth Climate Data Backup Project is going well! Our Kickstarter campaign ended on January 31st and the money has recently reached us. Our original goal was $5000. We got $20,427 of donations, and after Kickstarter took its cut we received $18,590.96.

Next time I’ll tell you what our project has actually been doing. This time I just want to give a huge “thank you!” to all 627 people who contributed money on Kickstarter!

I sent out thank you notes to everyone, updating them on our progress and asking if they wanted their names listed. The blanks in the following list represent people who either didn’t reply, didn’t want their names listed, or backed out and decided not to give money. I’ll list people in chronological order: first contributors first.

Only 12 people backed out; the vast majority of blanks on this list are people who haven’t replied to my email. I noticed some interesting but obvious patterns. For example, people who contributed later are less likely to have answered my email yet—I’ll update this list later. People who contributed more money were more likely to answer my email.

The magnitude of contributions ranged from $2000 to $1. A few people offered to help in other ways. The response was international—this was really heartwarming! People from the US were more likely than others to ask not to be listed.

But instead of continuing to list statistical patterns, let me just thank everyone who contributed.

thank-you-message2_edited-1

Daniel Estrada
Ahmed Amer
Saeed Masroor
Jodi Kaplan
John Wehrle
Bob Calder
Andrea Borgia
L Gardner

Uche Eke
Keith Warner
Dean Kalahan
James Benson
Dianne Hackborn

Walter Hahn
Thomas Savarino
Noah Friedman
Eric Willisson
Jeffrey Gilmore
John Bennett
Glenn McDavid

Brian Turner

Peter Bagaric

Martin Dahl Nielsen
Broc Stenman

Gabriel Scherer
Roice Nelson
Felipe Pait
Kenneth Hertz

Luis Bruno


Andrew Lottmann
Alex Morse

Mads Bach Villadsen
Noam Zeilberger

Buffy Lyon

Josh Wilcox

Danny Borg

Krishna Bhogaonker
Harald Tveit Alvestrand


Tarek A. Hijaz, MD
Jouni Pohjola
Chavdar Petkov
Markus Jöbstl
Bjørn Borud


Sarah G

William Straub

Frank Harper
Carsten Führmann
Rick Angel
Drew Armstrong

Jesimpson

Valeria de Paiva
Ron Prater
David Tanzer

Rafael Laguna
Miguel Esteves dos Santos 
Sophie Dennison-Gibby




Randy Drexler
Peter Haggstrom


Jerzy Michał Pawlak
Santini Basra
Jenny Meyer


John Iskra

Bruce Jones
Māris Ozols
Everett Rubel



Mike D
Manik Uppal
Todd Trimble

Federer Fanatic

Forrest Samuel, Harmos Consulting








Annie Wynn
Norman and Marcia Dresner



Daniel Mattingly
James W. Crosby








Jennifer Booth
Greg Randolph





Dave and Karen Deeter

Sarah Truebe









Tieg Zaharia
Jeffrey Salfen
Birian Abelson

Logan McDonald

Brian Truebe
Jon Leland


Nicole



Sarah Lim







James Turnbull




John Huerta
Katie Mandel Bruce
Bethany Summer




Heather Tilert

Anna C. Gladstone



Naom Hart
Aaron Riley

Giampiero Campa

Julie A. Sylvia


Pace Willisson









Bangskij










Peter Herschberg

Alaistair Farrugia


Conor Hennessy




Stephanie Mohr




Torinthiel


Lincoln Muri 
Anet Ferwerda 


Hanna





Michelle Lee Guiney

Ben Doherty
Trace Hagemann







Ryan Mannion


Penni and Terry O'Hearn



Brian Bassham
Caitlin Murphy
John Verran






Susan


Alexander Hawson
Fabrizio Mafessoni
Anita Phagan
Nicolas Acuña
Niklas Brunberg

Adam Luptak
V. Lazaro Zamora






Branford Werner
Niklas Starck Westerberg
Luca Zenti and Marta Veneziano 


Ilja Preuß
Christopher Flint

George Read 
Courtney Leigh

Katharina Spoerri


Daniel Risse



Hanna
Charles-Etienne Jamme
rhackman41



Jeff Leggett

RKBookman


Aaron Paul
Mike Metzler


Patrick Leiser

Melinda

Ryan Vaughn
Kent Crispin

Michael Teague

Ben



Fabian Bach
Steven Canning


Betsy McCall

John Rees

Mary Peters

Shane Claridge
Thomas Negovan
Tom Grace
Justin Jones


Jason Mitchell




Josh Weber
Rebecca Lynne Hanginger
Kirby


Dawn Conniff


Michael T. Astolfi



Kristeva

Erik
Keith Uber

Elaine Mazerolle
Matthieu Walraet

Linda Penfold




Lujia Liu



Keith



Samar Tareem


Henrik Almén
Michael Deakin 
Rutger Ockhorst

Erin Bassett
James Crook



Junior Eluhu
Dan Laufer
Carl
Robert Solovay






Silica Magazine







Leonard Saers
Alfredo Arroyo García



Larry Yu













John Behemonth


Eric Humphrey


Svein Halvor Halvorsen



Karim Issa

Øystein Risan Borgersen
David Anderson Bell III











Ole-Morten Duesend







Adam North and Gabrielle Falquero

Robert Biegler 


Qu Wenhao






Steffen Dittmar




Shanna Germain






Adam Blinkinsop







John WS Marvin (Dread Unicorn Games)


Bill Carter
Darth Chronis 



Lawrence Stewart

Gareth Hodges

Colin Backhurst
Christopher Metzger

Rachel Gumper


Mariah Thompson

Falk Alexander Glade
Johnathan Salter




Maggie Unkefer
Shawna Maryanovich






Wilhelm Fitzpatrick
Dylan “ExoByte” Mayo
Lynda Lee




Scott Carpenter



Charles D, Payet
Vince Rostkowski


Tim Brown
Raven Daegmorgan
Zak Brueckner


Christian Page

Adi Shavit


Steven Greenberg
Chuck Lunney



Adriel Bustamente

Natasha Anicich



Bram De Bie
Edward L






Gray Detrick
Robert


Sarah Russell

Sam Leavin

Abilash Pulicken

Isabel Olondriz
James Pierce
James Morrison


April Daniels



José Tremblay Champagne


Chris Edmonds

Hans & Maria Cummings
Bart Gasiewiski


Andy Chamard



Andrew Jackson

Christopher Wright

Crystal Collins

ichimonji10


Alan Stern
Alison W


Dag Henrik Bråtane





Martin Nilsson


William Schrade


Saving Climate Data (Part 5)

6 February, 2017

march-for-science-earth-day

There’s a lot going on! Here’s a news roundup. I will separately talk about what the Azimuth Climate Data Backup Project is doing.

I’ll start with the bad news, and then go on to some good news.

Tweaking the EPA website

Scientists are keeping track of how Trump administration is changing the Environmental Protection Agency website, with before-and-after photos, and analysis:

• Brian Kahn, Behold the “tweaks” Trump has made to the EPA website (so far), National Resources Defense Council blog, 3 February 2017.

There’s more about “adaptation” to climate change, and less about how it’s caused by carbon emissions.

All of this would be nothing compared to the new bill to eliminate the EPA, or Myron Ebell’s plan to fire most of the people working there:

• Joe Davidson, Trump transition leader’s goal is two-thirds cut in EPA employees, Washington Post, 30 January 2017.

If you want to keep track of this battle, I recommend getting a 30-day free subscription to this online magazine:

InsideEPA.com.

Taking animal welfare data offline

The Trump team is taking animal-welfare data offline. The US Department of Agriculture will no longer make lab inspection results and violations publicly available, citing privacy concerns:

• Sara Reardon, US government takes animal-welfare data offline, Nature Breaking News, 3 Feburary 2017.

Restricting access to geospatial data

A new bill would prevent the US government from providing access to geospatial data if it helps people understand housing discrimination. It goes like this:

Notwithstanding any other provision of law, no Federal funds may be used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing._

For more on this bill, and the important ways in which such data has been used, see:

• Abraham Gutman, Scott Burris, and the Temple University Center for Public Health Law Research, Where will data take the Trump administration on housing?, Philly.com, 1 February 2017.

The EDGI fights back

The Environmental Data and Governance Initiative or EDGI is working to archive public environmental data. They’re helping coordinate data rescue events. You can attend one and have fun eating pizza with cool people while saving data:

• 3 February 2017, Portland
• 4 February 2017, New York City
• 10-11 February 2017, Austin Texas
• 11 February 2017, U. C. Berkeley, California
• 18 February 2017, MIT, Cambridge Massachusetts
• 18 February 2017, Haverford Connecticut
• 18-19 February 2017, Washington DC
• 26 February 2017, Twin Cities, Minnesota

Or, work with EDGI to organize one your own data rescue event! They provide some online tools to help download data.

I know there will also be another event at UCLA, so the above list is not complete, and it will probably change and grow over time. Keep up-to-date at their site:

Environmental Data and Governance Initiative.

Scientists fight back

The pushback is so big it’s hard to list it all! For now I’ll just quote some of this article:

• Tabitha Powledge, The gag reflex: Trump info shutdowns at US science agencies, especially EPA, 27 January 2017.

THE PUSHBACK FROM SCIENCE HAS BEGUN

Predictably, counter-tweets claiming to come from rebellious employees at the EPA, the Forest Service, the USDA, and NASA sprang up immediately. At The Verge, Rich McCormick says there’s reason to believe these claims may be genuine, although none has yet been verified. A lovely head on this post: “On the internet, nobody knows if you’re a National Park.”

At Hit&Run, Ronald Bailey provides handles for several of these alt tweet streams, which he calls “the revolt of the permanent government.” (That’s a compliment.)

Bailey argues, “with exception perhaps of some minor amount of national security intelligence, there is no good reason that any information, data, studies, and reports that federal agencies produce should be kept from the public and press. In any case, I will be following the Alt_Bureaucracy feeds for a while.”

NeuroDojo Zen Faulkes posted on how to demand that scientific societies show some backbone. “Ask yourself: “Have my professional societies done anything more political than say, ‘Please don’t cut funding?’” Will they fight?,” he asked.

Scientists associated with the group_ 500 Women Scientists _donned lab coats and marched in DC as part of the Women’s March on Washington the day after Trump’s Inauguration, Robinson Meyer reported at the Atlantic. A wildlife ecologist from North Carolina told Meyer, “I just can’t believe we’re having to yell, ‘Science is real.’”

Taking a cue from how the Women’s March did its social media organizing, other scientists who want to set up a Washington march of their own have put together a closed Facebook group that claims more than 600,000 members, Kate Sheridan writes at STAT.

The #ScienceMarch Twitter feed says a date for the march will be posted in a few days. [The march will be on 22 April 2017.] The group also plans to release tools to help people interested in local marches coordinate their efforts and avoid duplication.

At The Atlantic, Ed Yong describes the political action committee 314Action. (314=the first three digits of pi.)

Among other political activities, it is holding a webinar on Pi Day—March 14—to explain to scientists how to run for office. Yong calls 314Action the science version of Emily’s List, which helps pro-choice candidates run for office. 314Action says it is ready to connect potential candidate scientists with mentors—and donors.

Other groups may be willing to step in when government agencies wimp out. A few days before the Inauguration, the Centers for Disease Control and Prevention abruptly and with no explanation cancelled a 3-day meeting on the health effects of climate change scheduled for February. Scientists told Ars Technica’s Beth Mole that CDC has a history of running away from politicized issues.

One of the conference organizers from the American Public Health Association was quoted as saying nobody told the organizers to cancel.

I believe it. Just one more example of the chilling effect on global warming. In politics, once the Dear Leader’s wishes are known, some hirelings will rush to gratify them without being asked.

The APHA guy said they simply wanted to head off a potential last-minute cancellation. Yeah, I guess an anticipatory pre-cancellation would do that.

But then—Al Gore to the rescue! He is joining with a number of health groups—including the American Public Health Association—to hold a one-day meeting on the topic Feb 16 at the Carter Center in Atlanta, CDC’s home base. Vox’s Julia Belluz reports that it is not clear whether CDC officials will be part of the Gore rescue event.

The Sierra Club fights back

The Sierra Club, of which I’m a proud member, is using the Freedom of Information Act or FOIA to battle or at least slow the deletion of government databases. They wisely started even before Trump took power:

• Jennifer A Dlouhy, Fearing Trump data purge, environmentalists push to get records, BloombergMarkets, 13 January 2017.

Here’s how the strategy works:

U.S. government scientists frantically copying climate data they fear will disappear under the Trump administration may get extra time to safeguard the information, courtesy of a novel legal bid by the Sierra Club.

The environmental group is turning to open records requests to protect the resources and keep them from being deleted or made inaccessible, beginning with information housed at the Environmental Protection Agency and the Department of Energy. On Thursday [January 9th], the organization filed Freedom of Information Act requests asking those agencies to turn over a slew of records, including data on greenhouse gas emissions, traditional air pollution and power plants.

The rationale is simple: Federal laws and regulations generally block government agencies from destroying files that are being considered for release. Even if the Sierra Club’s FOIA requests are later rejected, the record-seeking alone could prevent files from being zapped quickly. And if the records are released, they could be stored independently on non-government computer servers, accessible even if other versions go offline.


Azimuth Backup Project (Part 3)

22 January, 2017


azimuth_logo

Along with the bad news there is some good news:

• Over 380 people have pledged over $14,000 to the Azimuth Backup Project on Kickstarter, greatly surpassing our conservative initial goal of $5,000.

• Given our budget, we currently aim at backing up 40 terabytes of data, and we are well on our way to this goal. You can see what we’ve done at Our Progress, and what we’re still doing at the Issue Tracker.

• I have gotten a commitment from Danna Gianforte, the head of Computing and Communications at U. C. Riverside, that eventually the university will maintain a copy of our data. (This commitment is based on my earlier estimate that we’d have 20 terabytes of data, so I need to see if 40 is okay.)

• I have gotten two offers from other people, saying they too can hold our data.

I’m hoping that the data at U. C. Riverside will be made publicly available through a server. The other offers may involve it being held ‘secretly’ until such time as it became needed; that has its own complementary advantages.

However, the interesting problem that confronts us now is: how to spend our money?

You can see how we’re currently spending it on our Budget and Spending page. Basically, we’re paying a firm called Hetzner for servers and storage boxes.

We could simply continue to do this until our money runs out. I hope that long before then, U. C. Riverside will have taken over some responsibilities. If so, there would be a long period where our money would largely pay for a redundant backup. Redundancy is good, but perhaps there is something better.

Two members of our team, Sakari Maaranen and Greg Kochanski, have thoughts on this matter which I’d like to share. Sakari posted his thoughts on Google+, while Greg posted his in an email which he’s letting me share here.

Please read these and offer us your thoughts! Maybe you can help us decide on the best strategy!

Sakari Maaranen

For the record, my views on our strategy of using the budget that the Azimuth Climate Data Backup Project now has.

People have contributed it to this effort specifically.

Some non-government entities have offered “free hosting”. Of course the project should take any and all free offers to host our data. Those would not be spending our budget however. And they are still paying for it, even if they offered it to us “for free”.

As far as it comes to spending, I think we should think in terms of 1) terabytemonths, and 2) sufficient redundancy, and do that as cost-efficiently as possible. We should not just dump the money to any takers, but think of the best bang for the buck. We owe that to the people who have contributed now.

For example, if we burn the cash quick to expensive storage, I would consider that a failure. Instead, we must plan for the best use of the budget towards our mission.

What we have promised to the people is that we back up and serve these data sets, by the money they have given to us. Let’s do exactly that.

We are currently serving the mission at approximately €0.006 per gigabytemonth at least for as long as we have volunteers to work for free. The cost could be slightly higher if we paid for professional maintenance, which should be a reasonable assumption if we plan for long term service. Volunteer work cannot be guaranteed forever, even if it works temporarily.

This is one view and the question is open to public discussion.

Greg Kochanski

Some misc thoughts.

1) As I see it, we have made some promise of serving the data (“create a better interface for getting it”) which can be an expensive thing.

UI coding isn’t all that easy, and takes some time.

Beyond that, we’ve promised to back up the data, and once you say “backup”, you’ve also made an implicit promise to make the data available.

2) I agree that if we have a backup, it is a logical extension to take continuous backups, but I wouldn’t say it’s necessary.

Perhaps the way to think about it is to ask the question, “what do our donors likely want”?

3) Clearly they want to preserve the data, in case it disappears from the Federal sites. So, that’s job 1. And, if it does disappear, we need to make it available.

3a) Making it available will require some serving CPU, disk, and network. We may need to worry about DDOS attacks, thought perhaps we could get free coverage from Akamai or Google Project Shield.

3b) Making it available may imply paying some students to write Javascript and HTML to put up a front-end to allow people to access the data we are collecting.

Not all the data we’re collecting is in strictly servable form. Some of the databases, for example aren’t usefully servable in the form we collect, and we know some links will be broken because of missing pages, or because of wget’s design flaw.*

[* Wget stores http://a/b/c as a file, a/b/c, where a/b is a directory. Wget stores http://a/b as a file a/b, where a/b is a file.

Therefore, both cannot exist simultaneously on disk. If they do, wget drops one.]

Points 3 & 3a imply that we need to keep some money in the bank until either the websites are taken down, or we decide that the threat has abated. So, we need to figure out how much money to keep as a serving reserve. It doesn’t sound like UCR has committed to serve the data, though you could perhaps ask.

Beyond the serving reserve, I think we are free to do better backups (i.e. more than one data collection), and change detection.