Submission to arXiv

4 February, 2022

guest post by Phillip Helbig

Monthly Notices of the Royal Astronomical Society is one of the oldest and most prestigious journals in the fields of astronomy, astrophysics, and cosmology. My latest MNRAS paper was not allowed to appear in the astro-ph category at the arXiv (https://arxiv.org, the main avenue of distribution for scientific articles in many fields) because it was reclassified to a category which is inappropriate for several reasons. This is definitely not due to some technical error, misunderstanding, or oversight. It took more than three months for me to even be told why it had been reclassified, and that only after a well known cosmologist threatened the Scientific Director of arXiv that he would complain to the arXiv sponsors if things weren’t cleared up. Also, there is evidence that the reason I was given is not the real one.

Although I would like my paper to appear in astro-ph, this in not about just my paper. Rather, it is about the question whether the community wants arXiv to decide which papers, and hence which people, are allowed to be part of that community, as opposed to peer review by respected journals such as MNRAS. Below, after some general background on arXiv, I mention some policies which are probably not as well known as they should be, before briefly describing my own odyssey.

Like it or not, many if not most astronomers rely on arXiv at least for learning about new papers; some rely on it exclusively, despite the facts that not everything is on arXiv, that that which is there is not always in the definitive version, and that even if the definitive version is there, then that might not be clear. The last two (and, in some cases, the first as well) can be due to lazy authors or to restrictions imposed by journals as to what version, such as the ‘author’s accepted manuscript’, is allowed to appear; more-definitive versions hence either don’t appear or if so then that fact is not advertised. At the same time, publication in a respected journal is generally recognized as a mark of quality. In fact, the main reason that the quality of papers at arXiv is so high is that most of them will eventually appear in respected journals. So essentially journals are for separating the wheat from the chaff while arXiv has become the main method of distribution, because no subscription is required and because a majority of articles can be found at one website with a reasonably useful interface (the former is crucial for those without access to a subscription to every journal they might want to access and the latter saves large amounts of time). There is thus a problem if standards of acceptance between journals and arXiv differ.

The main reason, at least for me, to have my papers on arXiv is visibility. All else being equal, papers on arXiv are almost certainly read more, and probably cited more, than those which are not. (In a field in which a large fraction are on arXiv, the reason can’t be that only the better papers are put on arXiv. Also, at least a few years after the paper has appeared, having it on arXiv before it has appeared in the journal probably won’t substantially increase the number of times it is read and/or cited due to the only slightly increased time during which it has been available; the increased citation rate is due to the higher visibility from being on arXiv.) The ‘stamp of approval’ comes from the journal. It is easy to distribute open-access versions of the paper, although implementing a robust long-term storage strategy is not. Finding them is more difficult; that would be easiest via arXiv, but author-supplied links at the corresponding ADS& abstract web page are good enough.

People often look for open-access versions of papers via links on such web pages, especially if they want to make sure that they find the official version, not whatever version might be on arXiv; arXiv itself is not an option for papers which are not on arXiv; of course, ADS can be and is used completely independently of arXiv. Lack of visibility at arXiv is a serious disadvantage to an author and such decisions should be made only in extreme cases. (Also, having the paper at arXiv but in the wrong category can be worse than not having it there at all.)

arXiv is under no obligation to allow even a paper which has been accepted by a leading journal in the field to appear in the appropriate category (e.g., astro-ph for astronomy / astrophysics / cosmology), or even to appear at all. There are also some other things which are documented but not as well known as they should be, some things which are at best poorly documented, and inconsistent and/or incomplete recommendations. I think that it is important to alert the community to those in order to counter the impression held by many that everything worth reading is on arXiv and/or if something is not on arXiv then it must be a matter of the author excluding himself from the community, rather than being excluded by arXiv (references intentionally not included to avoid public shaming). (Of course, most who claim that all papers in their field worth reading are on arXiv are not in a position to make that claim, because they don’t read any papers which are not on arXiv.) I suspect that at least some of those things are known by many, but also that there is a fear of criticizing arXiv in public for fear of getting banned, which is the modern-day equivalent of excommunication.

According to the submission agreement, “[t]he Submitter waives…[a]ny claims against arXiv…based upon actions…including…decisions to include the Work in, or exclude the Work from, the repository…the classification or characterization of the Work.” “arXiv reserves the right to reject or reclassify any submission.” In other words, the idea that any serious paper (‘serious’ being defined here as having appeared in a respected journal) can (assuming, of course, that the journal allows it) be uploaded to arXiv is wrong. Also, arXiv reserves the right to reclassify the article, e.g. a paper submitted to astro-ph can be reclassified to gen-ph. Moreover, after such a reclassification, the author is not allowed to withdraw the paper (Steinn Sigurdsson*, personal communication; Eleonora Presani@, personal communication), although that is technically possible (by first ‘unsubmitting’ it then ‘deleting’ it).

Of course, journals also decide which papers they accept and reject. However, the comparison of arXiv with journals is not appropriate, for several reasons: arXiv does not peer-review submissions and claims to do only a minimal amount of moderation. Also, journals offer something between acceptance and rejection, namely the possibility of revision, coupled with the opportunity to discuss the degree of revision, or even reasons for rejection, with the referee(s) and/or editor(s). Of course, revision of an article accepted by a journal doesn’t make sense, but the fact that it is not offered is another piece of evidence that interaction with arXiv shouldn’t be compared to interaction with a journal. Moreover, if an article is rejected by a journal, it is not automatically submitted to another journal, much less without any possibility for the author to choose to withdraw it completely, hence the claim that the various arXiv categories are comparable to various journals with different standards (Eleonora Presani, personal communication) is dubious at best. In addition, there is usually more than one journal of comparable reputation in a given field, so the author has the chance of getting an independent evaluation. In that case, competition between journals is good. In the case of arXiv, however, a monopoly is actually good, as long as it works, because one of the main advantages of arXiv is that there is only one place one needs to look in order to find most papers. This is the main point of my criticism: arXiv’s unique relevance to the community means that excluding a paper from its intended category should be done only under extreme circumstances. arXiv has become one of the most important resources for the astronomical community but that community has essentially no control over arXiv. Great power should be accompanied by great responsibility. Quis custodiet ipsos custodes?

It is possible to appeal a decision. However, the appeals process is not well documented, in part because astro-ph is sometimes seen as a top-level category, sometimes as one of the physics categories. As part of the appeals process, “[e]xtreme cases may be addressed to the appropriate advisory committee chair only”. The value of a successful appeal is questionable, because most rely on the abstract lists for recent papers in a particularly category, either sent via email or available at the arXiv website. As far as I know, a paper reclassified after a successful appeal would not appear in the ‘recent’ list for that category. The main problem with such an appeal, though, is that arXiv is policing itself.

For various reasons, in recent years so-called arXiv-overlay journals have sprung up. There is even one for astrophysics, The Open Journal of Astrophysics, and I have published a review paper there. The basic idea is that there is a robust distribution structure already in place, namely arXiv, so the job of the journal is essentially only to provide refereeing. Such journals usually assume that all potential authors could post their paper to arXiv before submitting it to the journal, but obviously that is not the case. (Some even use the arXiv category as a filter to determine whether the paper could even be considered to be appropriate for the journal.) It is sometimes possible, though usually not widely advertised, to submit to the journal first and submit the paper to arXiv only after acceptance, which is what I did (like many, I prefer to put papers on arXiv only after acceptance). That paper had no problems at arXiv, but based on the reasons I’m presenting here, arXiv-overlay journals are no longer an option for me. (I have long suggested not only that should the possibility to submit to the journal before submitting to arXiv be more widely advertised, but also that the journal should have some sort of agreement with arXiv that any paper accepted by the journal automatically qualifies for the corresponding category at arXiv (after all, the purpose of a journal is publication); alas, the Open Journal of Astrophysics does not plan to pursue that at all: “OJA has no power to compel arXiv to accept submissions, nor would we want to. We see arXiv as the most important resource in astrophysics….”.) Despite the longevity and robustness of some traditional journals, the scientific publishing landscape is changing rapidly. That is a topic for another discussion, but part of it involves arXiv-overlay journals, and wrong assumptions about arXiv mean that a substantial part of the new system is built on shaky foundations.

Those who are interested in high-quality, free-for-readers-and-authors, well organized, open-access journals should check out https://scipost.org/. Is there any valid reason to submit anywhere else? Their astronomy journals are just getting underway; please consider supporting them.

I learned about some of the things discussed above the hard way when my latest MNRAS paper was reclassified from astro-ph to gen-ph (general physics). Of course, I appealed the decision quickly, after discussing the matter with a few colleagues, some of whom assumed that it must have been some sort of technical glitch. It took more than three months before I was told a reason for the classification (after having escalated up to the highest levels of arXiv)§, and more than four before the appeals process finally ended. That paper is not on arXiv, and I don’t intend to post anything else to arXiv before the procedure becomes fairer, more transparent, and more accountable (if it ever does). I had escalated as highly as possible within arXiv before I asked Cornell University (which hosts arXiv) to investigate possible academic misconduct, which led to an email from Eleonora Presani. Her stance is essentially the same as that of Licia Verde#: my accusations themselves don’t seem to have been investigated and authors just have to live with the fact that arXiv can reclassify papers at will and even prevent authors from withdrawing them completely before announcement if they disagree with the reclassification. Unfortunately, Cornell takes the point of view that although Cornell maintains and sustains arXiv, it is not the university’s role to interfere in the moderation or appeal process.

There is evidence that I wasn’t told the real reason why my paper was reclassified$, and no-one with whom I have discussed the matter thinks that arXiv was right to reclassify my paper. (That doesn’t mean that they necessarily have a high opinion of my paper, but those are two separate issues. One colleague stated (though not in reference to my paper) that even the occasional papers which appear in respected journals obviously by mistake should appear on arXiv; that would put pressure on journals to be more careful and also benefit those wishing to critically discuss or refute them.) However, I will discuss that and other aspects (hopefully) unique to my case elsewhere (perhaps in the comments if there is interest), and here concentrate on problems which the astronomical community should recognize and try to correct.

I certainly regard reclassifying a paper which has appeared in MNRAS to a category other than astro-ph, giving reasons for the reclassification only after threat from a famous colleague, and then giving me a completely different reason, to be an extreme case. Thus, I did contact the chair of the physics advisory committee, Robert Seiringer; that he is the appropriate person was also confirmed by Licia Verde. Nevertheless, his response was that he could not investigate disputes involving individual submissions, which was also Verde’s reply to my complaint. Hence, not only is there disagreement between arXiv’s documented appeals procedure and how those involved actually behave, there seems to be no system of checks and balances within arXiv, not to mention the problem that the community, despite relying on arXiv, in practice has no way to arbitrate disputes with it; it is judge, jury, and executioner.

All who believe that my paper should be on arXiv in the astro-ph.CO category if I so desire are encouraged to contact the Scientific Director, the Executive Director, the Chair of the Scientific Advisory Board, and the Chair of the Physics Advisory Committee and complain. It is not necessary to think that my paper is great. It is enough if one thinks that it is not so bad that it should be banned from astro-ph, or even if one can point to worse papers which are in astro-ph. (Of course, if one agrees that my paper should appear in astro-ph.CO, the reason why arXiv has not (yet?) let it appear are irrelevant.)

Of course, my bad experience with arXiv is not the main point. The main point is that arXiv can, and does, make decisions which experts in the field (see third footnote; Tegmark wasn’t the only expert consulted by me) cannot understand at all. Due to fear of the consequences of criticizing arXiv, most of those probably go unnoticed. While arXiv does need the possibility to reject or reclassify some papers, that needs to be done transparently and fairly. However, in view of its value to the community, there should be some simple rules, such as a ‘white list’ of journals so that papers accepted by them automatically qualify for the corresponding category at arXiv. Fortunately, my own livelihood does not depend on submitting to arXiv (in either sense of the word). Imagine the consequences of a young scientist who, after a year or so of work, gets their first paper accepted by a serious journal, only to have it rejected by arXiv or reclassified into a category where no colleague, potential employer, and so on will see it. Not only that, but the decision is made by someone (or some thing; arXiv is now moving to classification based on machine learning, but that was not relevant to the reclassification of my paper (Licia Verde, personal communication)) via an untransparent algorithm and no reason is given. Any appeal is within arXiv itself and essentially consists of some people asking others if they are guilty and accepting the expected answer. Such behaviour should be an embarrassment to the scientific community.

I think that some action on the part of the community would be in order even if my paper were the only one affected. However, the problem is much larger. Many colleagues have told me that they disagree with the reclassification of my paper, but are afraid to say so publicly for fear of getting banned from arXiv themselves. Also, I have been told that I am far from the first person to make such complaints about arXiv. Since I have started discussing this with colleagues, a few other similar cases have been mentioned to me. Considering that many of those affected probably don’t mention it at all out of a false sense of shame, the number of people affected is probably larger than many might at first guess. (I am not on FaceBook, but I understand that a similar problem was recently discussed within a FaceBook group for professional astronomers.)

A new development is that arXiv, by its own admission, doesn’t have the necessary means to do its job properly, and that I am not the only one complaining about it:

• Daniel Garisto, ArXiv.org reaches a milestone and a reckoning, Scientific American, 10 January 2022.

A red herring is that the American Astronomical Society has made all of its journals (which are some of the major journals in cosmology/astrophysics/astronomy) open-access. That probably won’t diminish the importance of arXiv—and hence the importance of making sure that it is run responsibly—for several reasons. First, an attraction of arXiv is that it is a one-stop shop with a reasonable interface, and by following it one can keep of with much of the literature in one’s field (though of course not all papers are posted to arXiv, but if it is run responsibly then there should be no reason for them not to be, except if the journal forbids posting (some version of) the paper to arXiv). Even if all papers were open-access, that would mean following websites, or RSS feeds, of several or even dozens of websites, not nearly as convenient as the abstract listings at arXiv. Second, the AAS journals have rather expensive publication fees, which are becoming increasingly hard to justify, especially in the case of online-only publications. (Note that there are journals with no publication fees which actually encourage the author to post something equivalent to the final version on arXiv with no embargo period; MNRAS is an example.) Third, items which would otherwise have limited circulation, such as theses and conference proceedings, can (in principle) be on arXiv.

I’m all for giving arXiv more support, but first my paper needs to be rehabilitated by being allowed into astro-ph, and the policies should be changed, and publicly communicated, so that such problems do not happen in the future (neither to me nor anyone else); I could then post my backlog. The evidence is that the goof is so large that a public apology is called for. The minimum which needs to be done:

  1. When a paper is reclassified, authors should be informed (now, there is not even an automatic email; that makes sense because arXiv thinks that it needs to reclassify some papers against the will of the submitter) and given a chance to approve the reclassification, delete the submission entirely, suggest another reclassification, or appeal. Until the matter is resolved, the submission should stay in the ‘hold’ status with no action required to keep it there (now, one has to unsubmit and resubmit it to keep it from going away).
  2. When a paper is reclassified, the submitter must be given concrete reasons.

  3. The appeals process needs to be overseen with some authority outside of arXiv which has the power to overrule arXiv’s decisions, otherwise it is more or less a farce. It seems to me that some committee in the corresponding professional organization would be a good choice, e.g. the International Astronomical Union for papers on cosmology / astrophysics / astronomy. There can be an internal appeals process, but the final authority of arXiv’s decisions should not reside with arXiv if arXiv is to provide a meaningful service to the community.

  4. Papers from the major journals should be essentially white-listed. If a paper is really so bad that it is obvious that it somehow slipped in by mistake, arXiv should request the journal to formally withdraw it. If the journal does so, then arXiv shouldn’t accept it either. If not, then it should go onto arXiv. (It should go on even if it is bad, to put pressure on journals to uphold quality and so that it can be discussed and rebutted).

  5. arXiv needs to publicly apologize for reclassifying papers for reasons other than quality or content (e.g. my case), and invite those papers to be resubmitted after the other points above have been implemented.

  6. The points above should make (re)submissions by wrong authors viable, but perhaps some sort of special protection is needed for whistle-blowers such as myself.

  7. I was going to call for the resignation of Seiringer, Verde, and Presani, but it seems that they have all no longer in the posts they were when interacting with me. The main guilty person, though, Sigurdsson, is still Scientific Director. How anyone can be aware of my story (which can be backed up with evidence, in court if necessary) and still think that Sigurdsson should have anything at all to do with arXiv is beyond me. Also, although they have chosen (probably with good reason) to remain nameless, if arXiv were not drastically wrong on this point, the distinguished colleagues who put in a lot of time and effort trying to get arXiv to reverse its decision would not have done so. I am extremely grateful to them for their courage.

Of course, a boycott will not put pressure on arXiv. (It would actually remove pressure if people who are critical of arXiv stop using it.) If really famous people publicly announce that they will stop posting to arXiv until the points I raise have been cleared up, that might lead to something.

It is not clear how large the problem is, in part because not everyone feels able to complain. I don’t think that my case is a one-off, or even part of a small minority, because otherwise arXiv would not have invested so much time and effort to prevent one more abstract from appearing in astro-ph. I have given them several opportunities to revert their decision and hence cut their losses, but never even received a reply to such requests. Thus, the problem is probably substantial, and hence should be of interest to the entire community.

Information based on the web pages pointed to by the URLs in the reference list reflects the state of those pages on 28 August 2020; that based on the technical behaviour of the arXiv interface reflects my experiences between 20 April and 25 July 2020. References to ‘arXiv’ reflect my experience with the astro-ph category.

I would be interested in hearing anything relevant to this topic by email (my address is easy enough to find). Please indicate the degree of confidentiality you wish.

Please point as many people as possible, by all means at your disposal, to this post and related discussion. I am probably taking a big risk by going public, but if I do so, I want it to have the maximum effect. I see the lack of accountability of arXiv as a serious problem in modern academia.

Footnotes

* Steinn Sigurdsson is the Scientific Director of arXiv.

@ Eleonora Presani was the first Executive Director of arXiv, the post having been created only in 2020, while arXiv itself was created in 1991. She used to work for Elsevier. On 21 December 2021, it was announced that she would step down. According to the same announcement, Steinn Sigurdsson is still Scientific Director. Robert Seiringer is no longer Chair of the physics committee. I don’t see a new Executive Director listed on the arXiv Leadership Team web page.

§ Even that happened only after noted cosmologist Max Tegmark had threatened to complain to arXiv’s sponsors if my paper wasn’t taken out of limbo. Before, I had received only an extremely brief reply from Sigurdsson, and that only after a colleague who has known him for a long time discussed my complaints with him. Tegmark not only agrees that arXiv is overstepping its bounds by essentially overriding the refereeing process of a respected journal, but also that there is no reason that my paper should not be allowed to appear in astro-ph. He was also kind enough and brave enough to give me permission to quote from his emails to me. These do contain quotations of emails he received from arXiv. Ethically, I think that trying to correct the tremendous harm done to me and others because of wrong reclassification overrides any concerns about quoting without permission (which of course would not be given), especially since such quotations make my case much stronger than merely paraphrasing what others have told me or even just my own suspicions; this is a typical whistle-blower situation.

$ The only reason which I was given is the alleged lack of “substantiveness” of the paper. Max Tegmark, on the other hand, wasn’t told that, but was told that my case is “complicated” and that “[t]he reason for this [arXiv not automatically accepting a paper accepted by a journal] is partly the SCOAP3 agreement, which arXiv is not party to but still put certain obligations on us, and partly because we can not privilege any one journal or publisher for legal reasons. We get sued.” (Max Tegmark, personal communication.) I certainly don’t think that arXiv should automatically accept a paper just because it has been accepted by any journal, but do think that rejecting or reclassifying a paper which has been accepted by a respected journal should be done only under extreme circumstances, via a transparent and fair process, and for reasons which can be explained. Also, no one I have talked to has any idea how SCOAP3 could be relevant to my paper. Apart from Max Tegmark, several other colleagues (all full professors of cosmology / astrophysics / astronomy at major research universities) tried to intervene with arXiv (which did not want even discuss the matter with a low-life such as myself). That none of them want their names mentioned publicly is a problem in itself: the people whom arXiv is supposed to serve do not feel free to offer constructive criticism in public. Between the lines (or even in them, if one is allowed to see them), it seems that, in my case, the reclassification was not due to the contents or quality of my paper, but rather indicates another, possibly even more serious, problem: arXiv appears to be afraid of getting sued by crackpots. Apparently they abuse the gen-ph category (which is a mix of papers about general physics, papers which at first or even second or third glance obviously belong in another category and have nothing obviously wrong with them, and genuine crackpot stuff) by reclassifying some real papers to it and also letting through a few crackpot papers, thus avoiding the accusation of white-listing the major journals (which shouldn’t be a problem) and the crackpots can be appeased by having their papers in the same category as some major-journal papers. Of course this is not a policy which arXiv has published, but when several people get the same message behind the scenes, it is as certain as it needs to be to make my case. Although I believe that the concept still would have been deeply flawed, I offered to leave the paper in gen-ph but get have it cross-listed to astro-ph, but that suggestion was rejected by arXiv. Of course, if their goal is to appease the crackpots but at the same time keep them out of the major categories, that strategy wouldn’t work, because they would then have to cross-list crackpot papers or make a distinction, which is what they are trying to avoid (or rather they want to have a few alibi papers with no distinction).

# Licia Verde was Chair of the arXiv Scientific Advisory Committee. The Chair is now Ralph Wijers, who is also chair of the Physics Advisory Committee. I did contact him, but he sees no reason to investigate my case, as it happened before his posts as Chairman.

& The SAO/NASA Astrophysics Data system is the most important bibliographic database in astronomy/astrophysics/cosmology, operated by the Smithsonian Astrophysical Observatory (part of the Harvard/Smithsonian Center for Astrophysics, which also includes the Harvard College Observatory) under a grant from the National Aeronautics and Space Administration.


Category Theory Calendar

6 April, 2020

There are now enough online events in category theory that a calendar is needed. And here it is!

https://teamup.com/ksfss6k4j1bxc8vztb

It should show the times in your time zone, at least if you don’t prevent it from getting that information.


Category Theory Community Server

25 March, 2020

My student Christian Williams has started a community server for category theory, computer science, logic, as well as general science and industry. In just a few days, it has grown into a large and lively place, with people of many backgrounds and interests. Please feel free to join!

Register here:

https://categorytheory.zulipchat.com/join/vxijncyvot5japrc426ntmwr/

(this link will expire in a while) and from then on you can just go here:

http://categorytheory.zulipchat.com

If the link for registration has expired, just let me know and I’ll revive it.

\;
category-theory-banner-light


A Double Conference

23 February, 2018

Here’s a cool way to cut carbon emissions: a double conference. The idea is to have a conference in two faraway locations connected by live video stream, to reduce the amount of long-distance travel!

Even better, it’s about a great subject:

• Higher algebra and mathematical physics, August 13–17, 2018, Perimeter Institute, Waterloo, Canada, and Max Planck Institute for Mathematics, Bonn, Germany.

Here’s the idea:

“Higher algebra” has become important throughout mathematics, physics, and mathematical physics, and this conference will bring together leading experts in higher algebra and its mathematical physics applications. In physics, the term “algebra” is used quite broadly: any time you can take two operators or fields, multiply them, and write the answer in some standard form, a physicist will be happy to call this an “algebra”. “Higher algebra” is characterized by the appearance of a hierarchy of multilinear operations (e.g. A-infinity and L-infinity algebras). These structures can be higher categorical in nature (e.g. derived categories, cohomology theories), and can involve mixtures of operations and co-operations (Hopf algebras, Frobenius algebras, etc.). Some of these notions are purely algebraic (e.g. algebra objects in a category), while others are quite geometric (e.g. shifted symplectic structures).

An early manifestation of higher algebra in high-energy physics was supersymmetry. Supersymmetry makes quantum field theory richer and thus more complicated, but at the same time many aspects become more tractable and many problems become exactly solvable. Since then, higher algebra has made numerous appearances in mathematical physics, both high- and low-energy._

Participation is limited. Some financial support is available for early-career mathematicians. For more information and to apply, please visit the conference website of the institute closer to you:

North America: http://www.perimeterinstitute.ca/HAMP
Europe: http://www.mpim-bonn.mpg.de/HAMP

If you have any questions, please write to double.conference.2018@gmail.com.

One of the organizers, Aaron Mazel-Gee, told me:

We are also interested in spreading the idea of double conferences more generally: we’re hoping that our own event’s success inspires other academic communities to organize their own double conferences. We’re hoping to eventually compile a sort of handbook to streamline the process for others, so that they can learn from our own experiences regarding the various unique challenges that organizing such an event poses. Anyways, all of this is just to say that I would be happy for you to publicize this event anywhere that it might reach these broader audiences.

So, if you’re interested in having a double conference, please contact the organizers of this one for tips on how to do it! I’m sure they’ll have better advice after they’ve actually done it. I’ve found that the technical details really matter for these things: it can be very frustrating when they don’t work correctly. Avoiding such problems requires testing everything ahead of time—under conditions that exactly match what you’re planning to do!


Saving Climate Data (Part 6)

23 February, 2017

Scott Pruitt, who filed legal challenges against Environmental Protection Agency rules fourteen times, working hand in hand with oil and gas companies, is now head of that agency. What does that mean about the safety of climate data on the EPA’s websites? Here is an inside report:

• Dawn Reeves, EPA preserves Obama-Era website but climate change data doubts remain, InsideEPA.com, 21 February 2017.

For those of us who are backing up climate data, the really important stuff is in red near the bottom.

The EPA has posted a link to an archived version of its website from Jan. 19, the day before President Donald Trump was inaugurated and the agency began removing climate change-related information from its official site, saying the move comes in response to concerns that it would permanently scrub such data.

However, the archived version notes that links to climate and other environmental databases will go to current versions of them—continuing the fears that the Trump EPA will remove or destroy crucial greenhouse gas and other data.

The archived version was put in place and linked to the main page in response to “numerous [Freedom of Information Act (FOIA)] requests regarding historic versions of the EPA website,” says an email to agency staff shared by the press office. “The Agency is making its best reasonable effort to 1) preserve agency records that are the subject of a request; 2) produce requested agency records in the format requested; and 3) post frequently requested agency records in electronic format for public inspection. To meet these goals, EPA has re-posted a snapshot of the EPA website as it existed on January 19, 2017.”

The email adds that the action is similar to the snapshot taken of the Obama White House website.

The archived version of EPA’s website includes a “more information” link that offers more explanation.

For example, it says the page is “not the current EPA website” and that the archive includes “static content, such as webpages and reports in Portable Document Format (PDF), as that content appeared on EPA’s website as of January 19, 2017.”

It cites technical limits for the database exclusions. “For example, many of the links contained on EPA’s website are to databases that are updated with the new information on a regular basis. These databases are not part of the static content that comprises the Web Snapshot.” Searches of the databases from the archive “will take you to the current version of the database,” the agency says.

“In addition, links may have been broken in the website as it appeared” on Jan. 19 and those will remain broken on the snapshot. Links that are no longer active will also appear as broken in the snapshot.

“Finally, certain extremely large collections of content… were not included in the Snapshot due to their size” such as AirNow images, radiation network graphs, historic air technology transfer network information, and EPA’s searchable news releases.”

‘Smart’ Move

One source urging the preservation of the data says the snapshot appears to be a “smart” move on EPA’s behalf, given the FOIA requests it has received, and notes that even though other groups like NextGen Climate and scientists have been working to capture EPA’s online information, having it on EPA’s site makes it official.

But it could also be a signal that big changes are coming to the official Trump EPA site, and it is unclear how long the agency will maintain the archived version.

The source says while it is disappointing that the archive may signal the imminent removal of EPA’s climate site, “at least they are trying to accommodate public concerns” to preserve the information.

A second source adds that while it is good that EPA is seeking “to address the widespread concern” that the information will be removed by an administration that does not believe in human-caused climate change, “on the other hand, it doesn’t address the primary concern of the data. It is snapshots of the web text.” Also, information “not included,” such as climate databases, is what is difficult to capture by outside groups and is what really must be preserved.

“If they take [information] down” that groups have been trying to preserve, then the underlying concern about access to data remains. “Web crawlers and programs can do things that are easy,” such as taking snapshots of text, “but getting the data inside the database is much more challenging,” the source says.

The first source notes that EPA’s searchable databases, such as those maintained by its Clean Air Markets Division, are used by the public “all the time.”

The agency’s Office of General Counsel (OGC) Jan. 25 began a review of the implications of taking down the climate page—a planned wholesale removal that was temporarily suspended to allow for the OGC review.

But EPA did remove some specific climate information, including links to the Clean Power Plan and references to President Barack Obama’s Climate Action Plan. Inside EPA captured this screenshot of the “What EPA Is Doing” page regarding climate change. Those links are missing on the Trump EPA site. The archive includes the same version of the page as captured by our screenshot.

Inside EPA first reported the plans to take down the climate information on Jan. 17.

After the OGC investigation began, a source close to the Trump administration said Jan. 31 that climate “propaganda” would be taken down from the EPA site, but that the agency is not expected to remove databases on GHG emissions or climate science. “Eventually… the propaganda will get removed…. Most of what is there is not data. Most of what is there is interpretation.”

The Sierra Club and Environmental Defense Fund both filed FOIA requests asking the agency to preserve its climate data, while attorneys representing youth plaintiffs in a federal climate change lawsuit against the government have also asked the Department of Justice to ensure the data related to its claims is preserved.

The Azimuth Climate Data Backup Project and other groups are making copies of actual databases, not just the visible portions of websites.


Azimuth Backup Project (Part 4)

18 February, 2017

The Azimuth Climate Data Backup Project is going well! Our Kickstarter campaign ended on January 31st and the money has recently reached us. Our original goal was $5000. We got $20,427 of donations, and after Kickstarter took its cut we received $18,590.96.

Next time I’ll tell you what our project has actually been doing. This time I just want to give a huge “thank you!” to all 627 people who contributed money on Kickstarter!

I sent out thank you notes to everyone, updating them on our progress and asking if they wanted their names listed. The blanks in the following list represent people who either didn’t reply, didn’t want their names listed, or backed out and decided not to give money. I’ll list people in chronological order: first contributors first.

Only 12 people backed out; the vast majority of blanks on this list are people who haven’t replied to my email. I noticed some interesting but obvious patterns. For example, people who contributed later are less likely to have answered my email yet—I’ll update this list later. People who contributed more money were more likely to answer my email.

The magnitude of contributions ranged from $2000 to $1. A few people offered to help in other ways. The response was international—this was really heartwarming! People from the US were more likely than others to ask not to be listed.

But instead of continuing to list statistical patterns, let me just thank everyone who contributed.

thank-you-message2_edited-1

Daniel Estrada
Ahmed Amer
Saeed Masroor
Jodi Kaplan
John Wehrle
Bob Calder
Andrea Borgia
L Gardner

Uche Eke
Keith Warner
Dean Kalahan
James Benson
Dianne Hackborn

Walter Hahn
Thomas Savarino
Noah Friedman
Eric Willisson
Jeffrey Gilmore
John Bennett
Glenn McDavid

Brian Turner

Peter Bagaric

Martin Dahl Nielsen
Broc Stenman

Gabriel Scherer
Roice Nelson
Felipe Pait
Kenneth Hertz

Luis Bruno


Andrew Lottmann
Alex Morse

Mads Bach Villadsen
Noam Zeilberger

Buffy Lyon

Josh Wilcox

Danny Borg

Krishna Bhogaonker
Harald Tveit Alvestrand


Tarek A. Hijaz, MD
Jouni Pohjola
Chavdar Petkov
Markus Jöbstl
Bjørn Borud


Sarah G

William Straub

Frank Harper
Carsten Führmann
Rick Angel
Drew Armstrong

Jesimpson

Valeria de Paiva
Ron Prater
David Tanzer

Rafael Laguna
Miguel Esteves dos Santos 
Sophie Dennison-Gibby




Randy Drexler
Peter Haggstrom


Jerzy Michał Pawlak
Santini Basra
Jenny Meyer


John Iskra

Bruce Jones
Māris Ozols
Everett Rubel



Mike D
Manik Uppal
Todd Trimble

Federer Fanatic

Forrest Samuel, Harmos Consulting








Annie Wynn
Norman and Marcia Dresner



Daniel Mattingly
James W. Crosby








Jennifer Booth
Greg Randolph





Dave and Karen Deeter

Sarah Truebe









Tieg Zaharia
Jeffrey Salfen
Birian Abelson

Logan McDonald

Brian Truebe
Jon Leland


Nicole



Sarah Lim







James Turnbull




John Huerta
Katie Mandel Bruce
Bethany Summer




Heather Tilert

Anna C. Gladstone



Naom Hart
Aaron Riley

Giampiero Campa

Julie A. Sylvia


Pace Willisson









Bangskij










Peter Herschberg

Alaistair Farrugia


Conor Hennessy




Stephanie Mohr




Torinthiel


Lincoln Muri 
Anet Ferwerda 


Hanna





Michelle Lee Guiney

Ben Doherty
Trace Hagemann







Ryan Mannion


Penni and Terry O'Hearn



Brian Bassham
Caitlin Murphy
John Verran






Susan


Alexander Hawson
Fabrizio Mafessoni
Anita Phagan
Nicolas Acuña
Niklas Brunberg

Adam Luptak
V. Lazaro Zamora






Branford Werner
Niklas Starck Westerberg
Luca Zenti and Marta Veneziano 


Ilja Preuß
Christopher Flint

George Read 
Courtney Leigh

Katharina Spoerri


Daniel Risse



Hanna
Charles-Etienne Jamme
rhackman41



Jeff Leggett

RKBookman


Aaron Paul
Mike Metzler


Patrick Leiser

Melinda

Ryan Vaughn
Kent Crispin

Michael Teague

Ben



Fabian Bach
Steven Canning


Betsy McCall

John Rees

Mary Peters

Shane Claridge
Thomas Negovan
Tom Grace
Justin Jones


Jason Mitchell




Josh Weber
Rebecca Lynne Hanginger
Kirby


Dawn Conniff


Michael T. Astolfi



Kristeva

Erik
Keith Uber

Elaine Mazerolle
Matthieu Walraet

Linda Penfold




Lujia Liu



Keith



Samar Tareem


Henrik Almén
Michael Deakin 
Rutger Ockhorst

Erin Bassett
James Crook



Junior Eluhu
Dan Laufer
Carl
Robert Solovay






Silica Magazine







Leonard Saers
Alfredo Arroyo García



Larry Yu













John Behemonth


Eric Humphrey


Svein Halvor Halvorsen



Karim Issa

Øystein Risan Borgersen
David Anderson Bell III











Ole-Morten Duesend







Adam North and Gabrielle Falquero

Robert Biegler 


Qu Wenhao






Steffen Dittmar




Shanna Germain






Adam Blinkinsop







John WS Marvin (Dread Unicorn Games)


Bill Carter
Darth Chronis 



Lawrence Stewart

Gareth Hodges

Colin Backhurst
Christopher Metzger

Rachel Gumper


Mariah Thompson

Falk Alexander Glade
Johnathan Salter




Maggie Unkefer
Shawna Maryanovich






Wilhelm Fitzpatrick
Dylan “ExoByte” Mayo
Lynda Lee




Scott Carpenter



Charles D, Payet
Vince Rostkowski


Tim Brown
Raven Daegmorgan
Zak Brueckner


Christian Page

Adi Shavit


Steven Greenberg
Chuck Lunney



Adriel Bustamente

Natasha Anicich



Bram De Bie
Edward L






Gray Detrick
Robert


Sarah Russell

Sam Leavin

Abilash Pulicken

Isabel Olondriz
James Pierce
James Morrison


April Daniels



José Tremblay Champagne


Chris Edmonds

Hans & Maria Cummings
Bart Gasiewiski


Andy Chamard



Andrew Jackson

Christopher Wright

Crystal Collins

ichimonji10


Alan Stern
Alison W


Dag Henrik Bråtane





Martin Nilsson


William Schrade


Saving Climate Data (Part 5)

6 February, 2017

march-for-science-earth-day

There’s a lot going on! Here’s a news roundup. I will separately talk about what the Azimuth Climate Data Backup Project is doing.

I’ll start with the bad news, and then go on to some good news.

Tweaking the EPA website

Scientists are keeping track of how Trump administration is changing the Environmental Protection Agency website, with before-and-after photos, and analysis:

• Brian Kahn, Behold the “tweaks” Trump has made to the EPA website (so far), National Resources Defense Council blog, 3 February 2017.

There’s more about “adaptation” to climate change, and less about how it’s caused by carbon emissions.

All of this would be nothing compared to the new bill to eliminate the EPA, or Myron Ebell’s plan to fire most of the people working there:

• Joe Davidson, Trump transition leader’s goal is two-thirds cut in EPA employees, Washington Post, 30 January 2017.

If you want to keep track of this battle, I recommend getting a 30-day free subscription to this online magazine:

InsideEPA.com.

Taking animal welfare data offline

The Trump team is taking animal-welfare data offline. The US Department of Agriculture will no longer make lab inspection results and violations publicly available, citing privacy concerns:

• Sara Reardon, US government takes animal-welfare data offline, Nature Breaking News, 3 Feburary 2017.

Restricting access to geospatial data

A new bill would prevent the US government from providing access to geospatial data if it helps people understand housing discrimination. It goes like this:

Notwithstanding any other provision of law, no Federal funds may be used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing._

For more on this bill, and the important ways in which such data has been used, see:

• Abraham Gutman, Scott Burris, and the Temple University Center for Public Health Law Research, Where will data take the Trump administration on housing?, Philly.com, 1 February 2017.

The EDGI fights back

The Environmental Data and Governance Initiative or EDGI is working to archive public environmental data. They’re helping coordinate data rescue events. You can attend one and have fun eating pizza with cool people while saving data:

• 3 February 2017, Portland
• 4 February 2017, New York City
• 10-11 February 2017, Austin Texas
• 11 February 2017, U. C. Berkeley, California
• 18 February 2017, MIT, Cambridge Massachusetts
• 18 February 2017, Haverford Connecticut
• 18-19 February 2017, Washington DC
• 26 February 2017, Twin Cities, Minnesota

Or, work with EDGI to organize one your own data rescue event! They provide some online tools to help download data.

I know there will also be another event at UCLA, so the above list is not complete, and it will probably change and grow over time. Keep up-to-date at their site:

Environmental Data and Governance Initiative.

Scientists fight back

The pushback is so big it’s hard to list it all! For now I’ll just quote some of this article:

• Tabitha Powledge, The gag reflex: Trump info shutdowns at US science agencies, especially EPA, 27 January 2017.

THE PUSHBACK FROM SCIENCE HAS BEGUN

Predictably, counter-tweets claiming to come from rebellious employees at the EPA, the Forest Service, the USDA, and NASA sprang up immediately. At The Verge, Rich McCormick says there’s reason to believe these claims may be genuine, although none has yet been verified. A lovely head on this post: “On the internet, nobody knows if you’re a National Park.”

At Hit&Run, Ronald Bailey provides handles for several of these alt tweet streams, which he calls “the revolt of the permanent government.” (That’s a compliment.)

Bailey argues, “with exception perhaps of some minor amount of national security intelligence, there is no good reason that any information, data, studies, and reports that federal agencies produce should be kept from the public and press. In any case, I will be following the Alt_Bureaucracy feeds for a while.”

NeuroDojo Zen Faulkes posted on how to demand that scientific societies show some backbone. “Ask yourself: “Have my professional societies done anything more political than say, ‘Please don’t cut funding?’” Will they fight?,” he asked.

Scientists associated with the group_ 500 Women Scientists _donned lab coats and marched in DC as part of the Women’s March on Washington the day after Trump’s Inauguration, Robinson Meyer reported at the Atlantic. A wildlife ecologist from North Carolina told Meyer, “I just can’t believe we’re having to yell, ‘Science is real.’”

Taking a cue from how the Women’s March did its social media organizing, other scientists who want to set up a Washington march of their own have put together a closed Facebook group that claims more than 600,000 members, Kate Sheridan writes at STAT.

The #ScienceMarch Twitter feed says a date for the march will be posted in a few days. [The march will be on 22 April 2017.] The group also plans to release tools to help people interested in local marches coordinate their efforts and avoid duplication.

At The Atlantic, Ed Yong describes the political action committee 314Action. (314=the first three digits of pi.)

Among other political activities, it is holding a webinar on Pi Day—March 14—to explain to scientists how to run for office. Yong calls 314Action the science version of Emily’s List, which helps pro-choice candidates run for office. 314Action says it is ready to connect potential candidate scientists with mentors—and donors.

Other groups may be willing to step in when government agencies wimp out. A few days before the Inauguration, the Centers for Disease Control and Prevention abruptly and with no explanation cancelled a 3-day meeting on the health effects of climate change scheduled for February. Scientists told Ars Technica’s Beth Mole that CDC has a history of running away from politicized issues.

One of the conference organizers from the American Public Health Association was quoted as saying nobody told the organizers to cancel.

I believe it. Just one more example of the chilling effect on global warming. In politics, once the Dear Leader’s wishes are known, some hirelings will rush to gratify them without being asked.

The APHA guy said they simply wanted to head off a potential last-minute cancellation. Yeah, I guess an anticipatory pre-cancellation would do that.

But then—Al Gore to the rescue! He is joining with a number of health groups—including the American Public Health Association—to hold a one-day meeting on the topic Feb 16 at the Carter Center in Atlanta, CDC’s home base. Vox’s Julia Belluz reports that it is not clear whether CDC officials will be part of the Gore rescue event.

The Sierra Club fights back

The Sierra Club, of which I’m a proud member, is using the Freedom of Information Act or FOIA to battle or at least slow the deletion of government databases. They wisely started even before Trump took power:

• Jennifer A Dlouhy, Fearing Trump data purge, environmentalists push to get records, BloombergMarkets, 13 January 2017.

Here’s how the strategy works:

U.S. government scientists frantically copying climate data they fear will disappear under the Trump administration may get extra time to safeguard the information, courtesy of a novel legal bid by the Sierra Club.

The environmental group is turning to open records requests to protect the resources and keep them from being deleted or made inaccessible, beginning with information housed at the Environmental Protection Agency and the Department of Energy. On Thursday [January 9th], the organization filed Freedom of Information Act requests asking those agencies to turn over a slew of records, including data on greenhouse gas emissions, traditional air pollution and power plants.

The rationale is simple: Federal laws and regulations generally block government agencies from destroying files that are being considered for release. Even if the Sierra Club’s FOIA requests are later rejected, the record-seeking alone could prevent files from being zapped quickly. And if the records are released, they could be stored independently on non-government computer servers, accessible even if other versions go offline.


Azimuth Backup Project (Part 3)

22 January, 2017


azimuth_logo

Along with the bad news there is some good news:

• Over 380 people have pledged over $14,000 to the Azimuth Backup Project on Kickstarter, greatly surpassing our conservative initial goal of $5,000.

• Given our budget, we currently aim at backing up 40 terabytes of data, and we are well on our way to this goal. You can see what we’ve done at Our Progress, and what we’re still doing at the Issue Tracker.

• I have gotten a commitment from Danna Gianforte, the head of Computing and Communications at U. C. Riverside, that eventually the university will maintain a copy of our data. (This commitment is based on my earlier estimate that we’d have 20 terabytes of data, so I need to see if 40 is okay.)

• I have gotten two offers from other people, saying they too can hold our data.

I’m hoping that the data at U. C. Riverside will be made publicly available through a server. The other offers may involve it being held ‘secretly’ until such time as it became needed; that has its own complementary advantages.

However, the interesting problem that confronts us now is: how to spend our money?

You can see how we’re currently spending it on our Budget and Spending page. Basically, we’re paying a firm called Hetzner for servers and storage boxes.

We could simply continue to do this until our money runs out. I hope that long before then, U. C. Riverside will have taken over some responsibilities. If so, there would be a long period where our money would largely pay for a redundant backup. Redundancy is good, but perhaps there is something better.

Two members of our team, Sakari Maaranen and Greg Kochanski, have thoughts on this matter which I’d like to share. Sakari posted his thoughts on Google+, while Greg posted his in an email which he’s letting me share here.

Please read these and offer us your thoughts! Maybe you can help us decide on the best strategy!

Sakari Maaranen

For the record, my views on our strategy of using the budget that the Azimuth Climate Data Backup Project now has.

People have contributed it to this effort specifically.

Some non-government entities have offered “free hosting”. Of course the project should take any and all free offers to host our data. Those would not be spending our budget however. And they are still paying for it, even if they offered it to us “for free”.

As far as it comes to spending, I think we should think in terms of 1) terabytemonths, and 2) sufficient redundancy, and do that as cost-efficiently as possible. We should not just dump the money to any takers, but think of the best bang for the buck. We owe that to the people who have contributed now.

For example, if we burn the cash quick to expensive storage, I would consider that a failure. Instead, we must plan for the best use of the budget towards our mission.

What we have promised to the people is that we back up and serve these data sets, by the money they have given to us. Let’s do exactly that.

We are currently serving the mission at approximately €0.006 per gigabytemonth at least for as long as we have volunteers to work for free. The cost could be slightly higher if we paid for professional maintenance, which should be a reasonable assumption if we plan for long term service. Volunteer work cannot be guaranteed forever, even if it works temporarily.

This is one view and the question is open to public discussion.

Greg Kochanski

Some misc thoughts.

1) As I see it, we have made some promise of serving the data (“create a better interface for getting it”) which can be an expensive thing.

UI coding isn’t all that easy, and takes some time.

Beyond that, we’ve promised to back up the data, and once you say “backup”, you’ve also made an implicit promise to make the data available.

2) I agree that if we have a backup, it is a logical extension to take continuous backups, but I wouldn’t say it’s necessary.

Perhaps the way to think about it is to ask the question, “what do our donors likely want”?

3) Clearly they want to preserve the data, in case it disappears from the Federal sites. So, that’s job 1. And, if it does disappear, we need to make it available.

3a) Making it available will require some serving CPU, disk, and network. We may need to worry about DDOS attacks, thought perhaps we could get free coverage from Akamai or Google Project Shield.

3b) Making it available may imply paying some students to write Javascript and HTML to put up a front-end to allow people to access the data we are collecting.

Not all the data we’re collecting is in strictly servable form. Some of the databases, for example aren’t usefully servable in the form we collect, and we know some links will be broken because of missing pages, or because of wget’s design flaw.*

[* Wget stores http://a/b/c as a file, a/b/c, where a/b is a directory. Wget stores http://a/b as a file a/b, where a/b is a file.

Therefore, both cannot exist simultaneously on disk. If they do, wget drops one.]

Points 3 & 3a imply that we need to keep some money in the bank until either the websites are taken down, or we decide that the threat has abated. So, we need to figure out how much money to keep as a serving reserve. It doesn’t sound like UCR has committed to serve the data, though you could perhaps ask.

Beyond the serving reserve, I think we are free to do better backups (i.e. more than one data collection), and change detection.


Saving Climate Data (Part 4)

21 January, 2017

At noon today in Washington DC, while Trump was being inaugurated, all mentions of “climate change” and “global warming” were eliminated from the White House website.

Well, not all. The word “climate” still shows up here:

President Trump is committed to eliminating harmful and unnecessary policies such as the Climate Action Plan….

There are also reports that all mentions of climate change will be scrubbed from the website of the Environmental Protection Agency, or EPA.

From Motherboard

Let me quote from this article:

• Jason Koebler, All references to climate change have been deleted from the White House website, Motherboard, 20 January 2017.

Scientists and professors around the country had been rushing to download and rehost as much government science as was possible before the transition, based on a fear that Trump’s administration would neglect or outright delete government information, databases, and web applications about science. Last week, the Radio Motherboard podcast recorded an episode about these efforts, which you can listen to below, or anywhere you listen to podcasts.

The Internet Archive, too, has been keeping a close watch on the White House website; President Obama’s climate change page had been archived every single day in January.

So far, nothing on the Environmental Protection Agency’s website has changed under Trump, but a report earlier this week from Inside EPA, a newsletter and website that reports on the agency, suggested that pages about climate are destined to be cut within the first few weeks of his presidency.

Scientists I’ve spoken to who are archiving websites say they expect scientific data on the NASA, NOAA, Department of Energy, and EPA websites to be neglected or deleted eventually. They say they don’t expect agency sites to be updated immediately, but expect it to play out over the course of months. This sort of low-key data destruction might not be the type of censorship people typically think about, but scientists are treating it as such.

From Technology Review

Greg Egan pointed out another good article, on MIT’s magazine:

• James Temple, Climate data preservation efforts mount as Trump takes office, Technology Review, 20 January 2010.

Quoting from that:

Dozens of computer science students at the University of California, Los Angeles, will mark Inauguration Day by downloading federal climate databases they fear could vanish under the Trump Administration.

Friday’s hackathon follows a series of grassroots data preservation efforts in recent weeks, amid increasing concerns the new administration is filling agencies with climate deniers likely eager to cut off access to scientific data that undermine their policy views. Those worries only grew earlier this week, when Inside EPA reported website that the Environmental Protection Agency transition team plans to scrub climate data from the agency’s website, citing a source familiar with the team.

Earlier federal data hackathons include the “Guerrilla Archiving” event at the University of Toronto last month, the Internet Archive’s Gov Data Hackathon in San Francisco at the beginning of January, and the DataRescue Philly event at the University of Pennsylvania last week.

Much of the collected data is being stored in the servers of the End of Term Web Archive, a collaborative effort to preserve government websites at the conclusion of presidential terms. The University of Pennsylvania’s Penn Program in Environmental Humanities launched the separate DataRefuge project, in part to back up environmental data sets that standard Web crawling tools can’t collect.

Many of the groups are working off a master list of crucial data sets from NASA, the National Oceanic and Atmospheric Administration, the U.S. Geological Survey, and other agencies. Meteorologist and climate journalist Eric Holthaus helped prompt the creation of that crowdsourced list with a tweet early last month.

Other key developments driving the archival initiatives included reports that the transition team had asked Energy Department officials for a list of staff who attended climate change meetings in recent years, and public statements from senior campaign policy advisors arguing that NASA should get out of the business of “politically correct environmental monitoring.”

“The transition team has given us no reason to believe that they will respect scientific data, particularly when it’s inconvenient,” says Gretchen Goldman, research director in the Center for Science and Democracy at the Union of Concerned Scientists. These historical databases are crucial to ongoing climate change research in the United States and abroad, she says.

To be clear, the Trump camp hasn’t publicly declared plans to erase or eliminate access to the databases. But there is certainly precedent for state and federal governments editing, removing, or downplaying scientific information that doesn’t conform to their political views.

Late last year, it emerged that text on Wisconsin’s Department of Natural Resources website was substantially rewritten to remove references to climate change. In addition, an extensive Congressional investigation concluded in a 2007 report that the Bush Administration “engaged in a systematic effort to manipulate climate change science and mislead policymakers and the public about the dangers of global warming.”

In fact these Bush Administration efforts were masterminded by Myron Ebell, who Trump chose to lead his EPA transition team!

Continuing:

In fact, there are wide-ranging changes to federal websites with every change in administration for a variety of reasons. The Internet Archive, which collaborated on the End of Term project in 2008 and 2012 as well, notes that more than 80 percent of PDFs on .gov sites disappeared during that four-year period.

The organization has seen a surge of interest in backing up sites and data this year across all government agencies, but particularly for climate information. In the end, they expect to collect well more than 100 terabytes of data, close to triple the amount in previous years, says Jefferson Bailey, director of Web archiving.

In fact the Azimuth Backup Project alone may gather about 40 terabytes!

From Inside EPA

And then there’s this view from inside the Environmental Protection Agency:

• Dawn Reeves, Trump transition preparing to scrub some climate data from EPA Website, Inside EPA, January 17, 2017

The incoming Trump administration’s EPA transition team intends to remove non-regulatory climate data from the agency’s website, including references to President Barack Obama’s June 2013 Climate Action Plan, the strategies for 2014 and 2015 to cut methane and other data, according to a source familiar with the transition team.

Additionally, Obama’s 2013 memo ordering EPA to establish its power sector carbon pollution standards “will not survive the first day,” the source says, a step that rule opponents say is integral to the incoming administration’s pledge to roll back the Clean Power Plan and new source power plant rules.

The Climate Action Plan has been the Obama administration’s government-wide blueprint for addressing climate change and includes information on cutting domestic greenhouse gas (GHG)emissions, including both regulatory and voluntary approaches; information on preparing for the impacts of climate change; and information on leading international efforts.

The removal of such information from EPA’s website — as well as likely removal of references to such programs that link to the White House and other agency websites — is being prepped now.

The transition team’s preparations fortify concerns from agency staff, environmentalists and many scientists that the Trump administration is going to destroy reams of EPA and other agencies’ climate data. Scientists have been preparing for this possibility for months, with many working to preserve key data on private websites.

Environmentalists are also stepping up their efforts to preserve the data. The Sierra Club Jan. 13 filed a Freedom of Information Act request seeking reams of climate-related data from EPA and the Department of Energy (DOE), including power plant GHG data. Even if the request is denied, the group said it should buy them some time.

“We’re interested in trying to download and preserve the information, but it’s going to take some time,” Andrea Issod, a senior attorney with the Sierra Club, told Bloomberg. “We hope our request will be a counterweight to the coming assault on this critical pollution and climate data.”

While Trump has pledged to take a host of steps to roll back Obama EPA climate and other high-profile actions actions on his first day in office, transition and other officials say the date may slip.

“In truth, it might not [happen] on the first day, it might be a week,” the source close to the transition says of the removal of climate information from EPA’s website. The source adds that in addition to EPA, the transition team is also looking at such information on the websites of DOE and the Interior Department.

Additionally, incoming Trump press secretary Sean Spicer told reporters Jan. 17 that not much may happen on Inauguration Day itself, but to expect major developments the following Monday, Jan. 23. “I think on [Jan. 23] you’re going to see a big flurry of activity” that is expected to include the disappearance of at least some EPA climate references.

Until Trump is inaugurated on Jan. 20, the transition team cannot tell agency staff what to do, and the source familiar with the transition team’s work is unaware of any communications requiring language removal or beta testing of websites happening now, though it appears that some of this work is occurring.

“We can only ask for information at this point until we are in charge. On [Jan. 20] at about 2 o’clock, then they can ask [staff] to” take actions, the source adds.

Scope & Breadth

The scope and breadth of the information to be removed is unclear. While it is likely to include executive actions on climate, it does not appear that the reams of climate science information, including models, tools and databases on the EPA Office of Research & Development’s (ORD) website will be impacted, at least not immediately.

ORD also has published climate, air and energy strategic research action plans, including one for 2016-2019 that includes research to assess impacts; prevent and reduce emissions; and prepare for and respond to changes in climate and air quality.

But other EPA information maintained on its websites including its climate change page and its “What is EPA doing about climate change” page that references the Climate Action Plan, the 2014 methane strategy and a 2015 oil and gas methane reduction strategy are expected targets.

Another possible target is new information EPA just compiled—and hosted a Jan. 17 webinar to discuss—on climate change impacts to vulnerable communities.

One former EPA official who has experience with transitions says it is unlikely that any top Obama EPA official is on board with this. “I would think they would be violently against this. . . I would think that the last thing [EPA Administrator] Gina McCarthy would want to do would to be complicit in Trump’s effort to purge the website” of climate-related work, and that if she knew she would “go ballistic.”

But the former official, the source close to the transition team and others note that EPA career staff is fearful and may be undertaking such prep work “as a defensive maneuver to avoid getting targeted,” the official says, adding that any directive would likely be coming from mid-level managers rather than political appointees or senior level officials.

But while the former official was surprised that such work might be happening now, the fact that it is only said to be targeting voluntary efforts “has a certain ring of truth to it. Someone who is knowledgeable would draw that distinction.”

Additionally, one science advocate says, “The people who are running the EPA transition have a long history of sowing misunderstanding about climate change and they tend to believe in a vast conspiracy in the scientific community to lie to the public. If they think the information is truly fraudulent, it would make sense they would try to scrub it. . . . But the role of the agency is to inform the public . . . [and not to satisfy] the musings of a band of conspiracy theorists.”

The source was referring to EPA transition team leader Myron Ebell, a long-time climate skeptic at the Competitive Enterprise Institute, along with David Schnare, another opponent of climate action, who is at the Energy & Environment Legal Institute.

And while “a new administration has the right to change information about policy, what they don’t have the right to do is change the scientific information about policies they wish to put forward and that includes removing resources on science that serve the public.”

The advocate adds that many state and local governments rely on EPA climate information.

EPA Concern

But there has been plenty of concern that such a move would take place, especially after transition team officials last month sought the names of DOE employees who worked on climate change, raising alarms and cries of a “political witch hunt” along with a Dec. 13 letter from Sen. Maria Cantwell (D-WA) that prompted the transition team to disavow the memo.

Since then, scientists have been scrambling to preserve government data.

On Jan. 10, High Country News reported that on a Saturday last month, 150 technology specialists, hackers, scholars and activists assembled in Toronto for the “Guerrilla Archiving Event: Saving Environmental Data from Trump” where the group combed the internet for key climate and environmental data from EPA’s website.

“A giant computer program would then copy the information onto an independent server, where it will remain publicly accessible—and safe from potential government interference.”

The organizer of the event, Henry Warwick, said, “Say Trump firewalls the EPA,” pulling reams of information from public access. “No one will have access to the data in these papers” unless the archiving took place.

Additionally, the Union of Concerned Scientists released a Jan. 17 report, “Preserving Scientific Integrity in Federal Policy Making,” urging the Trump administration to retain scientific integrity. It wrote in a related blog post, “So how will government science fare under Trump? Scientists are not just going to wait and see. More than 5,500 scientists have now signed onto a letter asking the president-elect to uphold scientific integrity in his administration. . . . We know what’s at stake. We’ve come too far with scientific integrity to see it unraveled by an anti-science president. It’s worth fighting for.”


Give the Earth a Present: Help Us Save Climate Data

28 December, 2016

getz_ice_shelf

We’ve been busy backing up climate data before Trump becomes President. Now you can help too, with some money to pay for servers and storage space. Please give what you can at our Kickstarter campaign here:

Azimuth Climate Data Backup Project.

If we get $5000 by the end of January, we can save this data until we convince bigger organizations to take over. If we don’t get that much, we get nothing. That’s how Kickstarter works. Also, if you donate now, you won’t be billed until January 31st.

So, please help! It’s urgent.

I will make public how we spend this money. And if we get more than $5000, I’ll make sure it’s put to good use. There’s a lot of work we could do to make sure the data is authenticated, made easily accessible, and so on.

The idea

The safety of US government climate data is at risk. Trump plans to have climate change deniers running every agency concerned with climate change. So, scientists are rushing to back up the many climate databases held by US government agencies before he takes office.

We hope he won’t be rash enough to delete these precious records. But: better safe than sorry!

The Azimuth Climate Data Backup Project is part of this effort. So far our volunteers have backed up nearly 1 terabyte of climate data from NASA and other agencies. We’ll do a lot more! We just need some funds to pay for storage space and a server until larger institutions take over this task.

The team

Jan Galkowski is a statistician with a strong interest in climate science. He works at Akamai Technologies, a company responsible for serving at least 15% of all web traffic. He began downloading climate data on the 11th of December.

• Shortly thereafter John Baez, a mathematician and science blogger at U. C. Riverside, joined in to publicize the project. He’d already founded an organization called the Azimuth Project, which helps scientists and engineers cooperate on environmental issues.

• When Jan started running out of storage space, Scott Maxwell jumped in. He used to work for NASA—driving a Mars rover among other things—and now he works for Google. He set up a 10-terabyte account on Google Drive and started backing up data himself.

• A couple of days later Sakari Maaranen joined the team. He’s a systems architect at Ubisecure, a Finnish firm, with access to a high-bandwidth connection. He set up a server, he’s downloading lots of data, he showed us how to authenticate it with SHA-256 hashes, and he’s managing many other technical aspects of this project.

There are other people involved too. You can watch the nitty-gritty details of our progress here:

Azimuth Backup Project – Issue Tracker.

and you can learn more here:

Azimuth Climate Data Backup Project.