Open source at CCeH in 2016

Welcome to the annual CCeH report on our contributions to the open source world!

We at the Cologne Center for eHumanities always turn to open source software for our DH projects. Not only because they are free (as in beer) and free (as in speech) but also because of the marvelous communities that have formed around many of these free and open source projects. Our way to say thank you to these communities is to give back to their projects.

In 2014 we started a conscious effort to develop in the open and be good open source citizens. In 2015 we reported on our first contributions to the free software components that we use in our projects. We are happy to report in 2016 CCeH has contributed even more and to many more projects.

Our own DH projects

The number of projects that are available on our GitHub space https://github.com/cceh has grown a lot in 2016.

Some of the project repositories we published in the last year:

We are also setting up a GitLab installation to better integrate collaborators from other universities, private foundations and research partners. Stay tuned for more news on this front.

Improvements to other projects

Sometimes a program does 99% of what we need. Instead of just complaining about the missing 1%, we do our best to contribute the missing functionalities or fixing some incorrect behavior. Contributing with improvements to open source projects is also as a sign of gratitude towards the volunteers that work daily to improve and to maintain it.

During 2016 we contributed to the collation tool CollateX, fixing some subtle errors and making it work better in UNIX workflows.[1,2,3] We also provided patches to make the eXist XML database understand and process correctly the unconventional range of dates used in Archeology.[4,5]

We also contributed small improvement to the Rack web server[6,7] and the W3C Web Audio specifications.[8]

Bug reports

Nobody is perfect. Every software has a problem or two. In case we stumble upon one of these problems, we go at great lengths to report it in the best way possible, spending time to understand its root cause and to gather all the details that the maintainers of the project will need to fix the problem.

In 2016 we reported many conformance and performance issues to the widely used eXist XML database.[9,10,11,12,13,14] In another XML database we use, BaseX, we pointed out some possible improvement to the way it is run on servers.[15,16]

Big and well-known projects are not outside our radar. For example, we have found out and reported that Wikipedia was involuntarily producing pages that were not well formed XHTML and could not be analyzed using standard XML tools.[17] Thankfully that has been fixed in a couple of days.

It is nice to observe how our bug reports reflect the technologies predominately used at the CCeH: XML and TEI[18], web servers[19], browsers [20] and WordPress[21].

2016 has been a fulfilling year. We will strive to make 2017 even better.

Open source at CCeH in 2015

At the Cologne Center for eHumanities we rely on plenty of open source components: programs, libraries, frameworks and so on. Having so many useful components available is great, but we do not want to be just passive users. We are eager to contribute back to the open source community.

In 2014/2015 we started a conscious effort to develop in the open and to be good open source citizens. Here is a small selection of things we have done so far. This is just the beginning.

Our own DH projects

A selection of our most recent projects, including some that are still in progress, are publicly available on our GitHub space: https://github.com/cceh/.

Reusable libraries

Some of the code we produce could be useful to other researches, so we extracted them from their main project and released them as separate projects. Do you need to filter, extract and publish bibliographic records from your Zotero collection? pybibgen[1] may be what you need. Do you want to automate eXist-db tasks using Gulp? gulp-exist[2] does just that.

Improvements to other projects

Giving back is important. Contributing bug fixes and new features is the best way to say thank you to an open source project. In 2015 and 2014 we contributed to many established projects we used. Our aim is to make the project even better for all other users. For example we provided a new way to generate IDs to Artefactual’s AtoM that made ingestion of millions of records a matter of seconds instead of hours, while generating better URLs at the same time [3]. We also reported other small problems and features [4,5,6,7,8]. In Saxon we pointed out two problems that once fixed made some XSLT run an order of magnitude faster [9,10,11,12]. We also produced detailed problem reports with test cases to eXist-db [13,14,15,16], Chromium [17] and other development tools [18,19,20].

These were our contributions for 2015. Let’s see what awaits us in 2016.

  1. https://github.com/cceh/pybibgen
  2. https://github.com/olvidalo/gulp-exist
  3. https://github.com/artefactual/atom/pull/187
  4. https://github.com/artefactual/atom/pull/32
  5. https://github.com/artefactual/atom/pull/41
  6. https://github.com/artefactual/atom/pull/54
  7. https://projects.artefactual.com/issues/7026
  8. https://projects.artefactual.com/issues/6785
  9. https://saxonica.plan.io/boards/3/topics/6209
  10. https://saxonica.plan.io/issues/2489
  11. https://saxonica.plan.io/boards/3/topics/6256
  12. https://saxonica.plan.io/issues/2565
  13. https://github.com/eXist-db/exist/issues/362
  14. https://github.com/eXist-db/exist/issues/426
  15. https://github.com/eXist-db/exist/issues/712
  16. https://github.com/eXist-db/exist/issues/811
  17. https://code.google.com/p/chromium/issues/detail?id=565225
  18. https://mailman.uni-konstanz.de/pipermail/basex-talk/2015-February/008185.html
  19. https://github.com/toggl/toggldesktop/pull/1196
  20. https://github.com/davidswelt/zot_bib_web/issues/1