New Data Curators Wanted

The Green Deal Data Observatory is looking for new data curators

Green Deal Data Observatory, Daniel Antal

Nov 9, 2022 7 min read

A data curator is a contributor in our open collaboration who will be named as a co-creator of tidy, standardized, reusable, FAIR, datasets in his/her field of expertise. Our curators help us vocalize the needs of their domain, be it data-driven beekeeping, or detecting algorithmic biases of recommender systems, and evaluates if the data that we come up with is directly usable and actionable. A data curator is a similar co-author as a “contributor” to open source software or a co-author of a journal article.

Table of Contents

Boost your career without a conflict of interest

Being a data curator does not mean a commercial affiliation with any observatory partners, it is an affiliation to jointly create intellectual property. All our data curators are identified by their ORCiD ideas and named as co-creators in the open science repositories where we make our data available.

We create CC0 data that can be used for commercial, academic, and policy purposes. However, we want to honor the intellectual investment into a shared intellectual property by

delaying the release (for remaining competitive in academic publishing, if our curator is using the data in new articles; or NGOs for their campaign)
creating hybrid assets for commercial users where some elements, particularly the ones that use their proprietary data, may not become open data.

FAIR: Findable, Accessible, Interoperable, and Reusable Digital Assets

Our observatories do not only work with open data.

We gladly add commercially available data to our observatory if we can share a large enough subset that our peer-reviewers can attest to the data’s high quality, usability, and actionability.

How to become a data curator?

Our handbook for curators a bit of a work in progress, but the [onboarding process](https://curators.dataobservatory.eu/onboarding.html) is clear. Do not worry if you do not use GitHub, it is not necessary, but we story and co-create our assets, including the curator's handbook on this digital co-working place. — Our handbook for curators a bit of a work in progress, but the onboarding process is clear. Do not worry if you do not use GitHub, it is not necessary, but we story and co-create our assets, including the curator’s handbook on this digital co-working place.

This is an open book that we co-create on GitHub, and if you find any roadblocks, you do not understand something, or have a better idea on how to illustrate or explain things, just make a for to this repo, improve it, add new photos, and send us a pull request. (You need an invite first for editing!)
Here is a starter repository on GitHub. Not mandatory, but if you use GitHub, start here.

In a nutshell:

Please read the entire covenant here.
We need a very brief biography. Name, affiliation, education details, one-line and short biography. Please, send back this bio_template.txt text file. If you know markdown, use this version. The files are identical, but your word processor may not know how to open an .md file.
Your ORCiD to resolve ambiguity with similarly named people. You may use different library or publication service IDs, such as Google Scholar, Publeon, etc, you may provide them, too, but we do need an ORCiD ID, because most of the EU open science infrastructure and the R ecosystem uses this one. If you do not have it, please create one—it only takes a few minutes. Please add it to the bio_template.txt.
Your LinkedIn ID, add it to the bio_template.txt.
You should follow our file naming conventions, and avoid the use of special characters in any file names at all times: , $, :,;,,,., ", ' tick or backtick.
You must send a ile picture that is at least 500px wide (jpg or png format.) It can be bigger, and preferably not a very “narrow” cut, as all avatars will be behind a circular mask (see other curators.)

Find inspiration from other contributors

Why data observatories?

Our data observatories (platform products) cover our R&D and platform costs while giving us access to an expanding range of prime clients. We use 21-st century open-source data engineering solutions, a decentralized data governance method, and web 3.0 technologies to avoid conflicts of interest and prevent the data Sisyphus of error-prone human data wrangling. There is little competition on this service level (there are about 60 UN/EU/OECD recognized data observatories, and almost all of them are managed by a different operator.) This layer is already monetized, and we have proven success. Our unique advantage is a combination of legal and technological skills: understanding legally open data, web 3.0, and data modeling, and the ability to participate in the open-source statistical /scientific software creator community.
We create open-source software applications that fuel our data observatories with unprocessed, open, linked data. We create software for the R statistical environment, which is used in both official statistics and in many business and academic organizations. The production of R software components is a competitive field, but we believe that our position is strong: the vast majority of R packages are lightly or not at all serviced because of the lack of financing.

Reprex produces [open-source scientific software](/https://reprex.nl/#releases), and various collaborative data engineering infrastructures to get legally open governmental data and open science data in a timely, usable format to ecological researchers, and ecotech innovators. — Reprex produces open-source scientific software, and various collaborative data engineering infrastructures to get legally open governmental data and open science data in a timely, usable format to ecological researchers, and ecotech innovators.

We provide bespoke analytics solutions to our institutional partners in our data observatories. Such bespoke solutions iterate over our existing software components, helping us design better applications within an ever-expanding ecosystem. Providing tailored data-science services would require a large organization without a clear focus. We provide these services on an ad-hoc basis only among institutional partners and users of our data observatories. In these circles, which are often prime clients, we face little or no competition because we are trusted partners and data and solution providers. This is a key to our revenue and market growth.
We develop high-value software-as-service applications that leverage our data observatory assets and our software solution into a novel, commercially valuable uses. Our applications are built around our family of open-source software and generalize our bespoke analytics solutions. We are in a late prototype phase where we already have some revenue and are trying to prepare for scaling up at the correct price with three of our applications. All of our applications are entering into highly competitive market segments. We are building on our ‘unfair’ advantage that we are bundling our solutions with data that is not accessible to competitors, and we can test them in the protected ecosystems of our observatories.

Good to know

FAIR Principles: improve the Findability, Accessibility, Interoperability, and Reuse of digital assets.
DataCite: A persistent, standardized approach to access, identification, sharing, and re-use of datasets—this is our favored way of describing data for future use according to the FAIR principles. Many EU open science repositories will ask your publications with this documentation.
Biblatex is a standard text file used by citation engines, bibliography management tool, and in scientific publication templates. (See for example the Overleaf Biblatex tutorial.
Dublin Core is an older international standard than DataCite, but the two standards greatly overlap. Dublin Core was originally developed by libraries. You often may need to fill out Dublin Core properties for publication.