<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R | Green Deal Data Observatory</title>
    <link>https://greendeal.dataobservatory.eu/tag/r/</link>
      <atom:link href="https://greendeal.dataobservatory.eu/tag/r/index.xml" rel="self" type="application/rss+xml" />
    <description>R</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Fri, 07 Oct 2022 12:35:00 +0200</lastBuildDate>
    <image>
      <url>https://greendeal.dataobservatory.eu/media/icon_hu15ef3b829c0a4063327dbf09185a10cc_70008_512x512_fill_lanczos_center_3.png</url>
      <title>R</title>
      <link>https://greendeal.dataobservatory.eu/tag/r/</link>
    </image>
    
    <item>
      <title>Learn R with Reprex</title>
      <link>https://greendeal.dataobservatory.eu/slides/learnr-with-reprex/</link>
      <pubDate>Fri, 07 Oct 2022 12:35:00 +0200</pubDate>
      <guid>https://greendeal.dataobservatory.eu/slides/learnr-with-reprex/</guid>
      <description>&lt;h1 id=&#34;big-data-creates-inequalities&#34;&gt;Big Data Creates Inequalities&lt;/h1&gt;
&lt;p&gt;Only the largest corporations, best-endowed universities, and rich governments can afford data collection and processing capacities that are large enough to harness the advantages of AI.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;slide-navigation&#34;&gt;Slide navigation&lt;/h2&gt;
&lt;p&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;️&amp;gt;&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt; | Previous :️&lt;code&gt;&amp;lt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt; | Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;|  Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click 🖱️&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;big-data-that-works-for-all&#34;&gt;Big data that works for all&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p style=&#34;font-size:75%&#34;&gt;No matter how big is the problem or how small is your team, `Reprex` fill your reports, dashboards, newsletters, books with data and its visualization.
&lt;/li&gt;
&lt;li&gt;
&lt;p style=&#34;font-size:75%&#34;&gt;Learn R with us: you can reduce the inequalities by joining the open source movement, learning to run open source software, ask for help, improve the tutorials, the documentation, and eventually learn to make the computer work for you.
&lt;/li&gt;
&lt;li&gt;
&lt;p style=&#34;font-size:75%&#34;&gt;Contributor Covenant: Participating in open source is often a highly collaborative experience. We’re encouraged to create in public view, and we’re incentivized to welcome contributions of all kinds from people around the world. This makes the practice of open source as much social as it is technical.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;get-inspired&#34;&gt;Get Inspired&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://curators.dataobservatory.eu/inspiration.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Find more interesting and better data&lt;/a&gt;: you don&amp;rsquo;t have to be a data scientist or write code to contribute to our projects.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://data-feminism.mitpress.mit.edu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data feminism&lt;/a&gt;: Catherine D&amp;rsquo;Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought. Highly inspirational, free, open-source book.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://rladies.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RLadies&lt;/a&gt; is a world-wide organization to promote gender diversity in the R community.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;contributor-covenant&#34;&gt;Contributor Covenant&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p style=&#34;font-size:75%&#34;&gt;We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p style=&#34;font-size:75%&#34;&gt;We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_example_1.webp&#34;
  
      
      data-background-position=&#34;center&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_example_2.webp&#34;
  
      
      data-background-position=&#34;center&#34;
  &gt;

&lt;hr&gt;
&lt;h2 id=&#34;run-code-from-tutorials&#34;&gt;Run code from tutorials&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize.dataobservatory.eu&lt;/a&gt;&lt;/br&gt;
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/retroharmonize.htmll&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;🖱 Get started&lt;/a&gt;&lt;/br&gt;
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;🖱️  Articles&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_readme.webp&#34;
  
      
      data-data-background-position=&#34;bottom&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;github_issues_spotifyR.webp&#34;
  &gt;

&lt;h2 id=&#34;find-help-ask-for-help-reprex&#34;&gt;Find help, ask for help: reprex&lt;/h2&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_tutorials.webp&#34;
  &gt;

&lt;h2 id=&#34;documentation-for-better-tutorials&#34;&gt;Documentation for better tutorials&lt;/h2&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_r_testthat.webp&#34;
  &gt;

&lt;h2 id=&#34;debugging-and-testing-code&#34;&gt;Debugging and testing code&lt;/h2&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_r_documentation.webp&#34;
  &gt;

&lt;h2 id=&#34;contribute-to-documentation&#34;&gt;Contribute to documentation&lt;/h2&gt;
&lt;hr&gt;
&lt;h2 id=&#34;r-is-a-functional-language&#34;&gt;R is a functional language&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;R is both a statistical environment and a programming language&lt;/li&gt;
&lt;li&gt;R, the open source and further developed version of the S language, is mainly functional&lt;/li&gt;
&lt;li&gt;If you did a task at least twice, the 3rd time you better write a function script to keep doing it forever.&lt;/li&gt;
&lt;li&gt;Most of your effort will be to find a well-written function for your work&lt;/li&gt;
&lt;li&gt;If you cannot find a function, you will modify somebody else&amp;rsquo;s function, or eventually write your own&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_r_code.webp&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;rmd_example.webp&#34;
  &gt;

&lt;h2 id=&#34;r--yaml--markdown--web-ready&#34;&gt;R + YAML + markdown = web ready&lt;/h2&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://learnxinyminutes.com/docs/yaml/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Learn YAML in Y minutes&lt;/a&gt;: tell the computer what you want to do with a document&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://rmarkdown.rstudio.com/authoring_basics.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;R Markdown basics&lt;/a&gt;: it is just a plain markdown that allows you to insert little R program &amp;lsquo;chunks&amp;rsquo;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/mundimark/awesome-markdown-editors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Awesome markdown editors and pre-writers&lt;/a&gt;: find a convenient tool&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://workspace.google.com/marketplace/app/docs_to_markdown/700168918607&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Google Docs to markdown&lt;/a&gt;: practice by translating your Google Docs text to markdown. It is &lt;em&gt;very&lt;/em&gt; easy.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;retroharmonize_website.webp&#34;
  &gt;

&lt;h2 id=&#34;package-and-release-a-team-effort&#34;&gt;Package and release: a team effort&lt;/h2&gt;
&lt;hr&gt;
&lt;h2 id=&#34;our-open-source-development-projects&#34;&gt;Our open source development projects&lt;/h2&gt;
&lt;p&gt;🔢 &lt;a href=&#34;https://dataset.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset&lt;/a&gt;: Synchronize datasets with global knowledge hubs #️⃣ &lt;a href=&#34;https://statcodelists.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;statcodelists&lt;/a&gt;: Make your data codes understood globally ♻️ &lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt;: Create economic or environmental impact assessments in any EU country 🌍 &lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt;: Create from raw survey data more granular statistics in any EU country ✅ &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;: Harmonize questions banks, recycle answers from past surveys ⏭️  &lt;a href=&#34;https://reprex.nl/#releases&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;all in on one page&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;create_with_reprex.webp&#34;
  &gt;

&lt;h2 id=&#34;create-with-us&#34;&gt;Create with us&lt;/h2&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Email&lt;/a&gt; | &lt;a href=&#34;https://keybase.io/team/reprexcommunity&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Keybase&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;LinkedIn: &lt;a href=&#34;https://www.linkedin.com/in/antaldaniel/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Daniel Antal&lt;/a&gt; - &lt;a href=&#34;https://www.linkedin.com/company/68855596&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Home&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>stacodelists: use standard, language-independent variable codes to help international data interoperability and machine reuse in R</title>
      <link>https://greendeal.dataobservatory.eu/post/2022-06-29-statcodelists/</link>
      <pubDate>Wed, 29 Jun 2022 08:12:00 +0100</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2022-06-29-statcodelists/</guid>
      <description>&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-visit-the-documentation-website-of-statcodelists-on-statcodelistsdataobservatoryeuhttpsstatcodelistsdataobservatoryeu&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Visit the documentation website of statcodelists on [statcodelists.dataobservatory.eu/](https://statcodelists.dataobservatory.eu/).&#34; srcset=&#34;
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_0b514d80337ede30bff4c26cee6a6f11.webp 400w,
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_1416f7a0950b1cecac8097850d995432.webp 760w,
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_0b514d80337ede30bff4c26cee6a6f11.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Visit the documentation website of statcodelists on &lt;a href=&#34;https://statcodelists.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;statcodelists.dataobservatory.eu/&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;!-- badges: start --&gt;
&lt;p&gt;&lt;a href=&#34;https://dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://img.shields.io/badge/ecosystem-dataobservatory.eu-3EA135.svg&#34; alt=&#34;dataobservatory&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;!-- badges: end --&gt;
&lt;p&gt;The goal of &lt;code&gt;statcodelists&lt;/code&gt; is to promote the reuse and exchange of statistical information and related metadata with making the internationally standardized SDMX code lists available for the R user. SDMX – the &lt;a href=&#34;https://sdmx.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; has been published as an ISO International Standard (ISO 17369). The metadata definitions, including the codelists are updated regularly according to the standard. The authoritative version of the code lists made available in this package is &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://sdmx.org/?page_id=3215/&lt;/a&gt;.&lt;/p&gt;
&lt;details class=&#34;spoiler &#34;  id=&#34;spoiler-1&#34;&gt;
  &lt;summary&gt;Click to expand table of contents of the post&lt;/summary&gt;
  &lt;p&gt;&lt;details class=&#34;toc-inpage d-print-none  &#34; open&gt;
  &lt;summary class=&#34;font-weight-bold&#34;&gt;Table of Contents&lt;/summary&gt;
  &lt;nav id=&#34;TableOfContents&#34;&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href=&#34;#purpose&#34;&gt;Purpose&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#installation&#34;&gt;Installation&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#code-of-conduct&#34;&gt;Code of Conduct&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/nav&gt;
&lt;/details&gt;
&lt;/p&gt;
&lt;/details&gt;
&lt;h2 id=&#34;purpose&#34;&gt;Purpose&lt;/h2&gt;
&lt;p&gt;Cross-domain concepts in the SDMX framework describe concepts relevant to many, if not all, statistical domains. SDMX recommends using these concepts whenever feasible in SDMX structures and messages to promote the reuse and exchange of statistical information and related metadata between organisations.&lt;/p&gt;
&lt;p&gt;Code lists are predefined sets of terms from which some statistical coded concepts take their values. SDMX cross-domain code lists are used to support cross-domain concepts. What are these cross-domain coded concepts?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Geographical codes, like &lt;code&gt;NL&lt;/code&gt;:  the Netherlands in the &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_AREA.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_AREA&lt;/a&gt; code list.&lt;/li&gt;
&lt;li&gt;Standard industry codes &lt;code&gt;J631&lt;/code&gt; for Data processing, hosting and related activities in Europe. (&lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_ACTIVITY_NACE2.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NACE Rev 2&lt;/a&gt; in Europe, beware, it is &lt;code&gt;J592&lt;/code&gt;in Australia and New Zealand, see &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_ACTIVITY_ANZSIC06.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_ACTIVITY_ANZSIC06&lt;/a&gt;.)&lt;/li&gt;
&lt;li&gt;Occupations, like &lt;code&gt;OC2521&lt;/code&gt; for &lt;code&gt;Database designers and administrators&lt;/code&gt; in &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_OCCUPATION.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_OCCUPATIONS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Time fomatting standards, like &lt;code&gt;CCYY&lt;/code&gt; for annual data series in &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_TIME_FORMAT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_TIME_FORMAT&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Check out the available codlists on the &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package homepage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The use of common code lists will help users to work even more efficiently, easing the maintenance of and reducing the need for mapping systems and interfaces delivering data and metadata to them. A very obvious advantage of using the code systems is that you can retrieve data from national sources indifferent of the natural language used in North Macedonia, Japan, the U.S. or the Netherlands. While the data labels may change to be locally human-readable, computers and geeks can read the codes and understand them immediately. Provided that they use the standard codes.&lt;/p&gt;
&lt;p&gt;Our data observatories are rolling out SDMX coding across all datasets to help data ingestion and interoperability, data findability and data reuse. &lt;code&gt;statcodelists&lt;/code&gt; can help the use of standard SDMX codes in your R workflow&amp;ndash;both for downloading data from statistical agencies and to produce publication-ready datasets that the rest of the world (and even APIs) will understand.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation&lt;/h2&gt;
&lt;p&gt;You can install &lt;code&gt;statcodelists&lt;/code&gt; from CRAN:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;install.packages(&amp;#34;statcodelists&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Further recommended code values for expressing general statistical concepts like &lt;code&gt;not applicable&lt;/code&gt;, etc., can be found in section &lt;code&gt;Generic codes&lt;/code&gt; of the &lt;a href=&#34;https://sdmx.org/?page_id=4345&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Guidelines for the creation and management of SDMX Cross-Domain Code Lists&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further codelists used by reliable statistical agency but not harmonized on SDMX level please consult the &lt;a href=&#34;https://registry.sdmx.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SDMX Global Registry&lt;/a&gt; &lt;a href=&#34;https://registry.sdmx.org/items/codelist.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Codelists&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;The creator of this package is not affiliated with SDMX, and this package was has not been endorsed by SDMX.&lt;/p&gt;
&lt;h2 id=&#34;code-of-conduct&#34;&gt;Code of Conduct&lt;/h2&gt;
&lt;p&gt;Please note that the &lt;code&gt;statcodelists&lt;/code&gt; project is released with a &lt;a href=&#34;https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of Conduct&lt;/a&gt;. By contributing to this project, you agree to abide by its terms.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Including Indicators from Arab Barometer in Our Observatory</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-06-28-arabbarometer/</link>
      <pubDate>Mon, 28 Jun 2021 09:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-06-28-arabbarometer/</guid>
      <description>&lt;p&gt;&lt;em&gt;A new version of the retroharmonize R package – which is working with retrospective, ex post harmonization of survey data – was released yesterday after peer-review on CRAN. It allows us to compare opinion polling data from the Arab Barometer with the Eurobarometer and Afrorbarometer. This is the first version that is released in the rOpenGov community, a community of R package developers on open government data analytics and related topics.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Surveys are the most important data sources in social and economic
statistics – they ask people about their lives, their attitudes and
self-reported actions, or record data from companies and NGOs. Survey
harmonization makes survey data comparable across time and countries. It
is very important, because often we do not know without comparison if an
indicator value is &lt;em&gt;low&lt;/em&gt; or &lt;em&gt;high&lt;/em&gt;. If 40% of the people think that
&lt;em&gt;climate change is a very serious problem&lt;/em&gt;, it does not really tell us
much without knowing what percentage of the people answered this
question similarly a year ago, or in other parts of the world.&lt;/p&gt;
&lt;p&gt;With the help of Ahmed Shabani and Yousef Ibrahim, we created a third
case study after the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
and
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt;,
about working with the &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt;
harmonized survey data files.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Ex ante&lt;/em&gt; survey harmonization means that researchers design
questionnaires that are asking the same questions with the same survey
methodology in repeated, distinct times (waves), or across different
countries with carefully harmonized question translations. &lt;em&gt;Ex post&lt;/em&gt;
harmonizations means that the resulting data has the same variable
names, same variable coding, and can be joined into a tidy data frame
for joint statistical analysis. While seemingly a simple task, it
involves plenty of metadata adjustments, because established survey
programs like Eurobarometer, Afrobarometer or Arab Barometer have
several decades of history, and several decades of coding practices and
file formatting legacy.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Variable harmonization&lt;/em&gt; means that if the same question is called
in one microdata source &lt;code&gt;Q108&lt;/code&gt; and the other &lt;code&gt;eval-parl-elections&lt;/code&gt;
then we make sure that they get a harmonize and machine readable
name without spaces and special characters.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Variable label harmonization&lt;/em&gt; means that the same questionnaire
items get the same numeric coding and same categorical labels.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Missing case harmonization&lt;/em&gt; means that various forms of missingness
are treated the same way.&lt;/li&gt;
&lt;/ul&gt;
















&lt;figure  id=&#34;figure-for-the-climate-awareness-dataset-get-the-country-averages-and-aggregates-from-zenodohttpsdoiorg105281zenodo5035562-and-the-plot-in-jpg-or-png-from-figsharehttpsdoiorg106084m9figshare14854359&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;For the climate awareness dataset get the country averages and aggregates from [Zenodo](https://doi.org/10.5281/zenodo.5035562), and the plot in `jpg` or `png` from [figshare](https://doi.org/10.6084/m9.figshare.14854359).&#34; srcset=&#34;
               /media/img/blogposts_2021/arab_barometer_5_climate_change_by_country_hu8dd9da8add5270829a1e50ead6a6a120_38791_1bab40489e5820c07250b277ffe362e0.webp 400w,
               /media/img/blogposts_2021/arab_barometer_5_climate_change_by_country_hu8dd9da8add5270829a1e50ead6a6a120_38791_fd825f05348e751021206419bd01c763.webp 760w,
               /media/img/blogposts_2021/arab_barometer_5_climate_change_by_country_hu8dd9da8add5270829a1e50ead6a6a120_38791_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/arab_barometer_5_climate_change_by_country_hu8dd9da8add5270829a1e50ead6a6a120_38791_1bab40489e5820c07250b277ffe362e0.webp&#34;
               width=&#34;760&#34;
               height=&#34;570&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      For the climate awareness dataset get the country averages and aggregates from &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035562&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Zenodo&lt;/a&gt;, and the plot in &lt;code&gt;jpg&lt;/code&gt; or &lt;code&gt;png&lt;/code&gt; from &lt;a href=&#34;https://doi.org/10.6084/m9.figshare.14854359&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;figshare&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In our new &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab Barometer case
study&lt;/a&gt;,
the evaulation of parliamentary elections has the following labels. We
code them consistently &lt;code&gt;1:  free_and_fair&lt;/code&gt;, &lt;code&gt;2:  some_minor_problems&lt;/code&gt;,
&lt;code&gt;3:  some_major_problems&lt;/code&gt; and &lt;code&gt;4:  not_free&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 50%&#34; /&gt;
&lt;col style=&#34;width: 50%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“0. missing”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“1. they were completely free and fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“2. they were free and fair, with some minor problems”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“3. they were free and fair, with some major problems”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“4. they were not free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“8. i don’t know”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“9. declined to answer”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Missing”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were completely free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were free and fair, with some minor breaches”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were free and fair, with some major breaches”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were not free and fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Don’t know”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Refuse”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Completely free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Free and fair, but with minor problems”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Free and fair, with major problems”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Not free or fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Don’t know (Do not read)”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Decline to answer (Do not read)”&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Of course, this harmonization is essential to get clean results like this:&lt;/p&gt;
















&lt;figure  id=&#34;figure-for-evaluation-or-reuse-of-parliamentary-elections-dataset-get-the-replication-data-and-the-code-from-the-zenodohhttpsdoiorg105281zenodo5034759-open-repository&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the [Zenodo](hhttps://doi.org/10.5281/zenodo.5034759) open repository.&#34; srcset=&#34;
               /media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp 400w,
               /media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_f7e62366b8310160e9cdd16714a5ac44.webp 760w,
               /media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp&#34;
               width=&#34;506&#34;
               height=&#34;760&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the &lt;a href=&#34;hhttps://doi.org/10.5281/zenodo.5034759&#34;&gt;Zenodo&lt;/a&gt; open repository.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In our case study, we had three forms of missingness: the respondent
&lt;em&gt;did not know&lt;/em&gt; the answer, the respondent &lt;em&gt;did not want&lt;/em&gt; to answer, and
at last, in some cases the &lt;em&gt;respondent was not asked&lt;/em&gt;, because the
country held no parliamentary elections. While in numerical processing,
all these answers must be left out from calculating averages, for
example, in a more detailed, categorical analysis they represent very
different cases. A high level of refusal to answer may be an indicator
of surpressing democratic opinion forming in itself.&lt;/p&gt;
&lt;p&gt;Survey harmonization with many countries entails tens of thousands of
small data management task, which, unless automatically documented,
logged, and created with a reproducible code, is a helplessly
error-prone process. We believe that our open-source software will bring
many new statistical information to the light, which, while legally
open, was never processed due to the large investment needed.&lt;/p&gt;
&lt;p&gt;We also started building experimental APIs data is running
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; regularly.
We will place cultural access and participation data in the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital
Music Observatory&lt;/a&gt;, climate
awareness, policy support and self-reported mitigation strategies into
the &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data
Observatory&lt;/a&gt;, and economy and
well-being data into our &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data
Observatory&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;further-plans&#34;&gt;Further plans&lt;/h2&gt;
&lt;p&gt;Retrospective survey harmonization is a far more complex task than this
blogpost suggest. Retrospective survey harmonization is a far more complex task than this blogpost suggest, because established survey programs have gathered decades of legacy data in legacy coding schemes and legacy file formats.  Putting the data right, and especially putting the invaluable descriptive and administrative (processing) metadata right is a huge undertaking. We are releasing example codes, datasets and charts for researchers to comapre our harmonized results with theirs, and improve our software. We are releasing example codes, datasets and charts
for researchers to comapre our harmonized results with theirs, and
improve our software.&lt;/p&gt;
&lt;h3 id=&#34;use-our-software&#34;&gt;Use our software&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;retroharmonize&lt;/code&gt; R package can be freely used, modified and
distributed under the GPL-3 license. For the main developer and
contributors, see the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package&lt;/a&gt; homepage. If you
use it for your work, please kindly cite it as:&lt;/p&gt;
&lt;p&gt;Daniel Antal (2021). retroharmonize: Ex Post Survey Data Harmonization.
R package version 0.1.17. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5034752&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://doi.org/10.5281/zenodo.5034752&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Download the &lt;a href=&#34;https://greendeal.dataobservatory.eu/media/bibliography/cite-retroharmonize.bib&#34; target=&#34;_blank&#34;&gt;BibLaTeX entry&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;tutorial-to-work-with-the-arab-barometer-survey-data&#34;&gt;Tutorial to work with the Arab Barometer survey data&lt;/h3&gt;
&lt;p&gt;Daniel Antal, &amp;amp; Ahmed Shaibani. (2021, June 26). Case Study: Working
With Arab Barometer Surveys for the retroharmonize R package (Version
0.1.6). Zenodo. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5034759&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://doi.org/10.5281/zenodo.5034759&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For the replication data to report potential
&lt;a href=&#34;https://github.com/rOpenGov/retroharmonize/issues&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;issues&lt;/a&gt; and
improvement suggestions with the code:&lt;/p&gt;
&lt;p&gt;Daniel Antal, &amp;amp; Ahmed Shaibani. (2021). Replication Data for the
retroharmonize R Package Case Study: Working With Arab Barometer Surveys
(Version 0.1.6) [Data set]. Zenodo.
&lt;a href=&#34;https://doi.org/10.5281/zenodo.5034741&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://doi.org/10.5281/zenodo.5034741&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;experimental-api&#34;&gt;Experimental API&lt;/h3&gt;
&lt;p&gt;We are also experimenting with the automated placement of authoritative
and citeable figures and datasets in open repositories. For the climate
awareness dataset get the country averages and aggregates from
&lt;a href=&#34;https://doi.org/10.5281/zenodo.5035562&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Zenodo&lt;/a&gt;, and the plot in &lt;code&gt;jpg&lt;/code&gt;
or &lt;code&gt;png&lt;/code&gt; from &lt;a href=&#34;https://doi.org/10.6084/m9.figshare.14854359&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;figshare&lt;/a&gt;.
Our plan is to release open data in a modern API with rich descriptive
metadata meeting the &lt;em&gt;Dublin Core&lt;/em&gt; and &lt;em&gt;DataCite&lt;/em&gt; standards, and further
administrative metadata for correct coding, joining and further
manipulating or data, or for easy import into your database.&lt;/p&gt;
&lt;h3 id=&#34;join-our-open-source-effort&#34;&gt;Join our open source effort&lt;/h3&gt;
&lt;p&gt;Want to help us improve our open data service? Include
&lt;a href=&#34;https://www.latinobarometro.org/lat.jsp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Lationbarómetro&lt;/a&gt; and the
&lt;a href=&#34;https://caucasusbarometer.org/en/datasets/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Caucasus Barometer&lt;/a&gt; in our
offering? Join the rOpenGov community of R package developers, an our
open collaboration to create the automated data observatories. We are
not only looking for
&lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/developer/&#34;&gt;developers&lt;/a&gt;,
but &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/curator/&#34;&gt;data
curators&lt;/a&gt; and
&lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/team/&#34;&gt;service design
associates&lt;/a&gt;, too.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Open Data - The New Gold Without the Rush</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-06-18-gold-without-rush/</link>
      <pubDate>Fri, 18 Jun 2021 17:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-06-18-gold-without-rush/</guid>
      <description>&lt;p&gt;&lt;em&gt;If open data is the new gold, why even those who release fail to reuse it? We created an open collaboration of data curators and open-source developers to dig into novel open data sources and/or increase the usability of existing ones. We transform reproducible research software into research- as-service.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&lt;/p&gt;
















&lt;figure  id=&#34;figure-there-is-no-rush-for-it-because-panning-out-its-value-requires-a-lot-of-hours-of-hard-work-our-goal-is-to-automate-this-work-to-make-open-data-usable-at-scale-even-in-trustworthy-ai-solutions&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&#34; srcset=&#34;
               /media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp 400w,
               /media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_faa00e96d3d0b700cfcf1daa513f3ad2.webp 760w,
               /media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Most open data is not public, it is not downloadable from the Internet – in the EU parlance, “open” only means a legal entitlement to get access to it. And even in the rare cases when data is open and public, often it is mired by data quality issues. We are working on the prototypes of a data-as-service and research-as-service built with open-source statistical software that taps into various and often neglected open data sources.&lt;/p&gt;
&lt;p&gt;We are in the prototype phase in June and our intentions are to have a well-functioning service by the time of the conference, because we are working only with open-source software elements; our technological readiness level is already very high. The novelty of our process is that we are trying to further develop and integrate a few open-source technology items into technologically and financially sustainable data-as-service and even research-as-service solutions.&lt;/p&gt;
















&lt;figure  id=&#34;figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources.&#34; srcset=&#34;
               /media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp 400w,
               /media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_ecd6d08ba5e9bac19c8173546f036651.webp 760w,
               /media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;We are taking a new and modern approach to the &lt;code&gt;data observatory&lt;/code&gt; concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science. Various UN and OECD bodies, and particularly the European Union support or maintain more than 60 data observatories, or permanent data collection and dissemination points, but even these do not use these organizations and their members open data. We are building open-source data observatories, which run open-source statistical software that automatically processes and documents reusable public sector data (from public transport, meteorology, tax offices, taxpayer funded satellite systems, etc.) and reusable scientific data (from EU taxpayer funded research) into new, high quality statistical indicators.&lt;/p&gt;
















&lt;figure  id=&#34;figure-we-are-taking-a-new-and-modern-approach-to-the-data-observatory-concept-and-modernizing-it-with-the-application-of-21st-century-data-and-metadata-standards-the-new-results-of-reproducible-research-and-data-science&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science&#34; srcset=&#34;
               /media/img/slides/automated_observatory_value_chain_huf9c0a6d9b150a8fdeb42cadf99abee90_616274_c18a97f00bbcac322614b6c2d55783f6.webp 400w,
               /media/img/slides/automated_observatory_value_chain_huf9c0a6d9b150a8fdeb42cadf99abee90_616274_8b655e803b41b817a8093a37ccd19689.webp 760w,
               /media/img/slides/automated_observatory_value_chain_huf9c0a6d9b150a8fdeb42cadf99abee90_616274_1200x1200_fit_q75_h2_lanczos.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/slides/automated_observatory_value_chain_huf9c0a6d9b150a8fdeb42cadf99abee90_616274_c18a97f00bbcac322614b6c2d55783f6.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;We are building various open-source data collection tools in R and Python to bring up data from big data APIs and legally open, but not public, and not well served data sources. For example, we are working on capturing representative data from the Spotify API or creating harmonized datasets from the Eurobarometer and Afrobarometer survey programs.&lt;/li&gt;
&lt;li&gt;Open data is usually not public; whatever is legally accessible is usually not ready to use for commercial or scientific purposes. In Europe, almost all taxpayer funded data is legally open for reuse, but it is usually stored in heterogeneous formats, processed into an original government or scientific need, and with various and low documentation standards. Our expert data curators are looking for new data sources that should be (re-) processed and re-documented to be usable for a wider community. We would like to introduce our service flow, which touches upon many important aspects of data scientist, data engineer and data curatorial work.&lt;/li&gt;
&lt;li&gt;We believe that even such generally trusted data sources as Eurostat often need to be reprocessed, because various legal and political constraints do not allow the common European statistical services to provide optimal quality data – for example, on the regional and city levels.&lt;/li&gt;
&lt;li&gt;With &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/ropengov/&#34;&gt;rOpenGov&lt;/a&gt; and other partners, we are creating open-source statistical software in R to re-process these heterogenous and low-quality data into tidy statistical indicators to automatically validate and document it.&lt;/li&gt;
&lt;li&gt;We are carefully documenting and releasing administrative, processing, and descriptive metadata, following international metadata standards, to make our data easy to find and easy to use for data analysts.&lt;/li&gt;
&lt;li&gt;We are automatically creating depositions and authoritative copies marked with an individual digital object identifier (DOI) to maintain data integrity.&lt;/li&gt;
&lt;li&gt;We are building simple databases and supporting APIs that release the data without restrictions, in a tidy format that is easy to join with other data, or easy to join into databases, together with standardized metadata.&lt;/li&gt;
&lt;li&gt;We maintain observatory websites (see: &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;, &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;) where not only the data is available, but we provide tutorials and use cases to make it easier to use them. Our mission is to show a modern, 21st century reimagination of the data observatory concept developed and supported by the UN, EU and OECD, and we want to show that modern reproducible research and open data could make the existing 60 data observatories and the planned new ones grow faster into data ecosystems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are working around the open collaboration concept, which is well-known in open source software development and reproducible science, but we try to make this agile project management methodology more inclusive, and include data curators, and various institutional partners into this approach. Based around our early-stage startup, Reprex, and the open-source developer community rOpenGov, we are working together with other developers, data scientists, and domain specific data experts in climate change and mitigation, antitrust and innovation policies, and various aspects of the music and film industry.&lt;/p&gt;
















&lt;figure  id=&#34;figure-our-open-collaboration-is-truly-open-new-data-curatorsauthorscuratordevelopersauthorsdeveloper-and-service-designersauthorsteam-even-volunteers-and-citizen-scientists-are-welcome-to-join&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Our open collaboration is truly open: new [data curators](/authors/curator/),[developers](/authors/developer/) and [service designers](/authors/team/), even volunteers and citizen scientists are welcome to join.&#34; srcset=&#34;
               /media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp 400w,
               /media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_3a4ae7f72478fd880961b08e1f7075dd.webp 760w,
               /media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp&#34;
               width=&#34;760&#34;
               height=&#34;427&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our open collaboration is truly open: new &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/curator/&#34;&gt;data curators&lt;/a&gt;,&lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/developer/&#34;&gt;developers&lt;/a&gt; and &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/team/&#34;&gt;service designers&lt;/a&gt;, even volunteers and citizen scientists are welcome to join.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Our open collaboration is truly open: new &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/curator/&#34;&gt;data curators&lt;/a&gt;, data scientists and data engineers are welcome to join. We develop open-source software in an agile way, so you can join in with an intermediate programming skill to build unit tests or add new functionality, and if you are a beginner, you can start with documentation and testing our tutorials. For business, policy, and scientific data analysts, we provide unexploited, exciting new datasets. Advanced developers can &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/developer/&#34;&gt;join&lt;/a&gt; our development team: the statistical data creation is mainly made in the R language, and the service infrastructure in Python and Go components.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analyze Locally, Act Globally: New regions R Package Release</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-06-16-regions-release/</link>
      <pubDate>Wed, 16 Jun 2021 12:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-06-16-regions-release/</guid>
      <description>















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;&#34; srcset=&#34;
               /media/img/package_screenshots/regions_017_169_hu4c6da2626fe9335e12d5da3506258dd2_123607_1aeab2d63a062640baf35ce7ffff4b52.webp 400w,
               /media/img/package_screenshots/regions_017_169_hu4c6da2626fe9335e12d5da3506258dd2_123607_340cd90381be5d85c6b08caba8072821.webp 760w,
               /media/img/package_screenshots/regions_017_169_hu4c6da2626fe9335e12d5da3506258dd2_123607_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/package_screenshots/regions_017_169_hu4c6da2626fe9335e12d5da3506258dd2_123607_1aeab2d63a062640baf35ce7ffff4b52.webp&#34;
               width=&#34;760&#34;
               height=&#34;427&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;p&gt;The new version of our &lt;a href=&#34;https://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt; R package
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; was released today on
CRAN. This package is one of the engines of our experimental open
data-as-service &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; prototypes, which aim to
place open data packages into open-source applications.&lt;/p&gt;
&lt;details class=&#34;spoiler &#34;  id=&#34;spoiler-1&#34;&gt;
  &lt;summary&gt;Click to expand table of contents of the post&lt;/summary&gt;
  &lt;p&gt;&lt;details class=&#34;toc-inpage d-print-none  &#34; open&gt;
  &lt;summary class=&#34;font-weight-bold&#34;&gt;Table of Contents&lt;/summary&gt;
  &lt;nav id=&#34;TableOfContents&#34;&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href=&#34;#get-the-package&#34;&gt;Get the Package&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#join-us&#34;&gt;Join us&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/nav&gt;
&lt;/details&gt;
&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;In international comparison the use of nationally aggregated indicators
often have many disadvantages: they inhibit very different levels of
homogeneity, and data is often very limited in number of observations
for a cross-sectional analysis. When comparing European countries, a few
missing cases can limit the cross-section of countries to around 20
cases which disallows the use of many analytical methods. Working with
sub-national statistics has many advantages: the similarity of the
aggregation level and high number of observations can allow more precise
control of model parameters and errors, and the number of observations
grows from 20 to 200-300.&lt;/p&gt;
















&lt;figure  id=&#34;figure-the-change-from-national-to-sub-national-level-comes-with-a-huge-data-processing-price-internal-administrative-boundaries-their-names-codes-codes-change-very-frequently&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.&#34; srcset=&#34;
               /media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp 400w,
               /media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_09a0d6124e334c5f1727420a059512a9.webp 760w,
               /media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Yet the change from national to sub-national level comes with a huge
data processing price. While national boundaries are relatively stable,
with only a handful of changes in each recent decade. The change of
national boundaries requires a more-or-less global consensus. But states
are free to change their internal administrative boundaries, and they do
it with large frequency. This means that the names, identification codes
and boundary definitions of sub-national regions change very frequently.
Joining data from different sources and different years can be very
difficult.&lt;/p&gt;
















&lt;figure  id=&#34;figure-our-regions-r-packagehttpsregionsdataobservatoryeu-helps-the-data-processing-validation-and-imputation-of-sub-national-regional-datasets-and-their-coding&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Our [regions R package](https://regions.dataobservatory.eu/) helps the data processing, validation and imputation of sub-national, regional datasets and their coding.&#34; srcset=&#34;
               /media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp 400w,
               /media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_81a53fd42fac7f0c3fe4e1a89d5b7892.webp 760w,
               /media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our &lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions R package&lt;/a&gt; helps the data processing, validation and imputation of sub-national, regional datasets and their coding.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There are numerous advantages of switching from a national level of the
analysis to a sub-national level comes with a huge price in data
processing, validation and imputation, and the
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; package aims to help this
process.&lt;/p&gt;
&lt;p&gt;You can review the problem, and the code that created the two map
comparisons, in the &lt;a href=&#34;https://regions.dataobservatory.eu/articles/maping.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Maping Regional Data, Maping Metadata
Problems&lt;/a&gt;
vignette article of the package. A more detailed problem description can
be found in &lt;a href=&#34;https://regions.dataobservatory.eu/articles/Regional_stats.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With Regional, Sub-National Statistical
Products&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This package is an offspring of the
&lt;a href=&#34;https://ropengov.github.io/eurostat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; package on
&lt;a href=&#34;https://ropengov.github.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;. It started as a tool to
validate and re-code regional Eurostat statistics, but it aims to be a
general solution for all sub-national statistics. It will be developed
parallel with other rOpenGov packages.&lt;/p&gt;
&lt;h2 id=&#34;get-the-package&#34;&gt;Get the Package&lt;/h2&gt;
&lt;p&gt;You can install the development version from
&lt;a href=&#34;https://github.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub&lt;/a&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;rOpenGov/regions&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or the released version from CRAN:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;install.packages(&amp;quot;regions&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can review the complete package documentation on
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions.dataobservaotry.eu&lt;/a&gt;. If
you find any problems with the code, please raise an issue on
&lt;a href=&#34;https://github.com/rOpenGov/regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Github&lt;/a&gt;. Pull requests are welcome
if you agree with the &lt;a href=&#34;https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of
Conduct&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you use &lt;code&gt;regions&lt;/code&gt; in your work, please cite the
package as:
Daniel Antal. (2021, June 16). regions (Version 0.1.7). CRAN. &lt;a href=&#34;%28https://doi.org/10.5281/zenodo.4965909%29&#34;&gt;http://doi.org/10.5281/zenodo.4965909&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Download the &lt;a href=&#34;https://greendeal.dataobservatory.eu/media/bibliography/cite-regions.bib&#34; target=&#34;_blank&#34;&gt;BibLaTeX entry&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://www.r-pkg.org/badges/version/regions&#34; alt=&#34;CRAN_Status_Badge&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;details class=&#34;spoiler &#34;  id=&#34;spoiler-5&#34;&gt;
  &lt;summary&gt;Join our Green Deal Data Observatory collaboration!&lt;/summary&gt;
  &lt;p&gt;&lt;em&gt;Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in economic policies, particularly computation antitrust, innovation and small enterprises? Check out our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Music Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/intent/follow?screen_name=GreenDealObs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://img.shields.io/twitter/follow/GreenDealObs.svg?style=social&#34; alt=&#34;Follow GreenDealObs&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Retrospective Survey Harmonization Case Study - Climate Awareness Change in Europe 2013-2019.</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-03-05-retroharmonize-climate/</link>
      <pubDate>Fri, 05 Mar 2021 00:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-03-05-retroharmonize-climate/</guid>
      <description>&lt;p&gt;Retrospective survey harmonization comes with many challenges, as we
have shown in the
&lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/&#34;&gt;introduction&lt;/a&gt;
to this tutorial case study. In this example, we will work with
Eurobarometer’s data.&lt;/p&gt;
&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    This code tutorial is not outdated, but the &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; R package has a new (development) release with more featues.
  &lt;/div&gt;
&lt;/div&gt;
&lt;details class=&#34;spoiler &#34;  id=&#34;spoiler-1&#34;&gt;
  &lt;summary&gt;Click to expand table of contents of the post&lt;/summary&gt;
  &lt;p&gt;&lt;details class=&#34;toc-inpage d-print-none  &#34; open&gt;
  &lt;summary class=&#34;font-weight-bold&#34;&gt;Table of Contents&lt;/summary&gt;
  &lt;nav id=&#34;TableOfContents&#34;&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href=&#34;#get-the-data&#34;&gt;Get the Data&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#metadata-analysis&#34;&gt;Metadata analysis&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#metadata-protocol-variables&#34;&gt;Metadata: Protocol Variables&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#metadata-geographical-information&#34;&gt;Metadata: Geographical information&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#socio-demography-and-weights&#34;&gt;Socio-demography and Weights&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#harmonizing-variable-labels&#34;&gt;Harmonizing Variable Labels&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#creating-the-longitudional-table&#34;&gt;Creating the Longitudional Table&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#putting-it-on-a-map&#34;&gt;Putting It on a Map&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/nav&gt;
&lt;/details&gt;
&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;Please use the development version of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;antaldaniel/retroharmonize&amp;quot;)

library(retroharmonize)
library(dplyr)       # this is necessary for the example 
library(lubridate)   # easier date conversion

## Warning: package &#39;lubridate&#39; was built under R version 4.0.4

library(stringr)     # You can also use base R string processing functions 
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;get-the-data&#34;&gt;Get the Data&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;retroharmonize&lt;/code&gt; is not associated with Eurobarometer, or its creators,
Kantar, or its archivists, GESIS. We assume that you have acquired the
necessary files from GESIS after carefully reading their terms and you
placed it on a path that you call gesis_dir. The precise documentation
of the data we use can be found in this supporting
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blogpost&lt;/a&gt;.
To reproduce this blogpost, you will need &lt;code&gt;ZA5877_v2-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA6595_v3-0-0.sav&lt;/code&gt;, &lt;code&gt;ZA6861_v1-2-0.sav&lt;/code&gt;, &lt;code&gt;ZA7488_v1-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA7572_v1-0-0.sav&lt;/code&gt; in a directory that you will name &lt;code&gt;gesis_dir&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#Not run in the blogpost. In the repo we have a saved version.
climate_change_files &amp;lt;- c(&amp;quot;ZA5877_v2-0-0.sav&amp;quot;, &amp;quot;ZA6595_v3-0-0.sav&amp;quot;,  &amp;quot;ZA6861_v1-2-0.sav&amp;quot;, 
                          &amp;quot;ZA7488_v1-0-0.sav&amp;quot;, &amp;quot;ZA7572_v1-0-0.sav&amp;quot;)

eb_waves &amp;lt;- read_surveys(file.path(gesis_dir, climate_change_files), .f=&#39;read_spss&#39;)

if (dir.exists(&amp;quot;data-raw&amp;quot;)) {
  save ( eb_waves,  file:  file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}

if ( file.exists( file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )) {
  load (file.path( &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot; ) )
} else {
  load (file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;,  &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;eb_waves&lt;/code&gt; nested list contains five surveys imported from SPSS to
the survey class of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;.
The survey class is a data.frame that retains important metadata for
further harmonization.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;document_waves (eb_waves)

## # A tibble: 5 x 5
##   id            filename           ncol  nrow object_size
##   &amp;lt;chr&amp;gt;         &amp;lt;chr&amp;gt;             &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 ZA5877_v2-0-0 ZA5877_v2-0-0.sav   604 27919   139352456
## 2 ZA6595_v3-0-0 ZA6595_v3-0-0.sav   519 27718   119370440
## 3 ZA6861_v1-2-0 ZA6861_v1-2-0.sav   657 27901   151397528
## 4 ZA7488_v1-0-0 ZA7488_v1-0-0.sav   752 27339   169465928
## 5 ZA7572_v1-0-0 ZA7572_v1-0-0.sav   348 27655    80562432
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Beware the object sizes. If you work with many surveys, memory-efficient
programming becomes imperative. We will be subsetting whenever possible.&lt;/p&gt;
&lt;h2 id=&#34;metadata-analysis&#34;&gt;Metadata analysis&lt;/h2&gt;
&lt;p&gt;As noted before, prepare to work with nested lists. Each imported survey
is nested as a data frame in the &lt;code&gt;eb_waves&lt;/code&gt; list.&lt;/p&gt;
&lt;h2 id=&#34;metadata-protocol-variables&#34;&gt;Metadata: Protocol Variables&lt;/h2&gt;
&lt;p&gt;Eurobarometer calls certain metadata elements, like interviewee
cooperation level or the date of a survey interview as protocol
variable. Let’s start here. This will be our template to harmonize more
and more aspects of the five surveys (which are, in fact, already
harmonization of about 30 surveys conducted in a single ‘wave’ in
multiple countries.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# select variables of interest from the metadata
eb_protocol_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( .data$label_orig %in% c(&amp;quot;date of interview&amp;quot;) |
             .data$var_name_orig: = &amp;quot;rowid&amp;quot;)  %&amp;gt;%
  suggest_var_names( survey_program:  &amp;quot;eurobarometer&amp;quot; )

# subset and harmonize these variables in all nested list items of &#39;waves&#39; of surveys
interview_dates &amp;lt;- harmonize_var_names(eb_waves, 
                                       eb_protocol_metadata )

# apply similar data processing rules to same variables
interview_dates &amp;lt;- lapply (interview_dates, 
                      function (x) x %&amp;gt;% mutate ( date_of_interview:  as_character(.data$date_of_interview) )
                      )

# join the individual survey tables into a single table 
interview_dates &amp;lt;- as_tibble ( Reduce (rbind, interview_dates) )

# Check the variable classes.

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;       &amp;quot;character&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is our sample workflow for each block of variables.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Get a unique identifier.&lt;/li&gt;
&lt;li&gt;Add other variables&lt;/li&gt;
&lt;li&gt;Harmonize the variable names&lt;/li&gt;
&lt;li&gt;Subset the data leaving out anything that you do not harmonize in
this block.&lt;/li&gt;
&lt;li&gt;Apply some normalization in a nested list.&lt;/li&gt;
&lt;li&gt;When the variables are harmonized to same name, class, merge them
into a data.frame-like &lt;code&gt;tibble&lt;/code&gt; object.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now finish the harmonization. &lt;code&gt;Wednesday, 31st October 2018&lt;/code&gt; should
become a Date type &lt;code&gt;2018-10-31&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;require(lubridate)
harmonize_date &amp;lt;- function(x) {
  x &amp;lt;- tolower(as.character(x))
  x &amp;lt;- gsub(&amp;quot;monday|tuesday|wednesday|thursday|friday|saturday|sunday|\\,|th|nd|rd|st&amp;quot;, &amp;quot;&amp;quot;, x)
  x &amp;lt;- gsub(&amp;quot;decemberber&amp;quot;, &amp;quot;december&amp;quot;, x) # all those annoying real-life data problems!
  x &amp;lt;- stringr::str_trim (x, &amp;quot;both&amp;quot;)
  x &amp;lt;- gsub(&amp;quot;^0&amp;quot;, &amp;quot;&amp;quot;, x )
  x &amp;lt;- gsub(&amp;quot;\\s\\s&amp;quot;, &amp;quot;\\s&amp;quot;, x)
  lubridate::dmy(x) 
}

interview_dates &amp;lt;- interview_dates %&amp;gt;%
  mutate ( date_of_interview:  harmonize_date(.data$date_of_interview) )

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;            &amp;quot;Date&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To avoid duplication of row IDs in surveys that may not be unique in
&lt;em&gt;different&lt;/em&gt; surveys, we created a simple, sequential ID for each survey,
including the ID of the original file.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(interview_dates, 6)

## # A tibble: 6 x 2
##   rowid               date_of_interview
##   &amp;lt;chr&amp;gt;               &amp;lt;date&amp;gt;           
## 1 ZA7488_v1-0-0_7016  2018-10-28       
## 2 ZA7488_v1-0-0_19187 2018-11-02       
## 3 ZA6861_v1-2-0_1218  2017-03-18       
## 4 ZA6861_v1-2-0_4142  2017-03-21       
## 5 ZA7572_v1-0-0_12363 2019-04-17       
## 6 ZA7572_v1-0-0_8071  2019-04-18
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this type-conversion problem let’s see an issue when an original
SPSS variable can have two meaningful R representations.&lt;/p&gt;
&lt;h2 id=&#34;metadata-geographical-information&#34;&gt;Metadata: Geographical information&lt;/h2&gt;
&lt;p&gt;Let’s continue with harmonizing geographical information in the files.
In this example, &lt;code&gt;var_name_suggested&lt;/code&gt; will contain the harmonized
variable name. It is likely that you have to make this call, after
carefully reading the original questionnaires and codebooks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_regional_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^nuts$&amp;quot;, .data$var_name_orig)) %&amp;gt;%
  suggest_var_names( survey_program:  &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  mutate ( var_name_suggested:  case_when ( 
    var_name_suggested: = &amp;quot;region_nuts_codes&amp;quot;     ~ &amp;quot;geo&amp;quot;,
    TRUE ~ var_name_suggested ))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_var_names()&lt;/code&gt; takes all variables in the subsetted,
geographical metadata table, and brings them to the harmonized
&lt;code&gt;var_name_suggested&lt;/code&gt; name. The function subsets the surveys to avoid the
presence of non-harmonized variables. All regional NUTS codes become
&lt;code&gt;geo&lt;/code&gt; in our case:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- harmonize_var_names(eb_waves, 
                                 eb_regional_metadata)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are used to work with single survey files, you are likely to work
in a tabular format, which easily converts into a data.frame like
object, in our example, to tidyverse’s &lt;code&gt;tibble&lt;/code&gt;. However, when working
with longitudinal data, it is far simpler to work with nested lists,
because the tables usually have different dimensions (neither the rows
corresponding to observations or the columns are the same across all
survey files.)&lt;/p&gt;
&lt;p&gt;In the nested list, each list element is a single, tabular-format
survey. (In fact, the survey are in retroharmonize’s
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;survey&lt;/a&gt;
class, which is a rich tibble that contains the metadata and the
processing history of the survey.)&lt;/p&gt;
&lt;p&gt;The regional information in the Eurobarometer files is contained in the
&lt;code&gt;nuts&lt;/code&gt; variable. We want to keep both the original labels and values.
The original values are the region’s codes, and the labels are the
names. The easiest and fastest solution is the base R &lt;code&gt;lapply&lt;/code&gt; loop.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- lapply ( geography, 
                      function (x) x %&amp;gt;% mutate ( region:  as_character(geo), 
                                                  geo   :  as.character(geo) )  
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because each table has exactly the same columns, we can simply use
&lt;code&gt;rbind()&lt;/code&gt; and reduce the list to a modern &lt;code&gt;data.frame&lt;/code&gt;, i.e. a &lt;code&gt;tibble&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- as_tibble ( Reduce (rbind, geography) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see a dozen cases:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(geography, 12)

## # A tibble: 12 x 4
##    rowid               isocntry geo   region              
##    &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;               
##  1 ZA7488_v1-0-0_7016  SI       SI012 Podravska           
##  2 ZA7488_v1-0-0_19187 PL       PL63  Pomorskie           
##  3 ZA6861_v1-2-0_1218  DK       DK02  Sjaelland           
##  4 ZA6861_v1-2-0_4142  FI       FI1B  Helsinki-Uusimaa    
##  5 ZA7572_v1-0-0_12363 SE       SE12  Oestra Mellansverige
##  6 ZA7572_v1-0-0_8071  IT       ITH   Nord-Est [IT]       
##  7 ZA6861_v1-2-0_6145  IE       IE021 Dublin              
##  8 ZA6861_v1-2-0_24638 RO       RO31  South [RO]          
##  9 ZA7488_v1-0-0_11315 CY       CY    REPUBLIC OF CYPRUS  
## 10 ZA6595_v3-0-0_27568 HR       HR041 Grad Zagreb         
## 11 ZA7572_v1-0-0_17397 CZ       CZ06  Jihovychod          
## 12 ZA6861_v1-2-0_10993 PT       PT17  Lisboa
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The idea is that we do similar variable harmonization block by block,
and eventually we will join them together. Next step: socio-demography
and weights.&lt;/p&gt;
&lt;h2 id=&#34;socio-demography-and-weights&#34;&gt;Socio-demography and Weights&lt;/h2&gt;
&lt;p&gt;There are a few peculiar issues to look out for. This example shows that
survey harmonization requires plenty of expert judgment, and you cannot
fully automate the process.&lt;/p&gt;
&lt;p&gt;The Eurobarometer archives do not use all weight and demographic
variable names consistently. For example, the &lt;code&gt;wex&lt;/code&gt; variable, which is a
projected weight for the country’s 15 years old or older population is
sometimes called &lt;code&gt;wex&lt;/code&gt;, sometimes &lt;code&gt;wextra&lt;/code&gt;. The individual survey’s
post-stratification weight is the &lt;code&gt;w1&lt;/code&gt; variable, but this is not
necessarily what you need to use.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;suggest_var_names()&lt;/code&gt; function has a parameter for
&lt;code&gt;survey_program:  &amp;quot;eurobaromater&amp;quot;&lt;/code&gt; which normalizes a bit the most used
variables. For example, all variations of wex, wextra wil be noramlized
to wex. You can ignore this parameter and use your own names, too.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata  &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^d8$|^d7$|^wex|^w1$|d25|^d15a|^d11$&amp;quot;, .data$var_name_orig) ) %&amp;gt;%
  suggest_var_names( survey_program:  &amp;quot;eurobarometer&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, using the original labels would not help, because they
also contain various alterations.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata %&amp;gt;%
  select ( filename, var_name_orig, label_orig, var_name_suggested ) %&amp;gt;%
  filter (var_name_orig %in% c(&amp;quot;wex&amp;quot;, &amp;quot;wextra&amp;quot;) )

##            filename var_name_orig                                  label_orig
## 1 ZA5877_v2-0-0.sav        wextra      weight extrapolated population 15 plus
## 2 ZA6595_v3-0-0.sav        wextra      weight extrapolated population 15 plus
## 3 ZA6861_v1-2-0.sav           wex weight extrapolated population aged 15 plus
## 4 ZA7488_v1-0-0.sav           wex weight extrapolated population aged 15 plus
## 5 ZA7572_v1-0-0.sav           wex weight extrapolated population aged 15 plus
##   var_name_suggested
## 1                wex
## 2                wex
## 3                wex
## 4                wex
## 5                wex

demography &amp;lt;- harmonize_var_names ( waves:  eb_waves, 
                                    metadata:  eb_demography_metadata ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Socio-demographic variables like level of highest education or
occupation are rather country-specific. Eurobarometer uses standardized
occupation and marital status scales, and a proxy for education levels,
age of leaving full-time education.&lt;/p&gt;
&lt;p&gt;This is a particularly tricky variable, because it’s coding in fact
contains three different variables - school leaving age, except for
students, and except for people who did not finish their compulsory
primary school. And while school leaving age was a good proxy since the
1970s, in the age when the EU is promoting life-long-learning becomes
less and less useful, as people stop and re-start their education
throughout their lives.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;example &amp;lt;- demography[[1]] %&amp;gt;%
  mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
  mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) )
unique ( example$age_education )

##  [1] &amp;quot;22&amp;quot;                     &amp;quot;25&amp;quot;                     &amp;quot;17&amp;quot;                    
##  [4] &amp;quot;19&amp;quot;                     &amp;quot;12&amp;quot;                     &amp;quot;23&amp;quot;                    
##  [7] &amp;quot;18&amp;quot;                     &amp;quot;20&amp;quot;                     &amp;quot;21&amp;quot;                    
## [10] &amp;quot;14&amp;quot;                     &amp;quot;24&amp;quot;                     &amp;quot;16&amp;quot;                    
## [13] &amp;quot;26&amp;quot;                     &amp;quot;15&amp;quot;                     &amp;quot;Still studying&amp;quot;        
## [16] &amp;quot;DK&amp;quot;                     &amp;quot;31&amp;quot;                     &amp;quot;29&amp;quot;                    
## [19] &amp;quot;27&amp;quot;                     &amp;quot;13&amp;quot;                     &amp;quot;32&amp;quot;                    
## [22] &amp;quot;28&amp;quot;                     &amp;quot;30&amp;quot;                     &amp;quot;53&amp;quot;                    
## [25] &amp;quot;42&amp;quot;                     &amp;quot;62&amp;quot;                     &amp;quot;40&amp;quot;                    
## [28] &amp;quot;No full-time education&amp;quot; &amp;quot;Refusal&amp;quot;                &amp;quot;37&amp;quot;                    
## [31] &amp;quot;39&amp;quot;                     &amp;quot;34&amp;quot;                     &amp;quot;35&amp;quot;                    
## [34] &amp;quot;47&amp;quot;                     &amp;quot;36&amp;quot;                     &amp;quot;45&amp;quot;                    
## [37] &amp;quot;51&amp;quot;                     &amp;quot;33&amp;quot;                     &amp;quot;43&amp;quot;                    
## [40] &amp;quot;38&amp;quot;                     &amp;quot;49&amp;quot;                     &amp;quot;46&amp;quot;                    
## [43] &amp;quot;41&amp;quot;                     &amp;quot;57&amp;quot;                     &amp;quot;7&amp;quot;                     
## [46] &amp;quot;48&amp;quot;                     &amp;quot;44&amp;quot;                     &amp;quot;50&amp;quot;                    
## [49] &amp;quot;56&amp;quot;                     &amp;quot;8&amp;quot;                      &amp;quot;11&amp;quot;                    
## [52] &amp;quot;10&amp;quot;                     &amp;quot;9&amp;quot;                      &amp;quot;75 years&amp;quot;              
## [55] &amp;quot;6&amp;quot;                      &amp;quot;3&amp;quot;                      &amp;quot;54&amp;quot;                    
## [58] &amp;quot;55&amp;quot;                     &amp;quot;60&amp;quot;                     &amp;quot;64&amp;quot;                    
## [61] &amp;quot;2 years&amp;quot;                &amp;quot;58&amp;quot;                     &amp;quot;52&amp;quot;                    
## [64] &amp;quot;72&amp;quot;                     &amp;quot;61&amp;quot;                     &amp;quot;4&amp;quot;                     
## [67] &amp;quot;63&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The seamingly trival &lt;code&gt;age_exact&lt;/code&gt; variable has its own issues, too:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;unique ( example$age_exact)

##  [1] &amp;quot;54&amp;quot;       &amp;quot;66&amp;quot;       &amp;quot;56&amp;quot;       &amp;quot;53&amp;quot;       &amp;quot;33&amp;quot;       &amp;quot;72&amp;quot;      
##  [7] &amp;quot;83&amp;quot;       &amp;quot;62&amp;quot;       &amp;quot;86&amp;quot;       &amp;quot;77&amp;quot;       &amp;quot;64&amp;quot;       &amp;quot;46&amp;quot;      
## [13] &amp;quot;44&amp;quot;       &amp;quot;59&amp;quot;       &amp;quot;60&amp;quot;       &amp;quot;67&amp;quot;       &amp;quot;63&amp;quot;       &amp;quot;20&amp;quot;      
## [19] &amp;quot;43&amp;quot;       &amp;quot;37&amp;quot;       &amp;quot;78&amp;quot;       &amp;quot;49&amp;quot;       &amp;quot;90&amp;quot;       &amp;quot;45&amp;quot;      
## [25] &amp;quot;28&amp;quot;       &amp;quot;29&amp;quot;       &amp;quot;30&amp;quot;       &amp;quot;39&amp;quot;       &amp;quot;51&amp;quot;       &amp;quot;38&amp;quot;      
## [31] &amp;quot;41&amp;quot;       &amp;quot;71&amp;quot;       &amp;quot;25&amp;quot;       &amp;quot;48&amp;quot;       &amp;quot;79&amp;quot;       &amp;quot;88&amp;quot;      
## [37] &amp;quot;61&amp;quot;       &amp;quot;85&amp;quot;       &amp;quot;70&amp;quot;       &amp;quot;35&amp;quot;       &amp;quot;81&amp;quot;       &amp;quot;52&amp;quot;      
## [43] &amp;quot;57&amp;quot;       &amp;quot;27&amp;quot;       &amp;quot;47&amp;quot;       &amp;quot;15 years&amp;quot; &amp;quot;21&amp;quot;       &amp;quot;42&amp;quot;      
## [49] &amp;quot;32&amp;quot;       &amp;quot;68&amp;quot;       &amp;quot;36&amp;quot;       &amp;quot;34&amp;quot;       &amp;quot;19&amp;quot;       &amp;quot;31&amp;quot;      
## [55] &amp;quot;26&amp;quot;       &amp;quot;23&amp;quot;       &amp;quot;24&amp;quot;       &amp;quot;22&amp;quot;       &amp;quot;16&amp;quot;       &amp;quot;84&amp;quot;      
## [61] &amp;quot;65&amp;quot;       &amp;quot;18&amp;quot;       &amp;quot;55&amp;quot;       &amp;quot;40&amp;quot;       &amp;quot;50&amp;quot;       &amp;quot;73&amp;quot;      
## [67] &amp;quot;69&amp;quot;       &amp;quot;87&amp;quot;       &amp;quot;89&amp;quot;       &amp;quot;74&amp;quot;       &amp;quot;75&amp;quot;       &amp;quot;98 years&amp;quot;
## [73] &amp;quot;76&amp;quot;       &amp;quot;80&amp;quot;       &amp;quot;58&amp;quot;       &amp;quot;82&amp;quot;       &amp;quot;17&amp;quot;       &amp;quot;93&amp;quot;      
## [79] &amp;quot;91&amp;quot;       &amp;quot;92&amp;quot;       &amp;quot;95&amp;quot;       &amp;quot;94&amp;quot;       &amp;quot;97&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see all the strange labels attached to &lt;code&gt;age&lt;/code&gt;-type variables:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(metadata:  eb_demography_metadata %&amp;gt;%
                     filter ( var_name_suggested %in% c(&amp;quot;age_exact&amp;quot;, &amp;quot;age_education&amp;quot;)) )

##  [1] &amp;quot;2 years&amp;quot;                  &amp;quot;75 years&amp;quot;                
##  [3] &amp;quot;No full-time education&amp;quot;   &amp;quot;Still studying&amp;quot;          
##  [5] &amp;quot;15 years&amp;quot;                 &amp;quot;98 years&amp;quot;                
##  [7] &amp;quot;96 years&amp;quot;                 &amp;quot;[NOT CLEARLY DOCUMENTED]&amp;quot;
##  [9] &amp;quot;74 years&amp;quot;                 &amp;quot;99 and older&amp;quot;            
## [11] &amp;quot;Refusal&amp;quot;                  &amp;quot;87 years&amp;quot;                
## [13] &amp;quot;DK&amp;quot;                       &amp;quot;88 years&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We must handle many exception, so we created a function for this
purpose:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;remove_years  &amp;lt;- function(x) { 
  x &amp;lt;- gsub(&amp;quot;years|and\\solder&amp;quot;, &amp;quot;&amp;quot;, tolower(x))
  stringr::str_trim (x, &amp;quot;both&amp;quot;)}

process_demography &amp;lt;- function (x) { 
  
  x %&amp;gt;% mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
    mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) ) %&amp;gt;%
    mutate ( across (contains(&amp;quot;age&amp;quot;), remove_years)) %&amp;gt;%
    mutate ( age_exact:  as.numeric (age_exact)) %&amp;gt;%
    mutate ( is_student:  ifelse ( tolower(age_education): = &amp;quot;still studying&amp;quot;, 
                                   1, 0), 
             no_education:  ifelse ( tolower(age_education): = &amp;quot;no full-time education&amp;quot;, 1, 0)) %&amp;gt;%
    mutate ( education:  case_when (
      grepl(&amp;quot;studying&amp;quot;, age_education) ~ age_exact, 
      grepl (&amp;quot;education&amp;quot;, age_education)  ~ 14, 
      grepl (&amp;quot;refus|document|dk&amp;quot;, tolower(age_education)) ~ NA_real_,
      TRUE ~ as.numeric(age_education)
    ))  %&amp;gt;%
    mutate ( education:  case_when ( 
      education &amp;lt; 14 ~ NA_real_, 
      education &amp;gt; 30 ~ 30, 
      TRUE ~ education )) 
}

demography &amp;lt;- lapply ( demography, process_demography )

## Warning in eval_tidy(pair$rhs, env:  default_env): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env:  default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env:  default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env:  default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env:  default_env): NAs introduced by coercion

## WE&#39;ll full join and not use rbind, because we have different variables in different waves.
demography &amp;lt;- Reduce ( full_join, demography )

## Joining, by:  c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by:  c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by:  c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by:  c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s see what we have here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(demography, 12)

## # A tibble: 12 x 14
##    rowid    isocntry    w1    wex marital_status        age_education  age_exact
##    &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                 &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;
##  1 ZA7488_~ SI       0.828  1428. (Re-)Married: withou~ 19                    43
##  2 ZA7488_~ PL       1.01  32830. (Re-)Married: withou~ 19                    64
##  3 ZA6861_~ DK       0.641  3100. (Re-)Married: withou~ 22                    78
##  4 ZA6861_~ FI       1.83   8601. (Re-)Married: childr~ 30                    38
##  5 ZA7572_~ SE       0.342  2645. (Re-)Married: withou~ 17                    68
##  6 ZA7572_~ IT       0.630 32287. (Re-)Married: childr~ 20                    40
##  7 ZA6861_~ IE       0.868  3054. (Re-)Married: childr~ 32                    42
##  8 ZA6861_~ RO       0.724 11805. (Re-)Married: withou~ 14                    59
##  9 ZA7488_~ CY       0.691  1013. (Re-)Married: childr~ 18                    67
## 10 ZA6595_~ HR       0.580  2098. Single living w part~ 27                    30
## 11 ZA7572_~ CZ       1.86  16908. Single: without chil~ still studying        20
## 12 ZA6861_~ PT       0.932  7448. Widow: with children  no full-time ~        84
## # ... with 7 more variables: occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;harmonizing-variable-labels&#34;&gt;Harmonizing Variable Labels&lt;/h2&gt;
&lt;p&gt;So far we have been working with metadata, weights and socio-demography.
In other words, we have not even started the desired harmonization of
climate change awareness. The methodology is the same, but here we
really must look out for the answer options in the questionnaire. (Refer
to our data summary again
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;climate_awareness_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  suggest_var_names( survey_program:  &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  filter ( .data$var_name_suggested  %in% c(&amp;quot;rowid&amp;quot;,
                                            &amp;quot;serious_world_problems_first&amp;quot;, 
                                             &amp;quot;serious_world_problems_climate_change&amp;quot;)
  ) 

hw &amp;lt;- harmonize_var_names ( waves:  eb_waves, 
                            metadata:  climate_awareness_metadata )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;retroharmoinze&lt;/code&gt; package comes with a generic
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_waves.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
function that will change the value labels of categorical variables
(including binary ones) to a unitary format. It will also take care of
various types of missing values.&lt;/p&gt;
&lt;p&gt;First, let’s go back to our metadata and collect all value labels that
will show up with
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/collect_val_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;collect_val_labels()&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(climate_awareness_metadata)

##  [1] &amp;quot;Climate change&amp;quot;                            
##  [2] &amp;quot;International terrorism&amp;quot;                   
##  [3] &amp;quot;Poverty, hunger and lack of drinking water&amp;quot;
##  [4] &amp;quot;Spread of infectious diseases&amp;quot;             
##  [5] &amp;quot;The economic situation&amp;quot;                    
##  [6] &amp;quot;Proliferation of nuclear weapons&amp;quot;          
##  [7] &amp;quot;Armed conflicts&amp;quot;                           
##  [8] &amp;quot;The increasing global population&amp;quot;          
##  [9] &amp;quot;Other (SPONTANEOUS)&amp;quot;                       
## [10] &amp;quot;None (SPONTANEOUS)&amp;quot;                        
## [11] &amp;quot;Not mentioned&amp;quot;                             
## [12] &amp;quot;Mentioned&amp;quot;                                 
## [13] &amp;quot;DK&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we want to select &lt;code&gt;Climate change&lt;/code&gt; as the mentioned &lt;em&gt;most
serious problem&lt;/em&gt;, and &lt;code&gt;Climate change&lt;/code&gt; taken from a list of three
serious problems. The first question type is a single-choice one, where
&lt;code&gt;Climate change&lt;/code&gt; is either mentioned, or the alternative answer is
labeled as &lt;code&gt;Not mentioned&lt;/code&gt;. In the multiple choice case, the alternative
may be something else, for example, &lt;code&gt;Spread of infectious diseases&lt;/code&gt;, as
we all well know by 2021.&lt;/p&gt;
&lt;p&gt;We want to see who thought &lt;code&gt;Climate change&lt;/code&gt; was the most serious
problem, or one of the most serious problems, so we label each mentions
of &lt;code&gt;Climate change&lt;/code&gt; as &lt;code&gt;mentioned&lt;/code&gt; and we pair it with a numeric value
of &lt;code&gt;1&lt;/code&gt;. All other cases are labeled as &lt;code&gt;not_mentioned&lt;/code&gt;, with the
exceptions of various missing observations, which in these cases are
&lt;code&gt;Do not know&lt;/code&gt; answers, &lt;code&gt;Declined to answer&lt;/code&gt; cases, and &lt;code&gt;Inappropriate&lt;/code&gt;
cases [The latter one is Eurobarometer’s label for questions that were
for one reason or other not asked from a particular interviewee – for
example, because the Turkish Cypriot community received a different
questionnaire.]&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# positive cases
label_1:  c(&amp;quot;^Climate\\schange&amp;quot;, &amp;quot;^Mentioned&amp;quot;)
# missing cases 
na_labels &amp;lt;- collect_na_labels( climate_awareness_metadata)
na_labels

## [1] &amp;quot;DK&amp;quot;                             &amp;quot;Inap. (10 or 11 in qa1a)&amp;quot;      
## [3] &amp;quot;Inap. (coded 10 or 11 in qc1a)&amp;quot; &amp;quot;Inap. (coded 10 or 11 in qb1a)&amp;quot;

# negative cases
label_0 &amp;lt;- collect_val_labels( climate_awareness_metadata)
label_0 &amp;lt;- label_0[! label_0 %in% label_1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_serious_problems()&lt;/code&gt; function harmonizes the labels within
the special labeled class of &lt;code&gt;retroharmonize&lt;/code&gt;. This class retains all
information to give categorical variables a character or numeric
representation, and various processing metadata for documentation
purposes. While this class is very reach (it contains whatever was
imported from SPSS’s proprietary data format and the history), it is not
suitable for statistical analysis. We could, of course, directly call
the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
from the retroharmonize package, but the parameterization would be very
complicated even in a simple function call, not to mention a looped
call. Because this function is the heart of the
&lt;code&gt;retroharmonize package&lt;/code&gt;, it has &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a tutorial
article&lt;/a&gt;
on its own.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;harmonize_serious_problems &amp;lt;- function(x) {
  label_list &amp;lt;- list(
    from:  c(label_0, label_1, na_labels), 
    to:  c( rep ( &amp;quot;not_mentioned&amp;quot;, length(label_0) ),   # use the same order as in from!
            rep ( &amp;quot;mentioned&amp;quot;, length(label_1) ),
            &amp;quot;do_not_know&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;), 
    numeric_values:  c(rep ( 0, length(label_0) ), # use the same order as in from!
                       rep ( 1, length(label_1) ),
                       99997,99999,99999,99999)
  )
  
  harmonize_values(x, 
                   harmonize_labels:  label_list, 
                   na_values:  c(&amp;quot;do_not_know&amp;quot;=99997,
                                 &amp;quot;declined&amp;quot;=99998,
                                 &amp;quot;inap&amp;quot;=99999), 
                   remove:  &amp;quot;\\(|\\)|\\[|\\]|\\%&amp;quot;
  )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our objects are rather big in memory, so first, let’s remove the surveys
that do not contain these world problem variables. In this cases, the
subsetted and harmonized surveys in the nested list have only one
columns, i.e. the &lt;code&gt;rowid&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- hw[unlist ( lapply ( hw, ncol)) &amp;gt; 1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have a smaller problem to deal with. With many surveys, it is
easy to fill up your computer’s memory, so let’s start building up our
joined panel data from a smaller set of nested, subsetted surveys.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function (x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), harmonize_serious_problems) ) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our &lt;code&gt;lapply&lt;/code&gt; loop calls an anonymous function which in turn calls the
&lt;code&gt;harmonize_serious_problems&lt;/code&gt; parameterized version of the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
on all variables that have &lt;code&gt;problem&lt;/code&gt; in their names.&lt;/p&gt;
&lt;p&gt;once we are done, our variables have harmonized names, and harmonized
values, and harmonized label, but they are stored in the complex
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize_labelled_spss_survey&lt;/a&gt;
class, inherited from the &lt;code&gt;haven_labelled_spss&lt;/code&gt; in
&lt;a href=&#34;https://haven.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We reduced our single and multiple choice questions to binary choice
variables. We can now give them a numeric representation. Be mindful
that &lt;code&gt;retroharmonize&lt;/code&gt; has special methods for its special labeled class
that retains metadata from SPSS. This means that &lt;code&gt;as_character&lt;/code&gt; and
&lt;code&gt;as_numeric&lt;/code&gt; knows how to handle various types of missing values,
whereas the base R &lt;code&gt;as.character&lt;/code&gt; and &lt;code&gt;as.numeric&lt;/code&gt; may coerce special
values to unwanted results. This is particularly dangerous with numeric
variables – and this is the reason why we introduced a new set of S3
objects and methods in the package.&lt;/p&gt;
&lt;p&gt;We will ignore the differences between various forms of missingness,
i.e. the person said that she did not know, or did not want to answer,
or for some reason was not asked in the survey. In a more descriptive,
non-harmonized analysis you would probably want to explore them as
various ‘categories’ and use a character representation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function(x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), as_numeric) ))

hw &amp;lt;- Reduce ( full_join, hw) # we must use joins instead of binds because the number of columns vary.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see what we have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n (hw, 12)

## # A tibble: 12 x 3
##    rowid             serious_world_problems_fi~ serious_world_problems_climate_~
##    &amp;lt;chr&amp;gt;                                  &amp;lt;dbl&amp;gt;                            &amp;lt;dbl&amp;gt;
##  1 ZA6595_v3-0-0_23~                          0                               NA
##  2 ZA7572_v1-0-0_70~                          0                                0
##  3 ZA6595_v3-0-0_18~                          0                               NA
##  4 ZA6861_v1-2-0_27~                          0                                0
##  5 ZA6595_v3-0-0_26~                          0                               NA
##  6 ZA7572_v1-0-0_19~                          0                                1
##  7 ZA5877_v2-0-0_16~                          0                                0
##  8 ZA6861_v1-2-0_12~                          0                                0
##  9 ZA7572_v1-0-0_17~                          0                                0
## 10 ZA5877_v2-0-0_17~                          0                                1
## 11 ZA6861_v1-2-0_41~                          0                                0
## 12 ZA6861_v1-2-0_61~                          0                                1
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;creating-the-longitudional-table&#34;&gt;Creating the Longitudional Table&lt;/h2&gt;
&lt;p&gt;Now we just need to join the partial table by the &lt;code&gt;rowid&lt;/code&gt; together:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#start from the smallest (we removed the survey that had no relevant questionnaire item)
panel &amp;lt;- hw %&amp;gt;%
  left_join ( geography, by:  &#39;rowid&#39; ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( demography, by:  c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;) ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( interview_dates, by:  &#39;rowid&#39; )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And let’s see a small sample:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sample_n(panel, 12)

## # A tibble: 12 x 19
##    rowid  serious_world_pr~ serious_world_pr~ isocntry geo   region    w1    wex
##    &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;             &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;
##  1 ZA686~                 0                 0 ES       ES41  Casti~ 1.21  46787.
##  2 ZA686~                 0                 0 RO       RO31  South~ 0.724 11805.
##  3 ZA686~                 0                 0 SK       SK02  Zapad~ 0.774  3499.
##  4 ZA757~                 0                 1 PT       PT16  Centr~ 1.11   9336.
##  5 ZA659~                 1                NA HR       HR041 Grad ~ 0.580  2098.
##  6 ZA659~                 1                NA RO       RO21  North~ 1.21  20160.
##  7 ZA686~                 0                 0 PT       PT17  Lisboa 0.932  7448.
##  8 ZA659~                 0                NA GB-GBN   UKI   London 0.994 50133.
##  9 ZA757~                 0                 0 CY       CY    REPUB~ 0.594   874.
## 10 ZA686~                 0                 0 LT       LT003 Klaip~ 0.623  1564.
## 11 ZA757~                 0                 0 IE       IE013 West ~ 0.490  1651.
## 12 ZA659~                 0                NA LT       LT003 Klaip~ 1.16   2917.
## # ... with 11 more variables: marital_status &amp;lt;chr&amp;gt;, age_education &amp;lt;chr&amp;gt;,
## #   age_exact &amp;lt;dbl&amp;gt;, occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;,
## #   date_of_interview &amp;lt;date&amp;gt;

saveRDS ( panel, file.path(tempdir(), &amp;quot;climate_panel.rds&amp;quot;), version:  2)

# not evaluated
saveRDS( panel, file:  file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel.rds&amp;quot;), version=2)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;putting-it-on-a-map&#34;&gt;Putting It on a Map&lt;/h2&gt;
&lt;p&gt;This is not the end of the story. If you put all this on a map, the
results are a bit disappointing.&lt;/p&gt;
&lt;img src=&#34;featured.png&#34; width=&#34;660&#34; /&gt;
&lt;p&gt;Why? Because sub-national (provincial, state, county, district, parish)
borders are changing all the time - within the EU and everywhere. The
next step is to harmonize the geographical information. We have another
CRAN released package to help you with. See the next post: &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Regional
Climate Change Awareness
Dataset&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>What is Retrospective Survey Harmonization?</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/</link>
      <pubDate>Thu, 04 Mar 2021 00:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/</guid>
      <description>&lt;h2 id=&#34;reproducible-ex-post-harmonization-of-survey-microdata&#34;&gt;Reproducible ex post harmonization of survey microdata&lt;/h2&gt;
&lt;p&gt;Retrospective survey harmonization allows the comparison of opinion poll
data conducted in different countries or time. In this example we are
working with data from surveys that were ex ante harmonized to a certain
degree – in our tutorials we are choosing questions that were asked in
the same way in many natural languages. For example, you can compare
what percentage of the European people in various countries, provinces
and regions thought climate change was a serious world problem back in
2013, 2015, 2017 and 2019.&lt;/p&gt;
&lt;p&gt;We developed the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; R package
to help this process. We have tested the package with about 80
Eurobarometer, 5 Afrobarometer survey files extensively, and a bit with
Arabbarometer files. This allows the comparison of various survey
answers in about 70 countries. This policy-oriented survey programs were
designed to be harmonized to a certain degree, but their ex post
harmonization is still necessary, challenging and errorprone.
Retrospective harmonization includes harmonization of the different
coding used for questions and answer options, post-stratification
weights, and using different file formats.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
&lt;a href=&#34;https://www.afrobarometer.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobaromer&lt;/a&gt;, &lt;a href=&#34;https://www.arabbarometer.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt; and
&lt;a href=&#34;https://www.latinobarometro.org/lat.jsp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Latinobarómetro&lt;/a&gt; make survey
files that are harmonized across countries available for research with
various terms. Our
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; is not
affiliated with them, and to run our examples, you must visit their
websites, carefully read their terms, agree to them, and download their
data yourself. What we add as a value is that we help to connect their
files across time (from different years) or across these programs.&lt;/p&gt;
&lt;p&gt;The survey programs mentioned above publish their data in the
proprietary SPSS format. This file format can be imported and translated
to R objects with the haven package; however, we needed to re-design
&lt;a href=&#34;https://haven.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven’s&lt;/a&gt;
&lt;a href=&#34;https://haven.tidyverse.org/reference/labelled_spss.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;labelled_spss&lt;/a&gt;
class to maintain far more metadata, which, in turn, a modification of
the &lt;a href=&#34;&#34;&gt;labelled&lt;/a&gt; class. The haven package was designed and tested with
data stored in individual SPSS files.&lt;/p&gt;
&lt;p&gt;The author of labelled, Joseph Larmarange describes two main approaches
to work with labelled data, such as SPSS’s method to store categorical
data in the &lt;a href=&#34;http://larmarange.github.io/labelled/articles/intro_labelled.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Introduction to
labelled&lt;/a&gt;.&lt;/p&gt;
















&lt;figure  id=&#34;figure-two-main-approaches-of-labelled-data-conversion&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;img/larmarange_approaches_to_labelled.png&#34; alt=&#34;Two main approaches of labelled data conversion.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Two main approaches of labelled data conversion.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Our approach is a further extension of &lt;strong&gt;Approach B&lt;/strong&gt;. Survey
harmonization in our case always means the joining data from several
SPSS files, which requires a consistent coding among several data
sources. This means that data cleaning and recoding must take place
before conversion to factors, character or numeric vectors. This is
particularly important with factor data (and their simple character
conversions) and numeric data that occasionally contains labels, for
example, to describe the reason why certain data is missing. Our
tutorial vignette
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;labelled_spss_survey&lt;/a&gt;
gives you more information about this.&lt;/p&gt;
&lt;p&gt;In the next series of tutorials, we will deal with an array of problems.
These are not for the faint heart – you need to have a solid
intermediate level of R to follow.&lt;/p&gt;
&lt;h2 id=&#34;tidy-joined-survey-data&#34;&gt;Tidy, joined survey data&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The original files identifiers may not be unique, we have to create
new, truly unique identifiers. Weighting may not be straightforward.&lt;/li&gt;
&lt;li&gt;Neither the number of observations or the number of variables (which
represents the survey questions and their translation to coded data)
is the same. Certain data may be only present in one survey and not
the other. This means that you will likely to run loops on lists and
not data.frames, but eventually you must carefully join them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;class-conversion&#34;&gt;Class conversion&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Similar questions may be imported from a non-native R format, in our
case, from an SPSS files, in an inconsistent manner. SPSS’s variable
formats cannot be translated unambiguously to R classes.
&lt;code&gt;retroharmonize&lt;/code&gt; introduced a new S3 class system that handles this
problem, but eventually you will have to choose if you want to see a
numeric or character coding of each categorical variable.&lt;/li&gt;
&lt;li&gt;The harmonized surveys, with harmonized variable names and
harmonized value labels, must be brought to consistent R
representations (most statistical functions will only work on
numeric, factor or character data) and carefully joined into a
single data table for analysis.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;harmonization-of-variables-and-variable-labels&#34;&gt;Harmonization of variables and variable labels&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Same variables may come with dissimilar variable names and variable
labels. It may be a challenge to match age with age. We need to
harmonize the names of variables.&lt;/li&gt;
&lt;li&gt;The harmonized variables may have different labeling. One may call
refused answers as &lt;code&gt;declined&lt;/code&gt; and the other &lt;code&gt;refusal&lt;/code&gt;. On a simple
choice, climate change may be ‘Climate change’ or
&lt;code&gt;Problem: Climate change&lt;/code&gt;. Binary choices may have survey-specific
coding conventions. Value labels must be harmonized. There are good
tools to do this in a single file - but we have to work with several
of them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;missing-value-harmonization&#34;&gt;Missing value harmonization&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;There are likely to be various types of &lt;code&gt;missing values&lt;/code&gt;. Working
with missing values is probably where most human judgment is needed.
Why are some answers missing: was the question not asked in some
questionnaires? Is there a coding error? Did the respondent refuse
the question, or sad that she did not have an answer?
&lt;code&gt;retroharmonize&lt;/code&gt; has a special labeled vector type that retains this
information from the raw data, if it is present, but you must make
the judgment yourself – in R, eventually you will either create a
missing category, or use &lt;code&gt;NA_character_&lt;/code&gt; or &lt;code&gt;NA_real_&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s a lot to put on your plate.&lt;/p&gt;
&lt;p&gt;It is unlikely that you will be able to work with completely unfamiliar
survey programs if you do not have a strong intermediate level of R. Our
package comes with tutorials for
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt;
and our development version already covers Arab Barometer, highlighting
some peculiar issues with these survey programs, that we hope to give a
head start for less experienced R users.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
