About the Data

1 Introduction

This page shows the workflow to get the cleaned data used in the dashboard starting from the raw data. Happy reading 📚.

1.1 Setup

First, list and load the needed packages.

library(tidyverse)      # to do datascience
library(here)           # to work easily with paths

2 Read raw data

The raw data are saved in data/input/20250224_craywatch_raw.txt as a tab separated text file. It contains GBIF observations of 6 invasive alien crayfishes in Flanders from 2024-06-13, start of the craywatch project. Derived from 10.15468/dl.pszsrh. The query the GBIF download was based on, is the same query running real-time to show the layer “Nieuwe waarnemingen” (new observations) in the craywatch map!

cray_raw_df <- readr::read_tsv(
  here::here("data", "input", "20250224_craywatch_raw.txt"),
  na = "",
  guess_max = 10000
)

Preview (10 rows):

head(cray_raw_df, 10)

3 Clean data

We remove data without a specific year in column year and select only few columns we are interested to:

cray_cleaned_df <- cray_raw_df %>%
  filter(!is.na(year)) %>%
  select(gbifID, occurrenceID, species, speciesKey, decimalLatitude,
         decimalLongitude, coordinateUncertaintyInMeters,
         eventDate, year, month, countryCode, stateProvince,
         county, locality, basisOfRecord, individualCount,
         datasetName)

Preview (10 rows):

head(cray_cleaned_df, 10)

4 Save cleaned data

We save the cleaned data as 20250224_craywatch_cleaned.txt in data/20250224. This file will be the input file for the dashboard.

cray_cleaned_df %>%
  readr::write_tsv(here::here("data",
                              "output",
                              "20250224_craywatch_cleaned.txt")
  )