Research Software Engineering Day 2025

December 04, 2025 • Damiano Oldoni

software rse Leuven Belgium foss open science open fair

Image by BE-RSE

General
Keynotes - part 1
- From FAIR to Good Practice: How Do We Improve Research Software?
- The Impact of Open Source Robotics Research - or the Lack thereof on Industry
Parallel Track 1 - Workflow
- The journey of migrating a spreadsheet with 500 rows
- Research software made EESSI: the European Environment for Scientific Software Installations
Keynotes - part 2
- Hack, Fix, Repeat: FOSS and the Future of Systems Security
- The Marvelous Misadventures of a Scientist-in-progress: from PhD Disasters to Corporate Farces
Parallel Track 2 - Data & AI/ML

General

Website
Program: html. Full program (pdf). See also Abstracts.
Organizer: BE-RSE
Host: KU Leuven, Machinezaal in Thermotechnical Institute (Kasteelpark Arenberg 41 , 3001 Heverlee)
Founder: BE-RSE
Date: 4/12/2025

These notes are a very first DRAFT. A review will occurr soon.

Keynotes - part 1

From FAIR to Good Practice: How Do We Improve Research Software?

Presenter: Neil Chue Hong (Director, Sofware Sustainability Institute), Professort of Research Software Policy and Practice

Chue Hong, Neil (2025). From FAIR to Good Practice: How do we improve research software?. figshare. Presentation. https://doi.org/10.6084/m9.figshare.30764660

A lot has changed since 2010.

Research Software Engineer is a kind of “family” as there are different tasks-research users.

Software Sustainability Institute has been founded in 2010 to improve practice aroud the software that enables research. Dedicated to any kind of sustainability.

Some milestones of the institute:

Carpentries
FAIR principles for software (FAIR4RS)

FAIR principles for software are comprehensive but quite difficult to fully achieve. So, a new paper was written speaking about FAIR enough principles. What are the FAIR enough principles?

Write code to be readable, reusable, and testable
Often dependent on institutional policies and sector

UNESCO and OECD have been working in recognizing the role of software and software community in research.

Check “Software and skills for research computing in the UK” report: full, short.

Software citation support: GitHub code repo added support for software citation using CFF files in July 2021. By Sept 2022, more than 11k projects had added a citation file.

Research software is team work and must to think globally. Check Research Software Development Principles, Chue Hong N.. Key aspects:

FAIR: reusable by as many users as possible
Secure: respect data privacy and assume attacks
Maintainable: easy to adapt and to correct faults
Reproducible: enable trust in research
Recognition: reward all roles and develop the next generation
Inclusive: Accessible and supportive of a broad community
Responsible: build to reduce impact on our environment
Open & global: transcend national and discipline boundaries
Humanist: unbiased, ethical and in support of humanity

Communities of Practice, e.g. code check, Research Software Alliance, The Carpentries. It’s all about:

Enthousisasm
Engagement
Knowledge exchange

AI is everything, everywhere, all at once. Impact on RSE?

Majority of survey respondents use AI as part of their work and believe it has increased their productivity.
Y combinator reporeted that 25% of startup companies in its Winter 2025 batch had codebases that were 95% AI-generated!
AI in RSE? Curious but cautious
What about the university students? They are using AI but are the most skeptical.

Two tendencies:

less experienced developers see the advantages of speeding up producing working code
most experiences see a slow down of the productivity (less quality of code)

Are there some simple rules for AI-assisted coding in science? Ten simple rules for AI-Assisted coding in Science, aRxiv.

Coding assistants open the possibilities for many more people to develop effective researhc software
But not aa substitute for teaching users the basics of algorithmes, data strucutres and software engineering
Not a substitute for understanding how to frame a problem: domain knowledge, architectural and design choices for a language, identifying requirements
What tool to use where and when - we need more training about how to use coding assistants proberly for research software

Key question/challenge: how do we train people now that they use code assistance?

Coding assistants good at creating unit tests for edge cases what happens if the tools you rely on increase their price/disappear? Sustainability aspects are very important while thinking about code assistants.

The Impact of Open Source Robotics Research - or the Lack thereof on Industry

Presenter: Peter Soetens, CEO at Intermodalics

What Intermodalics does?

Produce software for Vision Guided Robots.
Real-time system, Robot Operating System (ROS)

All modern digital infrstructure is in some way based on open software, e.g. linux, created by “poor” Linus Torvalds (poor in comparison with all tech CEOs): https://www.lets-code.co.in/blogs/getting-started-with-open-source/

But, the Linux Foundation is not poor at all as big tech and many other companies worldwide understand the importance of it.

Was ROS the best when launched? No, just “good enough”. And that’s was its success; together with being open source.

Check the Open Source Robotics Alliance.

What we all need from research:

Open up the foundational levels (R package, Python library or a model, …)
If you cannot open it up because your research has been paied by companies and you have confidentiality issues, collaborate globally: the overhead is worth the win!
If what you are doing is so secret that even the previous step is not possible, just start your own company: do you have any breaktrough idea? Keep it and build it! And get back to 1, or at least 2 (italic: my personal addition).

Parallel Track 1 - Workflow

Chair: Johan Philips

The journey of migrating a spreadsheet with 500 rows

Presenter: Neil Chue Hong (Director, Sofware Sustainability Institute), Professort of Research Software Policy and Practice Slides: https://doi.org/10.6084/m9.figshare.30764660 (CC BY 4.0)

Database was a colleciton of non harmonised ~500 rows long spreadsheets (MS Excel). The researcher must be sarch by all of them sometimes. Data quality? Quite poor! Example: a numeric column was not always numeric as comments sometimes were added.

Migration from MS Xecel to PostgreSQL? No, researchers didn’t want (have time) to learn SQL. Maybe building a web application on top of it? No, research software engineer didn’t have time to do that. .

Part of the solution: eLabFTW, a free and open source electronic lab notebook. It’s open source, it has an API, it is built with a Role Based Access Control (RBAC) and it has structured metadata, which can be created via User Interface.

So, all the worksheet / tables were mapped to resource categories/templates in eLabFTW.

Daily all data go to a data warehouse, where data analysis pipelines and dashboards are built on top of the data.

Now the research team has its own research software engineer and can move further on their own.

Research software made EESSI: the European Environment for Scientific Software Installations

Presenters: Kenneth Hoste, Lara Peeters.

https://www.eessi.io/docs/

Keynotes - part 2

Hack, Fix, Repeat: FOSS and the Future of Systems Security

Presenter: Jo Van Bulck.

Kerckhoff’s Principle: No Security through Obscurity.

Linus’s Law: Security through Open Source? “Given enough eyeballs, all bugs are shallow” (Eric S. Raymond).

“Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone. [..] Researchers and practitioners have repeatedly shown the effectiveness of reviewing processes in finding bugs and security issues.” (Pfleeger, Charles P.; Pfleeger, Shari Lawrence (2003). Security in Computing, 4th Ed. Prentice Hall PTR. pp. 154–157. ISBN 0-13-239077-9.)

Linux: hundreds of vulnerabilities

Let’s speak about Confidential Computing.

Truested execution: hardware-level isolation and attestation. There is a growing ecosystem of Trusted Execution Environments (TTEs). TTEs are here to stay: “Confidential Computing Today, Just Computing Tomorrow” (Mark Russinovich, CTO Microsoft Azure)

Research Agenda: test security claims of these TTEs

Offensive security analysis of closed-source (large) commercial systems: critical analysis of vendor claims.
Defensive prototypes on open-source (small) research systems: next-generation innovations

DistriNet is vetting confidential computing for more a decade.

Let’s speak now about Sancus, a long-lived open source project, do’s and don’ts.

Sancus is a lightweight Trusted computing for the Internet Of Things (IOT).

No commercialization, but FOSS licenses Limit dependencies Upstream eagerly : avoid dead forks. 2012-2017: public tarballs + private git. From 2017: move to public GitHub organisation: Sancus.

Check the inspiring book The Cathedral & The Bazaar, by Eric S. Raymond (see online version, paper book).

Build usable systems

large engineering effort resulting in minimal publication effort.
Simulators and test frameworks
Continuous integration

Impact through Education

Having master students on Sancus allows understanding, by putting theory into practice. Highly recommended: continuous master thesis involvement.

Science Communication

Documentation, conferences: FOSSDEM is the most important probably.

Let’s speak now about SGX-Step, a versatile open-source attack framework.

SGX-Step enlights an important aspect: Engage with Industry! SGX-Step lef to changes in major OSs, Intel chipts and enclaves SDKs.

Conclusions. What are the magic ingredients?

Open-source ecosystem
Modular base design
Impact through education
Science communication
Accessible library design
Reusable primitives
Engage with industry

The Marvelous Misadventures of a Scientist-in-progress: from PhD Disasters to Corporate Farces

Presenter: Giada Lalli, Bioinformatician

Coding was her next-step. It was like taking control on something she hadn’t control before. Coding perceived like a game.

What makes PhD students happy? Good supervision.

Seek mentorship, not control Seek guidance, not permission Seek direction, not directives Seek feedback, not instructions Seek support, not dependence

Never forget: you are in charge of your PhD as it’s your project. Do not forget that your promotor was in your position just few years before you.

Trust your skills.

And don’t be shy, but reach out even just for advice.

Parallel Track 2 - Data & AI/ML

Chair: Ingrid Barcena Roig

AI pair programming: how lazy can you afford to be?

Presenter: Geert Jan Bex

Attempt to making slides with voice over.

AI generated code: How to debug? How to maintain?

Formulate specs

Write documentation about User Interface and let AI to create code based on it. Better precision of my speces, more correct the output.

AI is good in “boring work”:

command line arguments
input validation
documentation stubs
initial unit tests

Code completion and suggestions: yes, but pay attention that they could be “outdated”, even if working. AI systems

Code reviews: sourcery. Nice.

Use agents: they will improve incredibly the quality.

Online agents:

GitHub Copilot
OpenAi Codex

On your machine:

OpenAI Codex CLI

OpenAI Codex CLI for something where discussion/iterations are expected as the goal is more complex, not just unit-tests.

Provide context via Markdowns, e.g. AGENTS.md.

General guidelines Specialization:

in GIthub repository
in specific directories

Boring science is easy, science not.

Use scpeifications
Can save lots of time
Check answers/review code

You won’t be out of a job anytime soon… if you add value and you know your stuff.

AI doesn’t replace competence: it complements it.

Valorise your research by developing software: challenges and opportunities in the domains of digital education and healthcare

Presenter: Frederik Cornillie, Stefaan Haspeslagh

Slides.

Think beyond your current collaboration. Have a mission, a long-term goal.

dtaianomaly: A Python library for time series anomaly detection

Presenter: Louis Carpentier

dtaianomaly is a Python tool for time series anomaly detection.

There is a web application on top of it: InTimeAD (GItHub).

During poster session, the presenter showed me another Python tool, patsemb which could be interesting for migration/spawning detection in eels/shads time series I work on. PaTSEmb is a Python package for creating a pattern-based embedding of the time series. This is an embedding of the time series which contains information about the typical shapes are occurring at which locations in the time series.

On this page