Research Software Engineering Day 2025
General
- Website
- Program: html. Full program (pdf). See also Abstracts.
- Organizer: BE-RSE
- Host: KU Leuven, Machinezaal in Thermotechnical Institute (Kasteelpark Arenberg 41 , 3001 Heverlee)
- Founder: BE-RSE
- Date: 4/12/2025
These notes are a very first DRAFT. A review will occurr soon.
Keynotes - part 1
From FAIR to Good Practice: How Do We Improve Research Software?
Presenter: Neil Chue Hong (Director, Sofware Sustainability Institute), Professort of Research Software Policy and Practice
Chue Hong, Neil (2025). From FAIR to Good Practice: How do we improve research software?. figshare. Presentation. https://doi.org/10.6084/m9.figshare.30764660
A lot has changed since 2010.
Research Software Engineer is a kind of “family” as there are different tasks-research users.
Software Sustainability Institute has been founded in 2010 to improve practice aroud the software that enables research. Dedicated to any kind of sustainability.
Some milestones of the institute:
- Carpentries
- FAIR principles for software (FAIR4RS)
FAIR principles for software are comprehensive but quite difficult to fully achieve. So, a new paper was written speaking about FAIR enough principles. What are the FAIR enough principles?
- Write code to be readable, reusable, and testable
- Often dependent on institutional policies and sector
UNESCO and OECD have been working in recognizing the role of software and software community in research.
Check “Software and skills for research computing in the UK” report: full, short.
Software citation support: GitHub code repo added support for software citation using CFF files in July 2021. By Sept 2022, more than 11k projects had added a citation file.
Research software is team work and must to think globally. Check Research Software Development Principles, Chue Hong N.. Key aspects:
- FAIR: reusable by as many users as possible
- Secure: respect data privacy and assume attacks
- Maintainable: easy to adapt and to correct faults
- Reproducible: enable trust in research
- Recognition: reward all roles and develop the next generation
- Inclusive: Accessible and supportive of a broad community
- Responsible: build to reduce impact on our environment
- Open & global: transcend national and discipline boundaries
- Humanist: unbiased, ethical and in support of humanity
Communities of Practice, e.g. code check, Research Software Alliance, The Carpentries. It’s all about:
- Enthousisasm
- Engagement
- Knowledge exchange
AI is everything, everywhere, all at once. Impact on RSE?
- Majority of survey respondents use AI as part of their work and believe it has increased their productivity.
- Y combinator reporeted that 25% of startup companies in its Winter 2025 batch had codebases that were 95% AI-generated!
- AI in RSE? Curious but cautious
- What about the university students? They are using AI but are the most skeptical.
Two tendencies:
- less experienced developers see the advantages of speeding up producing working code
- most experiences see a slow down of the productivity (less quality of code)
Are there some simple rules for AI-assisted coding in science? Ten simple rules for AI-Assisted coding in Science, aRxiv.
- Coding assistants open the possibilities for many more people to develop effective researhc software
- But not aa substitute for teaching users the basics of algorithmes, data strucutres and software engineering
- Not a substitute for understanding how to frame a problem: domain knowledge, architectural and design choices for a language, identifying requirements
- What tool to use where and when - we need more training about how to use coding assistants proberly for research software
Key question/challenge: how do we train people now that they use code assistance?
Coding assistants good at creating unit tests for edge cases what happens if the tools you rely on increase their price/disappear? Sustainability aspects are very important while thinking about code assistants.
The Impact of Open Source Robotics Research - or the Lack thereof on Industry
Presenter: Peter Soetens, CEO at Intermodalics
What Intermodalics does?
- Produce software for Vision Guided Robots.
- Real-time system, Robot Operating System (ROS)
All modern digital infrstructure is in some way based on open software, e.g. linux, created by “poor” Linus Torvalds (poor in comparison with all tech CEOs): https://www.lets-code.co.in/blogs/getting-started-with-open-source/
But, the Linux Foundation is not poor at all as big tech and many other companies worldwide understand the importance of it.
Was ROS the best when launched? No, just “good enough”. And that’s was its success; together with being open source.
Check the Open Source Robotics Alliance.
What we all need from research:
- Open up the foundational levels (R package, Python library or a model, …)
- If you cannot open it up because your research has been paied by companies and you have confidentiality issues, collaborate globally: the overhead is worth the win!
- If what you are doing is so secret that even the previous step is not possible, just start your own company: do you have any breaktrough idea? Keep it and build it! And get back to 1, or at least 2 (italic: my personal addition).
Parallel Track 1 - Workflow
Chair: Johan Philips
The journey of migrating a spreadsheet with 500 rows
Presenter: Neil Chue Hong (Director, Sofware Sustainability Institute), Professort of Research Software Policy and Practice Slides: https://doi.org/10.6084/m9.figshare.30764660 (CC BY 4.0)
Database was a colleciton of non harmonised ~500 rows long spreadsheets (MS Excel). The researcher must be sarch by all of them sometimes. Data quality? Quite poor! Example: a numeric column was not always numeric as comments sometimes were added.
Migration from MS Xecel to PostgreSQL? No, researchers didn’t want (have time) to learn SQL. Maybe building a web application on top of it? No, research software engineer didn’t have time to do that. .
Part of the solution: eLabFTW, a free and open source electronic lab notebook. It’s open source, it has an API, it is built with a Role Based Access Control (RBAC) and it has structured metadata, which can be created via User Interface.
So, all the worksheet / tables were mapped to resource categories/templates in eLabFTW.
Daily all data go to a data warehouse, where data analysis pipelines and dashboards are built on top of the data.
Now the research team has its own research software engineer and can move further on their own.
Research software made EESSI: the European Environment for Scientific Software Installations
Presenters: Kenneth Hoste, Lara Peeters.
https://www.eessi.io/docs/
Keynotes - part 2
Hack, Fix, Repeat: FOSS and the Future of Systems Security
Presenter: Jo Van Bulck.
Kerckhoff’s Principle: No Security through Obscurity.
Linus’s Law: Security through Open Source? “Given enough eyeballs, all bugs are shallow” (Eric S. Raymond).
“Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone. [..] Researchers and practitioners have repeatedly shown the effectiveness of reviewing processes in finding bugs and security issues.” (Pfleeger, Charles P.; Pfleeger, Shari Lawrence (2003). Security in Computing, 4th Ed. Prentice Hall PTR. pp. 154–157. ISBN 0-13-239077-9.)
Linux: hundreds of vulnerabilities
Let’s speak about Confidential Computing.
Truested execution: hardware-level isolation and attestation. There is a growing ecosystem of Trusted Execution Environments (TTEs). TTEs are here to stay: “Confidential Computing Today, Just Computing Tomorrow” (Mark Russinovich, CTO Microsoft Azure)
Research Agenda: test security claims of these TTEs
- Offensive security analysis of closed-source (large) commercial systems: critical analysis of vendor claims.
- Defensive prototypes on open-source (small) research systems: next-generation innovations
DistriNet is vetting confidential computing for more a decade.
Let’s speak now about Sancus, a long-lived open source project, do’s and don’ts.
Sancus is a lightweight Trusted computing for the Internet Of Things (IOT).
No commercialization, but FOSS licenses Limit dependencies Upstream eagerly : avoid dead forks. 2012-2017: public tarballs + private git. From 2017: move to public GitHub organisation: Sancus.
Check the inspiring book The Cathedral & The Bazaar, by Eric S. Raymond (see online version, paper book).
Build usable systems
- large engineering effort resulting in minimal publication effort.
- Simulators and test frameworks
- Continuous integration
Impact through Education
Having master students on Sancus allows understanding, by putting theory into practice. Highly recommended: continuous master thesis involvement.
Science Communication
Documentation, conferences: FOSSDEM is the most important probably.
Let’s speak now about SGX-Step, a versatile open-source attack framework.
SGX-Step enlights an important aspect: Engage with Industry! SGX-Step lef to changes in major OSs, Intel chipts and enclaves SDKs.
Conclusions. What are the magic ingredients?
- Open-source ecosystem
- Modular base design
- Impact through education
- Science communication
- Accessible library design
- Reusable primitives
- Engage with industry
The Marvelous Misadventures of a Scientist-in-progress: from PhD Disasters to Corporate Farces
Presenter: Giada Lalli, Bioinformatician
Coding was her next-step. It was like taking control on something she hadn’t control before. Coding perceived like a game.
What makes PhD students happy? Good supervision.
Seek mentorship, not control Seek guidance, not permission Seek direction, not directives Seek feedback, not instructions Seek support, not dependence
Never forget: you are in charge of your PhD as it’s your project. Do not forget that your promotor was in your position just few years before you.
Trust your skills.
And don’t be shy, but reach out even just for advice.
Parallel Track 2 - Data & AI/ML
Chair: Ingrid Barcena Roig
AI pair programming: how lazy can you afford to be?
Presenter: Geert Jan Bex
Attempt to making slides with voice over.
AI generated code: How to debug? How to maintain?
Formulate specs
Write documentation about User Interface and let AI to create code based on it. Better precision of my speces, more correct the output.
AI is good in “boring work”:
- command line arguments
- input validation
- documentation stubs
- initial unit tests
Code completion and suggestions: yes, but pay attention that they could be “outdated”, even if working. AI systems
Code reviews: sourcery. Nice.
Use agents: they will improve incredibly the quality.
Online agents:
- GitHub Copilot
- OpenAi Codex
On your machine:
- OpenAI Codex CLI
OpenAI Codex CLI for something where discussion/iterations are expected as the goal is more complex, not just unit-tests.
Provide context via Markdowns, e.g. AGENTS.md.
General guidelines Specialization:
- in GIthub repository
- in specific directories
Boring science is easy, science not.
- Use scpeifications
- Can save lots of time
- Check answers/review code
You won’t be out of a job anytime soon… if you add value and you know your stuff.
AI doesn’t replace competence: it complements it.
Valorise your research by developing software: challenges and opportunities in the domains of digital education and healthcare
Presenter: Frederik Cornillie, Stefaan Haspeslagh
Think beyond your current collaboration. Have a mission, a long-term goal.
dtaianomaly: A Python library for time series anomaly detection
Presenter: Louis Carpentier
dtaianomaly is a Python tool for time series anomaly detection.
There is a web application on top of it: InTimeAD (GItHub).
During poster session, the presenter showed me another Python tool, patsemb which could be interesting for migration/spawning detection in eels/shads time series I work on. PaTSEmb is a Python package for creating a pattern-based embedding of the time series. This is an embedding of the time series which contains information about the typical shapes are occurring at which locations in the time series.