15 years ago, I enrolled as a M.Sc. at a lab that researched diagnostic ultrasound and focused on (pun intended, for my fellow acoustic nerds) higher-frequency, finer-resolution imaging. The group had just acquired the latest high-frequency scanner model, including a brand new "digital RF" add-on that could export raw radio-frequency (RF) data, before it was turned into an image. The add-on was of special interest to me, because my research relied on acquiring RF data.
“Beta” get started
This "digital RF" module was more than brand new; it was a pre-release beta. It came with no instructions and barely any software. It could export two files in a proprietary, undocumented format. There was no existing way to read these files!
Athough I was inexperienced, with a degree in computer engineering and 2 years of software development internships under my belt, programming was my comfort zone. Attempting to reverse engineer the file format felt like a natural next step. I chose MATLAB
for the task, not because it was necessarily the right tool for prodding these files, but because:
- It was convenient and,
- It's what I needed to use to analyze the decoded data afterward
MATLAB
> An aside
As the name suggests, MATLAB
is a powerful experimentation platform aimed at highly mathematical applications. Used in both industry and academia, it has particularly strong adoption in the latter. Unsurprising when you consider MATLAB
's core strengths:
- Easy to get started (all-in-one development environment, scripting = no compilation)
- Fast prototyping of complex, experimental ideas
- Minimal need to "reinvent the wheel" (lots of built-in, domain-specific functions)
Due to the low barrier to entry and ability to "get things done" quickly, plenty of MATLAB
code in academia is written with minimal up-front design and often by people with no formal software development training or experience. Thus while scientists rightly love the abilities that MATLAB
unlocks, developers and engineers tend to grimace when exposed to the resulting code. This has led to a misconception in industry that high quality software isn't possible with MATLAB
. While it's certainly not a language or platform you'd often choose to use "in production", writing robust and readable MATLAB
is simply a matter of know-how.
Quick & dirty
Early 20's me, who did not possess that know-how, cobbled together a rudimentary script. And, like other scientist-authored code found around the University, it shared a common hallmark:
It was written by me, for me. It did what I needed it to. No more.
Given the correct files and setup, it successfully loaded an array of the decoded RF data into the MATLAB
workspace for further analysis. If the files to-be-read included unanticipated header information or there had been an update to the scanner console software, it failed.
I used it to gather the data I needed and fixed bugs only as neccessary. There was no incentive to improve the script beyond that. Its expected trajectory was that of a lot of lab-created software: serve one user, be discarded, and forgotten.
A breath of life
Then, with an innocuous question, that destiny was altered.
“Hey, Zamir. You know your RF data reading script? Could I use it?”
Uh... yeah, sure. I would never hesitate to help out a fellow researcher, so the answer to whether they may use the script was a resounding yes. What I was unsure about was whether, with the state the script was in, they could use it.
The code was an ever-changing mess that only I understood. You had to edit the script with case-specific information for every use! And, like most students in the lab, this person had a scientific background rather than a software-focused one. I took one look at my code and knew it had to improve before it stepped “outside”.
Users = Forcing function for quality
And so we arrive at a general truth about software. When you write software for someone else, it has to be better than if it's just for you. Starting with this initial request and continuing as more interest arose, I felt obliged to increase the quality of the script with changes such as:
- Variable names describing their own meaning
- Named constants describing their values (no magic numbers)
- Consistent style (spacing, indentation, brackets, letter-casing, etc.)
- Comments for how-to usage and describing non-trivial operations
- Modularization from monolith to logically-separated functions
With these updates, the code made its way around the lab, improving by small amounts each time someone found a bug.
Code quality: an example
The above enhancements, considered part of a bare minimum in most commercial software settings, are often missing in early-stage academic code. A contrived-but-illustrative example is turning this...
my_wheel_rotations.m
TRIP_DISTANCE = 10;
% car
num_wheels = 4;
wheelRad = 8.5;
Rots = 10 * 1000 * num_wheels / (2 * 3.1415 * wheelRad * 0.0254);
disp("The car's wheels do a total of " + num2str(Rots) + " rotations");
% bike
num_wheels = 2;
wheelRad = 11.5;
Rots = 10 * 1000 * num_wheels / (2 * 3.1415 * wheelRad * 0.0254);
disp("The bike's wheels do a total of " + num2str(Rots) + " rotations");
...into this:
wheelRotations.m
% Wheel rotations for a car
carWheelRotations = getWheelRotations(Constants.TripDistanceKms, "car", 4, 8.5);
% Wheel rotations for a bike
bikeWheelRotations = getWheelRotations(Constants.TripDistanceKms, "bike", 2, 11.5);
constants.m
% Collection of named constants
classdef Constants
properties (Constant)
TripDistanceKms = 10
MetersPerKm = 1000
MetersPerInch = 0.0254
end
end
getWheelRotations.m
function totalWheelRotations = getWheelRotations(tripDistance, vehicleName, numWheels, wheelRadius)
% GETWHEELROTATIONS Calculates the total wheel rotations for a
% vehicle traveling a specific distance
%
% INPUTS
% tripDistance -> (number) Total trip distance, in kilometers
% vehicleName -> (string) Descriptive name of the vehicle
% numWheels -> (integer) The number of wheels the vehicle has
% wheelRadius -> (number) Radius of the wheels, in inches
%
% OUTPUTS
% totalWheelRotations -> (number) Number of wheel rotations during trip
%
% EXAMPLE SYNTAX
% wheelRots = getWheelRotations(10, "car", 4, 8.5)
%
% ABOUT
% Author: Dr. John Doe
% Created: 7-28-2019
% Last modified: 8-5-2019
% ------------------------------------------------------------------------
% TODO: enforce valid input types (https://www.mathworks.com/help/matlab/matlab_prog/function-argument-validation-1.html)
% Convert the input parameters to common units
% TODO: use unitconvert for conversions (https://www.mathworks.com/help/symbolic/unitconvert.html)
tripDistanceMeters = tripDistance * Constants.MetersPerKm;
wheelDiameter = 2 * wheelRadius;
wheelDiameterMeters = wheelDiameter * Constants.MetersPerInch;
wheelCircumferenceMeters = pi * wheelDiameterMeters;
% Calculate the total wheel rotations
rotationsPerWheel = tripDistanceMeters / wheelCircumferenceMeters;
totalWheelRotations = rotationsPerWheel * numWheels;
% Display a descriptive result
disp("A " + vehicleName + " with " + num2str(numWheels) + ...
" wheels of " + num2str(wheelDiameter) + """ diameter will have " + ...
num2str(totalWheelRotations) + " total wheel rotations during a " + ...
num2str(tripDistance) + "km trip");
end
You might furrow your brow at this so-called "improvement". We went from 11 lines of code (LOC) to 40 for something that does (more or less) the exact same thing! It's true, but nobody said that higher quality software was convenient. Looking strictly at development time, the "before" example wins easily. That's part of the point: it explains why messy code is common in academia, where moving quickly to test out new ideas takes priority.
But let's compare these two versions more closely. Imagine a codebase performing a difficult-to-comprehend task in a complex, graduate-level, scientific domain.
Characteristic | "Before" | "After" |
---|---|---|
Total LOC | 11 | 40 |
Time to write | ~3 minutes | ~15 minutes |
LOC that are comments | 18% | 70% |
Readability | Poor | Excellent |
Modularity | Poor | Excellent |
The "After" code is far from perfect. In fact, it includes “TODO
” comments pointing out potential improvements. But, in terms of quality and serving more than a single user, the choice is obvious.
Why bother?
How far does such an incremental quality improvement take you? You invest extra time into writing better code, documenting, and maintaining it; time that you could have spent researching. Is it worth it?
Let's circle back to the RF file reading script. The impact on me, the Master's student, was that I spent more time writing and maintaining that code. Not a whole lot of time, on the order of a handful of hours over the course of those two years, but still time that I didn't need to spend if the script was just for me.
Impact on me: loss (or investment) of ~5 hrs
However, the script enhancements enabled a few other graduate students to read the RF data for their own purposes, saving them the time to write their own script. With most of them being non-programmers, these time savings were likely significant. A conservative estimate:
Impact on colleagues: 2 researchers * ~8 hrs each (to write the "before" version of the script) = ~16 hrs saved
So even at this early stage, real lab time & effort was saved.
As for me, the RF data script helped acquire the data I needed, which I analyzed, wrote up into a thesis, and successfully defended. Before leaving to start a career in industry, I tidied up my lab workstation and archived what files I was leaving behind on a DVD (yes, it was that long ago) that I expected to never be used again.
Far from finished
A year later, a labmate finishing up her PhD asked me some questions about the script via e-mail (the documentation still wasn't perfect!). I thought it was both neat to hear it was still in use and alarming that a temporary stop-gap had not been replaced by an official manufacturer version. I answered her queries and got back to work.
Years went by. I changed jobs a few times, still keeping in touch with the researchers I had been “in the trenches” with. The same PhD student was now a post-doc researcher at another institution and again, she reached out with some RF script questions. It had been so long, that I almost had no recollection of the code, but after jogging my memory it became apparent that:
7 years later, the script was still in-use and at no less than 3 universities.
I was floored. By now, I was leading a development team at a medical device manufacturer, creating software used for intraoperative imaging. Our code was under FDA regulation and ensuring software quality was the most time-consuming part of my job. From that viewpoint, the little script was of an embarrassing standard, but there was also a sense of pride that any developer feels when their software has a real impact (because a surprising amount of software never does).
What did we learn?
This isn't a story about an amazingly high-quality piece of software, because by professional standards it was pedestrian. But it is an example:
A small, early investment in lab software quality can generate substantial returns.
The benefits, this time, were on the research side, but they could just as easily boost a commercialization effort. While there was no need to commercialize the RF data script, it would have been much faster and cost-effective to do so starting with the "clean" version that served many users, rather than the initial "quick & dirty" MATLAB
file. This head start would easily translate to thousands of dollars in savings when developing the market prototype.
Make your own luck
To be fair, there was an unfair advantage at play here. Most software in scientific research is not written by someone with industry development experience. Quality software development practices are almost never inherent, but learned through teaching and doing. For many labs, this makes a “better way” feel inaccessible, so they opt to focus on their strength: research.
At Code Clear Labs, we're here to unlock that “better way” through education and collaboration. We believe that labs that write software should make measured, early investments in their development practices and codebase quality. With professional guidance, the impact of the software they write, both in research and in practice, has a chance to far exceed everyone's expectations.