How to run a Medical Device Project using Agile principles.

This discussion is based on a project I ran for a medical device company for the GUI software component. It assumes that you have a basic understanding of the FDA documents.

The device was a Class 2 portable Dialysis pump. The GUI, because of various alarms and other aspects was also classified as Class 2. The device is a long term, very stable, and extremely easy to use for patients to use in their home.

Project Goals

The basic GUI was likely not going to change very much in the long term. Therefore, the project goals were:

  • from a project management point of view, "The Principle of Least Surprise"
  • use agile principles to ensure:
    • continual, up-to-date documentation
    • continual, up-to-date verification testing
    • continual, up-to-date releases for user and system testing as needed
  • simple, easy to test GUI
  • handle all possible navigation paths through the GUI
  • automate verification testing as must as possible
  • use the automated testing facility for other component verification testing, when possible
  • ease the generation of all FDA required documents, when possible

FDA Document Overview

There were approximately 150 documents that I wrote for this project. These ranged from the Software Development Plan (SDP), Configuration Management, and Intended Use Verification (IUV) documents.

However, the primary documents were used throughout the project were these four:

  • SRS - Software Requirements Specification - contains the requirements for this component
  • SDD - Software Design Description - contains the design for this component
  • SVP - Software Verification Protocol - contains the verification protocol for this component
  • SVR - Software Verification Report - contains the verification report for this component

The first Sprint

Given that full automated testing was a primary goal, the first Sprint was dedicated to create a basic framework (in Ruby) that could collect all the expected information about the GUI.

That information was:

  • the top-level frame; each frame had a unique id across all of them throughout the GUI
  • all child widgets on that frame, plus any sub-frames and their children
  • each widget had the following information
    • the widget's unique id; each widget within a frame or sub-frame id
    • the xy coordinates of the widget's encompassing box
    • the widget type; e.g. "button" or "label"
    • flag enabled or disabled
    • current text; note this was unicode since the device was intended for multiple languages
    • eventually we found that each child needed to declare its parent since the GUI could have nested widgets that beyond sub-frames

And to test that framework, a mock first screen was set up with a button and a label on a frame. The label had default text in it, and when the button was pressed the text would change.

The first script was to:

  • connect
  • get the frame information and confirm that it was on the expected frame
  • get the current default text and verify that it was the expected default text
  • press the button
  • get the updated default text and verify that it was the expected updated text
  • disconnect (cleanly)

This setup would get us the basic behaviors we would need for the vast majority of the verification test framework in place.

The script and GUI also had to be robust:

  • recover from failed connects and disconnects
  • recover from exceptions in the GUI by both the GUI and the automation script
  • recover from exceptions (and ctrl-C) in the automation script by both the GUI and the script job
  • handle multiple sequential connections
  • disallow multiple simultaneous connections

To ensure that all verification testing could be run quickly, the GUI was cross-compiled for either the actual hardware or for Ubuntu running on a test server. The CI/CD servers ran Jenkins and the first verification job was set up to run this test.

It was clear that other facilities were needed:

  • we were going to have many parallel jobs running verification scripts so the results needed a common area to save this information.
  • the ability to run a final report script once all the other jobs had completed
  • the ability to clear out the results area to run a "fresh" run of all scripts

CI/CD servers

After a few months it became clear that adding additional CI/CD servers would speed up the total running time. At one point it took nearly 10 hours to run the entire suite. Adding 10 more servers dropped that time to 6 hours. Rearranging the order of the scripts running on the individual servers, dropped that time to 3 hours.

Creating the CI/CD servers was difficult. Ensuring they were identical was complicated at the time (since then Docker makes this much easier). To address this, I wrote a bash script and a ruby script to take a base level Ubuntu installation and add all the packages needed to run any automation script.

Automation Framework

The automation framework was very simple to start with. It's purpose was simple: gather data.

  • establish a connection to the GUI
  • perform the functions as needed on the GUI
  • gather the data from the GUI
  • provide a set of verify_xx() functions to gather the pass/fail results.
  • note the script would continue even if a verify_xx() function failed. It quickly became clear that in some cases the script should end on a failure. These were named confirm_xx() functions.
  • close the connection to the GUI

The report would gather all the run-time data and generate a PDF. The intent of this PDF was to embed it into a section in an MSWord document that contained the boilerplate content. The PDF had a typical layout for a Test Protocol document. Each protocol was a table, and each row in the table as:

  • step e.g. 1, 2, etc
  • description - what was the intent of this step?
  • PASS/FAIL indication
  • expected value
  • actual value
  • additional info, e.g. date-time run, script file name and line number of the step

It was possible for a step to have multiple verify_xx() calls in it. A step passed if all verify_xx and confirm_xx() passed, otherwise failed.

Running against a real machine

When the project was started there were no real machines to run the GUI on. In the meantime, we created a HAL layer in the GUI and wrote a pump simulator to run our tests. The communication layer was not simulated except for the lowest function that used UART or a socket to simulate the UART.

Side benefit: the simulator allowed us to test additional scenarios that we could not test on a real machine.

When a real machine was available, approximately 6 months into the project, we started testing our scripts against that machine. It took only a day or two to use the UART to communicate with it and a few more days for us and/or the pump developers to fix their code as needed.

Happy Path script

It became clear that a more efficient way to test the real machine was to write an end-to-end "happy path" script. It would perform a full treatment on the machine and therefore test the most important pump behaviors and the GUI's interactions with the pump component. There were some conversations about using it as a system verification test but the final decision was to use it as a "dry run", informal facility instead.

The happy path script was long and so to ease the development effort, it was written in simulated mode first. All the bugs/problems were fixed and when it was clean we ran it against the real machine. There were manual interventions required e.g. add water to the reservoir, mount a cartridge, connect/disconnect tubing, etc. The verification framework was extended to interact with the user/tester running it.

Side benefits

Side benefit: dialysis treatments can be 4 hours long, so to allow the scripts to operate quicker, we came up with a "time warp" facility. All timing in the GUI was passed through one class. That class modified the real time wait or elapsed time calculation by a "time warp factor". When that factor was 1.0 then 1 second in GUI time was 1 second in real time. When it was 60.0 then 1 minute in GUI time was 1 second in real time. And conversely, when it was 1/60.0 then 1 second of GUI time was 1 minute in real time. The time warp factor could be changed via the verification framework during a script's run. Using that we could speed up or slow down the treatment, timeouts, etc. to make effective testing possible and much easier. There were limits to this when running on a real machine since the pump firmware had built-in timing limits.

Side benefit: when a failure occurred on a real machine, it was difficult to debug the failure. We added the ability to grab a screenshot from the verification framework.

Side benefit: when the screenshot facility was added we did some brainstorming and came up with the idea of remote monitoring of the real machine. We created a PC app that grabbed the current state of the GUI every second or two and displayed that screenshot in the app.

And that led us to extend it to the next step. We started monitoring the mouse clicks on the PC app. When a click occurred, we calculated the coordinates on the real GUI screen for that click. If it was on a widget (and we knew the current screen content) then we would send a click to the real GUI at that xy coordinate. Using that we could run a treatment remotely.

Sprint allocation and progress

After the first sprint, we split up the work to be done into 4 phases at a time.

Phase 1

During phase 1, we gathered information from the clinicians, the pump team, and other stakeholders asking what they would like us to work on next. Then the GUI developers and I did a design session ( or two) laying out any major changes or architectural gaps that we had. If there were any questions on how to proceed, we would do a spike to check.

Phase 2

During phase 2, I used the design results to update the SDD and the SRS.

Phase 2 - SDD

The SDD had IDs for each of its sections. These IDs were used in the block comments in all the GUI code functions. That allowed us to trace from the SDD to the actual code.

The SDD document was a MSWord docx, but I was able to extract out the SDD design as a text file. We used that text file to perform a PR (Peer Review) of the SDD changes I had made. The developers reviewed them and if there were any gaps or changes, I'd update the docx and text file and re-review in the PR.

Note the SDD design was written in simple pseudocode. Since the PR required text, no images were used. The intent of the SDD document is to clarify that the chosen design prevents certain component failures from occurring and allows reliable behavior and operation based on the architecture.

We also wanted to separate the design from the implementation. The choices made for a new class or a new addition to the existing classes were based on the understanding and knowledge of the developers. They needed to comply with the expected design, but needed some leeway in choosing specific methods of implementing that.

Based on all that, the pseudocode was focused on the application level behavior and only needed to get into the actual implementation in a couple of areas. Those areas were clarified how the expected behavior for key, system level behaviors. For example, there were only two exceptions in the GUI. One exception was recoverable, and the other was fatal. That was not an implementation choice, it was a choice made at the design-level and therefore had to be specified in the SDD.

Once I had an approved PR, I would create tickets in our defect tracking system based on the SDD. This meant that the GUI developers were working on tickets against the design document, not the SRS. This is key. The design session and the SDD updates were for implementing the design and the developers were in charge of that.

Phase 2 - SRS

The SRS is a document for the verification team that clarifies what to test in that design. And so when the SDD PR was approved, I went through the design changes and captured the expected behaviors from them in a new set of SRS requirements.

The SRS was a MSWord docx, and I extracted out the text file as I did for the SDD. We used that text file to perform a PR on the SRS changes I made. Both the development team and the verification test team reviewed them and if there were any gaps or changes, I'd immediately update the docx and text file and re-review in the PR. At the end of this, the SDD matched the SRS document and both complied to all our understanding and expectations.

Note the SRS had requirement IDs for use in the verification Test Protocol to associate a verify_xx() with the requirement that it was verifying.

Again, once I had an approved PR, I would create tickets in our defect tracking system based on the SRS. These tickets named the SRS ID and also the related SDD IDs. That way the verification team could trace back to SDD tickets if necessary.

Usually Phase 1 and Phase 2 were done in the current sprint.

Phase 3

During phase 3, the Dev team implemented the development tickets. This Phase was allocated during Sprint planning and was based on the current state of the system.

Any bugs were the highest priority. The general principle was "add clean code on clean code". If a bug was left in "till later" then it is possible for new code to become dependent on the existence of the bug. When that bug was fixed then there would be side-bugs which would occur because of that dependence. Cleaning up a bug was therefore the highest priority.

All code had a unit test before it was considered complete. All unit tests had an ID and that ID was placed in the GUI function block comment. That allowed us to trace from the code to the Unit Test and back again.

The unit tests also had the side effect of keeping the class architecture very clean. It is much easier to write a unit test for a simple well laid out class than it is for a complex class. As a double check we ran an automated UML parser and the class structure in the entire GUI was very simple.

New implementation code was allocated in a vertical slice through the GUI. We did NOT do work in a layer by layer fashion. We implemented all new code required in the layers as needed. This ensured someone writing a lower layer did not have to guess on how it was going to be used. The use case was immediate i.e. the lower layer code was written and used at the same time. Any incorrect assumptions were immediately detected and fixed. Also, those assumptions were tested via Unit Test and thereby captured in visible code.

Periodically gaps in the SDD were found during implementation. I would update the SDD and do a PR immediately. I would also double-check the SRS for any changes required there.

Phase 4

During phase 4 of 4, the Verification team implemented test scripts for the code just completely. This Phase again was allocated during Sprint planning. A ticket was only considered for script implementation if:

  • the SRS and SDD were complete and reviewed for it
  • the code was implemented
  • the unit tests were passing

As the scripts were written, there were times when there were gaps found in the SRS requirements. I would update the SRS and do a PR immediately. It would also double-check the SDD for any changes required there.

Any failing scripts caused a bug to be created and the bug would be fixed as soon as possible. In general, there were very bugs that occurred. This makes sense since the code was clean. Why? Because the matching verification scripts and unit tests were run continually, at least once daily. We also had two automated static analysis tests as well. And of course, all GUI code was code reviewed. Once we started running with a real machine, then we also had another set of people running the GUI code as well from the system level perspective.

In other words, each of these were testing the GUI code from different perspectives. A bug can slip through all of those sieves, but it is rare.

Another key takeaway, once a verification script was written and passing, then:

  • the SRS and SDD documents match each other
  • the test protocol document matches the SRS and the SDD
  • the code matches the SRS, SDD and the test protocol document


Writing the automation scripts became the critical path. As the code became more complex and the SRS more complex, writing a script to test a specific expected behavior became more time-consuming.

Since it was important for the verification team to be testing the latest and greatest code, the developers would periodically help that team to catch up for a sprint.


All the docs were more or less ready at all times. After a few months, I started to submit those docs to the DHF as formal updates. In general, there were very few changes from other stakeholders reviewing them. In general, they understood our process and saw the clarity of the SRS requirements and the SDD contents. There were some recommended changes to boilerplate content but in general even those were minor.

Once the project had progressed far enough for the potential for a 510K submission, all the docs, again, were more or less ready. At the very end of the project, there was a final run of the test protocol scripts to get the Verification Report and then final formal reviews and signatures that needed to be done.

We had been continually running the Verification Report every day. We knew how long it to run the entire verification suite (3 hours). The final run had no surprises.

Ad hoc trace reports

To ensure the documentation was for the code base was correct, I wrote a script to parse it and report missing comment blocks and expected values inside them. Once that was done, I had that script generate HTML to show traces from:

  • SRS requirements to one or more SDD sections,
  • SDD IDs to source code
  • source code to UTs

These traces were informal and ad hoc only but proved valuable in showing gaps in the requirements and SDD design documents. For example, the code function tracing to a specific SDD ID, should clearly show the design in that section of the SDD being fulled implemented as described. If it didn't then:

  • there was a typo in the SDD ID
  • the SDD pseudocode was missing some design specification
  • unnecessary code

Having the source code to UT trace was also valuable. Seeing the source code on the left and the UT that tested that code on the right, clearly showed any gaps in the unit test.

Using the same technique, I was also able to show the tracing from SRS requirements to source code. This was valuable too to gaps in the requirements or code.

- John Arrizza