Verifiable science on modified PCR machine

53 points by kotaKat 4 months ago

progbits 4 months ago

Really nice project.

I have a few questions if the authors or anyone knowledgeable is around.

From the listed features:

  Cryptographic signing of all test results
  Tamper-evident resin seals on all connections and access points
  Any attempts to open or modify the machine result in visible damage to security seals

This sounds like tampering won't break the signing, but only leaves evidence. How would this be enforced? Is the idea that a third party would regularly inspect the machine, and if evidence of tampering is found any results signed since last inspection are not to be trusted?

> The system is designed for use in supervised laboratory environments where sample chain of custody is maintained. While the machine can't prevent sample swapping before testing, it ensures that once a sample is tested, the results cannot be manipulated.

Two questions here:

- Would an approach where you have to commit to a sample label before testing help? Before running the machine you say "ok this is sample of experiment X on patient Y", this gets written to a third party transparency log, and only then the machine will produce a result and sign it together with a reference to the log. Later you can't hide such results, or run the samples again.

- Maybe the "supervised laboratory" answers my first question of inspecting the machines for tampering, if we assume the laboratory itself is to be trusted, and that only researchers might falsify results. Is this reasonable assumption? Wouldn't the laboratory or the institution be also incentivized to at least overlook cheating?

mbreese 4 months ago

I’m a bit curious about what the purpose of this is… PCR machines in and of themselves don’t have an output. However, this is about a specific real-time PCR machine, which does have a digital output. It is a simple mistake, but is an important distinction. And it makes me wonder about how much experience the authors have in a lab setting. It’s important to know how these systems are used in real life to know what is really practical.

This looks like a very interesting project in exploring how much work would really go into locking down a specific instrument. Spoiler — This is a lot of work! But, now knowing how much effort there is here, I don’t see how practical it would be.

I’m not convinced that locking down each specific piece of equipment is really the answer to the issue (which I agree is a problem). There are too many different types of equipment to really make a dent in locking down a lab’s workflow. What might work though is a resource for signing digital assets/output files to prove the raw data is unaltered. You couldn’t stop someone from uploading an altered digital file, but you’d have a digital pop trail for when a file was created. That coupled with a lab notebook, and you’d have a pretty good audit trail.

I’ve done a bit of work in this space a long time ago. The hardest part was always getting people to actually use the system. Now that journals are starting to require copies of all raw results used in figures, this type of system would make more sense and could make a real impact.

codelion 4 months ago

that's a great point about the digital audit trail... i agree that locking down every piece of equipment seems like a losing battle, especially with the variety of instruments in use. a system for signing output files would be a much more scalable approach.
it's interesting that journals are starting to require raw data... that shift could definitely drive adoption of these kinds of systems. btw, i've seen some projects focusing on verifiable data provenance, might be relevant here.

tptacek 4 months ago

I do not understand what practical problem this solves. Biology researchers aren't reverse engineering the binary outputs of qPCR machines. Regardless of what the machine itself does, malicious researchers control the inputs. It feels a little bit like "we have a signature system, let's stick it somewhere".

It would be easier to take this seriously if the work was accompanied by any kind of documented threat model.

I also don't believe this is even ostensibly addressing an important problem. There's a cognitive availability bias problem in discussions of research misconduct. Incidents are newsworthy, but more people in these discussions don't have a sense the scale of the field --- which is enormous. People are talking about a numerator without knowing the denominator. A simpler way to address this problem: ignore it, unless you work in the field.

timewizard 4 months ago

> don't have a sense the scale of the field --- which is enormous
And has a massive impact on capital flows.
> A simpler way to address this problem: ignore it, unless you work in the field.
Then there's no reason to continue funding it or attempting to reason about it's output. Unless the field has a way to fund itself?
cge 4 months ago

>Biology researchers aren't reverse engineering the binary outputs of qPCR machines
While this is beside your point, outputs from qPCR machines are often reasonably interpretable. I've seen one model that's XML, and one that's JSON, both as files in an overall data file that's just a ZIP, and I have seen intermediate files that are just plain text. In both cases, these are processed on the machine from raw images, which are a standard TIFFs, though they are usually not stored.
I am similarly left confused about the motivation of this project, however. If I recall correctly, commercial offerings of cryptographic signatures for qPCR machines already exist, running on the machines. Also, while they immediately dismiss the idea of modifying the machine, or even using different software, those are both important considerations. Many modern qPCR machines have computers running full operating systems in them (the QuantStudio machines run Android, for example, and have their core control running internally in Python); these can extremely insecure. All the work they have done would do nothing to protect against a modified machine, and contrary to what they suggest, I have found that modifying machines, and writing interface software as an alternative to the manufacturer's, can be quite easy. qPCR machines are not actually very complicated instruments, especially from a firmware perspective. In putting tamper-evident seals on the machines they perhaps don't realize how open lab equipment can be: I've seen almost intentional unauthenticated privileged remote code execution, for example.
And, as you note: it's very unclear what type of behavior this is even trying to defend against. Research malpractice is both rare, and can take place at very many points in the process. This seems like it is defending against a particularly unlikely point. I think fraud would more likely take place either in setup, or in data processing.
And finally: this locks down machines to be only usable in very specific ways. That is reasonable in diagnostic and clinical settings. But research is often not about doing the exact same thing that has already been done. It often involves new uses for equipment, and new experimental methods. Most of my use of qPCR machines, for examplee, would be impossible in their setup, as I'm using them in ways the manufacturer's software doesn't support, and the data coming out has nothing to do with qPCR, and so can't be processed in a normal way. At some level, research equipment cannot be heavily constrained, as that is contrary to the very point of research.
(For context: I maintain an open source package to control QuantStudio machines, primarily intended for using them for non-qPCR purposes.)

enopod_ 4 months ago

As much as i like the idea--but this project will not prevent any fraud. It begins with what you load into the PCR thermocycler. Samples, controls, primers are all things that can easily be manipulated and not be traced back. Everyone with a little knowledge about PCR can make a PCR look exactly the way they want it to look. That's why replication is the gold standard in science. Will other scientists come to the same results when they perform a published experiment in their own lab?

cge 4 months ago

>That's why replication is the gold standard in science. Will other scientists come to the same results when they perform a published experiment in their own lab?
I feel like it's worth a reminder to those reading this that replication isn't just about preventing fraud. I've personally seen far fewer examples of fraud than I have cases where difficulty reproducing results led to the discovery of considerations affecting the experiment that happened to present in the original experiment, but that the original researchers might not have thought of, or thought significant.
genewitch 4 months ago

> Will other scientists come to the same results when they perform a published experiment in their own lab?
Betteridge says "no" - https://www.nature.com/articles/533452a for instance...

perching_aix 4 months ago

Rather than verifying the science the machine performs, this project seems to be about verifying the machine's integrity instead. I worry that without heavy disclaimers, this may mislead people about the character of the improvement this solution provides, which is dangerous in its own right. From what I understand, verifiable imaging as a whole suffers from this too.

JR1427 4 months ago

I'm not sure how much value this has.

Ultimately a human still labels the samples, and it is this labelling that is the most likely point of fraud. I.e. I label this sample as X, but it is really Y.

This is how the slightly smarter people commit fraud of their Western blots etc. (The even more stupid fraudsters do crude cut-and-paste photoshop jobs).

biophysboy 4 months ago

I applaud this for the sheer heroism involved in hacking one of these damn things. For those who've never worked in a wet lab, imagine doing science with 12 different printers.

bzmrgonz 4 months ago

Can you share details on the host, is it a hypervisor or just a nix with a kvm vm ? did you go with incus maybe? What about maybe an automotive embedded micro hypervisor?

bobbiechen 4 months ago

Considering the topic, I thought the PCR might be Platform Configuration Registers (it's actually polymerase chain reaction, for scientific lab machines).

caycep 4 months ago

vs...trying to replicate the results?

you could also sequence the output and see if you get what it is saying you get

therein 4 months ago

Good idea. Interests are too aligned in one direction to falsify results.

iandanforth 4 months ago

Summon Elisabeth Bik! (I want to know what she thinks)

whitten 4 months ago

PCR is Polymerase Chain Reaction