AI-designed synthetic plant DNA passes its first cellular test


May 28, 2026

PlantGFM created synthetic plant DNA that tobacco cells copied into RNA, with some sequences producing detectable proteins in early validation tests.

(Nanowerk Spotlight) Living cells do not treat every piece of DNA as an instruction. Most DNA placed inside a cell is ignored, damaged, silenced, or copied in ways that lead nowhere. To act like a gene, a stretch of DNA must pass several tests. The cell has to recognize it, make an RNA copy, process that copy, and sometimes use it to build a protein. Scientists have been changing plant inheritance since the mid-19th century, when Gregor Mendel crossed pea plants and showed how traits pass from one generation to the next. Modern plant breeding still uses that basic idea: cross two plants so their offspring inherit a mix of traits. Researchers can also expose seeds or cells to chemicals or radiation to create random DNA changes. Newer methods can make more precise changes. CRISPR, for example, lets researchers cut DNA at a chosen spot and alter a specific part of an existing gene. These approaches differ in precision, but they share one feature: they usually start with DNA that a plant cell already knows how to use. Creating DNA from scratch is a different challenge. It means making a stretch of DNA that no plant has used before, placing it inside a plant cell, and seeing whether the cell can read it as a gene-like instruction. A study published in Advanced Science (“PlantGFM: A Genomic Foundation Model for Discovery and Creation of Plant Genes”) reports an artificial intelligence model that took on that problem in plants. The model, called PlantGFM, generated new DNA sequences that resembled plant genes. When the researchers tested seven of them in tobacco leaf cells, all seven were copied into RNA, and two produced detectable proteins. That result does not prove the new sequences have useful functions. It shows something earlier and more basic: some AI-designed DNA can pass the first biological tests that separate genetic text from cellular instruction. The reason this is difficult is that a plant gene is not a single uninterrupted command. Many plant genes contain useful coding pieces separated by introns, which cells remove after making an RNA copy. Other regions around the coding sequence do not become protein, but they can help determine whether the RNA is processed, stabilized, and used. A sequence can resemble a gene in software and still lack the cues a cell needs. PlantGFM was built to read longer stretches of plant DNA than many earlier plant genome models. It can process sequences up to 64 kb, enough to include a full gene plus surrounding regions that may affect how the cell recognizes and uses it. The model keeps single-base resolution across that span, so it can preserve both context and detail. text The left image shows GFP fluorescence in tobacco leaf cells after expression of AI-generated plant DNA, with two candidates producing visible protein signals. The right image confirms those protein products by western blot, where bands appear at the expected sizes for the same two candidates. (Image: Adapted from DOI:10.1002/advs.75772, CC BY) (click on image to enlarge) The researchers trained PlantGFM on 10.84 billion nucleotides from 12 plant species, then tested whether it could find genes in genomes where the answers were already known. Strong performance across species and cultivars would suggest that the model had learned recurring plant gene structure rather than only short patterns in DNA letters. PlantGFM performed close to specialized annotation tools. That reading test set up the generation experiment. In gene prediction, the model marks likely genes in existing DNA. In generation, it must produce a new sequence with enough architecture of its own: boundaries, coding regions, introns, and surrounding signals that may matter once the DNA enters a cell. The researchers retrained PlantGFM on 355,190 natural plant genes shorter than 4 kb and asked it to generate candidates. The model was not asked to design a disease-resistance gene, a drought-tolerance gene, or any other useful plant trait. The first test was whether PlantGFM could produce artificial DNA with enough recognizable plant-gene structure to justify synthesis and cellular testing. Useful function would have to come later. Before synthesis, the researchers needed to rule out two obvious failure modes. The generated sequences might be random-looking DNA that only appeared plausible to the model, or they might be close copies of natural genes in the training data. Existing gene-prediction tools classified many PlantGFM sequences as plausible genes, while random sequences rarely passed the same checks. Similarity searches found almost no close matches to natural genes under the authors’ criteria. Those tests did not prove biological function, but they made the candidates worth taking into cells. The researchers then applied biological filters, favoring sequences with strong predicted gene structure and untranslated regions in ranges seen in natural genes. That filtering reduced thousands of generated sequences to 30 candidates. From those, the researchers selected seven and placed them into cells of Nicotiana benthamiana, a tobacco relative widely used in plant expression experiments. The study now moved from asking whether the sequences resembled genes to asking whether plant cells would handle them as gene-like material. The first test was transcription, the step in which a cell copies DNA into RNA. All seven candidates passed. RNA sequencing and RT-qPCR showed expression above the empty-vector control, meaning the inserted DNA did not simply sit unused. Without transcription, a designed sequence never reaches the cell’s route for making gene products. The next evidence came from RNA processing. One candidate produced multiple RNA forms, consistent with the cell recognizing splice sites and processing the transcript in more than one way. Splicing is more demanding than copying DNA into RNA. The cell has to identify internal signals and remove specific sections. Protein production was a stricter test. The researchers attached GFP tags so translated products would be easier to detect by fluorescence and immunoblotting. Two of the seven candidates produced detectable proteins at the expected sizes. The other five did not. This mixed result is the paper’s most important boundary: transcription worked across the tested set, but stable detectable protein production appeared only in some cases. The two protein-producing candidates also changed gene activity in the host cells. Hundreds of tobacco genes shifted expression, with 162 moving in the same direction in both lines. The paper treats that result cautiously. Some affected genes were linked to heat-shock responses, so the changes may reflect stress from overexpression rather than a specific role for the artificial proteins. The same caution applies to the model’s directed test with NLR genes, a family that encodes plant immune receptors. After retraining on natural NLR sequences, PlantGFM generated candidates with recognizable immune-receptor domains and predicted structures resembling natural NLR proteins. That suggests the model can reproduce some features of a gene family. The study did not show that those candidates recognize pathogens or activate plant immunity. PlantGFM fits into a broader push to make plant biology more programmable, alongside earlier work on delivering genetic material into plant cells. But this paper addresses a different part of the problem. Instead of asking how to deliver known genetic material into plants, it asks whether AI can generate new DNA that plant cells begin to interpret. That question also connects to wider efforts in deep learning for synthetic biology and AI genetic circuit design. In each case, the important test is not whether a model can produce a plausible design. It is whether living systems respond in a measurable and interpretable way. PlantGFM does not close the loop from sequence to useful trait. It does not show that the artificial sequences improve growth, disease resistance, stress tolerance, or any other plant property. Its contribution comes earlier, at the point where computer-designed DNA first meets the cell’s machinery. The study shows that a plant-focused genome model can generate artificial DNA that plant cells copy into RNA, process in at least one case, and translate in some cases into detectable protein. That makes the work important without making it larger than the evidence allows. The next stage of AI gene design will not be judged by how gene-like a sequence looks on a screen. It will be judged by whether researchers can predict what a plant cell will do with that sequence, and whether that cellular response can be turned into a useful biological function. PlantGFM shows that AI-written plant DNA can pass the first tests of being read. The harder test is designing DNA whose purpose is known before the cell reads it.


Michael Berger
By
– Michael is author of four books by the Royal Society of Chemistry:
Nano-Society: Pushing the Boundaries of Technology (2009),
Nanotechnology: The Future is Tiny (2016),
Nanoengineering: The Skills and Tools Making Technology Invisible (2019), and
Waste not! How Nanotechnologies Can Increase Efficiencies Throughout Society (2025)
Copyright ©




Nanowerk LLC

For authors and communications departmentsclick to open

Lay summary


Prefilled posts