How scientists are tackling one of cancer's most elusive proteins using E. coli and computational analysis
Imagine a protein so elusive that scientists call it the "evil twin" of a crucial cellular guardian. This is BORIS (Brother of the Regulator of Imprinted Sites), a protein that has captivated researchers since its discovery due to its potential role in triggering cancer and its astonishing resistance to being produced in the laboratory.
Under normal circumstances, BORIS appears only during a brief window in sperm development, then vanishes. But in cancer cells, it makes an unwanted comeback, potentially reprogramming cells toward malignancy.
As a human protein with complex folding properties, BORIS has stubbornly resisted mass production using conventional methods. This article explores how scientists are tackling this challenge by combining a clever genetic engineering approach—creating a truncated version of BORIS—with the power of E. coli bacteria as a protein factory, supplemented by sophisticated computer analysis to understand its structure and function before it even leaves the test tube.
Understanding why this protein is so difficult to study and its connection to cancer development.
Challenges in producing complex human proteins in bacterial systems like E. coli.
Using computer simulations to study protein structure and function.
BORIS belongs to a special class of proteins called transcription factors—biological switches that turn genes on and off. What makes BORIS particularly intriguing is its striking similarity to another protein called CTCF, which plays a critical role in organizing our DNA's 3D structure and controlling gene activity.
While CTCF is present in most cells, BORIS typically appears only during sperm development, suggesting it has a specialized role in reprogramming genetic activity.
The cancer connection emerges from BORIS's abnormal reappearance in various tumor types. Scientists hypothesize that when BORIS shows up in the wrong cells at the wrong time, it may activate cancer-promoting genes that should normally remain silent. Understanding exactly how BORIS works could unlock new approaches to cancer diagnosis and treatment, but first researchers need enough of the protein to study it.
Producing human proteins in bacterial systems like E. coli presents several formidable challenges. Unlike bacterial cells, human cells have sophisticated machinery for folding complex proteins and adding necessary chemical modifications. When faced with complicated human proteins, bacteria often become overwhelmed, leading to several potential outcomes:
For BORIS, which is particularly large and complex, these problems are amplified. The solution? Create a simplified, truncated version that contains only the essential functional parts of the protein, making it more manageable for bacterial production while retaining its biological activity.
While the truncated BORIS is being produced, computational biologists can work their magic through in silico analysis—studying the protein through computer simulations rather than physical experiments. This approach includes:
These computational methods provide valuable insights that guide further experiments, creating a virtuous cycle of hypothesis and testing.
Computational analysis can predict protein structures with accuracy comparable to some experimental methods, saving months of laboratory work.
While the user's specific experiment details producing a truncated BORIS protein, recent groundbreaking research illustrates the powerful approaches now available to overcome a major obstacle in recombinant protein production: ribosome stalling.
In 2025, a Japanese research team tackled one of the most persistent problems in protein production—the tendency of ribosomes to stall during translation of certain sequences. As Associate Professor Teruyo Ojima-Kato explained, "When ribosomes are unable to continue the translation process for some reason, protein synthesis is halted" 1 . This stalling dramatically reduces protein yields, particularly for difficult-to-express proteins like transcription factors.
The research team employed a sophisticated multi-step approach:
The researchers first created a library containing all possible combinations of four-amino-acid sequences (tetrapeptides), totaling 160,000 distinct sequences 1 .
They tested these tetrapeptides for their ability to prevent ribosome stalling and enhance translation efficiency in E. coli.
Using data from approximately 250 experiments, the team trained an artificial intelligence model to predict the translation-enhancing potential of all 160,000 tetrapeptides 1 .
The AI model underwent three rounds of prediction and refinement, demonstrating increasingly accurate identification of effective translation-enhancing peptides (TEPs).
The research successfully identified several novel TEP sequences that effectively prevent ribosome stalling. Most importantly, the team demonstrated that their AI model could accurately predict translation enhancement for the entire tetrapeptide library, suggesting that AI-based predictive models could revolutionize how we design easily producible protein sequences 1 .
Distinct tetrapeptide sequences tested
Experiments used to train the AI model
| Reagent/Tool | Function | Application in BORIS Production |
|---|---|---|
| E. coli BL21(DE3) | Protein production workhorse with T7 RNA polymerase system | Optimal host for expressing truncated BORIS |
| pET Vector System | Plasmid with strong T7 promoter to drive protein expression | Carries the genetic code for truncated BORIS |
| Translation-Enhancing Peptides (TEPs) | Short sequences that prevent ribosome stalling | Could be fused to truncated BORIS to improve yields 1 |
| Molecular Chaperones | Proteins that assist proper folding of other proteins | Co-expressed to help BORIS fold correctly |
| Affinity Tags | Molecular handles for purification | Added to truncated BORIS for easier purification |
| Protease Inhibitors | Chemicals that prevent protein degradation | Added during extraction to protect BORIS from degradation |
Specialized bacterial strains optimized for protein production
Plasmids engineered for high-level protein expression
Affinity tags and chromatography methods for protein isolation
| Challenge | Solution | Mechanism | Relevance to BORIS |
|---|---|---|---|
| Host Burden | T7 RNA polymerase regulation | Reducing metabolic competition | Essential for producing large proteins 4 |
| Protein Misfolding | Chaperone co-expression | Assisted folding | Critical for complex domains |
| Inclusion Bodies | Lower temperature cultivation | Slower, more accurate folding | May improve soluble BORIS yields |
| Disulfide Bonds | Engineered strains (Origami) | Oxidizing cytoplasm | Important if BORIS has cysteine bridges |
| Codon Bias | Rare codon supplementation | Matching tRNA availability | Crucial for human genes in bacteria |
| Toxic Effects | Tight promoter control | Preventing leaky expression | Essential if BORIS inhibits growth |
Combining optimization strategies can increase protein yields by over 300% compared to baseline expression.
While the physical production of truncated BORIS in E. coli provides the essential raw materials for study, computational analysis offers a complementary approach to understand the protein without traditional lab experiments.
Using the amino acid sequence of truncated BORIS, researchers can employ homology modeling techniques to predict its three-dimensional structure. This involves comparing the BORIS sequence to proteins with known structures and building a model based on these templates.
The predicted structure can then be validated through molecular dynamics simulations, which test the stability of the model in virtual solution 6 .
For proteins like BORIS, computational methods can identify critical residues that may be essential for function. One innovative approach applies network analysis to protein structures, treating each amino acid as a node in a network.
Residues that appear most frequently in the shortest paths connecting different protein regions—those with high "dynamic connectivity"—are often critical for protein function 9 . This method successfully identifies functionally important residues based solely on protein structure, which is particularly valuable when studying proteins with limited experimental data.
Perhaps most excitingly, researchers can use computational docking to predict how BORIS might interact with DNA or other protein partners. The AutoDock suite, for instance, allows scientists to virtually screen thousands of potential binding partners, providing hypotheses about BORIS function that can then be tested experimentally 8 .
This approach is especially valuable for generating research leads when studying a protein with as many potential interactions as BORIS.
The production of truncated human BORIS in E. coli represents more than just a technical achievement—it exemplifies the power of interdisciplinary approaches to solve biological puzzles. By combining genetic engineering, microbiology, and computational biology, researchers are steadily unraveling the mysteries of this intriguing protein.
What makes this work particularly compelling is its potential trajectory. The same AI-assisted production methods that enable BORIS production 1 could be applied to countless other challenging proteins, accelerating research across biomedical science. The computational tools that predict BORIS structure and function 9 are becoming increasingly sophisticated, allowing researchers to extract maximum knowledge from limited experimental material.
Each truncated protein produced in E. coli and each computational simulation brings us one step closer to these answers, demonstrating that even the most elusive biological targets eventually yield to persistent, creative scientific investigation.
The journey to understand BORIS continues—from the bacterial factory floor to the computer processor—proving that in modern science, the test tube and the microprocessor are equally essential tools for discovery.