Science - OmicsEdge

Revolutionizing the genomics industry with better models and predictions.

Our soon-to-be patent-protected bioinformatic technologies allow us to process and analyze genomic and phenotypic data at an unprecedented scale with industry-leading accuracy.

Research from Omics Edge

SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration

A comparative analysis of current phasing and imputation software

EZTraits: A programmable tool to evaluate multi-site deterministic traits

Polygenic Risk Scores validation whitepaper

Our Core Technologies

Local Ancestry Determination

Local ancestry inference (LAI) is an indispensable part of any genomics research project. A large proportion of individuals exhibit some degree of admixture and genetic diversity at the sub-continental level is increasingly relevant from a medical point of view, even at the finest scales. Understanding the criticality of this level of granularity, we have developed Orchestra, our ancestral deconvolution tool.

As seen in Figure 1, Orchestra markedly outperforms other leading ancestry methods in both non-admixed (generation 0) and admixed samples (generations 1-6). It has ~15% better overall recall and ~14% better overall precision than the next-best model. It also retains high accuracy across all tested populations, with a remarkable ability to distinguish between closely related ancestries. Orchestra achieves an accuracy of over 75% for 100% of populations within the 1KGP dataset and for 75% of populations within the custom data panel. The other models struggle with about a third of the populations, where their accuracy falls below 50%.

In Figure 2, you can see that Orchestra not only outperformed the other state-of-the-art models in highly mixed samples such as Latinos but was also able to collect the genetic history of Latin Americans with greater precision than comparable algorithms..

**Fig.1 | Orchestra vs. state-of-the-art LAI methods. (a)** Percent recall and precision for ancestry deconvolution by FLARE (navy), Gnomix (light blue), RFmix (green), and Orchestra (red) across 6 generations of admixture. Each generation is represented with a star shape. The number of points on the star corresponds to the number of generations (0 - 6). Orchestra outperforms other methods in the 1000 Genomes Project (1KGP) dataset with 16 populations (left) and our larger custom data panel with 35 populations (right). Accuracy (%) per population for the 16 populations in the 1KGP dataset **(b)** and the 35 populations in the larger custom dataset **(c)**. Populations are ordered by mean accuracy across all methods (cross). The overall accuracy for each reference panel is shown on the right.

**Fig.2 | Orchestra’s performance on Latino American individuals. (a)** Percent recall and precision for ancestry deconvolution by FLARE (navy), Gnomix (light blue), RFmix (green), and Orchestra (red) on Latino simulations (equivalent to 12 generations of admixture; we adjusted the simulations to mimic the actual genetic composition of different regions within the continent: CA = Central America, Cb = Caribbean, SA = South America). **(b)** Ancestral composition of 1KGP admixed American populations and UKBB participants that were born in the Americas revealed by Orchestra. Proportions of Native American (yellow), Southern European (green), Northern European (blue) and African (red) ancestries fit with historical immigration events. **(c)** Orchestra was also able to trace ancestries that other alternative methods failed to detect, such as those stemming from the establishment of large communities of Ashkenazi Jewish and Japanese ancestry in Argentina and Brazil, respectively; or due to colonial processes such as the Atlantic slave trade (South African ancestry in Brazil) or the indenture system (Indian ancestry in the Guianas).

Genetic Imputation

Genotype imputation enables researchers and health tech applications to leverage the power of whole genome sequencing at the cost of genotyping. Imputation uses array variants as a scaffold to infer missing variants from a relevant reference panel.

However, the accuracy of imputation is crucial for all downstream applications. We developed Selphi, our genotype imputation tool to give genotype researchers the highest level of imputation accuracy.

As seen in Figure 3, Selfie outperforms leading imputation methods across all allele frequencies, performing exceedingly well for rare variants, where all other models struggle. This is of importance because rare variants are more likely to be of medical significance. Selphi also performs better in all tested super populations: Africans, East and South Asians, and Europeans.

**Fig.3 | Selphi vs. state-of-the-art imputation methods.** The difference in imputation errors across chromosomes 1-22 between Beagle 5.4 (blue), Impute 5 (magenta), Minimac 4 (yellow), and Selphi (green) binned by minor allele frequency (MAF) **(a)** and in different super populations **(b)**. This difference is shown as the ratio (fold change) or the deviation from the average number of errors across all four methods. Target and reference samples were obtained from the 1KGP 30x reference panel. Selphi outperforms the other methods for all MAF bins and all four 1KGP super populations; being particularly better at imputing rare variants (MAF < 1%)

Polygenic Risk Scores

Most common health conditions are highly polygenic. The science of Polygenic Risk Scoring (PRS) allows us to know an individual’s genetic predispositions for a myriad of common health conditions, including the top causes of death and health care expenses, such as cardiovascular disease or diabetes.

This predisposition data enables clinicians to focus on the areas of highest risk and to stratify patient populations.

Our Polygenic Risk Score (PRS) model is currently unnamed, but Figure 4 shows the results from our latest benchmarking against leading PRS methods. Additionally, these results are prior to any corrections or adjusting for covariates such as principal components for population structure, gender, age, etc., which will significantly increase statistical power and final prediction accuracy.

**Fig.4 | Our model vs. state-of-the-art PRS methods.** Performance is evaluated with the Area Under the ROC Curve (AUC) between LDpred2 (navy), PRS-continuous shrinkage (orange), max clumping+thresholding (red), stacked CT (blue), and our method (green) in coronary artery disease.

Companion Diagnostics Pipeline

As a proof of concept, we built a genomic companion diagnostic pipeline based on matching people's genetic data to lifestyle, diet, and vitamin and mineral recommendations. We followed and measured users’ biomarker data before and after using the platform. We show the results related to diabetes prevention in Figure 5.

**Fig.5 | Health improvement in SelfDecode subscribers based on biomarker data.** Improvements in fasting glucose levels (mg/dL) **(a)** and in HbA1c levels (%) **(b)** between the first (gray) and most recent (blue) record. Both users with initial values in the range of diabetes (red zone) and prediabetes (yellow zone) experienced improvements on average. Those with higher levels tended to improve more (-28 mg/dL vs. -3.7 mg/dL and -0.5 vs. -0.1 % for fasting glucose and HbA1c levels, respectively).

Genomic Compression

Our proprietary genomic compression algorithm can achieve a better than 100:1 compression ratio on genomic data and is uniquely designed and developed for large biobank scale data.

In particular, because the algorithm is adaptive, the final compressed storage does not scale linearly with amount of decompressed storage and achieves better compression ratios with increasingly large datasets. Additionally, it allows for the ability to analyze specific parts of the data without decompressing all of the data since the method compresses data in blocks, the size of which can be optimized.

Meet the team Behind
Omics Edge

Puya Yazdi, MD

CEO

Manfred Grabherr, PhD

VP of Bioinformatics and AI

Biljana Novkovic, PhD

Senior Precision Medicine Scientist and Director of Research & Development

Laurent Dufloux, PhD

Big Data Engineer

The Omics Edge team is a highly qualified team of software engineers, data scientists, MDs and geneticists who came together to create a platform that will power the future of precision health.

Revolutionizing the genomics industry with better models and predictions.

Research from Omics Edge

SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration

A comparative analysis of current phasing and imputation software

EZTraits: A programmable tool to evaluate multi-site deterministic traits

Polygenic Risk Scores validation whitepaper

Our Core Technologies

Local Ancestry Determination

Local Ancestry Determination

Genetic Imputation

Genetic Imputation

Polygenic Risk Scores

Polygenic Risk Scores

Companion Diagnostics Pipeline

Companion Diagnostics Pipeline

Genomic Compression

Genomic Compression