Life sciences - case study - solution for exploration of genomics signals - Sigma IT
Skip to content
MENU
LANGUAGE

Exploration Of Genomics Signals

Life ScienceS – Case study

For the Precise Drug Target IdenTification

ABOUT CLIENT:
The multinational science-led bio-pharmaceutical company focused on developing life-changing medicines. (NDA)  

PROJECT GOAL:
The project goal was to build a software solution that would accelerate the exploration of genomics signals within large datasets in real-time combined with the population-based characteristics of each data point. This would help researchers in quick and precise drug target identification. Researchers should be able to define their groups of patients quickly and compare results against the control groups to generate certain plots that will bring greater insight into the specific cases analyzed by scientists. 

OUR SerVices

We built and provided a team of Python Developers, DevOps, and Project Manager who excel in handling complex projects and swiftly grasped the requirements of scientists and project domains.

The team was responsible for building back-end solution from scratch, including: 

Architecture proposal 
Developing  REST API using Python services in a microservices structure, which handles the requests performed by users 
ETL implementation as the data sets are received from an external source 
Integration of the computational tools 
Deployment and management of computing cluster for sufficient performance 
Deployment on the AWS cloud using Infrastructure as Code approach.   

Challenges

– Handling large and sparse datasets, especially when it comes to storing and operating them conventionally, which can lead to performance bottlenecks, significantly slowing down the application  

– A deep need to take advantage of modules that are not often used in the real-time application, like pandas, NumPy, dask 

– Setting up everything from scratch on the AWS cloud and, at the same time, ensuring that all of it fits seamlessly into the existing giant client ecosystem.

Solution

Our team has built a powerful solution that enables scientists to delve into the vast amount of genomic data and linked clinical records to identify potential targets and rapidly assess genomic relationships with user-defined case-control cohorts.

It’s a back-end solution that, via REST API – handling the requests performed by users –  provides real-time search and filtering capabilities and computational tools for data manipulation.  The JSON input is then presented for front-end service, where data visualization is done on various types of plots. 

The data is stored in parquet files and operated on via the dask cluster, which asynchronously cooperates with the back-end services.  

The entire solution is deployed on the AWS platform using Infrastructure as Code approach. The dask part of the application is set up on an AWS Parallel Cluster, which helps with the overall speed of the data handling and computational parts. 


Achievements We’re Proud of:

~0.5 million rows with 40 000 columns Are Searched and Filtered In Real-Time Through Back-end Rest Api

THe Waiting Time For The Results of Statistical Comparisons for cases and controls reduced from 2 days to 2 minutes

Project became a role model For Other Client Projects and as a result, we have been assigned to New projects.

Project Has been appreciated in internal Client awards as Highlight of the Year 2021 

Impact: revolutioniZing genomic data Exploration

This solution is set to revolutionize the way scientists explore genomic data, and we’re excited to see the impact it will have on the field.

The solution enables rapid hypothesis testing in the early stages of drug discovery by allowing researchers to quickly construct cohorts of their choices for further examinations. It results in a less costly and faster R&D process. 

Core Technologies

– Python, Django, Dask​ 
– Docker, PostgreSQL 
– AWS: CodePipeline, CloudFormation, CDK, KMS, SQS, Secrets Manager, Certifications Manager, CloudWatch, EC2, ECR, ECS, ELB, API Gateway, Fargate, Lambda, S3, Aurora RDS, ElastiCache Redis, SQS), IaC​ 
 

Innovation is a Process

Just tell us about your project needs and we’ll get back to you as soon as possible.

Never miss a thing With Sigma IT´s newsletter you get all the latest updates on everything we do.

With Sigma IT´s newsletter you get all the latest updates on everything we do.