James Barlow | September, 2022
Data Lakes, SaaS
Triumph Technologies was tasked to create a shielded centralized database system in order to manage, and store massive amounts of private internal data.
Berkeley Lights is a provider of research and development services using micro-droplet technology that change biological processes. Berkeley Lights has developed an advanced, proprietary platform that allows researchers to find and produce cell-based products in a fraction of the time and cost compared to traditional research approaches. Their goal is to ultimately serve their customers with cutting edge technologies to make cell-based products and therapeutics easily accessible. Berkeley Light has been developing cell-based products since 2011.
Their cutting edge platform can simultaneously gather phenotypic, functional, and genotypic data for large numbers of single cells, and give the live biology consumers want in the form of the best cells. BLI built their end-to-end solution to include proprietary consumables such as OptoSelectTM chips, reagent kits, advanced automation systems, and application software. The Berkeley Lights Platform was created to provide the most advanced environment for rapid functional assessment of single cells at scale, with the purpose of establishing an industry standard for their clients.
Triumph Technologies was tasked to create a shielded centralized database system in order to manage, and store massive amounts of private internal data. BLI had several requirements around the shielded database, such as PCX-compliance and HIPAA compliance. Data was expected to come from over 12 different sources and then expected to be leveraged in Tableau.
The below diagram illustrates the end result of this endeavor:
Leveraging the ETL capabilities of AWS Glue, Triumph opted for a data-lake approach in order to meet several of the ETL requirements required for the data sources.
Our solution leveraged IaC to deploy the necessary components such as a data lake API to provide users access to microservices that can perform administrative functions such as uploading data, creating data packages, searching through data packages, and generating data manifests. These services then interact with the rest of the architecture to provide the other functions such as data storage with S3, data management utilizing AWS Glue, and audit functions with Elasticsearch and Grafana. Amazon Kinesis was also leveraged in order to get the best out of streaming data in real time to provide real time analytics to BLI stakeholders.
Access to the console and data lake is provided and managed via an AWS Landing Zone and SSO directory.
As a result of implementing a data lake solution, Berkeley Lights has since been able to rely on the security, durability, and scalability of their shielded data lake. As they continue to experience growth this internal solution allows them to provide internal stakeholders with the data and analytics needed to make key decisions moving forward.