Careers at Triumph Tech

ShieldLegal Case Study – When Big Data Meets AWS DevOps

 

When Big Data Meets AWS DevOps

Timothy Wong


Big Data, DevOps, Serverless



Company Name
Shield Legal
Case Study Title
When Big Data Meets AWS DevOps
Vertical

Healthcare

We found that the client lost approximately $1,095,000 per year and that this solution would cost $10,000 in design and implementation, which amounts to a 108 times ROI in the first year.

Problem / Statement Definition

Shield Legal had been plagued by spending thousands of dollars on routine daily tasks, which could have been automated for greatly reduced costs. As their practice grew, time spent on reports generated by humans had increased exponentially. Triumph Tech recognized that Shield Legal needed a fully automated solution to get back valuable productivity time in order to save on operational expenses.

Proposed Solution and Architecture

As a team of legal lead generation and legal marketing experts, Shield Legal did not have the staff nor resources to modernize their current business processes.

Triumph selected AWS to provide the cost-efficient resources to fully automate the generation of lead reports. We used a Python Lambda function to retrieve reports from the client’s CRM and transform the data into a report that would be posted on Slack.

We chose Lambda to:

  • Save on cost
  • Pay only for what was used with CodePipeline and Systems Manager
  • Rapidly deploy modifications and updates to the serverless function
Outcomes of Project & Success Metrics

We discovered that valuable marketing staff spent four hours each day on the generation of reports when they could have been focused on their core business, lead-gen marketing. Not only was the client losing money on operational costs, they were losing out on revenue generating activity as well.

Shield Legal would lose $3,000 daily in operational expenses as a result of manually preparing these reports. That adds up to losses of $1,095,000 per year.

With the cost at $10,000 for the development of this microservice, savings amounted to over 108 times over from cost while streamlining time and business priorities.

Describe TCO Analysis Performed

TCO was calculated based on the amount of time to manually pull data and generate reports versus the amount of time required by our solution.

Lessons Learned

Automated solutions are the optimal way to reduce operational costs and to allow Shield Legal to focus on their core business.

Summary of Customer Environment

Cloud environment is native cloud. The entire stack is running on Amazon Web Services in the US-East-1 region.

Summary of Customer Environment
  • Root User is secured and MFA is required. IAM password policy is enforced.
  • Operations, Billing, and Security contact email addresses are set and all account contact information, including the root user email address, is set to a corporate email address or phone number.
  • AWS CloudTrail is enabled in all regions and logs stored in S3.
Operational Excellence
Metric Definitions
  • CodePipeline Health Metrics
    If any step within the pipeline fails, notifications are sent to the DevOps Slack channel. This is achieved via a Slack integration between SNS topics and AWS Chatbot integration.
  • Container Metrics
    Container health is handled at the orchestration level. We set up health checks to monitor specific ports and endpoints within our containers and check for a 200 or predetermined response code.
  • Lambda Health Metrics
    Lambda health is determined by the success / failure of the Lambda function. The most important metric is error count and success rate (%).
  • ELB Target Group Metrics
    Unhealthy targets are identified as targets that don’t pass ELB health checks.
Metric Collection and Analytics

We consult clients on best practices in terms of log / metric collection. For application-related logs, we prefer the use of an ELK stack, which takes advantage of AWS Elastic Search Service, Logstash running on EC2, and Kibana. This allows for complete security and granular control over log collection and visualization.

To automate the alerting of unhealthy targets of an application / network / or classic ELB, we consult our client on using CloudWatch alarms, SNS alarm trigger notification, and AWS Lambda. The Lambda function makes a describe-load-balancer or describe_target_groups API call to identify the failed target (as well as the cause of the failure) and then triggers an email notification via SNS with the discovered unhealthy host details.

We recommend the use of Grafana running on EC2 and Prometheus for the monitoring of individual workloads running within a stack. EC2, RDS, Container, EKS, and ECS metrics are collected by Prometheus and data visualized via dashboards within Grafana.
In this particular case we are using a lambda function in order to pipe logs from the Lambda application to an Amazon Elastic Search Cluster.

Operational Enablement

Enabling the client to manage and maintain the DevOps pipeline after handover is of the utmost importance. Our goals are to:

  • Minimize the maintenance required with the level of automation.
  • Allow the Development team to push code, follow a development process, and know that their applications are tested and rapidly deployed.
  • Training and handover with documentation specific to the customer’s workload, outlining the development lifecycle.
  • Document how to version IAC modules / templates that were developed and push out updates to their infrastructure.
  • Provide architecture diagrams that outline the branch strategy / git workflow.
  • Schedule a video conference, and conduct a hands-on session with the client, going over how to push application updates throughout the development, staging, and production environments.
  • Review the development workflow and branching strategy.
  • Show clients how to troubleshoot a failed pipeline build within CodeBuild.
  • Teach clients on where to find all relevant logs of their build and test stages within CodePipeline. (The majority of DevOps related troubleshooting tasks after the creation of a CI / CD automation pipeline will be found within the CodeBuild logs and fixed at the application layer.)
  • Outline common troubleshooting scenarios via videoconferencing that the client will run into and show them how to effectively troubleshoot the workload.
  • We go over each and every component of the infrastructure and CI / CD pipeline that was developed with the client and allow them time to ask any questions.
Deployment Testing and Validation

Deployments are tested and validated through a promotion strategy. The only branch which automatically deploys without approval is the development branch, which is deployed to the isolated development environment. At this point, the team will QA and validate application functionality and approve a promotion to the staging environment. A pull request is submitted to source control and merged into staging. Workloads are then deployed to the staging environment. After testing and validation of staging, a pull request is submitted from staging into master and merged. Master branch triggers a build and deployment to production via CodeBuild / CodePipeline.

Version Control

All code assets are version controlled within GitHub.

Application Workload and Telemetry

CloudWatch application logging is integrated by default into all of our container and serverless workloads. We include this as an “in scope” item for all DevOps projects. This provides a centralized system where error logs are captured and aid in operational troubleshooting.

Security: Identity and Access Management
Access Requirements Defined

In order to discover access requirements, we look at the organizational units within the client’s business, which are required to access DevOps infrastructure. We discover developers, systems engineers, security engineers, and stakeholders. We have previously defined best practices that we follow for each of these groups.

IAM groups are created for each of these Organizational Units and least privilege access is applied to each. Each group is granted access to only what they require.

Developer Policy

Our developer policy looks like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:RevokeSecurityGroupEgress"
            ],
            "Resource": "arn:aws:ec2:*:*:*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "ec2:Describe*",
                "iam:ListInstanceProfiles",
                "mgh:CreateProgressUpdateStream",
                "mgh:ImportMigrationTask",
                "mgh:NotifyMigrationTaskState",
                "mgh:PutResourceAttributes",
                "mgh:AssociateDiscoveredResource",
                "mgh:ListDiscoveredResources",
                "mgh:AssociateCreatedArtifact",
                "discovery:ListConfigurations"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "ec2:CreateSecurityGroup",
                "ec2:ModifyInstanceAttribute",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:DeleteVolume",
                "ec2:CreateImage"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Condition": {
                "ForAllValues:StringLike": {
                    "ec2.ResourceTag/appenv": [
                        "rmmigrate-dta"
                    ]
                }
            },
            "Action": [
                "ec2:TerminateInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:RunInstances"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "iam:PassRole",
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

No processes deployed to AWS infrastructure will use static AWS credentials. All instances calling other AWS functions use roles. Static AWS credentials are used to call AWS services only when third party integrations can’t make use of assumed roles.

Log into AWS for each APN partner and user of the platform, and make use of unique IAM users or federated login. No root access is permitted. We have a CloudWatch alarm setup which triggers an SNS notification via email anytime the root user logs in.

Security IT / Operations:

Components which require encryption:

  • Lambda Variables: These are encrypted at rest using KMS.
  • AWS API Integration
  • AWS CLI is used for all programmatic access.
Big Data Reliability
Deployment Automation

The deployment process is fully automated. When we merge a change into the master branch from development within GitHub, CodePipeline is triggered. CodePipeline first runs CodeBuild and compiles application dependencies via pip and requirements.txt, then creates an artifact and CloudFormation template which triggers the deployment of the serverless function via CloudFormation. We use change sets and then automatically execute those change sets via CodePipeline

Availability Requirements

  • RTO: Application reports run 3x daily at 8PM, 2PM, and 9PM
  • Application can be down for a maximum of 14 hours without causing any significant harm to the business.
  • RPO: 24 Hours
  • Data is backed up every 24 hours, so in the unlikely event of data loss, only one day’s worth of data may be lost.
Adapts to Changes in Big Data Demand

This application uses Lambda, which scales in response to demand. Reports are only run three times daily, which does not warrant the use of provisioned concurrency.

Cost Optimization
Cost Modelling

We test the workload in a Lambda development environment. We run the workload and record the transaction time, then estimate the cost using the AWS calculator. We multiply this by 90, since the report will run 90 times in a month. This particular function will fall under the AWS Free Tier in terms of cost.

We found that the client lost approximately $1,095,000 per year and that this solution would cost $10,000 in design and implementation, which amounts to a 108 times ROI in the first year.

Looking to save resources and money with DevOps and Big Data? Meet one of our Big Data Scientists today.