Back to projects
Health OCR
Professional

Health OCR

Affaq Ahmed · Software Engineer · Tintash · Desktop App / August 31, 2023

  • Python
  • AWS Lambda
  • Amazon S3
  • Batch Scripting
  • Cron
  • CI/CD

Overview

Health OCR is a Python desktop application that performs optical character recognition (OCR) within a health-based system, turning documents into machine-readable text. My work on the project focused on automating the delivery pipeline and operations around it — removing slow, error-prone manual steps so the team could ship and operate the app reliably.

My Role & Contributions

1. Automated the build process. Producing a build of the application was a multi-step manual process that took around an hour and required a person to babysit it the whole way through. I wrote a batch script that automated the entire build end to end — no human input required at any step.

2. Automated the logging pipeline. Application logs were being handled manually. I built an automated pipeline that stores logs on Amazon S3 and combines/processes them with AWS Lambda on a scheduled (cron) basis, replacing the previous manual workflow entirely.

Impact

  • Build time cut from ~60 minutes to under 15 minutes — roughly a 4× speed-up.
  • Fully hands-off — the build went from constant human supervision to zero manual input.
  • Thousands of hours of manual labour saved over the life of the project by removing repetitive build and log-handling work.
  • More reliable operations — automation eliminated the human errors that crept into the manual build and logging steps.

Tech Stack & Architecture

  • Application: Python desktop app performing OCR.
  • Build automation: A batch script orchestrating the full multi-step build.
  • Log pipeline: Logs shipped to Amazon S3, then aggregated and processed by AWS Lambda functions triggered on a cron schedule.

Challenges & Solutions

The build had several interdependent steps that previously had to be run in the right order by hand — easy to get wrong and slow to complete. I encoded the entire sequence into a single batch script so it runs deterministically and unattended. For the logs, moving from manual handling to an S3 + Lambda + cron pipeline meant the data was collected and processed consistently on a schedule, without anyone needing to remember to do it.

Timeline

June 2023 – August 2023 · 3 months