A few years back I wrote about an architecture for large-scale serverless data ingestion on AWS. That post covered the design of a Serverless Framework based import platform; an orchestrator service that picks up files landing in an S3 bucket and fans out processing to a set of consumer services, each handling a specific data … Continue reading Wrangling Serverless Development & Devops
Author: bitsofinfo
Poor man’s distributed locking
One of my prior posts on the async processing service mentioned a distributed locking mechanism. This post touches on that implementation. The short version: the client had no existing distributed locking infrastructure; no Hazelcast, no Redis, no Zookeeper, nothing. What they did have was Postgres, which was already in use. That turned out to be … Continue reading Poor man’s distributed locking
Simple pattern for an asynchronous processing service
The use case was straightforward: a backend service needed to receive document submissions and forward them to one or more document manager endpoints. Document managers accepted documents, processed them, and stored them. Documents could have been submitted by other systems independently, so the service had to account for the possibility that any given document was … Continue reading Simple pattern for an asynchronous processing service
Architecture for a data lake REST API using Delta Lake, Fugue & Spark
"Hey, we need some kind of a REST API over all our data lakes to let analysts and other integrations query records on demand . Can we please get this done?" That was the use-case laid out that needed a solution. If you've had any experience with data lakes you know that they can be … Continue reading Architecture for a data lake REST API using Delta Lake, Fugue & Spark
Architecture for generative Terragrunt & Terraform infrastructure as code (IaC)
This article covers a specific scenario where despite trying to leverage as many DRY (don't repeat yourself) principles made available to us by the underlying IaC (infrastructure as code) frameworks, sometimes we still need to elevate the abstraction to another level to fully reduce code duplication and gain larger economies of scale deploying large platforms … Continue reading Architecture for generative Terragrunt & Terraform infrastructure as code (IaC)
Bridging Auth0 with Legacy IdPs
"We know we need to move to Auth0 and OAuth standards eventually. But we can't just flip a switch. Can we figure out a path where both worlds can coexist?" That was the challenge from a client with a long running custom platform that had its own bespoke authentication and authorization system. Their existing custom … Continue reading Bridging Auth0 with Legacy IdPs
Fully Automated Lets Encrypt TLS certs with ACME-DNS on Kubernetes
This article covers fully automating DNS and the issuance of TLS certificates of Kubernetes for Ingress based workloads (both public and private) utilizing cert-manager, external-dns, acme-dns and kubernetes-acme-dns-registrar Scenario You are a busy DevOps professional. You want to setup an Kubernetes platform that can accept any typical HTTP based workload (Ingress based) with minimal management … Continue reading Fully Automated Lets Encrypt TLS certs with ACME-DNS on Kubernetes
Reacting to K8s Events with k8s-watcher
As part of a recent project which needs to automatically issue new TLS certificates for hosts defined in Kubernetes Ingress objects, I ended up having to create a library that would let me detect such events in a simplified manner for part of a larger Python program which needs to react to such events. My … Continue reading Reacting to K8s Events with k8s-watcher
Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas
Note, this is the final part of a two part series about this project; article #1 is here. Continuing on from where we last left off, now that we had a functioning collection engine producing full graphs of crawled data all the way down to interrogable dataset_items, it was now time to get down to … Continue reading Architecture for non-deterministic mass data collection: part 2: dynamic data lake schemas
Architecture for non-deterministic mass data collection: part 1: collection engine
Note, this is part one of a two part series about this project; article #2 is here. One of my more recent projects was spawned from a pretty interesting idea. The team wanted to build a system that would permit them to scour the Internet for information regarding a particular set of targets; a "target" … Continue reading Architecture for non-deterministic mass data collection: part 1: collection engine