Skip to main content
Filter by
Sorted by
Tagged with
-3 votes
0 answers
68 views

I'm working on a distributed consensus system based on the Raft protocol. Currently, when the leader node fails, our service experiences unavailability during the period of: Follower timeout ...
jy l's user avatar
  • 1
Advice
0 votes
2 replies
57 views

What's the best way to allow programs to discover each other on the network? Let's say we are writing a system that tracks the usage of computers over the network. We have an agent program that sends ...
Isembart's user avatar
1 vote
1 answer
51 views

I recently joined a startup that has a pretty messy backend setup, and I’ve been assigned to sort it out. Here’s the situation: There’s one main entry point (a federation/onboarding service) that’s ...
Adithya Srikar's user avatar
0 votes
1 answer
80 views

I have the following requirement. A website (lets call it Website A) where I sell subscription plans for my SaaS Payments are handled with Stripe I am using an authentication service (Auth0) so users ...
kmylonas's user avatar
0 votes
0 answers
70 views

Background I’m implementing Asynchronous Distributed Key Generation (ADKG) over secp256k1 so that N nodes collectively hold a threshold private key. After DKG each node has a secret share. To sign an ...
Shubham Gupta's user avatar
0 votes
0 answers
37 views

Should gRPC clients implement an HTTP-level resilience handler? Or only rely on the gRPC-level RetryPolicy? Why/why not? For example, if the server responds with a 5xx status code (unexpected but ...
Lindeberg's user avatar
  • 121
0 votes
0 answers
48 views

Many examples of Kafka topic configuration have RF = 3, min.insync.replicas = 2. In the case of a cluster of 5 brokers, if we use RF = 5, should min.insync.replicas = 3? That seems "natural" ...
alabaster's user avatar
  • 130
0 votes
1 answer
80 views

This question is inspired by a 'general admission' variant of the common 'event ticketing' System Design interview question. Critically in this version, the user does not select a seat - they only say ...
asantas93's user avatar
0 votes
0 answers
31 views

OceanBase Version: V4.2 I’m using OceanBase in MySQL mode, and I noticed that functions like NOW() and CURRENT_TIMESTAMP only provide microsecond (6-digit) precision. So I’m trying to create a custom ...
user avatar
0 votes
0 answers
38 views

I have been trying to understand the Raft protocol for quite some time now. One thing that has always stumped me is the proof of the Log Matching property. One of my concerns is that the proof in the ...
arl's user avatar
  • 118
0 votes
0 answers
42 views

I am trying to building an server that communicates with the client using socketio protocol, Now the server starts multiple application as a sub processes, the server communicates with this ...
Souvik De's user avatar
0 votes
1 answer
66 views

In the book Designing Data-Intensive Applications > chapter-5 > Leaderless replication > Detecting Concurrent Writes, below is what Author says while talking about Last write wins (LWW) The ...
wenn32's user avatar
  • 1,394
0 votes
1 answer
79 views

I am new to JMeter Distributed Environment setup. I dont have any knowledge on how to setup the master slave configuration. I just have the information that we have a Master VM and we can spin up ...
Farhan Meer's user avatar
0 votes
0 answers
33 views

I have a worker that processes tasks from RabbitMQ and inserts data into a database. The system operates at high scale, handling thousands of messages per second, which makes proper failure handling ...
Yakir's user avatar
  • 19
0 votes
0 answers
45 views

In Lamport's Distributed Mutual Exclusion algorithm, a process can enter the critical section if two conditions are met: Its request is at the head of its own queue. It has received a reply from all ...
Benjamin's user avatar
1 vote
0 answers
27 views

In paper Zab: High-performance broadcast for primary-backup systems, the figure 1 shows that Paxos could violate primary order of requests. I understand the result will be like that if each proposer ...
user1532146's user avatar
2 votes
1 answer
56 views

I’m working on a microservices-based project where each service is a separate PHP application. They all rely on JWT for authentication and authorization. The tricky part is revoking (or blacklisting) ...
Kamyar Safari's user avatar
0 votes
0 answers
42 views

As Ceph and its CRUSH algorithm ruled out the issue of metadata server's contention, and surely decreased object fetching latency by removing the RPC to query object location, why it is less adopted ...
Coulson Liang's user avatar
0 votes
1 answer
39 views

Lets say We have a cluster of 5 nodes and A is the leader. Following sequence of events take place: A sends the replicate change request in parallel to all the followers. Only B could receive the ...
Tarun's user avatar
  • 3,175
-1 votes
1 answer
186 views

Referring to this table depicted in the Raft paper, I did not find where do followers memorize the leader in any form such as identifier, physical addr, etc. Instead, I only find the leader ID in ...
PkDrew's user avatar
  • 2,301
1 vote
1 answer
180 views

I am new to Kafka and I understand that there is only guarantee of message order within one partition and not across partitions. What I am not sure is if this can create scalability issues e.g. in ...
smith's user avatar
  • 311
-3 votes
1 answer
42 views

I'm using confluent kafka library to create a distributed system, but I'm failing to understand some principles of Kafka itself. Lets say right now I'm working with a Central, that has to listen to ...
keykey13's user avatar
0 votes
1 answer
29 views

We have an NServiceBus application running in two Azure regions: North Europe and West Europe. We are using SQL transport, and both applications in these regions connect to a shared database. ...
Rick Neeft's user avatar
0 votes
1 answer
41 views

I have a number of workstations that run long processes containing sequences like this: x = wait_while_current_is_set y = read_voltage z = z + y The workstations must maintain synchronization with a ...
david's user avatar
  • 2,706
0 votes
0 answers
83 views

We are using MassTransit with Azure Service Bus in our backend system for messaging. Now, we need to extend our solution to communicate with on-premises agents that will be installed for each of our ...
SOK's user avatar
  • 595
2 votes
1 answer
140 views

I'm working on a Raft implementation as part of my distributed file system and I've run into a problem with the log compaction process. Accurding to the official Raft paper, when a log reaches a ...
Dror Chen's user avatar
0 votes
1 answer
191 views

I have a question I'm curious about. Let's say we are developing a microservice social media application (I chose this topic for practical purposes :)). I'm using the inbox-outbox pattern to ensure ...
OnurcanOgul's user avatar
1 vote
4 answers
183 views

Consider a database like cockroachDB that uses RAFT protocol for replicating data to a replica group owning a partition of the data. How does a client handle a request that fails in such DBs? Because, ...
Dumb_Pegasus's user avatar
1 vote
1 answer
165 views

I'm working on an event-sourced application that crawls sports betting games from different bookmakers. I have two primary aggregates in my system: Game: Represents a sports betting event for a ...
Ari Seyhun's user avatar
  • 12.8k
0 votes
1 answer
459 views

I am trying to add multiple DbContext instances in app, which is launched with .NET Aspire. I also want those separate contexts to have configuration available (in this case to have migration history ...
Vytenis Kajackas's user avatar
0 votes
1 answer
262 views

I have a clustered real time system that produces a very large amount of binary logs. I get a bunch of binary logs from each node in the system and I want to view the logs in a convinent way. Mostly, ...
shaharhoch's user avatar
0 votes
1 answer
73 views

I am new to using queue-worker architectures and I'm interested in how to make it resilient to a worker failing. For example We have a pool of workers Alpha that put entries onto queue A Then the ...
Lubed Up Slug's user avatar
1 vote
1 answer
237 views

I was following this blog on implementing Rate Limiter using Redis. Link to the blog Here they have used MULTI to pack all the atomic commands. This ensures that we're not concurrently writing wrongly ...
Gitesh Khanna's user avatar
2 votes
0 answers
72 views

I am running a TomEE server inside a Docker container, but my web application is not loading as expected. Here is the setup I'm using: Docker Image and Container: Image: interesting_picture:latest (...
Sushi's user avatar
  • 23
1 vote
1 answer
706 views

I'm having some trouble understanding the need of a distributed lock. I did think of an example where it may be required but I'm not completely sure. I would appreciate some comments if I'm thinking ...
Laksh Chauhan's user avatar
1 vote
0 answers
54 views

I'm working on a system where: A producer sends approximately 100 million messages daily to a message queue. The consumer processes each message from the queue and produces multiple parts as output. ...
Pouya Rezaei's user avatar
1 vote
1 answer
105 views

I wanted to address an important aspect of our microservices architecture, specifically regarding our tracing implementation with OpenTelemetry. We have multiple microservices operating seamlessly ...
Raushan's user avatar
  • 347
0 votes
0 answers
48 views

Recently, I encountered a scheduling problem in a distributed system and I hope to get some help: for a multi-stage microservice that has two stages calling the same instance, such as A-->B-->A, ...
user26585062's user avatar
0 votes
1 answer
213 views

I need to create the replication in clickhouse; on two different machines that are under the same network. I have tried to configure it but I have the following error: SQL Error [999] [07000]: Code: ...
Vilma Zorina Camacho Cagal's user avatar
3 votes
2 answers
371 views

I am reading DDIA. It says "possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178)...
Zack Light's user avatar
0 votes
1 answer
66 views

I have an application that uses postgres database on one region (US West) containing several tables, one of which contains several hundred thousand records (let's call it "events" table with ...
ct101's user avatar
  • 1
0 votes
0 answers
29 views

If a system is partition tolerant, it's impossible for it to be consistent since there's no way for one node to update another. How can you be both consistent while partition intolerant be possible?
JobHunter69's user avatar
  • 2,376
0 votes
0 answers
55 views

I'm working with an events table where different source tables trigger writes into this table with columns: entity_id and payload. These events are then published to a Kafka topic using a message ...
Forece85's user avatar
  • 518
1 vote
1 answer
684 views

Consider there are 3 microservices - s1, s2 and s3. s1 sends message m1. s2 consumes message m1, applies some business logic and then sends message m2. The problem is that s3 receives message m2 ...
Yash's user avatar
  • 27
0 votes
1 answer
158 views

I am using Citus as a managed service in the cloud with Azure Cosmos DB for PostgreSQL. I have 1 coordinator and 2 worker nodes setup. There are distributed tables and reference tables created. ...
Rohith K's user avatar
2 votes
0 answers
79 views

I have a .NET 8 distributed system in AKS where work is divided among workers using a Manager/Worker pattern. With work shared out on a Redis List. I'm aiming to get unified logging via Application ...
Andrew Matthews's user avatar
1 vote
0 answers
472 views

I need to fetch and filter data from three different services: ProductService, PriceService, and StockService. My goal is to get products that belong to Category = 54, have stock available, and are ...
mehmtee10's user avatar
2 votes
2 answers
159 views

Hi I have order creation functionality in my project and I am giving a order_id to client which is a auto-increment ID order = serializer.save(user=user,created_by=user,platform=platform) Now how ...
Ramprasad Thakur's user avatar
1 vote
3 answers
876 views

In the book "Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems", we can read regarding Sloppy Quorum : However, this means that even ...
Yas's user avatar
  • 63
1 vote
1 answer
57 views

In an event driven architecture using choreogeaphy model, how do we keep current, global state of the process? Lets say we have a process where many services p1,...,pn transition many states s1,...,...
tlt's user avatar
  • 15.5k

1
2 3 4 5
27