Message Queue Selection Is a Personality Test for Your Architecture
Why does message queue selection reveal architectural personality?
Each message queue technology embodies a set of tradeoffs: consistency versus throughput, ordering versus parallelism, operational simplicity versus configurability. The queue you choose is the tradeoff you prioritize, and that tradeoff reveals what your architecture values most.
When a team tells me they chose Kafka, I know certain things about their architecture before seeing a single line of code. They value event ordering. They want to replay historical events. They are willing to accept operational complexity for durability and throughput. When a team tells me they chose SQS, I know they value operational simplicity, do not need strict ordering, and prioritize managed infrastructure over configurability. These are not just technology preferences. They are architectural values expressed through infrastructure choices.
What does each major queue technology optimize for?
Kafka optimizes for ordered, durable event streams. RabbitMQ optimizes for flexible routing and protocol support. SQS optimizes for operational simplicity. NATS optimizes for low-latency, lightweight messaging.
Apache Kafka: A distributed commit log that provides ordered, durable, replayable event streams. I choose Kafka when the architecture requires event sourcing, stream processing, or the ability to replay events from any point in history. Kafka handles 1 million messages per second on modest hardware but requires a dedicated operations team (or a managed service like Confluent or AWS MSK). The operational cost is the price of durability and ordering. In 5 of 13 evaluations, Kafka was the correct choice.
RabbitMQ: A traditional message broker with flexible routing (direct, topic, fanout, header-based). I choose RabbitMQ when the architecture requires complex routing patterns, multiple consumer groups with different selection criteria, or protocol flexibility (AMQP, MQTT, STOMP). RabbitMQ handles 50,000 to 100,000 messages per second and is operationally simpler than Kafka but less durable under partition scenarios. In 3 of 13 evaluations, RabbitMQ was the correct choice.
Amazon SQS: A fully managed queue service with no infrastructure to operate. I choose SQS when the architecture does not require strict ordering (SQS FIFO provides ordering but with throughput limits), when operational simplicity is the highest priority, and when the system runs on AWS. SQS handles virtually unlimited throughput for standard queues and 300 messages per second per group for FIFO queues. In 4 of 13 evaluations, SQS was the correct choice because the team valued zero operational overhead over fine-grained control.
NATS: A lightweight, high-performance messaging system optimized for low latency. I choose NATS when the architecture requires sub-millisecond message delivery, when messages are ephemeral (fire-and-forget), or when the system operates in resource-constrained environments (edge, IoT). NATS JetStream adds persistence when needed. In 1 of 13 evaluations, NATS was the correct choice for a real-time telemetry system processing sensor data at the edge.
How should teams make this decision systematically?
Evaluate against 5 criteria: ordering requirements, delivery guarantees, throughput needs, operational capacity, and ecosystem integration. Score each criterion and match to the technology that best fits.
- Ordering: If events must be processed in order (financial transactions, event sourcing), Kafka’s partition-based ordering is the strongest guarantee. If ordering is “nice to have,” SQS FIFO or RabbitMQ work. If ordering does not matter, SQS Standard or NATS are simplest.
- Delivery guarantees: If exactly-once processing is required, Kafka with transactions or SQS with deduplication. If at-least-once is acceptable, any option works. If at-most-once is acceptable (telemetry, metrics), NATS core is the lightest option.
- Throughput: If sustained throughput exceeds 100,000 messages per second, Kafka or NATS. Below that threshold, all options perform adequately.
- Operational capacity: If the team has no dedicated infrastructure engineers, a managed service (SQS, Confluent Cloud, CloudAMQP) is the only responsible choice. Self-managed Kafka without operational expertise produces incidents that outweigh any technical benefit.
- Ecosystem: If the architecture already uses Kafka Connect, adding another Kafka topic is cheap. If the architecture already uses AWS-native services, SQS integrates without additional configuration.
According to the message broker landscape, the number of queue technologies is growing, but the fundamental tradeoffs remain the same. As I discussed in event-driven architecture, the queue is the nervous system of an asynchronous architecture. Choose it with the same deliberation you would apply to choosing a database, because like the database, the message queue decision will outlast the application code built on top of it.
What are the broader implications for architectural decision-making?
Technology selection decisions are value declarations. The tools you choose encode the tradeoffs you prioritize, and those tradeoffs shape the system’s character for years.
The message queue is a microcosm of every architectural decision. It looks like a technical choice. It is actually a statement about what the system values: durability or simplicity, control or convenience, ordering or throughput. The architect who understands this makes better decisions because they are choosing values, not products. And values, unlike products, do not become obsolete with the next release cycle. As I wrote in architecture decision records, documenting not just the choice but the values behind it ensures that future teams understand why the queue was selected and under what conditions the choice should be reconsidered.