Datadog vs Dripstat

Datadog's backend is a traditional timeseries datastore. It uses Cassandra to store each time series individually paired with a set of tags for filtering. While Timeseries databases work fine for low cardinality data, they perform poorly for arbitrary slice and dice of high cardinality data. Datadog started as an Infrastructure Monitoring product, where you typically store and query, say just a hundred different named metrics for a postgres or redis instance at a time. For these cases, Timeseries databases work well. However, for all other use cases, where there is typically high cardinality data, like APM, Kubernetes Monitoring, or Process Monitoring timeseries databases don't work so well. The limitations of Datadog's timeseries based backend is reflected throughout their product.

Container View Limitations

In the 'Container' tab, trying to 'group' containers by pod simply shows the 'list' of containers instead of true 'grouped' timeseries metrics, like Dripstat.

Datadog
Dripstat

In fact, the only timeseries metrics for containers in Datadog are for a single container at a time and you can only view 2 metrics total. Even the list view above of containers which allows filtering, only uses the last 10 minutes of data and shows only aggregated values instead of timeseries data, due to the limitation of querying over high cardinality data.

Process View Limitations

The Process view has similar limitations to the Container view. While Datadog claims to show 'Live metrics' for Processes with a 2-second resolution, that only applies for viewing the metrics of a single process at a time and is limited to 2 metrics only. While there is a top-level timeseries view where you can see multiple processes at once, that is limited to a 15-second resolution with only 15 mins of data and also limited to only 2 metrics. Trying to 'group' processes closes the timeseries view and shows just a list of processes, like the Container view.

Custom Dashboard Limitations

The 'Custom Dashboards' of Datadog also don't allow you to select any high cardinality data. This means data from Process, Container or APM view cannot be used in Custom Dashboards.

APM Limitations

Datadog APM similarly has very limited metrics. They  are fixed per endpoint. There is no way to slice or dice them, say by instance of application or caller. Its also why their APM is based on collecting 'traces' and querying on samples of those traces. While better than nothing, using sampled traces only gets you a subset of information.  What you are really trying to construct from those traces is timeseries metrics on 100% of span data, which can be arbitrarily sliced and diced.  After all, as humans, you can only look through a few traces at a time, while timeseries graphs allow you to view the state of multiple systems over a large period of time at a glance. Dripstat, on the other hand, collects 100% of data with full timeseries metrics for every single span of traces that can be arbitrarily sliced and diced. Dripstat even collects traces to help you look at exact line of code for slow calls to various systems.

Datadog's 'Service Map' is also indicative of this limitation. A 'map' only gives you a point in time snapshot of your services' communication graph. While good for marketing slides, the graph becomes useless as the number of nodes grows, and it doesn't show you how communication changed over time. Dripstat's Cross Application views show you a full timeline graph of every single interaction between various services and datastores. The UX scales well with the number of nodes and allows arbitrary slice and dice of data. Thus you get a complete picture of how the communication across various nodes changes over time.

Datadog's Infrastructure Monitoring is mostly relying on its breadth of integrations. A vast majority of those integrations are simply AWS cloudwatch metrics, which are just copied into datadog dashboards.

Dripstat's Infrastructure Monitoring is far more dynamic and completely live. It allows you to perform an action on your infrastructure and see the result in realtime, changing your entire experience of monitoring tools. Dripstat has most of the major integrations you need and new ones are being added every week. It will catch up to Datadog's set of integrations very soon.

Dripstat
Datadog
Metric Resolution
2 second
15 second
Server Monitoring
Dedicated Storage Monitoring
Basic Storage Metrics
No Dedicated UI
Process Monitoring
Enterprise Plan Only
Live Process Metrics
1 Process at a time only
2 Metrics only
Filter and Group Live Process Timeline graphs
View > 2 timeline metrics for Processes
Inventory Monitoring
Integrations
All the major ones &
new ones every week
More than Dripstat
AWS Lambda Monitoring
Coming Soon
Dedicated Network Monitoring
Coming Soon
Coming Soon

Datadog's Kubernetes Monitoring is simply a barebones dashboard along with basic Container metrics.  Its more of a checkbox feature. You will still need to use kubectl and kube-dashboard to actually see the state of your cluster. On the other hand, Dripstat's Kubernetes Monitoring is a comprehensive product with detailed metric and live configuration data. It can remove the need for you to use 'kubectl' and 'kube-dashboard' completely.

Dripstat
Datadog
Container Metrics
Filterable, Sorted Pod List by Resource Usage
Filterable Summary Count of desired vs ready Pods
High level node metrics
Arbitrary slice and dice of Container Metrics
Live State Change of Entities
Detailed Configuration Data of Individual Entities
Groupable Timeseries metrics of Entities
Detailed list of containers per node
Pod timeseries metrics by namespace, node pool, or workload type
Workload timeseries metrics split by pod
Pod timeseries metrics split by Containers
Endpoint configuration data
ConfigMap and Secret configuration data
Services configuration data
Detailed Per node configuration and metric data
View more than 2 timeseries container metrics at a time

Datadog has a very basic APM offering. Due to the limitations of their backend, it is based mostly on querying sampled trace data. There are some static metrics collected for latencies of individual endpoints, but they are very basic. Datadog's agent collects detailed trace data for every transaction, stores in it memory, then transfers it to the local infrastructure agent for sampling, thus storing it in that agent's memory subsequently. Due to this, the agent has a large memory overhead. It even increases the infrastructure agent's memory since traces are transferred to it. In the case of applications on Kubernetes, multiple applications will be transferring their traces to the same infrastructure agent, thus ballooning its memory usage proportional to the throughput of the applications.

Dripstat has a much more comprehensive APM offering. Its Cross-Application metric view is unmatched in being able to show a timeline of communication across instances, which a static 'services map' simply cannot convey. Dripstat's agents are extremely lightweight. Data is flushed every 2 seconds thus not allowed to collect. Only a few traces are kept in memory before flushing since dripstat collects metrics for each stat which occupies negligible memory. No intermediate data transfer is done to infra agents.

Dripstat
Datadog
Supported Languages
Java
Others Coming Soon
Java, NodeJs
Go, Python
Agent Overhead
Negligible
Much higher than Dripstat
Pre Instrumented Frameworks
Open Tracing support
Percentiles
Throughput
Error Rate
Response Time broken down by Service
Endpoint metrics
Traces with stacktraces, db statements & err msgs
JVM metrics
Java Profiler
Deployment History
Deployment markers in timeline
App Environment view
App library dependency view
Split metrics per instance
Timeline metrics of individual callers to external service/database statements
Individual Error metrics
Cross Application Timeline metrics
Per Application Caller timeline of each endpoint
Cross Application filterable error view
Dynamic Scalability Report per metric
Dynamic Trends Report per metric
Dynamic Cross Service Trend Report for Application and Endpoint Metrics
Timeseries view of custom metrics
Separate Graphs for Async and Sync Metrics
Timeline metrics of individual calls inside endpoints