Datadog's backend is a traditional timeseries datastore. It uses Cassandra to store each time series individually paired with a set of tags for filtering. While Timeseries databases work fine for low cardinality data, they perform poorly for arbitrary slice and dice of high cardinality data. Datadog started as an Infrastructure Monitoring product, where you typically store and query, say just a hundred different named metrics for a postgres or redis instance at a time. For these cases, Timeseries databases work well. However, for all other use cases, where there is typically high cardinality data, like APM, Kubernetes Monitoring, or Process Monitoring timeseries databases don't work so well. The limitations of Datadog's timeseries based backend is reflected throughout their product.
In the 'Container' tab, trying to 'group' containers by pod simply shows the 'list' of containers instead of true 'grouped' timeseries metrics, like Dripstat.
In fact, the only timeseries metrics for containers in Datadog are for a single container at a time and you can only view 2 metrics total. Even the list view above of containers which allows filtering, only uses the last 10 minutes of data and shows only aggregated values instead of timeseries data, due to the limitation of querying over high cardinality data.
The Process view has similar limitations to the Container view. While Datadog claims to show 'Live metrics' for Processes with a 2-second resolution, that only applies for viewing the metrics of a single process at a time and is limited to 2 metrics only. While there is a top-level timeseries view where you can see multiple processes at once, that is limited to a 15-second resolution with only 15 mins of data and also limited to only 2 metrics. Trying to 'group' processes closes the timeseries view and shows just a list of processes, like the Container view.
The 'Custom Dashboards' of Datadog also don't allow you to select any high cardinality data. This means data from Process, Container or APM view cannot be used in Custom Dashboards.
Datadog APM similarly has very limited metrics. They are fixed per endpoint. There is no way to slice or dice them, say by instance of application or caller. Its also why their APM is based on collecting 'traces' and querying on samples of those traces. While better than nothing, using sampled traces only gets you a subset of information. What you are really trying to construct from those traces is timeseries metrics on 100% of span data, which can be arbitrarily sliced and diced. After all, as humans, you can only look through a few traces at a time, while timeseries graphs allow you to view the state of multiple systems over a large period of time at a glance. Dripstat, on the other hand, collects 100% of data with full timeseries metrics for every single span of traces that can be arbitrarily sliced and diced. Dripstat even collects traces to help you look at exact line of code for slow calls to various systems.
Datadog's 'Service Map' is also indicative of this limitation. A 'map' only gives you a point in time snapshot of your services' communication graph. While good for marketing slides, the graph becomes useless as the number of nodes grows, and it doesn't show you how communication changed over time. Dripstat's Cross Application views show you a full timeline graph of every single interaction between various services and datastores. The UX scales well with the number of nodes and allows arbitrary slice and dice of data. Thus you get a complete picture of how the communication across various nodes changes over time.
Datadog's Infrastructure Monitoring is mostly relying on its breadth of integrations. A vast majority of those integrations are simply AWS cloudwatch metrics, which are just copied into datadog dashboards.
Dripstat's Infrastructure Monitoring is far more dynamic and completely live. It allows you to perform an action on your infrastructure and see the result in realtime, changing your entire experience of monitoring tools. Dripstat has most of the major integrations you need and new ones are being added every week. It will catch up to Datadog's set of integrations very soon.
Datadog's Kubernetes Monitoring is simply a barebones dashboard along with basic Container metrics. Its more of a checkbox feature. You will still need to use kubectl and kube-dashboard to actually see the state of your cluster. On the other hand, Dripstat's Kubernetes Monitoring is a comprehensive product with detailed metric and live configuration data. It can remove the need for you to use 'kubectl' and 'kube-dashboard' completely.
Datadog has a very basic APM offering. Due to the limitations of their backend, it is based mostly on querying sampled trace data. There are some static metrics collected for latencies of individual endpoints, but they are very basic. Datadog's agent collects detailed trace data for every transaction, stores in it memory, then transfers it to the local infrastructure agent for sampling, thus storing it in that agent's memory subsequently. Due to this, the agent has a large memory overhead. It even increases the infrastructure agent's memory since traces are transferred to it. In the case of applications on Kubernetes, multiple applications will be transferring their traces to the same infrastructure agent, thus ballooning its memory usage proportional to the throughput of the applications.
Dripstat has a much more comprehensive APM offering. Its Cross-Application metric view is unmatched in being able to show a timeline of communication across instances, which a static 'services map' simply cannot convey. Dripstat's agents are extremely lightweight. Data is flushed every 2 seconds thus not allowed to collect. Only a few traces are kept in memory before flushing since dripstat collects metrics for each stat which occupies negligible memory. No intermediate data transfer is done to infra agents.