Collecting Data from the Cloud

We’ve seen the usage of cloud computing platforms expand significantly over the years, and now it’s gotten to the point where nearly everyone has at least some cloud presence. Once you have a cloud presence, you’ll want to collect logs from those cloud systems or from the cloud platform itself.

There are plenty of mechanisms to accomplish this, and I’m not going to go into all of them in detail here, but all of them are fundamentally similar in that the end result is a data transfer from the cloud provider to the Splunk instance. This brings me to the main question of this blog post, which I hope to answer:

How Do We Deal with Transfer Fees?

Because of budgeting concerns, many people are worried about uncontrolled spending regarding transfer fees, and so they try to come up with ideas to reduce this. The truth is, however, it’s really not that big of a deal when compared to the overall cost of your cloud deployment, and you can spend a lot more money trying to reduce this cost.

Let's Look at a Real-World Example

At the time of this writing (July 2019), the cost to transfer up to 10TB/month out of the US East 2 (Ohio) region of AWS is $0.09 per GB (inbound transfer to AWS is free). That means transferring 1TB of data over a month would end up costing $92.07. This equates to less than 35GB/day of log volume. If you’re a smaller Splunk customer, this could end up being a significant amount of your daily license.

To make this math easier for the following example, let’s assume you’re collecting 100GB/day in logs from AWS. This works out to being around 3TB of data a month, or $276.39 in data transfer costs per month. We’re not considering potential overhead in transmission for this, so the actual cost may end up being somewhat higher, but this should be close enough for an estimate.

Now, let’s say you’re wondering if a better alternative would be to deploy a separate Splunk environment to receive any AWS logs, to keep all the data transfer for that entirely within AWS. I would quickly like to note here that it’s not cheaper to deploy an entirely new Splunk environment to only save money on AWS data transfer costs.

Building a Splunk instance that can be hosted entirely within AWS and handle 100GB/day in AWS logs will consist of the following:

AWS c5.4xl Linux instance (16 vCPU, 32GB of RAM) - Note: We’re assuming that Splunk Enterprise security is not in use here, as we’d need at least a c5.9xl in that case

Storage for our OS (for this example, 20GB)

Storage for 90 days of logs (~4.4TB of space)

This alternative, without any data transfer costs (which are not completely unavoidable, but for the sake of this example are minimal) ends up being $939.76 per month with on-demand pricing.

Let’s see what we can do to reduce the cost. The lowest instance prices in AWS are for a 3-year term, all upfront. The EC2 instance alone comes to $6,447 for 36 months of usage, or $179/month over 3 years. This seems better than the data transfer costs until we remember that we still don’t have any storage. Adding 20GB of disk for our OS and 4.4TB of disk for 3 months of retention gives us a monthly cost of $442 for the storage, bringing us to a total monthly cost of $621. Even excluding any possible data transmission costs, this is significantly higher than our data transfer costs.

This also doesn’t address if this is even the best approach from a Splunk design perspective. If you don’t currently have a Splunk deployment, a single instance in AWS may be an acceptable solution. However, if you already have an existing Splunk deployment—especially a distributed one—you may negatively impact the performance of your environment by adding an additional Splunk system that doesn’t effectively balance the data from other systems in the environment.

Ultimately, It Depends on Your Use Case

Despite all of this, the right answer for your specific use case is still a firm “it depends.” There may be situations where you might be able to adjust your AWS environment to optimize your costs and also accomplish other goals, such as geographical distribution of data. In those cases, data transfer costs are unavoidable, and with a larger Splunk deployment, especially one hosted entirely within AWS, they end up being a relatively small part of your overall AWS bill.