As an example, let’s have an e-commerce application. Sometimes, bad stuff happens and the development team needs to access logs to analyze what happened. In general, due to "security" reasons, developers cannot directly access the system, and needs to ask the operations team to send the log file(s). This too common process results in frustration, time wasting, and contributes to build even taller walls between Dev and Ops. To improve the situation, an option could be to setup a datastore to store the logs into, and a webapp to make them available to developers.

Here are some hypotheses regarding the source architecture components:

Load-balancer

Apache Web server

Apache Tomcat servlet container, where the e-commerce application is deployed

Solr server, that provide search and faceting features to the app

In turn, relevant data/logs that needs to be stored include:

Web server logs

Application logs proper

Tomcat technical logs

JMX data

Solr logs

Let’s implement those requirements with the Elastic stack. The target infrastructure could look like this:

Defining the architecture is generally not enough. Unless one work for a dream company that empowers employees to improve the situation on their own initiative (please send it my resume), chances are there’s a need for some estimates regarding the setup of this architecture. And that goes double in case it’s done for a customer company. You might push back, stating the evidence:

Engineers will probably think that managers are asking them to stick their neck out to produce estimates for no real reasons, and push back. But the later will kindly remind the former to "just" base estimates on assumptions, as if it was a real solution instead of plausible deniability. In other words, an assumption is a way to escape blame for a wrong estimate, and yes, it sounds much closer to contract law than to engineering concepts. That means that if any of the listed assumption is not fulfilled, then it’s acceptable for estimates to be wrong. It also implies estimates are then not considered a deadline anymore - which they never were supposed to be in the first place, if only from a semantical viewpoint.

Notwithstanding all that, and for the sake of the argument, let’s try to review possible assumptions pertaining to the proposed architecture that might impact implementation time:

Hardware location: on-premise, cloud-based or a mix of both?

Underlying operating system(s): Nix, Windows, something else, or a mix?

Infrastructure virtualization degree: is the infrastructure physical, virtualized, both?

Co-location of the OS: are there requirements regarding the location of components on physical systems?

Hardware availability: should there be a need to purchase physical machines?

Automation readiness: is there any solution already in place to automate infrastructure management? If not, in how many environments will the implementation need to be setup and if more than 2, will replication be handled manually?

Clustering: is any component clustered? Which one(s)? For the application, is there a session replication solution in place? Which one?

Infrastructure access: needs to be on-site? Needs to get a security token hardware? From software?