Since we care about CI/CD, bulk updates of customer environment, automated deployments, and so on, we’ve had to setup management server and other system resources. Exposing this server to the internet was not the way we wanted to go, so how to access it? There are a couple of options, considered all variations and narrowed down to Jump host and VPN to Azure. Later on we decided to go for VPN server, so I started to dig into available VPN options.

VPN, but not for you!

As I wrote above, we didn’t want to expose anything to the internet, except customer front end(s). Point-to-site VPN was good for both, customer and development team. Doesn’t sound like a problem, right? There is an Azure native service VPN Gateway.

I Did a quick research and found Azure VPN Gateway, MS says I can connect to my infrastructure from “ANYWHERE”:

Well, in Microsoft’s ideal world everybody uses Windows 10 with latest updates on PC/Laptops, Windows mobile on every smartphone and washing machines running Windows Server 2016R2. They simply don’t have a client for non Windows machines that is able to connect to SSTP VPN connection (Azure VPN gateway). I managed to google a couple of options for MacOS, but not those, I would use for production environment.

I'm not trying to advertise but should say a couple of words about the chosen VPN solution, that is OpenVPN Access Server. In my opinion it’s one of the best VPN solutions for production and I’ve convinced a couple of customers to use it. I guess now I deserve a “local distributor reward”, if there is such :)

Application insights

Application consist of Java BackEnd and Angular FrontEnd. They are served by Azure AppService, configured for Java web applications hosting, data is stored in MS SQL DB (Azure SQL).

As I already mentioned at the beginning of this article each customer environment is isolated from others, created in different resource groups and has it’s own SQL server. SQL server firewall is configured to allow connections only from a specific app service (AppService outbound IP addresses whitelisted in SQL server firewall)

E-Mail notifications / SMTP service

The application should have been able to send mail so we had to figure out which SMTP server we could use. Unlike AWS, Azure doesn’t have a native SMTP service, however there is a way: SendGrid.

A detailed how-to is well described here.

Azure customers can unlock 25,000 free emails each month. These 25,000 free monthly emails will give you access to advanced reporting and analytics and all APIs (Web, SMTP, Event, Parse and more). For information about additional services provided by SendGrid, visit the SendGrid Solutions page.

Deployment

One of the main challenges was auto deployment, including: Creation of an environment for specific customer, application deployment, DB schema deployment. To accomplish the goal I used a tool that I’m most comfortable with: Jenkins and configured job for each of the following actions there:

Create Environment

This job creates an environment, based on pre-cooked JSON templates, in Azure to store environment for a specific customer, customer’s name is passed to the job as a parameter. I’m not going to share the code of whole deployment scripts due to security reasons, but I’ll explain what is being done by each deployment script (bash):

Replace placeholders in parameters.json Generate DB password and write it to Azure KeyVault Create Resource Group as a logical unit to store all resources Create CNAME in DNS Zone mycompany.com, basically <customername>.mycompany.com By referring to template.json, create appservice, appservice usage plan, sqlserver with empty database, storage account with a container for backups, which we are going to place there later. Assign additional host name to appservice (the one we created at step 5.) Upload wildcard TLS certificate (*.mycompany.com) to newly created appservice Configure binding, so appservice is accepting requests on HTTPS and my custom domain name Create “Allow” rules in SQL server firewall to allow access only from Appservice outbound IP addresses (reading outbound IP addresses by querying appservice with az cli ).

If you wonder what we do in case we need to deploy a DB schema from jenkins to SQL server: we add management (Jenkins) server IP addresses dynamically and temporarily, before the job is executed and remove it at the end of any DB-related job. Force HTTPS on appservice by uploading this web.config file to FTP endpoint:



<configuration>

<system.webServer>

<rewrite>

<rules>

<!-- BEGIN rule TAG FOR HTTPS REDIRECT -->

<rule name="Force HTTPS" enabled="true">

<match url="(.*)" ignoreCase="false" />

<conditions>

<add input="{HTTPS}" pattern="off" />

</conditions>

<action type="Redirect" url="

</rule>

<!-- END rule TAG FOR HTTPS REDIRECT -->

</rules>

</rewrite>

</system.webServer>

</configuration> BEGIN rule TAG FOR HTTPS REDIRECT https://{HTTP_HOST}/{R:1 }" appendQueryString="true" redirectType="Permanent" /> END rule TAG FOR HTTPS REDIRECT

Deploy DB Schema

Deploying DB schema, creating required tables and so on. Here we have a bunch of SQL scripts and Flyway Migrate (gradle plugin) to execute them and take control over the whole process. It helps us to get rid of unnecessary DB changes, if DB schema version is already the latest available, it just skips everything and reports:

Schema [database_name] is up to date. No migration necessary.

Read DB server address, db name, db user, db password parameters from azure using CLI; Check if current management server is allowed to connect to SQL. Update SQL server firewall if needed; Replace placeholders in flywayMigrate profile with real values, pulled from azure; Execute flywayMigrate with gradle

Build Deploy back end / front end

These two jobs are designed to build and deploy back end and front end.

To build Java Back End we pull a bunch of parameters from azure, include those parameters in application properties. Basically we query azure for things like db connection strings configured at appservice, URL of azure blob storage where application stores some files (e-mail templates, profile pictures and so on) and then just build it with gradle.

I’ve never mentioned anything about deployment. There aren’t many ways to deploy to azure app-service, the only one for us was ordinary FTP deployment, and to make it a bit more secure I’ve used FTPS endpoint instead.

So in case of back end application we have a single WAR file when build is done, we take this file and upload to Tomcat webapp folder via FTPS, the rest is done by Tomcat.

However in case of frontend we have tons of HTML files, scripts, style-sheets, etc. Uploading all those to FTPS was a little problematic so we came to following idea, that might even sound weird, still worked fine for us:

Once npm build is done we compress everything with zip and rename archive to “frontend.war” then upload the single file via FTPS to Tomcat and let it do its job.

Health check: a thing you need!

One day it came to mind that it would be nice to know how much time it takes to deploy front end or back end and also we couldn't live without knowing whether deployment of a new version has been successful or not, we kinda needed some kind of a feedback from our little application. So here’s what I’m doing to make sure our version is deployed.

Front end:

When the application is built I’m injecting the file with current git HEAD to a package that's uploaded to web server later. After deployment I’m CURL-ing URL with this little file every 5 seconds and checking if git HEAD in file matches the one we built right before deployment. Once matched it returns exit code “0” and exit Jenkins job with “SUCCESS”. If git HEAD is different for 10 minutes, the build fails and I get an e-mail from Jenkins that something broke there. Here’s what I’m talking about:

gitHeadId=$(git rev-parse --short HEAD) #Verification if deployment was successful

retry_count=0

until [ $retry_count -gt 600 ] || [ $returned_head = $gitHeadId ] ; do

sleep 5

echo $(( retry_count+=5 ))s since new version FTP upload.

returned_head=$(curl -s --max-time 1 https://${customername}.mycompany.com/githead.txt)

done if [ $returned_head = $gitHeadId ]

then

echo "SUCCESS: New version is deployed, git head $gitHeadId "

else

echo "ERROR: Timeout of $retry_count seconds reached"

exit 1

fi

Back end

This one was tricky, since I cannot just inject file in java back end, well maybe I can but why should I learn how to do it when there is a team of skilled Java developers :). So here I asked the developers team to add “HEALTHCHECK” resource to API, that does the following:

If application is running, can query database and get expected results — it returns JSON with “success”

Returns JSON with current git head

We do basically the same, but a slightly different health check as we did for front end:

gitHeadId=$(git rev-parse --short HEAD)

retry_count=0

until [ $retry_count -gt 600 ] || [ $returned_version = ${customername}_$gitHeadId ] ; do

sleep 5

echo $(( retry_count+=5 ))s since new version upload.

returned_version=$(curl -s --max-time 5

done #Verification if deployment was successfulretry_count=0until [ $retry_count -gt 600 ] || [ $returned_version = ${customername}_$gitHeadId ] ; dosleep 5echo $(( retry_count+=5 ))s since new version upload.returned_version=$(curl -s --max-time 5 https://${customername}.mycompany.com/BE/healthcheck?includeVersions=true | jq -r ".version")done if [ $returned_version = ${customername}_$gitHeadId ]

then

echo "SUCCESS: New version is deployed, version: $returned_version"

else

echo "ERROR: Timeout of $retry_count seconds reached."

exit 1

fi

As you might have noticed I’m using JQ here. At the beginning I tried to have as fewer as possible third party tools, but later on I gave up, since that’s a lightweight tool and it’s the best for parsing JSON data.

Tomcat parallel deployment and clean-up

We’ve made use of parallel deployment in Tomcat to minimize downtime. Still I have to remove old versions after the new one is deployed:

#Remove old wars

echo "Removing old versions"

wars=($(curl -k -l $URL --user "$FTPS_USER":"${FTPS_PASS}" | grep -v ${warFilename} | grep ^backend.*\.war)) n=1

for i in "${wars[@]}"

do

echo "Removing old version: $i" curl -k -v -u "$FTPS_USER":"${FTPS_PASS}" $URL -Q "DELE site/wwwroot/webapps/$i"

n=$(($n+1))

done

Delete environment

This job name is pretty self-explanatory. It removes Resource Group for a specific customer, by doing that all resources in that RG are getting deleted. Additionally it removes DNS record (CNAME) from mycompany.com zone and database admin password from azure key vault.

Re-Key DB

We all know about the importance of changing our passwords from time to time due to security reasons. Same here, it would be nice to get SQL admin password changed from time to time, since in this case we are not talking about change in one single place, it makes sense to write a script that would do all updates for us:

Generate new password, apply changes to SQL server setting Write new value to Azure KeyVault Update application with new settings Restart application Another variation of HealthCheck loop, described above

Backup/Restore Azure appservice and its database

Microsoft offers automated backup of Azure SQL, that is a pretty good option, however we want to run backups on demand, use case: taking a backup every time before a new version deployment, so in case new version has some problems or, god forbid, hidden bugs we can roll-back to a version that was running right before we started an update. And here we’ve tried a couple of different options, let me start from the failed one:

0. FlywayMigrate, SqlCmd (Failed)

We are already familiar with this tool, so I wanted to use it to backup SQL. Not going to be a long story here, Azure SQL does not allow “BACKUP DATABASE” SQL statements. Obviously, situation is the same with another command line tool SqlCmd. Done with this option.

1. Copy to another database

Full copy of DB content to another database running on the same SQL server. Azure CLI helps here:

az sql db copy ...

This option was the easiest to implement (at that stage). But having another DB running in the same SQL is not optimal in terms of resources cost, therefore we’ve come to another solution:

2. Export to Azure Blob storage

This method implies taking a DB backup as a BACPAC and storing it on Azure blob storage, using our favorite Azure CLI:

az sql db export ...

We have been using this method for some time, until one day I noticed that backup, that I started early morning was still running at around 5PM. And it turned out there was a known issue, last reviewed Jun 16, 2014… In short:

This problem occurs when many customers make an import or export request at the same time in the same region. The Azure SQL Database Import/Export Service provides a limited number of Compute virtual machines (VMs) per region to process the import and export operations. The Compute VM is hosted per region to make sure that the import or export avoids cross-region bandwidth delays and charges. If too many requests are made at the same time in the same region, significant delays occur in processing the operations. The time that is required to complete requests can vary from a few seconds to many hours. ~ Microsoft

We can’t afford such delays, so another option popped up.

3. Backup to server’s HDD

In the article about the known issue (link above), they mentioned DACFx API, offering us to use it in our code as a workaround, not the case, we can have only scripts (bash/PowerShell). There is a way to backup (take a snapshot) database to BACPAC file: sqlpackage.exe. This command line tool gets installed automatically when we install MSSQL engine on a Linux server. I don’t really want to have it on my Linux machine, it looks a little bit weird to me. I have a Windows server (UI testing Jenkins slave), so I could use it and had to recall my PowerShell skills, then write some scripts for backup/restore. But first install sqlpackage.exe and Microsoft SQL 2014 Feature Pack. If we skip all error handling, variables definition, comments, etc, the line that does the job looks like this:

.\SqlPackage.exe /Action:Export /SourceDatabaseName:database_name /SourceServerName:"tcp:$db_server,1433" /tf:D:\backups\$customername\databasename.bacpac /su:$dbuser /sp:$dbpassword /p:Storage=File

This method worked much faster than the previous, however I wasn’t happy with storing backups on a local HDD, of course we configured backups of this VHD at Azure, but anyway… So let’s go to the next and final solution.

4. AppService + DB backup to Azure Blob

We can configure backups on Azure Portal in AppService blade. Same can be done with Azure CLI. Using this method we get a backup of appservice files and settings, optionally we can include database to a backup and in our case this option is extremely useful. And few words about the script: first we need to define variables with azure storage account name and access key, generate SAS key for access to blob storage.

#Populate variables with Azure storage details storageAccName="${customername}storage" storageAccKey=$(az storage account keys list --account-name $storageaccname --resource-group ${customername}RG --query [0].[value] --out tsv) #Generate SAS valid for 1 hour

SASkey=$(az storage account generate-sas --services bfqt --resource-types sco --permissions rwdlacup --ip "0.0.0.0-254.254.254.254" --expiry $(date -u -d '59 minutes' +%Y-%m-%dT%H:%MZ) --account-name $storageAccName --account-key $storageAccKey

Generating SAS key (Shared Access Signature) was new to me and I found it a bit tricky when you deal with it for the first time, so I'd like so shed some light on this. First thing to understand it is a temporary access key to access your storage, parameters explanation:

--services bfqt : Means we are generating SAS for Blob, File, Queue, Table services;

--resource-types sco : Specifies that the resource types for which the SAS is valid are Service, Container, and Object. This means that the specified permissions are granted for all appropriate operations for the specified services.

--permissions rwdlacup : Permissings are read, write, delete, list, copy, update, process.

--ip "0.0.0.0-254.254.254.254" : Allow access from all IP addresses. Don't miss this parameter, otherwise you will be gerring "Bad Request" error. There might be a better way to specify this parameter instead of full range, like digging opendns.com or another resource to find out external IP of the server we are working on and then allowing only one address.

--expiry $($date -u -d '59 minutes'+%Y-%m-%dT%H:%MZ) : Set SAS to expire in 1 hour from now, pay attention to date format.

Storage account name and key are already defined above, so we are pretty much done here. Once we have SAS it's time to tell Azure to start backup, assuming we have blob container named “backups”, our commands would look like this:

#Create backup

az webapp config backup create --webapp-name ${appServiceName} --resource-group ${customername}RG --container-url "https://${customername}storage.blob.core.windows.net/backups/?$(echo $SASkey | tr -d '"')&sr=b" --backup-name Backup1 --db-name databasename --db-type SqlAzure --db-connection-string "$(az webapp config connection-string list --resource-group ${customername}RG --name ${appServiceName} | jq -r '.[]."value"."value"')"