The current project that I’m working on has a typical Sitecore scenario, where content is automatically imported from an external system on an hourly basis. In this case, there’s a lot of content and it comes with a LOT of images, and they are stored in the Sitecore database, to the tune of 40GB. Of course, this is Sitecore, so with both master AND web, it is 80GB of blobs stored in SQL. Coupled with this is the tendency of the external system to have small updates, which are duly imported again and the images imported again, whether they need to be or not. Combined with having to publish these items and media items again, it creates a lot of chatter on the database. To alleviate some of the strain, we decided to move all of the master and web blobs to Azure blob storage. The aims of moving the media away from SQL blob storage are:

Reduce pressure on the SQL database: In non-prod environments where DTUs are limited because of cost, reducing the load on SQL will give better performance to end users.

Reduce storage costs: The cost of storing the data in Azure blobs is cheaper than storing them in SQL.

Same or better media load performance: The move to Azure blobs needs to provide at least equivalent to database performance.

Fallback: Because there are existing media stored in the database, the solution should still provide access to them until they can be migrated to Azure.

No changes to functionality for end users.

Looking at what was available online, I found a post from Adam Weber in 2013 which suggested that it should be possible to override the main SqlDataProvider in the config with a custom one. While this works for the customer-facing website on XP 9.0, Sitecore media uploads and publishing etc in the CMS completely ignore it. The next best option was to add an extra provider to the DataProvider stack for the web and master databases. Here’s the solution I implemented…

The nuget packages needed for this project are:

<package id= "Sitecore.Kernel.NoReferences" version= "9.0.180604" targetFramework= "net462" developmentDependency= "true" /> <package id= "WindowsAzure.Storage" version= "8.7.0" targetFramework= "net462" />

Sitecore.Kernel.NoReferences is available from the Sitecore nuget feed. The WindowsAzure.Storage version is older to avoid having to upgrade Newtonsoft.Json that is used elsewhere in the solution (or have multiple versions in the solution) so if you don’t have that dependency you should use a newer version.

The custom DataProvider below overrides methods concerned with CRUDing blobs. You may find that you need to explicitly override some extra DataProvider methods even though they return the same as the base DataProvider class. In my case, not explicitly overriding them meant that the main language variant (english) disappeared from publishing.

using System ; using System.Collections.Generic ; using System.Configuration ; using System.IO ; using Sitecore ; using Sitecore.Collections ; using Sitecore.Data ; using Sitecore.Data.DataProviders ; using Sitecore.Data.Items ; using Sitecore.Data.Templates ; using Sitecore.Globalization ; using Sitecore.Publishing.Pipelines.Publish ; using Sitecore.Workflows ; namespace Feature.AzureBlobStorage.DataProviders { [ UsedImplicitly ] public class DataProviderWithAzureStorage : DataProvider { private SqlServerDataProviderWithAzureStorage GetDataProvider ( CallContext context ) { return new SqlServerDataProviderWithAzureStorage ( ConfigurationManager . ConnectionStrings [ context . DataManager . Database . ConnectionStringName ]. ConnectionString ); } public override bool BlobStreamExists ( Guid blobId , CallContext context ) { var dataProvider = GetDataProvider ( context ); return dataProvider . BlobStreamExists ( blobId , context ); } public override Stream GetBlobStream ( Guid blobId , CallContext context ) { var dataProvider = GetDataProvider ( context ); return dataProvider . GetBlobStream ( blobId , context ); } public override bool RemoveBlobStream ( Guid blobId , CallContext context ) { var dataProvider = GetDataProvider ( context ); return dataProvider . RemoveBlobStream ( blobId , context ); } public override bool SetBlobStream ( Stream stream , Guid blobId , CallContext context ) { var dataProvider = GetDataProvider ( context ); return dataProvider . SetBlobStream ( stream , blobId , context ); } public override bool CleanupDatabase ( CallContext context ) { var dataProvider = GetDataProvider ( context ); return dataProvider . CleanupDatabase ( context ); } // other explicit method overrides. } }

Next, we come to the SqlDataProvider that will override the default Sitecore blob behaviour. In our case, we want to allow a fallback, so the provider works in three modes.

AzureOnly - Only read from Azure blob storage. Only write to Azure blob storage.

DatabaseOnly - Only read from DB blob storage. Only write to DB blob storage.

AzureWithDatabaseFallback - Try reading from Azure storage, then DB storage. Only write to Azure blob storage.

Because the SqlDataProvider only knows the connectionstring, and not the name of it, we need to do a bit of reverse logic to find out which database/blob container we should be talking to. Once we know whether we’re talking to master or web, the provider will connect to the correct Azure blob container.

using System ; using System.Collections.Generic ; using System.Configuration ; using System.Data ; using System.Data.SqlClient ; using System.IO ; using System.Linq ; using Microsoft.WindowsAzure.Storage ; using Sitecore.Collections ; using Sitecore.Configuration ; using Sitecore.Data ; using Sitecore.Data.DataProviders ; using Sitecore.Data.Managers ; using Sitecore.Data.SqlServer ; using Sitecore.Diagnostics ; namespace Feature.AzureBlobStorage.DataProviders { public class SqlServerDataProviderWithAzureStorage : SqlServerDataProvider { private readonly LockSet _blobSetLocks ; private readonly ICloudStorageProvider _azureBlobStorageProvider ; private readonly bool _useAzureStorage ; private readonly bool _useDatabaseStorage ; public SqlServerDataProviderWithAzureStorage ( string connectionString ) : base ( connectionString ) { //work backwards to figure out which container to use. var csName = string . Empty ; //default to web if the connectionstring is empty (happens on CD). if ( string . IsNullOrEmpty ( connectionString ) || connectionString . Equals ( ConfigurationManager . ConnectionStrings [ "web" ]. ConnectionString )) { csName = "web" ; } else if ( connectionString . Equals ( ConfigurationManager . ConnectionStrings [ "master" ]. ConnectionString )) { csName = "master" ; } //Don't use azure blob storage for databases other than master and web. if (! string . IsNullOrEmpty ( csName )) { _useAzureStorage = AzureBlobStorageSettings . StorageMode == AzureBlobStorageSettings . BlobStorageMode . AzureOnly || AzureBlobStorageSettings . StorageMode == AzureBlobStorageSettings . BlobStorageMode . AzureWithDatabaseFallback ; _useDatabaseStorage = AzureBlobStorageSettings . StorageMode == AzureBlobStorageSettings . BlobStorageMode . AzureWithDatabaseFallback || AzureBlobStorageSettings . StorageMode == AzureBlobStorageSettings . BlobStorageMode . DatabaseOnly ; } else { _useAzureStorage = false ; _useDatabaseStorage = true ; } _blobSetLocks = new LockSet (); _azureBlobStorageProvider = new AzureBlobStorageProvider ( AzureBlobStorageSettings . ConnectionString , AzureBlobStorageSettings . GetContainerName ( csName )); } public override Stream GetBlobStream ( Guid blobId , CallContext context ) { Assert . ArgumentNotNull ( context , "context" ); if ( _useAzureStorage ) { var memStream = new MemoryStream (); _azureBlobStorageProvider . Get ( memStream , blobId . ToString ()); if ( memStream . Length > 0 ) { return memStream ; } } //try getting the media from SQL if ( _useDatabaseStorage ) { return base . GetBlobStream ( blobId , context ); } return null ; } public override bool BlobStreamExists ( Guid blobId , CallContext context ) { if ( _useAzureStorage ) { var exists = _azureBlobStorageProvider . Exists ( blobId . ToString ()); if ( exists ) { return true ; } } //try getting the media from SQL if ( _useDatabaseStorage ) { return base . BlobStreamExists ( blobId , context ); } return false ; } public override bool RemoveBlobStream ( Guid blobId , CallContext context ) { if ( _useAzureStorage && _useDatabaseStorage && ! base . BlobStreamExists ( blobId , context ) && ! BlobStreamExists ( blobId , context )) { return false ; } bool deleted = false ; //delete azure storage blob. if ( _useAzureStorage && BlobStreamExists ( blobId , context )) { deleted = true ; _azureBlobStorageProvider . Delete ( blobId . ToString ()); } //delete database record. if ( _useDatabaseStorage && base . BlobStreamExists ( blobId , context )) { base . DeleteItem ( new ItemDefinition ( ID . Parse ( blobId ), "" , new ID (), new ID ()), context ); deleted = true ; } return deleted ; } public override bool SetBlobStream ( Stream stream , Guid blobId , CallContext context ) { //if azure storage is available, try storing there first. if ( _useAzureStorage ) { lock ( _blobSetLocks . GetLock ( blobId )) { try { _azureBlobStorageProvider . Put ( stream , blobId . ToString ()); //insert an empty reference to the BlobId into the SQL Blobs table, this is basically to assist with the cleanup process. //during cleanup, it's faster to query the database for the blobs that should be removed as opposed to retrieving and parsing a list from Azure. const string cmdText = "INSERT INTO [Blobs]([Id], [BlobId], [Index], [Created], [Data]) VALUES(NewId(), @blobId, @index, @created, @data)" ; using ( var connection = new SqlConnection ( Api . ConnectionString )) { connection . Open (); var command = new SqlCommand ( cmdText , connection ) { CommandTimeout = ( int ) CommandTimeout . TotalSeconds }; command . Parameters . AddWithValue ( "@blobId" , blobId ); command . Parameters . AddWithValue ( "@index" , 0 ); command . Parameters . AddWithValue ( "@created" , DateTime . UtcNow ); command . Parameters . Add ( "@data" , SqlDbType . Image , 0 ). Value = new byte [ 0 ]; command . ExecuteNonQuery (); } } catch ( StorageException ex ) { Log . Error ( $"Upload of blob with Id { blobId } failed." , ex , this ); throw ; } } return true ; } return base . SetBlobStream ( stream , blobId , context ); } protected override void CleanupBlobs ( CallContext context ) { Factory . GetRetryer (). ExecuteNoResult (() => DoCleanup ( context )); } public override bool CleanupDatabase ( CallContext context ) { var result = base . CleanupDatabase ( context ); CleanupBlobs ( context ); return result ; } //the majority of the code for the CleanupBlobs process is from the default SQL Server Data Provider protected void DoCleanup ( CallContext context ) { IEnumerable < Guid > blobsToDelete ; using ( var transaction = Api . CreateTransaction ()) { const string blobsInUseTempTableName = "#BlobsInUse" ; Api . Execute ( "CREATE TABLE {0}" + blobsInUseTempTableName + "{1} ({0}ID{1} {0}uniqueidentifier{1})" , new object [ 0 ]); var blobsInUse = GetBlobsInUse ( context . DataManager . Database ); foreach ( var blobReference in blobsInUse ) { Api . Execute ( "INSERT INTO {0}" + blobsInUseTempTableName + "{1} VALUES ({2}id{3})" , new object [] { "id" , blobReference }); } blobsToDelete = GetUnusedBlobs ( "#BlobsInUse" ); const string sql = " DELETE\r

FROM {0}Blobs{1}\r

WHERE {0}BlobId{1} NOT IN (SELECT {0}ID{1} FROM {0}" + blobsInUseTempTableName + "{1})" ; Api . Execute ( sql , new object [ 0 ]); Api . Execute ( "DROP TABLE {0}" + blobsInUseTempTableName + "{1}" , new object [ 0 ]); transaction . Complete (); } foreach ( var blobId in blobsToDelete ) { if ( _useAzureStorage ) { _azureBlobStorageProvider . Delete ( blobId . ToString ()); } base . RemoveBlobStream ( blobId , context ); } } //note: items in the recycle bin are technically still referenced/in use, so be sure to empty the recycle bin before attempting clean up blobs protected IEnumerable < Guid > GetBlobsInUse ( Database database ) { var tables = new [] { "SharedFields" , "UnversionedFields" , "VersionedFields" , "ArchivedFields" }; var blobsInUse = new List < Guid >(); foreach ( var template in TemplateManager . GetTemplates ( database ). Values ) { foreach ( var field in template . GetFields ()) { if (! field . IsBlob ) continue ; foreach ( var sql in tables . Select ( table => "SELECT DISTINCT {0}Value{1}\r

FROM {0}" + table + "{1}\r

WHERE {0}FieldId{1} = {2}fieldId{3}\r

AND {0}Value{1} IS NOT NULL \r

AND {0}Value{1} != {6}" )) { using ( var reader = Api . CreateReader ( sql , new object [] { "fieldId" , field . ID })) { while ( reader . Read ()) { var id = Api . GetString ( 0 , reader ); if ( id . Length > 38 ) { id = id . Substring ( 0 , 38 ); } ID parsedId ; if ( ID . TryParse ( id , out parsedId )) { blobsInUse . Add ( parsedId . Guid ); } } } } } } return blobsInUse ; } protected virtual IEnumerable < Guid > GetUnusedBlobs ( string blobsInUseTempTableName ) { var unusedBlobs = new List < Guid >(); //this database call is dependent on the #BlobsInUse temporary table var sql = "SELECT {0}BlobId{1}\r

FROM {0}Blobs{1}\r\b WHERE {0}BlobId{1} NOT IN (SELECT {0}ID{1} FROM {0}" + blobsInUseTempTableName + "{1})" ; using ( var reader = Api . CreateReader ( sql , new object [ 0 ])) { while ( reader . Read ()) { var id = Api . GetGuid ( 0 , reader ); unusedBlobs . Add ( id ); } } return unusedBlobs ; } } }

Below is the AzureBlobStorageProvider that provides communication with Azure Blobs (or the storage emulator locally). Nothing fancy, just passing data to and from Azure.

using System.IO ; using System.Linq ; using Microsoft.WindowsAzure.Storage ; using Microsoft.WindowsAzure.Storage.Blob ; namespace Feature.AzureBlobStorage.DataProviders { public class AzureBlobStorageProvider { protected CloudStorageAccount StorageAccount { get ; set ; } protected CloudBlobClient BlobClient { get ; set ; } protected CloudBlobContainer BlobContainer { get ; set ; } public AzureBlobStorageProvider ( string blobStorageConnectionString , string storageContainerName ) { StorageAccount = CloudStorageAccount . Parse ( blobStorageConnectionString ); BlobClient = StorageAccount . CreateCloudBlobClient (); BlobContainer = CreateBlobContainer ( BlobClient , storageContainerName ); } public void Put ( Stream stream , string blobId ) { // Use BlockBlob to store media assets // If blob with such Id already exists, it will be overwritten var blob = BlobContainer . GetBlockBlobReference ( blobId ); // Set blob streaming chunks if file larger than value specified in SingleBlobUploadThresholdInBytes setting. var streamWriteSizeBytes = AzureBlobStorageSettings . StreamWriteSizeInBytes ; if ( streamWriteSizeBytes . HasValue && streamWriteSizeBytes . Value > 0 ) { blob . StreamWriteSizeInBytes = streamWriteSizeBytes . Value ; } var requestProperties = GetRequestProperties (); // Some streams need to be rewound. if ( stream . CanSeek ) { stream . Position = 0 ; } blob . UploadFromStream ( stream , options : requestProperties . Item1 , operationContext : requestProperties . Item2 ); } public void Get ( Stream target , string blobId ) { var blob = GetBlobReference ( blobId ); if ( blob . Exists ()) { var requestProperties = GetRequestProperties (); blob . DownloadToStream ( target : target , options : requestProperties . Item1 , operationContext : requestProperties . Item2 ); } } public bool Delete ( string blobId ) { var blob = GetBlobReference ( blobId ); var requestProperties = GetRequestProperties (); var success = blob . DeleteIfExists ( options : requestProperties . Item1 , operationContext : requestProperties . Item2 ); return success ; } public bool Exists ( string blobId ) { var blob = BlobContainer . GetBlobReference ( blobId ); return blob . Exists (); } internal virtual CloudBlob GetBlobReference ( string blobId ) { return BlobContainer . GetBlobReference ( blobId ); } protected virtual System . Tuple < BlobRequestOptions , OperationContext > GetRequestProperties () { var requestOptions = new BlobRequestOptions () { MaximumExecutionTime = AzureBlobStorageSettings . MaximumExecutionTime , ParallelOperationThreadCount = AzureBlobStorageSettings . ParallelOperationThreadCount , RetryPolicy = AzureBlobStorageSettings . RetryPolicy , ServerTimeout = AzureBlobStorageSettings . RequestServerTimeout , SingleBlobUploadThresholdInBytes = AzureBlobStorageSettings . SingleBlobUploadThresholdInBytes , StoreBlobContentMD5 = AzureBlobStorageSettings . StoreBlobContentMD5 , UseTransactionalMD5 = AzureBlobStorageSettings . UseTransactionalMD5 }; var operationContext = new OperationContext () { LogLevel = AzureBlobStorageSettings . OperationContextLogLevel }; return System . Tuple . Create ( requestOptions , operationContext ); } private CloudBlobContainer CreateBlobContainer ( CloudBlobClient blobClient , string storageContainerName ) { var container = blobClient . GetContainerReference ( storageContainerName ); container . CreateIfNotExists (); return container ; } } }

The last chunk of code is the settings, similar to the way Adam Weber arranged things.

using System ; using Microsoft.WindowsAzure.Storage ; using Microsoft.WindowsAzure.Storage.RetryPolicies ; namespace Feature.AzureBlobStorage { public static class AzureBlobStorageSettings { public enum BlobStorageMode { AzureOnly , DatabaseOnly , AzureWithDatabaseFallback } public static BlobStorageMode StorageMode => ( BlobStorageMode ) Enum . Parse ( typeof ( BlobStorageMode ), Sitecore . Configuration . Settings . GetSetting ( "Feature.AzureBlobStorage.StorageMode" )); public static string ConnectionString => Sitecore . Configuration . Settings . GetSetting ( "Feature.AzureBlobStorage.ConnectionString" ); public static string GetContainerName ( string suffix ) { return Sitecore . Configuration . Settings . GetSetting ( $"Feature.AzureBlobStorage.ContainerName- { suffix . ToLowerInvariant ()} " ); } // Azure default: null // When NULL, it's set to 5 minutes public static TimeSpan ? RequestServerTimeout => GetNullableTimeSpan ( "Feature.AzureBlobStorage.RequestServerTimeout" , TimeSpan . FromMinutes ( 5.0 )); // Max MaximunExecutionTime is TimeSpan.FromDays(24.0). // Default: null public static TimeSpan ? MaximumExecutionTime => GetNullableTimeSpan ( "Feature.AzureBlobStorage.MaximumExecutionTime" , TimeSpan . FromDays ( 24.0 )); // Default: 1 public static int ParallelOperationThreadCount => Sitecore . Configuration . Settings . GetIntSetting ( "Feature.AzureBlobStorage.ParallelOperationThreadCount" , 1 ); // Default: NoRetry public static IRetryPolicy RetryPolicy => GetRetryPolicy ( Sitecore . Configuration . Settings . GetSetting ( "Feature.AzureBlobStorage.RetryPolicy" , "NoRetry" ), TimeSpan . FromSeconds ( 3 ), 3 ); // Default: 32 MB public static long ? SingleBlobUploadThresholdInBytes => Sitecore . Configuration . Settings . GetIntSetting ( "Feature.AzureBlobStorage.SingleBlobUploadThresholdInBytes" , 32 * 1024 * 1024 ); // Default: false public static bool UseTransactionalMD5 => Sitecore . Configuration . Settings . GetBoolSetting ( "Feature.AzureBlobStorage.UseTransactionalMD5" , false ); public static bool StoreBlobContentMD5 => Sitecore . Configuration . Settings . GetBoolSetting ( "Feature.AzureBlobStorage.StoreBlobContentMD5" , false ); // Default: 4 MB // Allowed between 16 KB and 4 MB public static int ? StreamWriteSizeInBytes => GetNullableInt ( "Feature.AzureBlobStorage.StreamWriteSizeInBytes" , 4 * 1024 * 1024 ); // Default: Verbose public static LogLevel OperationContextLogLevel => GetLogLevel ( Sitecore . Configuration . Settings . GetSetting ( "Feature.AzureBlobStorage.OperationContext.LogLevel" , "Verbose" )); internal static TimeSpan ? GetNullableTimeSpan ( string settingName , TimeSpan defaultValue ) { var settingValue = Sitecore . Configuration . Settings . GetTimeSpanSetting ( settingName , defaultValue ); if ( settingValue . Equals ( TimeSpan . Parse ( "00:00:00" ))) { return null ; } return settingValue ; } internal static int ? GetNullableInt ( string settingName , int defaultValue ) { int ? settingValue = Sitecore . Configuration . Settings . GetIntSetting ( "Feature.AzureBlobStorage.StreamWriteSizeInBytes" , defaultValue ); return settingValue > 0 ? settingValue : null ; } internal static IRetryPolicy GetRetryPolicy ( string policyName , TimeSpan deltaBackoff , int maxAttempts ) { IRetryPolicy policy ; switch ( policyName ) { case "Exponential" : policy = new ExponentialRetry ( deltaBackoff , maxAttempts ); break ; case "Linear" : policy = new LinearRetry ( deltaBackoff , maxAttempts ); break ; case "NoRetry" : policy = new NoRetry (); break ; default : policy = new NoRetry (); break ; } return policy ; } internal static LogLevel GetLogLevel ( string logLevelName ) { LogLevel logLevel ; bool success = Enum . TryParse ( logLevelName , true , out logLevel ); return success ? logLevel : LogLevel . Verbose ; } } }

And lastly the config file with the required settings. Note the role requirement for the master database, since master might not (should not) be available from the delivery server.

<? xml version = "1.0" encoding = "utf-8" ?> < configuration xmlns : patch = "http://www.sitecore.net/xmlconfig/" xmlns : role = "http://www.sitecore.net/xmlconfig/role/" > < sitecore > < dataProviders > < azureStorageDataProvider type = "Feature.AzureBlobStorage.DataProviders.DataProviderWithAzureStorage, Feature.AzureBlobStorage" /> </ dataProviders > < databases > < database id = "master" role : require = "Standalone or ContentManagement" > < dataProviders hint = "list:AddDataProvider" > < dataProvider patch : before = "*[1]" ref = "dataProviders/azureStorageDataProvider" /> </ dataProviders > </ database > < database id = "web" > < dataProviders hint = "list:AddDataProvider" > < dataProvider patch : before = "*[1]" ref = "dataProviders/azureStorageDataProvider" /> </ dataProviders > </ database > </ databases > < settings > <!-- MEDIA - AZURE BLOB STORAGE - STORAGEMODE Controls the behaviour of Sitecore ' s reading and writing blobs . AzureOnly - Only read from Azure blob storage . Only write to Azure blob storage . DatabaseOnly - Only read from DB blob storage . Only write to DB blob storage . AzureWithDatabaseFallback - Try reading from Azure storage , then DB storage . Only write to Azure blob storage . --> < setting name = "Feature.AzureBlobStorage.StorageMode" value = "AzureWithDatabaseFallback" /> < setting name = "Feature.AzureBlobStorage.ConnectionString" value = "UseDevelopmentStorage=true" /> < setting name = "Feature.AzureBlobStorage.ContainerName-master" value = "sitecore-media-library-master" /> < setting name = "Feature.AzureBlobStorage.ContainerName-web" value = "sitecore-media-library-web" /> < setting name = "Feature.AzureBlobStorage.RequestServerTimeout" value = "00:20:00" /> < setting name = "Feature.AzureBlobStorage.MaximumExecutionTime" value = "00:10:00" /> < setting name = "Feature.AzureBlobStorage.ParallelOperationThreadCount" value = "4" /> < setting name = "Feature.AzureBlobStorage.RetryPolicy" value = "NoRetry" /> < setting name = "Feature.AzureBlobStorage.SingleBlobUploadThresholdInBytes" value = "33554432" /> < setting name = "Feature.AzureBlobStorage.UseTransactionalMD5" value = "false" /> < setting name = "Feature.AzureBlobStorage.StoreBlobContentMD5" value = "false" /> < setting name = "Feature.AzureBlobStorage.StreamWriteSizeInBytes" value = "0" /> < setting name = "Feature.AzureBlobStorage.OperationContext.LogLevel" value = "Verbose" /> </ settings > </ sitecore > </ configuration >

After putting this into testing and migrating all of the blobs, we found that the authoring interface is more responsive because there is less pressure on the database, particularly when using the media library or the experience editor.

Of course, Sitecore have implemented this natively in XP 9.3 while I was writing this provider for 9.0, so there’s no need to go to the trouble if you’re using that version or greater.