T-SQL Tuesday #196: Career Risks

T-SQL Tuesday Logo

Welcome back to another edition of T-SQL Tuesday! For edition #196, this month’s host James Serra asked that we participants, write about a career risk that we’ve taken.

I went into Sales.

Sales? Eww, No Way…

When I was a little child, apparently I had said that I wanted to be a salesman when I grew up. This is most likely because my father ran a small b2b business and was essentially a salesman. But then as I got older, I became a hardcore techie nerd and my career took me to the realm of SQL Server as a database developer and DBA. And I loved it… but, things did stagnate for me.

Then, about 9 years ago, things took an interesting twist for me. I got a Twitter DM out of the blue, from my friend Scott Fallen, asking if I was interested in joining his team at SentryOne as a sales engineer. And I’ll be honest – I had ZERO IDEA what a sales engineer was. I never knew that such a role or career path existed!

Sales is Scummy… or Is It?

I’ll be brutally honest. For most of my career, I thought very poorly of sales. I was biased because I had too many negative experiences with sales who were simply chasing profit… aka the shady used car salesperson persona. But I gave SentryOne a shot because I knew their company culture well – one of building partnerships and supporting the SQL Server community. And I’m glad I did, because the SentryOne culture absolutely was embraced by their sales org.

It was this different sales culture that changed my perspective on Sales as a whole. I learned how sales can be a jointly beneficial engagement. Yes, on the seller side, I am motivated to sell product or services to make money. BUT I am also there to HELP YOU… help you overcome a challenge or obstacle. Of course the nuance here is that I am not giving away free consulting, but rather showing you how your challenges can be solved with the product or services that my company is selling.

The other important key is the relationship. Orgs with a positive sales culture believe in building long term partnerships and relationships. The proverbial scummy used car salesman is motivated by transactions – they just want to sell as many used cars as they can. But an org with a positive sales, wants to build and foster relationships for the long haul.

Sales Engineering is a Legitimate Career Path

In the past 9 years since, I’ve come to love Sales Engineers, aka “SE.” Titles often involve variations of Sales Engineer, Solutions Engineer, and Solutions Architect, but they all fundamentally involve the same thing… becoming a trusted technical advisor, to advise customers on how to solve challenges, using products and services that your company happens to sell.

And I’ve come to learn that those of us with prior technical career experience, can do really well in this career path pivot, because we bring our “in the trenches” experience to the table, to help relate with customer challenges and offer practical solutions to their headaches.

I’ll say that when I first started as an SE at SentryOne, my first few months, I thought I made the biggest mistake of my life. But then one day, it sort of clicked, and I haven’t looked back since. It was a big risk for me… stepping out of my comfort zone of being a technical data professional in the trenches. But looking back now… the skills I’ve learned… the perspective I now have… have helped me grow exponentially in ways I never could have imagined 10 years ago.

And if you ever want to learn more about being an SE, feel free to reach out. I am an active mentor and love helping people decide if this career path is right for them and how they can continue to grow in it and learn the lessons that I’ve learned.

Thanks for reading!

Taking Snapshots of Databases on VMFS Datastores

Every now and then, I’ve talked about how one can leverage storage based snapshots on Pure Storage, against SQL Server databases, to do cool things like orchestrate object level restores. And we have a number of example scripts published out on our Github, that you can use as building blocks to orchestrate time-saving workflows, like the old-fashioned restore Prod backups a bazillion times to non-Prod.

Almost all of those examples had one thing in common: they were written with vVols in mind. Why? Because storage array snapshots are volume based and vVols are awesome for a number of reasons. But then Broadcom decided to deprecate them in a future VCF release (note the article no longer explicitly states VCF 9.1 like it did originally), but that’s a different topic for a different time.

TL;DR – Where Can I Get Code Examples?

Back to VMFS & VMDK files

Many folks who virtualize their SQL Servers on VMware choose to rely on VMFS datastores with VMDK files underneath. And that poses an interesting challenge when it comes to taking storage array snapshots.

Remember how I said that our snapshots are volume based? That means that a storage volume corresponds to a VMware VMFS datastore. And what’s inside a VMFS datastore? A filesystem containing a bunch of VMDK files, each of those representing a virtual disk that is presented to a virtual machine. Thus, if one takes a snapshot of a datastore, one could be snapping a dozen SQL Server VMs and ALL of their underlying disks! Fortunately FlashArray is space efficient because it is globally data deduping, but this nuance still presents a headache when one just wants to clone some SQL Server databases from one machine to another.

With vVols, RDMs, and bare metal, we can isolate our snapshot workflows to just snap volume(s) containing our databases. If you’ve segregated out your user databases into individual volumes, like D:\ and L:\ for data files and log files respectively, that means we can skip your C:\ with OS, T:\ with TempDB, S:\ with system databases, M:\ with miscellaneous, etc.

To accomplish the same with a VMFS datastore, we need to work with the underlying VMDK files to achieve virtual machine disk level granularity.

Explaining the Workflow

First, if you want to follow along in my example code, hop over to our Github and you can pull up the Powershell script to follow along. Each discrete step is commented so I’ll just be giving a higher-level overview here.

In this example scenario, there’ll be a Production SQL Server (source) and a non-Production SQL Server (target). The source will reside on one VMFS datastore and the target will reside on a different datastore. In this example, it will also reside on a different array, to showcase asynchronous snapshot replication.

Like our normal workflow, we will take a crash-consistent snapshot of the source datastore, then we will clone it into a new volume on the array. That new volume will then be attached/presented to the ESXi host that our target SQL Server VM resides on. Next (after prepping the SQL & Windows layers), we will take the VMDK file(s) that contain our database(s) that we want to refresh, and detach and discard them from the target VMFS datastore. Don’t panic! Remember, we don’t need them anymore because we’re refreshing them with newer cloned ones from our source! After discarding the old VMDKs, we then connect up the VMDK files that we want, that are residing in the cloned datastore. Once that’s done, we can bring our disks and databases back online in Windows and SQL Server, and work can resume on the target SQL Server instance.

But we’re not quite done. At this point, the target SQL Server VM now has two datastores attached to it: the original datastore it resided on plus the newly cloned datastore with the newly cloned VMDKs from our source. We don’t want to leave those VMDK files there. This is where we leverage VMware Storage vMotion! We’ll conduct an online storage vMotion operation that’ll swing the VMDK files over from the cloned datastore to the primary datastore. Even better, VMware should leverage XCOPY under the covers, meaning the FlashArray won’t actually copy the VMDK files, but just update pointers, which’ll save a tremendous amount of time! Once the Storage vMotion is completed, we will go through the steps to disconnect then discard the cloned datastore because we no longer need it.

Voila, now we’re done!

From an operational standpoint, this will take longer than “just a few seconds,” mostly because of the new VMware steps involved, like re-scanning storage HBAs (twice). But all in all, it should only take about a minute or so, end to end. That’s still a tremendous time savings vs a full copying of a backup file to your non-prod environment, then executing a traditional restore.

What about T-SQL Snapshot Backup?

Great question! T-SQL Snapshot Backup is one of my most favorite features in SQL Server so I also put together an example that leverages that capability with a VMFS datastore! You can find it in the Point in Time Recovery – VMFS folder! It’s fundamentally the same thing as before, but with the extra VMFS/VMDK steps like what I outlined above.

Thanks for reading – hope you enjoy your time saved by using FlashArray snapshots!

Hyper-V: Cluster Shared Volumes, SQL Server, & FlashArray Snapshots

Many (arguably most) are currently evaluating hypervisor alternatives to VMware, and one obvious contender is obviously Microsoft Hyper-V. And if you’re a Pure Storage customer today, one thing you may be wondering, is whether you can keep utilizing storage-array snapshots to do things like replicate SQL Server data from one VM to another. The goods news is yes, absolutely! Let’s take a look at a common scenario as an example.

TL;DR – Show Me the Code!

A PowerShell example to orchestrate the above is now available up on Pure’s Github in the SQL Server Scripts repository. In addition to having the proper setup detailed below, read the README file’s Important Usage Notes section which will outline other pre-requisites.

If you just want to take Hyper-V CSV snapshots, without SQL Server involved, check out this Github repository, where I’ve shared a PowerShell solution. This is particularly useful if you want to clone an entire VM and present a second iteration of it back to a Hyper-V cluster.

And to see this in action, watch this narrated demo recording!

Updating Non-Prod Databases with a New Prod Copy

One extremely common task and time sink for DBAs is needing to take a backup of a production database and restore it to a non-production environment. Because it is a size of data operation, it can be quite time consuming. To help with that, Pure offers solutions that I’ve showcased, that leverage FlashArray snapshots, to orchestrate such workflows nearly instantaneously. But because (anecdotally) ~90% of all SQL Servers are on VMware, our examples were written for VMware. So now I want to show you how to do it in Hyper-V.

Hyper-V: Key Tidbits Overview

In Hyper-V, virtual disks take the form of VHDX files that reside on a Cluster Shared Volume (CSV) (or generally pre-Windows Server 2012, VHD’s that reside on a Clustered Disk). While CSVs were around in Windows Server 2008R2, they only really became useful and relevant in their second iteration, released with Windows Server 2012 and this linked blog has a fantastic deep dive into the CSV stack.

If you’re familiar with VMware, then I would say that a CSV is somewhat analogous to a VMFS datastore. A VMFS datastore contains one or more VMDK files, with each file representing a virtual disk for a virtual machine. With Hyper-V, each VHDX file is like a VMDK file.

So when you take a snapshot on FlashArray, you’re taking a snapshot of the entire volume – the CSV file in Hyper-V or the VMFS datastore in VMware (unless you’re using vVols – a separate conversation for another time).

The Setup & Workflow

For this, imagine that you have two SQL Servers: SQL-Prod and SQL-NonProd. These are Hyper-V VMs and may reside on the same CSV or different CSVs, it does not really matter. The idea is that we do not want to snapshot the ENTIRE VM, but instead isolate 1 or more SQL Server user databases, to clone from Prod to non-Prod.

So what you’ll do is create two more CSV’s, one for each SQL Server VM. We’ll call them CSV-UserDB-Prod and CSV-UserDB-NonProd for sake of discussion. Inside each CSV, will be the VHDX file(s) that correspond to the virtual disks that contain your user database data files and log files. If you have say, a D:\ drive for data and L:\ drive for log, then you should expect to have two VHDX files in your CSV.

The workflow is that we will then be taking a snapshot of just CSV-UserDB-Prod, and cloning that snapshot and overlaying CSV-UserDB-NonProd, thus updating the underlying data and files. Production will not be interrupted by at all by the snapshot, but because these are volume level operations, non-Prod will be disrupted.

Thanks for reading – happy snapshotting!

Hyper-V Cluster Shared Volumes + Write Caching

This blog post will probably never apply to anyone except me, but I’m writing it anyway.

Let’s say you have some VMware VMs… and decide to run Hyper-V hosts/cluster within those VMware VMs. And let’s say you start running into some headaches, taking crash-consistent snapshots and those snapshots not coming back in a consistent state as you would expect. That’s been my headache recently.

Write Caching?

So there’s several different layers of caching that can come into play, as documented here: “Hyper-V storage: Caching layers and implications for data consistency.” So I searched high and low, to try and figure out if I had unexpected write caching set up on my Cluster Shared Volumes. Looking at Disk Management -> Properties (of a CSV), one would normally see this:

Write-caching policy -> Enable write caching on the device

However… that is what one would see IF their respective Hyper-V host was on a bare-metal server. That’s NOT what one sees though, if the Hyper-V host resides on a VMware VM backed by a vVol (Virtual Volume). THIS is what you get…

Note that there are TWO policies now – Removal Policy

Removal Policy?! Where did that come from? Doing some digging, that is normally only present for removable storage like USB drives. And look at the default setting’s description… “enables write caching in Windows…”

So I disabled that on each of my Cluster Shared Volumes, bounced both Hyper-V hosts (not 100% sure if it was required but did so just to be sure), and HURRAY – my crash consistent snapshot code worked as expected!!!

One reason why this was so frustrating is that I was not doing most of this investigative work via the GUI, but rather via PowerShell commands. So I was digging around things like Cluster Parameters and turning CsvEnableBlockCache and CsvEnforceWriteThrough on and off (https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs & https://learn.microsoft.com/en-us/previous-versions/windows/desktop/mscs/physical-disk-csvenforcewritethrough). But nuking this Removal Policy wound up doing the trick for me.

TL;DR

If you’re trying to disable any and all write caching, for whatever reason, be sure to check if Windows thinks the disk is removable media and has a Removal Policy applied (that also enables write caching).

Thanks for reading.

Combat Database Bloat with Data Virtualization

One of SQL Server 2022’s new features is something called Data Virtualization. It enables T-SQL to directly query files that reside in Azure object storage or S3-compatible object storage. In my opinion, since SQL Server 2022’s release, it’s one of those underrated capabilities that I think many have glossed over. But I strongly believe that it is insanely useful and you should take a few minutes to learn more!

Data Bloat Anyone?

Ask yourself, do you have a database that has a large volume of that that will never change again? Sales order history data is a generic but very relatable example. Once an order is placed, it might be amended but after a few months, it’s highly probable that it’ll never change again. We’ll pretend for the sake of our discussion that after 3 months, that data is considered set in stone.

Can We PLEASE Archive/Delete That Old Data?

Who has begged their business colleagues if they could de-bloat a database by archiving off and deleting old, ancient data from a primary production database? I see your raised hands and raise mine too. There’s many practical reasons to de-bloat a database, from performance reasons to maintenance reasons. But then there’s also the business folks that say “no we can’t because we MIGHT need to report on it.” Might… So like my stubborn beer belly fat, that data bloat remains.

Meet Parquet (… not butter)

We should all know what a Comma Separated Values (CSV) flat file. Because of its flat, singular nature, it’s not very efficient to query because the entirety of the file must be scanned, consumed, processed, etc. But what if we had a different file format whose entirety did not need to be fully processed when querying? Wouldn’t it be awesome to have a simpler flat file format, that could offer column elimination, row elimination, and compression for our data and queries? Let me introduce you to Parquet.

Pretend your sales order history table has 30 different columns of data. And let’s say you were to export all of your sales order history into individual CSV’s per calendar year. If you wanted to query say, all sales in May, June, & July of 2021 and get a SUM of total sales, you’d have to consume the entirety of the 2021 CSV file. But with a Parquet file, metadata is utilized to enable you to zero in on rows for May, June, and July only. And instead of all 30 columns, you can just retrieve sales date and sales amount ONLY, and not consume the other 28 columns! That amounts to a tremendous savings from a workload perspective!

Data Virtualization

So how does SQL Server 2022 come into play here? First, we create something called an External Data Source which in this case will point to S3 object storage somewhere. Then we will combine it with creating an External File Format for our Parquet file. The third piece of the solution is Create External Table as SELECT (aka CETAS).

The end result is that each Parquet file will now become available to us within T-SQL as an External Table entity. So I can write T-SQL to query a Parquet file and my code will look identical.

CREATE EXTERNAL TABLE parquet.SalesOrderHistory_2021
WITH (
	LOCATION = '/SalesOrderHistory_2021.parquet', 
	DATA_SOURCE = cetas_demo, 
	FILE_FORMAT = parquet_file_format_object
)
AS 
SELECT 
	[SalesID], 
	[SalesDate],
	[ProductID],
	[Email],
	[PhoneNumber],
	[OtherColumns]
FROM parquet.SalesOrderHistory_2021;
GO

SELECT SalesID, SalesDate
FROM parquet.SalesOrderHistory_2021
WHERE SalesDate >= '2021-05-01' AND SalesDate < '2021-08-01'

Now you may have noticed that we have a singular table for 2021 data, or 1 table per Parquet file. You might be thinking, will that be a problem if I have many years worth of data, thus many Parquet files? This is where another old T-SQL trick comes into play – partitioned views. I can have multiple Parquet files virtualized via multiple External Tables, but overlay them all with a single partitioned view, to then enable me to query everything as I once did.

CREATE OR ALTER VIEW dbo.SalesOrderHistory
AS
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2020
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2021
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2022
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2023
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2024
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM dbo.SalesOrderHistory_Current
GO

SELECT SalesID, SalesDate
FROM dbo.SalesOrderHistory
WHERE SalesDate >= '2021-05-01' AND SalesDate < '2021-08-01'

Be sure to look closely at the final SELECT in the partitioned view. That’s a reference to a traditional SQL Server table! What this means is that instead of having a dbo.SalesOrderHistory table that contains everything including new orders, I’ve physically partitioned the data out into 5 Parquet files, virtualized via External Tables, AND still included all current sales order data as well. So existing T-SQL queries against dbo.SalesOrderHistory (before a Parquet split) has no idea and does NOT need to be changed!

You Mentioned Azure & S3… That Means I Need to Cloud, Right?

Whenever people (myself included, for the longest time) hear “S3” and/or “object storage,” they typically mentally default to AWS S3 Object Storage. But let’s break that apart. In brief, “object storage” is essentially a storage architecture geared towards unstructured data like images, videos, and documents. “S3” is Amazon’s “Simple Storage Service” which I’d loosely describe as Amazon’s implementation approach for object storage. The good news here, is that you’re not just stuck with Amazon. S3-compatible object storage is also a thing. This is where other vendors have their own object storage implementations but leverage the Amazon S3 API.

But my data is on-prem and staying that way. This is where another common misconception comes in. Because “Amazon S3 object storage” has become such a common term, many do not realize that S3-compatible on-prem storage is a thing! And yes, this is where Pure Storage comes into the picture with our FlashBlade series. You can have all of this awesomeness on-prem without having to consider compromising with the cloud.

I Want To See This in Action!

Hopefully you’re chomping at the bits to give this a go. I’d like to leave you with three resources to help you get jump started:

  1. Watch… A New Solution for Data Archiving using Data Virtualization
    This is my Pure Storage vendor session from PASS Summit 2023, that covers everything end to end with demos!
  2. Learn… Microsoft’s Introduction to SQL Server 2022 Data Virtualization learning path
    Go learn straight from Microsoft!
  3. Try… Pure Storage’s Modern Storage Platforms for SQL Server workshop
    Regardless of whether you’re a Pure Storage customer or not, you can review Module 4 of this workshop.
    And if you are a Pure Storage customer WITH a FlashBlade, use this workshop to try it out for yourself!
    And if you are a Pure Storage customer but do not have a FlashBlade, you can contact your Account Team and request a Test Drive voucher for this workshop. This will give you access to a virtual lab with hardware, to work through the lab workshop.

Thanks for reading!

Rapidly Recovering from SQL Server Data Mishaps with FlashArray Snapshots

Beginning of a Bad Day

Almost all of us have experienced a time when someone incorrectly modifies or deletes a subset of data in a database. And usually the only recourse is to run a database restore to get the data that we need back.

But… what if we cannot simply run a RESTORE operation over the entire database. Maybe the accident occurred a few hours prior and you cannot roll everything back. You need to do a side-by-side RESTORE.

But… what if the database in question is absolutely massive and will take hours to restore? And what if your server doesn’t have enough available storage to fit a second copy of that database at all?

FlashArray Snapshots to the Rescue!

If you’re fortunate to have your SQL Servers backed by Pure Storage, you can solve this quickly with minimal headache. The key is FlashArray snapshots.

This entire process does NOT require application consistent snapshots either. You can utilize crash consistent snapshots (that do NOT require VSS and do NOT stun your SQL Server when taken), and hopefully you’re already taking these today on your FlashArray. Because of how our snapshots work behind the scenes, you will not consume additional capacity on your storage array either. Best of all, it does not matter if your database was 50GB or 50TB – all of these actions will be nearly instantaneous.

I Need to Execute an Object-Level Restore RIGHT NOW!

Here is a step-by-step video walkthrough. Watch and pause it along the way, to emulate the steps I am doing.

Here are the functional steps (starting at 5m 23s):

  1. Add new volume(s) to your SQL Server, of the exact same size as the existing volume(s) that contain your database’s data and log files.
  2. In FlashArray: Protection Group’s UI, select a snapshot, and find the volume(s) you wish to clone – aka “Copy Snapshot.” Overwrite the volume(s) created in step 1 with the member volume of the Protection Group Snapshot.
  3. In the Windows OS – Disk Management – find the newly added volume(s)/disk(s) and Set Online.
  4. Attach the newly cloned database files in SQL Server (with a different name of course)
  5. Use T-SQL to insert/update/copy from your restored database back to your main database.
  6. Clean-up: drop the restored database, set the volume(s)/disk(s) offline in Windows, delete the disks from your VM

Assumptions & Pre-Requisites:

  • Your SQL Server is a VMware VM – this will work with bare metal as well, but with a slight adjustment to the steps
  • Your VMware VM is using vVols – this will work with RDMs as well, but with a slight adjustment to the steps. VMFS will also work, but you’ll need a slightly different set of steps which I’ll cover in a future blog
  • The volumes containing your SQL Server data and log files are being snapshotted by FlashArray on a pre-scheduled basis, all as part of a Protection Group

Conclusion

If you’re NOT in the midst of an emergency right now, I’d encourage you to practice this on a non-Prod server. Once you grasp the steps and workflow, you’ll see how easy it is and can now add this tool/technique to your arsenal for the next time someone accidentally mangles some data in your database.

Thanks for reading!

Career Thoughts: Never Would I Ever…

Today, Nov 1st, marks my 3 year anniversary at Pure Storage. And this milestone has put me into an introspective mood.

What Do I Want to Be When I Grow Up?

When I studied Computer Science at Marquette, I figured I’d wind up becoming a software developer of some sort. Or maybe a systems administrator. My first job out of college was as a generalist web developer & sys admin, so I got exposed to a ton of different things.

Then I specialized into SQL Server… first as an Jr. Ops DBA, but then shifted solely into the sector of being a T-SQL developer. I thrived there and figured I’d remain in that realm for the rest of my career.

Then after a decade-plus, I wound up finding myself back in an Ops DBA role. Never really thought I’d pivot back that way, since at the time, I barely even understood SQL Server clustering options, much less ever set up or managed an FCI or AG outside of a training class. Even so, after a few roles mixing Dev DBA and Ops DBA, I figured I’d remain solely in the SQL Server tech realm the rest of my career.

Then I got a call (or DM really)… and landed my first role as a Sales Engineer/Solutions Engineer for SentryOne. I never imagined I’d ever find myself in some kind of “sales” type role at all. And boy was it foreign to me. But others saw potential in me which is why I got recruited, and they were right, and I thrived there. And I figured I may remain in that realm the rest of my career.

Then 3 years ago, I joined Pure Storage. What was foreign to me re: Pure, is that I was never a hardware guy. I still remember sitting in “intro to storage” sessions at SQL Saturdays, just trying to wrap my brain around what the hell latency and IOPs all really meant. iSCSI & Fibre Channel were foreign to me as well. And I had rudimentary knowledge of VMware, virtualization, HA, and DR concepts and strategies.

And now, 3 years later, these are all things I talk about on a daily basis. Next week, I’m debuting a new session that I’m calling A Practical Deep Dive into I/O for the T-SQL Performance Tuner. It’s actually geared towards “Andy 5 years ago” who was definitely a T-SQL Perf Tuner but still not totally a hardware or I/O stack person. I’m still amazed at how far I’ve grown. And I’m still learning new stuff all the time!

TL;DR – Never Say Never…

The point to all of this, and this blog post, is to share with everyone how I’ve found myself in completely new realms throughout my career. And though today, you might think “I never could or would do that,” you really don’t know what life has in store for you. But if you have an open heart, an open mind, and an eagerness to learn, you absolutely can.

Thanks for reading.

Migrating SQL Server Database Files Between Storage Subsystems

In my role at Pure Storage, I often engage with customers who wish to migrate their SQL Server databases off of their prior storage onto our hardware. And after some digging around for prior-published material, I was surprised to find that there really wasn’t much that was comprehensive. After all, one doesn’t change SANs too often. But when it does happen, it is nice to have some reference material from others who have. So I decided to try and give a good overview of how I’d approach the challenge.

This is meant to be a “food for thought” kind of post. I’m going to keep things somewhat high level, but will provide links to other blogs and material that can help you continue down whatever path you choose. And for simplicity, I’m going to limit this scope to a single SQL Server.

Questions First

  1. Is your SQL Server virtualized or not?
    • VMware or another hypervisor?
    • If VMware, are your existing volumes RDM, VMFS, or vVols?
  2. How much downtime can you take for the actual cutover?
  3. Are you migrating your databases to a new SQL Server instance or keeping everything on the same instance?

VMware VM?

First, the easy one. If your SQL Server instance is on VMware VMFS or vVols, use Storage vMotion and call it a day. Are you one of the teeny-tiny percentage of folks running on Hyper-V? That’s cool – there’s storage migration functionality in there too, both for standalone Hyper-V VMs and Hyper-V VMs that are clustered and residing on Cluster Shared Volumes.

If you’re on RDM, know that from VMware’s perspective, RDMs are dead and vVols are the way of the future, so take this migration opportunity to consider changing that as well. One legacy reason SQL Server’s needed RDMs had to do with clustered storage, but now the capabilities needed for that are available in VMFS and/or vVols. Still have a Failover Cluster Instance with RDMs? Then check out this oldie but goodie blog for starters.

Cutover Downtime

The most critical question to answer next is how much downtime can you take for the actual migration? Do you have the luxury of being able to take an entire weekend because your workload is only Mon-Fri? Cool, you have it easy. More typically, I see teams that can afford a few hours of downtime overnight. And occasionally, I’ll hear “we can’t have ANY downtime!”

The long and short of it is this… your migration solution will most likely increase in complexity as the amount of downtime available to you decreases.

My Preferred Option

Generally speaking, I would pre-stage the databases via backup/restore to the new storage. If you are also migrating to a new SQL Server instance, use Log Shipping! It just works!

If you are remaining on the same SQL Server instance, you can use a “log-shipping-like” methodology. Present new volumes from your new SAN to the same instance and restore the databases side-by-side (named differently of course). Then if you want, you can use DIFFs and/or T-Log backups to keep that copy up-to-date. This makes the most sense if you say, leisurely pre-stage the databases 1-2 weeks beforehand, and say take a DIFF the morning of the cutover and restore that during the workday. Then at the time of cutover, you’d take a final transaction log backup (the tail), and restore that, and then swap database names. Of course, you can switch drive letters/mount points if you really want to as well.

Here’s a SQL Server Central blog that covers this in a bit more detail.

PowerShell Is Your Friend

I’d also strongly suggest making your life easier by using dbatools.io. If you’re on the same instance, check out Move-DbaDbFile. I like this one for same-instance scenarios as it’s stupid simple, but at the trade-off of taking each database offline during the operation (though there is a restore option too). But it handles metadata which is great.

And if you’re moving to a new instance, utilize Start-DbaMigration. If you’re new to dbatools.io, no worries! There’s a wealth of resources available – go search “dbatools” on YouTube and you’ll find a ton of amazing presentations showcasing how to use it, like this one from Jess Pomfret!

One other benefit to PowerShell… well, remember how I said this blog was only for 1 SQL instance? If you have a TON of SQL Servers to migrate, PowerShell is an amazing way to code up a solution and apply it to a ton of SQL Servers!

Filegroups – aka: I Didn’t Know I Could Do That!

Filegroups is one of those under-rated capabilities in SQL Server that I find most folks (myself included), don’t use. When you create a basic database, you get a data file, log file, and a single PRIMARY filegroup. But what you can do is create another filegroup and add underlying files to that file group. Then you can migrate existing data to the other filegroup. The cool thing about this option is that it’s fully online! The one drawback to this option is that it may not be as fast as one might like because it’s a size of data operation. But you remain online at least, so it can be done over a long period of time. There is one more quirk – it’ll introduce a ridiculous amount of external fragmentation. And remember that though external fragmentation is not as impactful like it once was spinning platters, it still negatively impacts read-ahead operations, so could negatively impact your workloads.

If you’re curious to learn more about this, check out Bob Pusateri’s blog post here. He presents a hybrid solution, because of specific requirements he had, but it’s extremely insightful on how to orchestrate this. And yes, this is a complex approach, but as I said earlier, if you must absolutely minimize downtime, a more complex orchestration will be your trade-off.

Thanks for reading!

Pure Storage FlashArray – Will a DELETE and/or SHRINK reclaim space?

In my current role at Pure Storage, I have the privilege of working with two amazingly smart, awesome SQL Server nerds; Andrew Pruski (b) and Anthony Nocentino (b). We often find ourselves facing interesting questions about SQL Server and storage, and today was no exception.

Andrew had a customer who wanted to know what happens on our FlashArray, from a space usage perspective, when they first delete a large volume of data in a database’s data file, then subsequently shrink the database’s data file.

DELETE Records

To properly answer what happens on FlashArray, one must look at all of the steps that happen in between.

First, what happens when SQL Server executes a DELETE operation against a ton of data? In a nutshell, SQL Server marks the data records and pages as ghosted and the ghost writer eventually comes up and marks the data pages for re-use. The data pages are not deallocated in the Windows file system, so no space is given back to Windows.

Aside: Individual record/singleton deletions are a bit of a different story, which I may explore in a future blog. Today, we’re only focusing on mass-deletion of tons of records/tons of data pages.

SHRINK DATABASE

So now that we’ve deleted a bunch of data, for whatever reason (good or bad) we want to run a SHRINK DATABASE to reclaim storage space. In a SHRINK operation, SQL Server will do is essentially physically move and consolidate data pages within each data file. If there are gaps within the data file, SQL Server will move data pages around, “smooshing” them together, then deallocate the newly freed space at the end of the data file. And in Windows on NTFS, this means that a deallocation is sent to the file system, which is essentially pushed down to the storage array.

To use an analogy, you have a large storage container (aka your data file) and little boxes (data pages) randomly sitting about inside. When you SHRINK, you’ll move boxes and stack them all in one corner. Then imagine you can use magic to shrink the larger storage container into a smaller one. So now your storage container takes up less overall space.

Aside: Paul Randal goes into greater detail in this blog post, along with why SHRINK is a bad idea

Aside 2: If you’re VMware virtalized on VMFS, there are additional caveats that will not be covered in this blog post today.

FlashArray

Within FlashArray, we do a variety of different things with the underlying data that is written to us. These operations include thin provisioning, data reduction algorithms, global deduplication, and compression. Capacity utilization needs to be thought of a bit differently on our array. And we use an algorithm called redirect-on-write, which I like to think of as an Append-Only or Insert-Only methodology for writing data (over-generalizing: we never update, just insert new).

So would sending a Windows NTFS deallocation to FlashArray, cause FlashArray to reclaim space?

Testing

To prove this out, I adapted Paul Randal’s demo script from his SHRINK data files blog, and added additional steps to check storage utilization on the FlashArray. You can review my published test script here.

To outline what I did, I created a new database whose data file was isolated on a dedicated volume (don’t care about the transaction log for this test). Then I used Paul’s methodology to create a table and hydrate it with a bunch of data. Then I created another table, hydrated that, then recorded my storage utilization from both SQL Server’s perspective and from FlashArray’s.

To begin the actual test, I dropped the first table then recorded FlashArray’s space utilization on the volume where the data file resided. Next, I rank DBCC SHRINKDATABASE, and then recorded space utilization again.

Results

Before executing the first delete, my database was 31.11 GB in size on FlashArray. After the DELETE, nothing changed. This is expected, because Windows did not deallocate anything – the SQL Server data file remained the same allocated size. Then after the SHRINK operation, which deallocated data, the size on FlashArray dropped to 15.94GB. Space was reclaimed on FlashArray.

Conclusion – TL;DR

When deleting records from a table, the space consumed is not impacted or reclaimed on the FlashArray. If you want to reclaim space, you must execute an operation to cause Windows to deallocate, which a SHRINK operation would do.

Thanks for reading!

Backup Internals – Part 5: Balancing Performance

Welcome back to Part 5 of my Backup Internals Series. In this blog, I want to explore some thoughts around maximizing the performance of your SQL Server backup operations.

Blog Series Table of Contents

Fast, Good, or Cheap…

Many of us have heard the tidbit that there’s 3 options to any endeavor: Fast, Good, and Cheap, but you can only ever pick 2. When it comes to accelerating your (FULL) backup performance, there’s a similar set of choices you can make, most of which have a trade-off of some sort.

Gather Requirements

If I were to have an opportunity to re-evaluate an environment’s backup strategy and schedule, there’s a number of questions I’d want to try to answer first. Remember, the scope of this thought exercise is thinking about FULL Backup strategy from a purely performance angle.

Questions to Ask

  • Backup maintenance windows: How long and how frequently?
    Every night for 4 hours? Anytime over the weekend? For 3 minutes on Sunday, between 22:55 and 22:59?
  • How many databases to back up and how large are they?
    Does the server just have 1 database, but it’s 15TB in size? Or do you have 800 databases, one per customer? And of those 800, what’s the individual size distribution? Maybe 700 databases are smaller than 100GB, another 75 databases are between 100GB and 500GB, and the last 25 databases are +500GB?
  • Exclusivity during backup maintenance window?
    Do you have the luxury of no workload running during your maintenance window? Or do you also have to juggle index maintenance, DBCC CHECKDB, or other application processes like ETLs, nightly processing jobs, etc.? Or to put it another way, do you have free reign to max out your SQL Server resources during your window?

Assumptions

  • 1 SQL Server with many databases
  • CPU = 12 cores; RAM = 256GB
  • Backup storage capacity and storage ingest throughput are not a concern (you bought “orange”)
  • Will ignore RPO/RTO business priorities (which I might tackle in a future blog)

Start Doing Math

Now that you know how much time you have and how many/how much you need to back up, you need to start weighing your options. Do you have the luxury to run one single backup job that will back up your databases one after another? Or do you need to split your databases across multiple backup jobs that run in parallel?

If server resources were infinite, one could theoretically kick off an individual backup job for each and every database, all starting at the exact same time. But of course, that makes no sense.

Revisiting What We’ve Learned

We now know that if our database’s data file(s) all reside on a single volume, we’ll only ever get one Reader Thread. So with a 12 core server, if I have full usage of the server’s resources, I may start with 8 backup output files to get 8 Writer Threads. And I might choose to use a large BUFFERCOUNT value and larger MAXTRANSFERSIZE value, which will result in more RAM consumption.

Now let’s pretend that to meet our requirements, we need to run backups in parallel. You might estimate that you need to run 4 backup jobs simultaneously. If you use the above parameters, you’ll may now start overrunning your CPU! Remember it’s not always about 100% CPU utilization either… all cores could be utilized at say 30%, but you could be context switching like crazy.

Who Doesn’t Love a Trace Flag?

So of course the above means you need to test different permutations. But when you’re doing testing, how can you determine the actual resource utilization of a given backup job? This is where some infrequently highlighted trace flags come into play.

Trace Flag 3213 - Generate diagnostic data about BACKUP & RESTORE operation (least)
Trace Flag 3004 - Generate diagnostic data about BACKUP & RESTORE operation (more)
Trace Flag 3014 - Generate diagnostic data about BACKUP & RESTORE operation (all)
These are officially undocumented

Trace Flag 3604 - Redirects output information to SSMS Results Pane
Trace Flag 3605 - Redirects output information to SQL Server Error Log: use if you need timestamps

You’ll see the above three Trace Flag are suffixed with least, more, and all. That’s because the output that each Trace Flag yields seems to have some overlap. And what’s more challenging is that I’ve found inconsistent documentation in older blogs that cover these Trace Flags as well. So I just opt to use them all to cover all of my bases.

-- Turn on Trace Flags
DBCC TRACEON(3604, 3004, 3014, 3213, -1);
GO

When turned on, you will receive a ton of diagnostic information after running a backup operation. Here’s an example (with prefix removed for brevity):

BACKUP DATABASE Sandbox_MultiFile_SingleVol TO
DISK='NUL'
WITH COPY_ONLY, FORMAT, INIT, STATS = 15
GO

--------------
[prefix]: BACKUP DATABASE started
[prefix]: Opening the database with S lock
[prefix]: Acquiring bulk-op lock on the database
[prefix]: Synchronizing with other operations on the database is complete
[prefix]: Opening the backup media set
[prefix]: The backup media set is open

Backup/Restore buffer configuration parameters

Memory limit: 32765 MB
BufferCount: 7
Sets Of Buffers: 1
MaxTransferSize: 1024 KB
Min MaxTransferSize: 64 KB
Total buffer space: 7 MB
Tabular data device count: 1
Fulltext data device count: 0
Filestream device count: 0
TXF device count: 0
Filesystem i/o alignment: 512
Media Buffer count: 7
Media Buffer size: 1024 KB

[prefix]: Preparing the media set for writing
[prefix]: The media set is ready for backup

[prefix]: Effective options: Checksum=0, Compression=0, Encryption=0, BufferCount=7, MaxTransferSize=1024 KB
[prefix]: Checkpoint LSN: 0:0:0
[prefix]: Checkpoint is complete (elapsed = 17 ms)
[prefix]: Start LSN: 3246:76679:230, SERepl LSN: 0:0:0
[prefix]: Last LSN: 3246:76775:1
[prefix]: Scanning allocation bitmaps
[prefix]: Data section: 204684263424 bytes in total
[prefix]: Scanning allocation bitmaps is complete

[prefix]: Calculating expected size of total data
[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1
[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1
[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1
[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1
[prefix]: TotalSize=204684263424 bytes
[prefix]: Estimated total size = 204684312576 bytes (data size = 204684263424 bytes, log size = 49152 bytes)

[prefix]: Work estimation is complete

[prefix]: Do the first force checkpoint before copying data section
[prefix]: Checkpoint LSN: 3246:76679:230
[prefix]: Checkpoint is complete (elapsed = 8 ms)
[prefix]: Writing the leading metadata

Shared Backup BufferQ count: 7

[prefix]: Calculating expected size of total data
[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1
[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1
[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1
[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1
[prefix]: TotalSize=204684263424 bytes

[prefix]: Copying data files
[prefix]: Data file backup process set to run in serial mode within a volume.
[prefix]: Number of data file readers = 1

15 percent processed.
30 percent processed.
45 percent processed.
60 percent processed.
75 percent processed.
90 percent processed.

[prefix]: InitialExpectedSize=204684263424 bytes, FinalSize=204684263424 bytes, ExcessMode=0

[prefix]: Calculating expected size of total data
[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1
[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1
[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1
[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1
[prefix]: TotalSize=204684263424 bytes

[prefix]: Do the second force checkpoint before copying diff section
[prefix]: Checkpoint LSN: 3246:76679:230
[prefix]: Checkpoint is complete (elapsed = 9 ms)
[prefix]: Start pin the log.
[prefix]: Start LSN: 3246:76778:1, SERepl LSN: 0:0:0
[prefix]: Offline the sparse bitmap
[prefix]: Scanning allocation bitmaps
[prefix]: Diff section: 3473408 bytes in total
[prefix]: Scanning allocation bitmaps is complete

[prefix]: Calculating expected size of total data
[prefix]: FID=1, ExpectedExtents=14, IsDifferentialMapAccurate=1
[prefix]: FID=3, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: FID=4, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: FID=5, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: TotalSize=3473408 bytes

[prefix]: Copying data files
[prefix]: Data file backup process set to run in serial mode within a volume.
[prefix]: Number of data file readers = 1
[prefix]: InitialExpectedSize=3473408 bytes, FinalSize=3473408 bytes, ExcessMode=0

[prefix]: Calculating expected size of total data
[prefix]: FID=1, ExpectedExtents=14, IsDifferentialMapAccurate=1
[prefix]: FID=3, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: FID=4, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: FID=5, ExpectedExtents=13, IsDifferentialMapAccurate=1
[prefix]: TotalSize=3473408 bytes
[prefix]: Diff section copy finished
[prefix]: Last LSN: 3246:76781:1
[prefix]: Copying data files is complete

Processed 6244992 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile' on file 1.
Processed 6252472 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile2' on file 1.
Processed 6244288 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile3' on file 1.
Processed 6244544 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile4' on file 1.

[prefix]: Copying transaction log
[prefix]: MediaFamily(0): FID=2, VLFID=3246, DataStreamSize=65536 bytes
Processed 1 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile_log' on file 1.
[prefix]: Copying transaction log is complete

[prefix]: Writing the trailing metadata
[prefix]: Writing the end of backup set
[prefix]: Writing history records for NoteBackup
[prefix]: Writing history records for NoteBackup is complete (elapsed = 37 ms)

BACKUP DATABASE successfully processed 24986297 pages in 87.111 seconds (2240.881 MB/sec).
[prefix]: BACKUP DATABASE finished

I’ve grouped together interesting “sub-operations” like when the tail of the log backup is taken. Note the multiple size estimates that are taken throughout as well. Note that there are 4 data files, which is why you see 4 FID entries each time. But also note that all 4 data files are on 1 single volume!

Key TF Output Highlights

Backup/Restore buffer configuration parameters

Memory limit: 32765 MB
BufferCount: 7
Sets Of Buffers: 1
MaxTransferSize: 1024 KB
Min MaxTransferSize: 64 KB
Total buffer space: 7 MB

[prefix]: Effective options: Checksum=0, Compression=0, Encryption=0, BufferCount=7, MaxTransferSize=1024 KB

[prefix]: Number of data file readers = 1

Let’s compare and contrast this to a backup operation with multiple backup output targets (this different database’s data files are spread across 4 different data volumes).

BACKUP DATABASE [Sandbox_MultiFile_MultiVol] TO
DISK = '\\10.21.200.27\ayun-sql-backups\Sandbox_MultiFile_MultiVol_1.bak',
DISK = '\\10.21.200.28\ayun-sql-backups\Sandbox_MultiFile_MultiVol_2.bak',
DISK = '\\10.21.200.70\ayun-sql-backups\Sandbox_MultiFile_MultiVol_3.bak',
DISK = '\\10.21.200.71\ayun-sql-backups\Sandbox_MultiFile_MultiVol_4.bak',
DISK = '\\10.21.200.72\ayun-sql-backups\Sandbox_MultiFile_MultiVol_5.bak',
DISK = '\\10.21.200.73\ayun-sql-backups\Sandbox_MultiFile_MultiVol_6.bak',
DISK = '\\10.21.200.74\ayun-sql-backups\Sandbox_MultiFile_MultiVol_7.bak',
DISK = '\\10.21.200.75\ayun-sql-backups\Sandbox_MultiFile_MultiVol_8.bak'
WITH FORMAT, INIT, STATS = 10,
MAXTRANSFERSIZE = 2097152,
BUFFERCOUNT = 500, COMPRESSION;
GO

--------------

Backup/Restore buffer configuration parameters
Memory limit: 98295 MB
BufferCount: 500
Sets Of Buffers: 3
MaxTransferSize: 2048 KB
Min MaxTransferSize: 64 KB
Total buffer space: 3000 MB
Filesystem i/o alignment: 512
Media Buffer count: 500
Media Buffer size: 2048 KB
Encode Buffer count: 500

Backup(Sandbox_MultiFile_MultiVol): Number of data file readers = 4

I think there’s a lot of really cool data in here. Remember that you can use Trace Flag 3605 to log this information to the Error Log. Thusly, you could use that to record what your BACKUP jobs are currently doing and review them later at your leisure (just remember it turn the TF off, lest you bloat your logs).

If you want to see more examples of these Trace Flags in action, visit my GitHub and check out my demo scripts from the original presentation. I introduce the Trace Flags in the 1_Baselining.sql script, then leverage them throughout the other demo scripts.

Workers vs CPU Cores

One more thing to keep in mind are the total number of worker threads you have available on your SQL Server. That’s different than the number of CPU cores you happen to have. I’m not going to dive deeper into this one today but will leave this old but still valid blog from Bob Dorr for reference instead: “How It Works: How many databases can be backed up simultaneously?

Bringing It All Together

Hopefully now, with the above information, you can see that you really need to do an analysis and essentially create a matrix, based on your requirements and available resources. This will help you determine how much you want to crank up your database tuneables to best accelerate your FULL backups for your environment.

Hope You’ve Enjoyed This Series

This’ll most likely be the final blog of this series, though I might do an “addendum” with random tidbits. Let me know in the comments if there’s anything else you’d like me to explore in an epilogue.

Thanks for reading!