Purging content after backing-up and then restoring a database

How do we create new datasets that contain just a portion of the content from an existing dataset? I’ve been asked this dozens of times over the years. Someone asked this past week and I figured I’d bundle up my typical method into a powershell script. This script can be run via a scheduled task, allowing you to routinely execute it for such purposes as refreshing development environments, exporting to less secured environments, or to curate training datasets.

Start by first installing the SQL Server powershell tools from an administrative powershell window by running this command:

Install-Module -Name SqlServer

You will be prompted to allow the install to continue.

2019-04-27_20-24-23.png

It will then initiate the download and attempt to install.

2019-04-27_20-24-36.png

In my case, I’ve already got it installed. If you do as well then you’ll see these types of errors, which are safe to ignore.

2019-04-27_20-26-17.png

Now we can start scripting out the process. The first step is to start the backup of the existing database. You can use the Backup-SqlDatabase command. To make things easier we use variables for the key parameters (like instance name, database name, path to the backup file).

# Backup the source database to a new file
Write-Information "Backing Up $($sourceDatabaseName)"
Backup-SqlDatabase -ServerInstance $serverInstance -Database $sourceDatabaseName -BackupAction Database -CopyOnly -BackupFile $backupFilePath

Next comes the restore of that backup. Unfortunately I cannot use the Restore-SqlDatabase commandlet, because it does not support restoring over databases that have other active connections. Instead I’ll have to run a series of statements that set the database to single-user mode, restores the database (with relocated files), and then sets the database back to multi-user mode.

# Restore the database by first getting an exclusive lock with single user access, enable multi-user when done
Write-Warning "Restoring database with exclusive use access"
$ExclusiveLock = "USE [master]; ALTER DATABASE [$($targetDatabaseName)] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;"
$RestoreDatabase = "RESTORE FILELISTONLY FROM disk = '$($backupFilePath)'; RESTORE DATABASE $($targetDatabaseName) FROM disk = '$($backupFilePath)' WITH replace, MOVE '$($sourceDatabaseName)' TO '$($sqlDataPath)\$($targetDatabaseName).mdf', MOVE '$($sourceDatabaseName)_log' TO '$($sqlDataPath)\$($targetDatabaseName)_log.ldf', stats = 5; ALTER DATABASE $($targetDatabaseName) SET MULTI_USER;"
ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement ($ExclusiveLock+$RestoreDatabase)

Next we need to change the recovery mode from full to simple. Doing so will allow us to manage the growth of the log files. This is important because the delete commands we run will spool changes into the database logs, which we’ll need to shrink as often as possible (otherwise the DB server could potentially run out of space).

# Change recovery mode to simple
Write-Information "Setting recovery mode to simple"
Invoke-Sqlcmd -ServerInstance $serverInstance -Database $targetDatabaseName -Query "ALTER DATABASE $($targetDatabaseName) SET RECOVERY SIMPLE"

With the database in simple recovery mode we can now start purging content from the restored database. Before digging into the logic of the deletes, I’ll need to create a function I can call that traps errors. I’ll also want to be able to incrementally shrink, if necessary.

# Function below is used so that we trap errors and shrink at certain times
function ExecuteSqlStatement 
{
    param([String]$Instance, [String]$Database, [String]$SqlStatement, [bool]$shrink = $false)
    $error = $false
    # Trap all errors
    try {
        # Execute the statement
        Write-Debug "Executing Statement on $($Instance) in DB $($Database): $($SqlStatement)"
        Invoke-Sqlcmd -ServerInstance $Instance -Database $Database -Query $SqlStatement | Out-Null
        Write-Debug "Statement executed with no exceptions"
    } catch [Exception] {
        Write-Error "Exception Executing Statement: $($_)"
        $error = $true
    } finally {
    }
    # When no error and parameter passed, shrink the DB (can slow process)
    if ( $error -eq $false -and $shrink -eq $true ) {
        ShrinkDatabase -SQLInstanceName $serverInstance -DatabaseName $Database -FileGroupName $databaseFileGroup -FileName $databaseFileName -ShrinkSizeMB 48000 -ShrinkIncrementMB 20
    }
}

To implement the incremental shrinking I use a method I found a few years ago, linked here. It’s great as it works around the super-slow shrinking process when done on very large datasets. Your database administrator should pay close attention to how it works and align it with your environment.

My goal is to remove all records of certain types, so that they aren’t exposed in the restored copy. Unfortunately the out-of-the-box constraints do not cascade and delete related objects. That means we need to delete them before trying to delete the records.

We need to delete:

  • Workflows (and supporting objects)

  • Records (and supporting objects)

  • Record Types

In very large datasets this process could take hours. You can optimize the performance by adding a “-shrink $true” parameter to any of the delete statements that impact large volumes of data in your org (electronic revisions, renditions, locations for instance).

# Purging content by record type uri
foreach ( $rtyUri in $recTypeUris ) 
{
    Write-Warning "Purging All Records & Supporting Objects for Record Type Uri $($rtyUri)"
    Write-Information " - Purging Workflow Document References"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tswkdocusa where wduDocumentUri in (select uri from tswkdocume where wdcRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri))"
    Write-Information " - Purging Workflow Documents"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tswkdocume where wdcRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri)"
    Write-Information " - Purging Workflow Activity Start Conditions"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tswkstartc where wscActivityUri in (select uri from tswkactivi where wacWorkflowUri in (select uri from tswkworkfl where wrkInitiator in (select uri from tsrecord where rcRecTypeUri=$rtyUri)))"
    Write-Information " - Purging Workflow Activities"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tswkactivi where wacWorkflowUri in (select uri from tswkworkfl where wrkInitiator in (select uri from tsrecord where rcRecTypeUri=$rtyUri))"
    Write-Information " - Purging Workflows"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tswkworkfl where wrkInitiator in (select uri from tsrecord where rcRecTypeUri=$rtyUri)"
    Write-Information " - Purging Communications Detail Words"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tstranswor where twdTransDetailUri in (select tstransdet.uri from tstransdet inner join tstransmit on tstransdet.tdTransUri = tstransmit.uri inner join tsrecord on tstransmit.trRecordUri = tsrecord.uri where tsrecord.rcRecTypeUri = $rtyUri);"
    Write-Information " - Purging Communications Details"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tstransdet where tdTransUri in (select tstransmit.uri from tstransmit inner join tsrecord on tstransmit.trRecordUri = tsrecord.uri where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Communications"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tstransmit where trRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Thesaurus Terms"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecterm where rtmRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Relationships"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsreclink where rkRecUri1 in (select uri from tsrecord where rcRecTypeUri=$rtyUri) OR rkRecUri2 in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Actions"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecactst where raRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Jurisdictions"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecjuris where rjRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Requests"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecreque where rqRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Rendition Queue"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrendqueu where rnqRecUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Renditions"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrenditio where rrRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Revisions"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tserecvsn where evRecElecUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Documents"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecelec where uri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Holds"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tscasereco where crRecordUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Record Locations"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecloc where rlRecUri in (select uri from tsrecord where rcRecTypeUri=$rtyUri);"
    Write-Information " - Purging Records (shrink after)"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrecord where rcRecTypeUri = $rtyUri" -shrink $true
    Write-Information " - Purging Record Types"
    ExecuteSqlStatement -Instance $serverInstance -Database $targetDatabaseName -SqlStatement "delete from tsrectype where uri = $rtyUri"
}

With that out of the way we can now restore the recovery mode back to full.

# Change recovery mode to simple
Write-Information "Setting recovery mode to full"
Invoke-Sqlcmd -ServerInstance $serverInstance -Database $targetDatabaseName -Query "ALTER DATABASE $($targetDatabaseName) SET RECOVERY FULL"

Last step is to restart the workgroup service on the server using this restored database.

# Restart CM
Write-Warning "Restarting Content Manager Service"
Restart-Service -Force -Name $cmServiceName

At the top of my script I have all of the variables defined. To make this work for you, you’ll need to adjust the variables to align with your environment. For instance, you’ll need to update the rtyUris array to contain the URIs of those record types you want to have deleted.

# Variables to be used throughout the script
$serverInstance = "localhost"                   # SQL Server instance name 
$sourceDatabaseName = "CMRamble_93"             # Name of database in SQL Server
$targetDatabaseName = "Restored_cmramble_93"    # Name to restore to in SQL Server
$backupFileName = "$($sourceDatabaseName).bak"  # File name for backup of database
$backupPath = "C:\\temp"                        # Folder to back-up into (relative to server)
$backupFilePath = [System.IO.Path]::Combine($backupPath, $backupFileName)   # Use .Net's path join which honors OS
$databaseFileGroup = "PRIMARY"                  # File group of database content, used when shrinking
$databaseFileName = "CMRamble_93"               # Filename within file group, used when shrinking
$sqlDataPath = "D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA"   # Path to data files, used in restore
$cmServiceName = "TRIMWorkgroup"                # Name of registered service in OS (not display name)
$recTypeUris = @( 3 )                           # Array of uri's for record types to be purged after restore

To find the URIs of your record types, you’ll need to customize your view pane and add in the unique identifier property.

2019-04-27_21-04-13.png

Running the script gives me this output….

2019-04-27_20-17-55.png

On my local machine it took ~1 minute to complete for a super small dataset. When running this on a 70 GB file, with me removing approximately 20 GB of content, it takes 15 minutes. Though my SQL Server has 64 GB of RAM and two SDD drives that hold the SQL Server data & log files.

You can download my full script here: https://github.com/aberrantCode/cm_db_restore_and_purge

Migrating the SQL DemoDB to PostgresSQL

The installation media for Content Manager 9.2 comes with a demonstration dataset that can be used for testing and training.  Although I think the data within it is junk, it's been with the product for so long that I can't help but to continue using it.  To mount the dataset you have to restore a backup file onto a SQL Server and then register it within the Enterprise Studio.

SQL Server is a bit too expensive for my testing purposes, so I need to get this dataset into PostgresSQL.  In this post I'll show how I accomplished this.  The same approach could be taken for any migration between SQL and PostgresSQL.

I'm starting this post having already restored the DemoDB onto a SQL Server:

 
2018-04-21_15-15-20.png
 

If you look at the connection details for the highlighted dataset, you'll see that GIS is enabled for this dataset.  My target environment will not support GIS.  This inhibits my ability to use the migrate feature when creating my new dataset.  If I tried to migrate it directly to my target environment I would receive the error message shown below.

2018-04-21_14-12-21.png

Even if I try to migrate from SQL to SQL, I can't migrate unless GIS is retained...

2018-04-21_14-12-10.png

To use the migration feature I need to first have a dataset that does not support GIS.  I'll use the export feature of the GIS enabled dataset to give me something I can work with.  Then I'll import that into a blank dataset without GIS enabled.

 
2018-04-21_9-58-55.png
 

The first export will be to SQL Server, but without GIS enabled.  When prompted I just need to provide a location for the exported script & data.

 
2018-04-21_15-27-47.png
 

Once completed I then created a new dataset.  This dataset was not initialized with any data, nor was GIS enabled. The screenshot below details the important dataset properties to be configured during creation.

 
2018-04-21_15-31-01.png
 

After it was created I can see both datasets within the Enterprise Studio, as shown below.

 
2018-04-21_14-44-53.png
 

Next I switched over to SQL Server Management Studio and opened the script generated as part of the export of the DemoDB.  I then executed the script within the database used for the newly created DemoDB No GIS.  This populates the empty dataset with all of the data from the original DemoDB.  I will lose all of the GIS data, but that's ok with me. 

 
 

Now I can create a new dataset on my workgroup server.  During it's creation I must specify a bulk loading path.  It's used in the same manner as the export process used in the first few steps.  The migration actually first performs an export and then imports those files, just like I did in SQL Server.  

 
 

On the last step of the creation wizard I can select my DemoDB No GIS dataset, as shown below.

 
2018-04-21_14-47-13.png
 

Now the Enterprise Studio shows me all three datasets.

 
2018-04-21_15-09-44.png
 

Fiscal Calendar Fun

A collection of SQL statements to be used as logical tables in support of fiscal calendar transformations with dates stored as integers.  Will need these to be able to report metrics out of CM based on an internal fiscal calendar.  


Current Fiscal Day, Month, and Year with Start/End & Yesterday

SELECT 
	fiscal_month_year_nr, 
	fiscal_year_nr, 
	fiscal_month_nr, 
	fiscal_month_year_bgn, 
	fiscal_month_year_end, 
	fiscal_year_bgn, 
	fiscal_year_end, 
	cast(convert(varchar(8),dateadd(d,-1,getdate()),112) as int) yesterday 
FROM Dim_Fiscal_Month_Year 
WHERE 
	fiscal_month_year_bgn <= cast(convert(varchar(8),dateadd(d,-1,getdate()),112) as int) 
	AND 
	fiscal_month_year_end >= cast(convert(varchar(8),dateadd(d,-1,getdate()),112) as int)

Fiscal Week to Date

select d.[date_id], wtd.[Date_id] wtd_fiscal_date_id
from [Dim_Date] dd
join [Dim_Fiscal_week_Year] fwy
	on (dd.[Fiscal_week_year_nr] = fwy.[fiscal_week_year_nr])
join [Dim_Date] wtd
	on (wtd.Date_id between fwy.[fiscal_week_year_bgn] and dd.[date_id])

Fiscal Month to Date

select dd.[date_id], mtd.[Date_id] mtd_date_id
from [Dim_Date] dd
join [Dim_Fiscal_Month_Year] fmy
	on (dd.[Fiscal_month_year_nr] = fmy.[fiscal_month_year_nr])
join [Dim_Date] mtd
	on (mtd.Date_id between fmy.[fiscal_month_year_bgn] and dd.[date_id])

Fiscal Quarter to Date

select dd.[date_id], qtd.[Date_id] qtd_date_id
from [Dim_Date] dd
join [Dim_Fiscal_Quarter_Year] fqy
    on (dd.[Fiscal_quarter_year_nr] = fqy.[fiscal_quarter_year_nr])
join [Dim_Date] qtd
	on (qtd.Date_id between fqy.[fiscal_quarter_year_bgn] and dd.[date_id])

Fiscal Year to Date

select dd.[date_id], ytd.[Date_id] ytd_date_id
from [Dim_Date] dd
join [Dim_Fiscal_Year] fy
	on (dd.[Fiscal_year_nr] = fy.[fiscal_year_nr])
join [Dim_Date] ytd
	on (ytd.Date_id between fy.[fiscal_year_bgn] and dd.[date_id])

Prior Fiscal Year Day

The trick here is to calculate the number of days since the start of the year (for the given date) and then add that number to the start date of the prior fiscal year.

select d.[Date_id], pyd.Date_id pydate_id 
from [Dim_Date] d 
join [Dim_Fiscal_Year] fy 
	on d.Fiscal_year_nr = fy.fiscal_year_nr 
join [Dim_Fiscal_Year] py 
	on (py.fiscal_year_ix = (fy.fiscal_year_ix-1)) 
join [Dim_Date] pyd 
	on (pyd.Calendar_dt = DATEADD(D,Datediff(D,convert(date,CONVERT(varchar(10),fy.fiscal_year_bgn,101)), d.Calendar_dt),convert(date,CONVERT(varchar(10),py.fiscal_year_bgn,101))))
where fy.fiscal_year_bgn not in (-1,0)