Pointing Microstrategy at the Ontario Energy Board

If you read yesterday's post then you know it's possible to point Microstrategy at Content Manager via the ServiceAPI.  It's an easy win for an initial effort.  I was using data from the ridiculously unusable demonstration database.  What's the point in shipping a demonstration database that lacks any real data?

I turned my sites northward, to the frozen land we colloquially call our "attic" (see: Canada).  There I found the Ontario Energy Board (OEB), which makes available lots of data via the ServiceAPI.  So I pointed Microstrategy there to see what can be accomplished.

Here's a sequences sunburst, which quickly breaks-out applicants by volume of submissions...

2018-02-21_12-48-17.png
2018-02-21_12-49-34.png

Then I created a data grid with a selector for the applicant, which provides a quick view of submitted documents.  Changing the applicant changes the contents of the grid...

Then I moved onto trying to create a visualization based on the record dates.  Three dates are exposed: Date Issued, Date Received, and Date Modified.  I'd like to show a timeline for each applicant and the types of documents they are submitting.  I decide to leverage date received & date issued in the google timeline visualization shown below.

2018-02-21_13-11-23.png

Yay!  Some fun visuals!

 

Connecting Microstategy to CM

In this post I'll show how to use the free Microstrategy Desktop application to expose record destruction information.  As shown below, we'll start this process by launching the application and creating a new Dossier...

2018-02-20_9-34-57.png

Next I want to add some data from Content Manager into this dossier...

2018-02-20_9-36-14.png

Here I'm going to select "Data from URL", so that I can ensure security is respected.  If I were to use Databases, and connect directly to the backend RDBMS, I would be directly accessing record data without any security validation.  Instead, I'll point to the ServiceAPI (which will be configured to honor security).

2018-02-20_9-37-06.png

Now I'll enter the URL to the ServiceAPI record end-point.  For authentication, I'll use Windows (though you may need to configure this in a different manner).  Then I click Finish to complete this step.

2018-02-20_9-52-58.png

Now all of my properties from CM are listed in the datasets pane.... 

2018-02-20_9-45-33.png

Next I'll switch my visualization to a pie chart...

2018-02-20_9-55-03.png

Then I'll add owner as a data element...

2018-02-20_9-56-00.png

Next I'll filter out records which cannot yet be destroyed...

2018-02-20_9-57-40.png

Now I'll create a metric, so that the pie slices are proportional...

2018-02-20_10-06-46.png

I'll configure the metric to be the number of unique ID's for a given owner in the visualization...

2018-02-20_10-08-08.png

Lastly, I'll drag the ownership count into the angle property of the visualization...

2018-02-20_10-10-27.png

Voila!  This is a rather simple example, but it should demonstrate the capabilities.  If you take the same approach shown in my aggregating retention data series, then you can see how to expose ServiceAPI end-points so that your Microstrategy users do not have to construct query parameters.

Vigilante Archivist (part 1)

After reading a few news articles (1,2,3) about shady drug treatment facilities, I've decided to see how to go about archiving, analyzing, and monitoring how these facilities promote their activities.  I'll accomplish this with a mixture of Content Manager, Powershell, Node.Js, Tesseract-OCR, and archive.org.

Before building anything new, I did some googling to find out what sort of solutions already exist to archive websites.  I found archive.org and see that they already perform this task.  No point in re-inventing the wheel, right?

If I look in content manager I can see that a box was created for one of the facilities referenced in the articles...

2018-02-19_8-42-57.png

So I flip over to the web archive and search for the website...

2018-02-19_8-42-02.png

If I click on any given day then I can see what the website looked on that date...

2018-02-19_8-45-23.png

Sweet.... so I don't necessarily have to write my own crawler to index sites.  I'll just need a script that checks each website and stores the last date it was indexed by web.archive.org.  Eventually this script may do other things and will be scheduled to run once daily.

2018-02-15_20-12-26.png

After running the script I can see the changes within the dataset....

2018-02-19_8-47-18.png

I found 347 of the 1100 facilities have no historical snapshots on the web archive.  A few disallow crawlers outright, preventing web.archive.org from indexing it.   For instance, the first box in the image above placed a "robots.txt" file at the root of their site.

2018-02-19_8-50-43.png

Since roughly 25% of the facilities can't be archived in this manner I need to explore other options.  For now I can start analyzing these sites on a more routine basis.  I can also request that web.archive.org index these sites as I detect changes.

 

In the mean time here's the powershell script I used for this task...

Clear-Host
#Import .Net SDK for Content Manager
Add-Type -Path "d:\Program Files\Hewlett Packard Enterprise\Content Manager\HP.HPTRIM.SDK.dll"
 
#Instantiate a connection to the default dataset
$Database = New-Object HP.HPTRIM.SDK.Database
$Database.Connect
 
$FacilityRecordTypeName = "Facility"
$FacilityWebsiteFieldName = "Facility Website"
$FacilityLastSnapshotFieldName = "Facility Last Snapshot"
$FacilityWebsite = [HP.HPTRIM.SDK.FieldDefinition]$Database.FindTrimObjectByName([HP.HPTRIM.SDK.BaseObjectTypes]::FieldDefinition, $FacilityWebsiteFieldName)
$FacilityLastSnapshot = [HP.HPTRIM.SDK.FieldDefinition]$Database.FindTrimObjectByName([HP.HPTRIM.SDK.BaseObjectTypes]::FieldDefinition, $FacilityLastSnapshotFieldName)
 
$Facilities = New-Object HP.HPTRIM.SDK.TrimMainObjectSearch -ArgumentList $Database, Record
$Facilities.SearchString = "type:[name:$($FacilityRecordTypeName)$($FacilityWebsite.SearchClauseName):*"
 
$RowNumber = 0
foreach ( $Record in $Facilities ) {
    $RowNumber++
    $Facility = [HP.HPTRIM.SDK.Record]$Record
    Write-Progress -Activity "Checking Records" -Status "$($Facility.Number)$($Facility.Title)" -PercentComplete (($RowNumber/($Facilities.Count))*100)
    
    $Website = $Facility.GetFieldValueAsString($FacilityWebsite, [HP.HPTRIM.SDK.StringDisplayType]::Default, $false)
    $WayBackUrl = ("http://archive.org/wayback/available?url=" + $Website)
    try {
        $Response = Invoke-RestMethod $WayBackUrl
        if ( $Response -ne $null ) {
            if ( $Response.archived_snapshots.closest -eq $null ) {
                $Facility.SetFieldValue($FacilityLastSnapshot, (New-Object HP.HPTRIM.SDK.UserFieldValue(New-Object HP.HPTRIM.SDK.TrimDateTime)))
                $Facility.Save()
            } else {
                $Facility.SetFieldValue($FacilityLastSnapshot, (New-Object HP.HPTRIM.SDK.UserFieldValue($Response.archived_snapshots.closest.timestamp)))
                $Facility.Save()
            }
        } else {
        }
    } catch {
        Write-Host "        Error: $($_)"
    }
 
}