Sourcing from a Webservice

This post continues my Mental Health Awareness Week project. Refer to the original post for background information about the goal of this project.

After deciding to build a box for each facility, I realized I never actually had a unique ID column in my source spreadsheet.  That means I can't maintain a routine import/update from that spreadsheet.  So I need to find an alternative source of meta-data.

Pressing F12 in Chrome lets me inspect what's happening when I use the SAMHSA site.  Reloading the page and running the search for Florida again lets me see all of the end-points (web URLs) this User Interface touched.  Hovering over the entries gives me insight into the nature of the network traffic..

2017-09-20_8-52-59.png

I can see with this one entry that Chrome received back a listing of facilities in Json form.  It was made over a secure connection and took 534 ms seconds to retrieve.  The response tab provides me with the full content of the request's response, which I copied to my clipboard.

2017-09-20_8-58-51.png

I then opened the Online JSON Viewer website in a new tab, pasted the response, and clicked the viewer tab.  It gave me a good, quick visualization of this data.

2017-09-20_9-04-43.png

I've found my unique ID in the "frid" property of each "row" of the results.  I also can see that there are 946 results spanning across 32 pages of results.  I can also see that there are properties on the rows which are misleading, such as: _irow, _rclass, rnum.  I should really take note of that or ensure my solution allows me to exclude the visibility of those properties. 

I copied the URL for the original network event and removed the paging feature.  That results in a single web request for all the Florida facilities being returned in 4.3 seconds.  This will become my source of facility boxes (for the foreseeable future).

After flipping over to DataPort I realize I don't have any way to routinely import from Json.  No worries!  I took my last DataPort ImportDataFormatter and whipped up a new Json importer.

2017-09-21_21-08-07.png

I've now got some much needed functionality:

  1. I can select Json as a potential import source
  2. I can specify a webservice end-point
  3. I can specify a path to the records in the Json data
  4. I can build up a transformation map between the Json data and fields in Content Manager
  5. I can prevent the exposure of properties from the Json data
  6. I can format data to match CM requirements

When I run it via DataPort it behaves as expected....

2017-09-21_21-08-06.png

The boxes created during the initial import will now be updated or added to, without me having to do anything.  Furthermore, I can now import data from any Spreadsheet or Webservice I come across.  And all this cool DataPort work has me thinking bigger.

I'm now ready to start finding records to place within these boxes.  Before I start visiting websites of regulators, facility operators, and governmental bodies, I worry how I'm going to capture all the things I come across.  I might need some new tools to help me import in bold new ways......

 

Importing Spreadsheets (regardless of format)

This post continues my Mental Health Awareness Week project. Refer to the original post for background information about the goal of this project.

So now I have a very rudimentary dataset designed and ready to be populated with data.  I've also got a spreadsheet from the SAMHSA website, which contains all of the facility's meta-data.  There's just one small hitch for me personally: I don't have Excel, just Google Sheets.  I can upload it to Google Sheets, export it to a Tab Delimited file, and then import that file via DataPort.  

Later on down the track I'll receiving additional spreadsheets from different sources, so this work process actually kind of sucks.  I need a way to import a spreadsheet into Content Manager, but without needing to convert files before importation.  Also, who knows what format the files might be in.... Excel 2007, Excel 2013, OFD, CSV, etc.  

In my first post on this blog I created a custom import data formatter that automatically added a Byte Order Mark to Tab delimited files.  It was crazy easy to create that thing.  The same approach I took there can be taken here: build something within the DataPort framework that solves my problem.  

So I took 30 minutes and created a new spreadsheet data import formatter by using the OpenXml and Spreadsheet Light libraries.  This gives me a new option when using DataPort, as shown in the image below.

2017-09-14_11-44-55.png

After I've selected the newly available File Format of Spreadsheet and picked a file from my computer, DataPort shows me all of the available columns of data (this does require that the file have a header row with column names).  I can then match my columns just like I would for the out-of-the-box Tab Delimited Data Formatter.  Again, this file can be in any spreadsheet format!

2017-09-14_11-44-55.png

Once executed, DataPort will even show me exactly how many rows and columns were in the spreadsheet.  It then processes it just like it would for a tab delimited file.  The image below shows what the output looks like in DataPort.

2017-09-14_11-44-55.png

Yay!  I've not got some facility boxes in my CM instance.  :)

2017-09-14_11-44-55.png

Designing a CM Dataset

This post continues my Mental Health Awareness Week project. Refer to the original post for background information about the goal of this project.

It's time to configure my empty dataset!  This means using the CM features to structure my data in a way that makes it easy to find records for a mental health facility.  To me that means I must decide: do I create boxes or not?  To decide this I must analyze my data, the records, and CM features/behavior.

My sources of data include:

  1. A list of provider facilities (in excel)
  2. Florida Department of Children and Family regulatory records
  3. Images, forms, and documents from facility websites
  4. News articles culled from Google News

Since the ultimate goal is to relate the records to the facilities, we need to store the facility meta-data somewhere.  The meta-data includes: ID number, name, address, owner, and website.  Where do I store that?  The design of CM gives me three immediate, obvious options: a location, a classification, or a record type.  I've listed them in that order because that's the progressive order of building blocks within CM.  

  • A location defines a historical person, place, or thing.  There are pre-defined properties on a location for a name, address, ID number, and website.  When a record is created the location can be attached.  Users can then find all records by searching for the usage of the location.
  • A classification is akin to the dewey decimal system: a hierarchical taxonomy used to organize information.  There are pre-defined properties for the owning location, retention schedule, and security; all of which are copied to any new record associated with a record.  If each facility is created as a classification, then users can find all records by searching within the classification.
  • A record type is the definition of a type of record (go figure!).  They can be configured to use specific locations, classifications, meta-data fields, numbering patterns, and security.  They also have various behavioral settings/options, which dictate how the system reacts in certain situations.  If each facility had it's own record type, users can find all records by searching for the specific record type.

Thinking about the features as described above makes a few things obvious to me:

  1. Locations should be used to store the name of the facility owners, the governmental bodies & agencies (federal, state, and local), and nominated users of this system.
  2. Classifications should be used to organize the types of information I'll be gathering (regulatory, facility, publications, and reference material records).
  3. Record Types should be used to structure relationships between the given records.

As the title of this post indicates, I need to focus in on that last thought: structure.  I could easily store the facility ID on each individual record; and, as a result, have a flat structure with no nesting or containment.  The record's meta-data registration form could require the entry of the facility ID, as shown below.

2017-09-14_11-44-55.png

This approach limits my ability to relate the document to a second facility though.  So this approach just wouldn't work for this situation.  Instead, I'd swap out the Facility Id field for the Container field and then add the facility ID onto the container, as shown below. I can use the alternative containment relationship (not shown here) to relate a document to multiple facilities.  

2017-09-14_11-44-55.png

I've established that each facility should have at least one facility container.  Is that enough though?  How do I distinguish between regulatory records and facility records? Classifications and record types are my two options here.  If I create a few classifications (known as Categories in the US Commons lexicon), then I can require each document attach a predefined classification.  An example of this approach is shown below.

2017-09-14_11-44-55.png

With this approach I, as the user, must pick this for each document.  That's a lot of extra work at the document level.  And as the above screenshot shows, there are going to be many different types of regulatory records.  If I don't want to have to pick it per document then I need to move that information (the classification) somewhere else.  If I move the field up in the structure that would mean adding the classification to the folder.  

2017-09-14_11-44-55.png

So now, during document import, I would find the facility container matching the type (classification) of the record.  Using the above screenshot I can see that I'd need at least three containers for each facility: regulatory, facility, and news records.  The common thing between them would be the facility Id.  Clicking on a point on the map should result in a search for all the matching containers, each containing 0+ documents.  

If I stop here and don't progress further with the design I would have a viable solution.  However, it requires me to always search in order to find all records for a given facility.  No Bueno!  I really would like to be able to find a facility and browse within it.  That means doing to the container the same thing I did to the document: move the facility Id up one level.

2017-09-14_11-44-55.png

Voila!  Each facility will be created as a box, which will contain folders.  Those folders will then contain documents.  This is the most common design approach within CM.  As the solution progresses it may be necessary to tweak the design... but for now I can move on!