Importing Spreadsheets (regardless of format)

This post continues my Mental Health Awareness Week project. Refer to the original post for background information about the goal of this project.

So now I have a very rudimentary dataset designed and ready to be populated with data.  I've also got a spreadsheet from the SAMHSA website, which contains all of the facility's meta-data.  There's just one small hitch for me personally: I don't have Excel, just Google Sheets.  I can upload it to Google Sheets, export it to a Tab Delimited file, and then import that file via DataPort.  

Later on down the track I'll receiving additional spreadsheets from different sources, so this work process actually kind of sucks.  I need a way to import a spreadsheet into Content Manager, but without needing to convert files before importation.  Also, who knows what format the files might be in.... Excel 2007, Excel 2013, OFD, CSV, etc.  

In my first post on this blog I created a custom import data formatter that automatically added a Byte Order Mark to Tab delimited files.  It was crazy easy to create that thing.  The same approach I took there can be taken here: build something within the DataPort framework that solves my problem.  

So I took 30 minutes and created a new spreadsheet data import formatter by using the OpenXml and Spreadsheet Light libraries.  This gives me a new option when using DataPort, as shown in the image below.

2017-09-14_11-44-55.png

After I've selected the newly available File Format of Spreadsheet and picked a file from my computer, DataPort shows me all of the available columns of data (this does require that the file have a header row with column names).  I can then match my columns just like I would for the out-of-the-box Tab Delimited Data Formatter.  Again, this file can be in any spreadsheet format!

2017-09-14_11-44-55.png

Once executed, DataPort will even show me exactly how many rows and columns were in the spreadsheet.  It then processes it just like it would for a tab delimited file.  The image below shows what the output looks like in DataPort.

2017-09-14_11-44-55.png

Yay!  I've not got some facility boxes in my CM instance.  :)

2017-09-14_11-44-55.png

Designing a CM Dataset

This post continues my Mental Health Awareness Week project. Refer to the original post for background information about the goal of this project.

It's time to configure my empty dataset!  This means using the CM features to structure my data in a way that makes it easy to find records for a mental health facility.  To me that means I must decide: do I create boxes or not?  To decide this I must analyze my data, the records, and CM features/behavior.

My sources of data include:

  1. A list of provider facilities (in excel)
  2. Florida Department of Children and Family regulatory records
  3. Images, forms, and documents from facility websites
  4. News articles culled from Google News

Since the ultimate goal is to relate the records to the facilities, we need to store the facility meta-data somewhere.  The meta-data includes: ID number, name, address, owner, and website.  Where do I store that?  The design of CM gives me three immediate, obvious options: a location, a classification, or a record type.  I've listed them in that order because that's the progressive order of building blocks within CM.  

  • A location defines a historical person, place, or thing.  There are pre-defined properties on a location for a name, address, ID number, and website.  When a record is created the location can be attached.  Users can then find all records by searching for the usage of the location.
  • A classification is akin to the dewey decimal system: a hierarchical taxonomy used to organize information.  There are pre-defined properties for the owning location, retention schedule, and security; all of which are copied to any new record associated with a record.  If each facility is created as a classification, then users can find all records by searching within the classification.
  • A record type is the definition of a type of record (go figure!).  They can be configured to use specific locations, classifications, meta-data fields, numbering patterns, and security.  They also have various behavioral settings/options, which dictate how the system reacts in certain situations.  If each facility had it's own record type, users can find all records by searching for the specific record type.

Thinking about the features as described above makes a few things obvious to me:

  1. Locations should be used to store the name of the facility owners, the governmental bodies & agencies (federal, state, and local), and nominated users of this system.
  2. Classifications should be used to organize the types of information I'll be gathering (regulatory, facility, publications, and reference material records).
  3. Record Types should be used to structure relationships between the given records.

As the title of this post indicates, I need to focus in on that last thought: structure.  I could easily store the facility ID on each individual record; and, as a result, have a flat structure with no nesting or containment.  The record's meta-data registration form could require the entry of the facility ID, as shown below.

2017-09-14_11-44-55.png

This approach limits my ability to relate the document to a second facility though.  So this approach just wouldn't work for this situation.  Instead, I'd swap out the Facility Id field for the Container field and then add the facility ID onto the container, as shown below. I can use the alternative containment relationship (not shown here) to relate a document to multiple facilities.  

2017-09-14_11-44-55.png

I've established that each facility should have at least one facility container.  Is that enough though?  How do I distinguish between regulatory records and facility records? Classifications and record types are my two options here.  If I create a few classifications (known as Categories in the US Commons lexicon), then I can require each document attach a predefined classification.  An example of this approach is shown below.

2017-09-14_11-44-55.png

With this approach I, as the user, must pick this for each document.  That's a lot of extra work at the document level.  And as the above screenshot shows, there are going to be many different types of regulatory records.  If I don't want to have to pick it per document then I need to move that information (the classification) somewhere else.  If I move the field up in the structure that would mean adding the classification to the folder.  

2017-09-14_11-44-55.png

So now, during document import, I would find the facility container matching the type (classification) of the record.  Using the above screenshot I can see that I'd need at least three containers for each facility: regulatory, facility, and news records.  The common thing between them would be the facility Id.  Clicking on a point on the map should result in a search for all the matching containers, each containing 0+ documents.  

If I stop here and don't progress further with the design I would have a viable solution.  However, it requires me to always search in order to find all records for a given facility.  No Bueno!  I really would like to be able to find a facility and browse within it.  That means doing to the container the same thing I did to the document: move the facility Id up one level.

2017-09-14_11-44-55.png

Voila!  Each facility will be created as a box, which will contain folders.  Those folders will then contain documents.  This is the most common design approach within CM.  As the solution progresses it may be necessary to tweak the design... but for now I can move on!

National Mental Health Awareness week is coming!

In honor of the upcoming National Mental Health Awareness week, I'll be focusing my blog post on the build-out of a new SAMHSA records portal powered by Content Manager.  There are no requirements and I'm not doing this for monetary gain, so I have free reign to do whatever I want!  I want to build something that both demonstrates the flexibility of Content Manager whilst also providing functionality not yet available to the public.

My home state of Florida has what we colloquially call the Sunshine Law.  This law requires all public records to be made available to citizens.  It, unfortunately, isn't so easy to actually find these records though.  Let's see if we can make it easier.

My goal is to re-create this:

2017-09-14_11-44-55.png

This is a site run by the federal government, so it naturally doesn't have any records from my home state.  What I'd like to do though is to gather as many public records as I can for each facility.  Hopefully to include: applications, financials, complaints, and news articles.  I envision clicking on a facility and seeing all of the records I can find out in the wild.  If I can code it to have a public records request submitted for the facility, even better.

Upcoming topics include:

  • Designing a dataset -- what should I create to support the end-result? 
  • Importing from Spreadsheets -- how do I take this list of facilities and create a structure?
  • Culling google news -- what's involved with dynamically searching google news for a given facility and automatically saving articles into CM?
  • Exposing a map in Webdrawer -- how can I achieve the above image within Webdrawer?
  • Finding Complaints -- these aren't available without a public records request, so I need to build something to automate the submission of a request per facility... and then how to import those into my instance

There's so much more that can be done, but I feel this is a good topic for learning and development.  Stay tuned!