Healthcare Analytics with Cerner: Part 2 - Cerner Millennium Data Model

Healthcare Analytics with Cerner: Part 2 - Cerner Millennium Data Model

This is the second part of my blog series, dedicated to healthcare analytics with Cerner.
In the first part we've looked at different options on how to extract data from Cerner and this time we will focus on Cerner Millennium Data Model.

If you do not have an account yet at ucern.com, go ahead to open one now (customers only). You will find a great community and a lot of useful resources which I will be referencing in my posts.

Cerner Millennium database (powered by Oracle RDBMS or more often Oracle RAC) is the core element of Cerner Millennium 3-Tier Architecture (Database - Middleware - Client). All Cerner apps and modules rely on data stored in Cerner Millennium database and great efforts are made to make it fast, reliable, secure and highly available. It is important to keep this in mind next time you query Cerner and try to minimize an impact of your work on this critical piece of infrastructure.

This is how Cerner Millennium Data Model looks like:

Looks scary, ha? Well, this picture shows only a fraction of 1000s of tables and every table has dozens and hundreds of columns. Who in the world can comprehend and learn something like that? The good news you can! The bad news it is not going to be an easy task and I hope my post will help you navigate this ocean of tables.

Core tables

Since Cerner's ocean of tables is so vast, we have to start with a pond (to continue the analogy) and slowly fill it with core tables, you will be using over and over again. Cerner does not define these core tables in the documentation but I am going to share my own list solely based on my experience and projects I've worked on. Knowledge of relationships between these core tables and the basic data elements they store certainly helps to understand the data model and make your job easier.

Cerner Millennium database is a relational OLTP database in the 3rd normal form (with some tables being an exception from this rule). Cerner calls its database design "person-centric" because

at its core are individuals-patients and associated information that is broader than a single episode or encounter of care.

In my experience, though it is all about patient encounters, at least for analytics. The majority of the tables you will be using would have direct or indirect reference to a patient encounter table (using encntr_id key) and clinical staff is often using patient FIN number which is a financial encounter number tied to a unique patient visit and organization.

For this reason, let's start with encounter table first.

It is all about encounters.

Encounter table is the starting point for your analysis. All the patient encounters (visits) are stored and constantly being updated with events like registration, admission, transfer to ED or nursing units, discharge and etc. Since data model is in 3rd normal form, you won't find all the little details about these events but you use this table to pull all other details you need. Charges and payments also tied to encounters as well as lab results, orders, drugs administration, alerts, vital signs, physician notes, diagnosis codes and other classifications (ICD, CPT, DRG, SNOMED etc.) and discharge summary. Take some time to study this table and the best way, of course, is to look at a sample of data. Oracle SQL query below will pull all the encounters for patients, discharged in the last 3 days:

-- Pull all the encounters for patients, discharged in the last 3 days:
SELECT * 
FROM encounter
WHERE
   disch_dt_tm > sysdate - 3

I am using Oracle SQL not CCL here and further down in the text. I mentioned in the first part of my post, that CCL in our experience is not the best choice for BI/Analytics projects but you can easily modify these examples to use CCL instead. CCL uses a flavor of Oracle SQL syntax and a lot of times you can run Oracle SQL code as it was CCL.

If you have not decided on a good SQL client, I highly recommend totally free and pretty good Oracle SQL Developer client. And if you do that, subscribe to That Jeff Smith blog and read most of his posts so you can quickly master it. The one feature that you need to get into a habit of using is running Explain plan (memorize F10 shortcut) and spend some time learning how to understand it and tune SQL queries. Physicians and nurses will thank you (even if they do not know you) as you will be helping to keep your Cerner Millennium database running fast.

Open data model and find encounter table there. Now with a sample of data and a data model in front of you, you are equipped to start learning. You won't understand much at this point and it is okay. You will be back doing this again and again and every time learning more and more.

The great thing about Cerner Data Model is that Cerner Data architects put a lot of thoughts (and I bet time!) to design tables that follow certain patterns and use the same naming conventions which will make your job a lot easier. Every new table you add to your arsenal will look familiar thanks to these design patterns and naming conventions. There are some exceptions and as I've read at uCern forums they normally happen when Cerner Development teams won't involve Data Architects. Also, keep in mind that Cerner database was designed with high transaction concurrency in mind - it was never built as an analytical data store. If you thinking about data acquisition and analytics strategy, I highly recommend building your own EDW or using more affordable and agile architecture with QlikView QVD extracts and Qlik Marts. It might be very tempting to continue getting data directly from Cerner without putting a good strategy in place, but as your analytics needs grow it will come back to bite you - I promise!

Back to the patterns and naming conventions. Using encounter table as an example, you will see that columns are named consistently:

  • Columns use traditional for Oracle underscore notation such as person_id, disch_dt_tm, encntr_type_cd etc. Oracle column names are case-insensitive and you will see them upper-cased in the documentation but I prefer to use lower case for readability reasons (I like to upper-case SQL statements and functions).

  • Every column name has a standard suffix to help identify the meaning of the value stored in it:

  • _id: Primary key on the table (encntr_id) or a foreign key to another table (e.g. person_id or organization_id).

  • _cd: Code value (it is a whole number). This value can be looked up in code_value table that contains textual descriptions. For example, encntr_type_cd column contains values that you can look up on code_value table and translate to Inpatient or Outpatient encounter type. And encntr_type_class_cd column classifies patients into more general groups than encounter type. (e.g. emergency, recurring outpatient). We will look at code_value table in a bit as it deserves more attention.

  • _dt_tm: A column with date/time value. We will look at how Cerner handles date and time later. For example, reg_dt_tm captures when a patient was registered or admitted.

  • _ind: Yes (1) or No (0) value. For example, active_ind column contains 1 for active rows. More about active rows and effective dates later.

  • _flag or _flg: Numeric flags - you can look up the exact meaning of the flag values in dm_flags table.

  • _key or _key_nls: textual data (name or description) which is upper-cased with special characters and spaces removed. Used for indexing or search. NLS formatted version is stored in _key_nls column.

Almost all the tables have internal columns that help audit who and when created a row (either a person or an application), who and when updated a row and whether data was created with authenticated or unauthenticated interface (HL7 ADT feeds). I will just list these columns below and you can check them in data model documentation:

  • create_dt_tm and create_prsnl_id
  • updt_dt_tm, updt_applctx, updt_cnt, updt_id, updt_task
  • data_status_dt_tm, data_status_cd, data_status_prsnl_id

updt_dt_tm is the one I used to use for incremental loads but Cerner came up with a better technique recently we are in a process of testing.

Now that you've learned about some basic elements and naming conventions, you can start writing your first queries against encounter table like below:

-- How many patients were registered in the past 3 days?
SELECT
  COUNT(e.encntr_id) as "Patients"
FROM encounter e
WHERE
      e.reg_dt_tm > sysdate - 3
  AND e.active_ind = 1
;

-- Count patients discharged in the last 30 days by discharge disposition
SELECT
  cdd.DISPLAY as "Discharge Disposition",
  COUNT(e.encntr_id) as "Patients"
FROM encounter e
INNER JOIN code_value cdd
  ON cdd.code_value = e.disch_disposition_cd
WHERE
      e.disch_dt_tm > sysdate - 30
  AND e.active_ind = 1
GROUP BY
  cdd.DISPLAY
;

Got a code value? Look it up!

code_value table is a very important table and you will use it a lot. It stores data dictionary or reference values for all the columns with _cd suffix. Cerner promotes 3rd normal form database design and keeps most of the captions for descriptive elements in that table. Some examples are gender, race, status code, discharge disposition, organization type, nursing unit type etc.

code_value values are grouped into related sets, so codes relevant to vital signs or encounter class codes will belong to the same set which is stored in code_value_set table.

There are two typical scenarios using code_value table:

  • Look up a description for a given code. We've looked at some examples with encounter table codes earlier. There three description columns available on code_value table - you can use display column (a short caption that users would normally see in the applications), description column (more descriptive and longer version of display or definition column (a long definition). Normally you would want to use display value.

  • Filter data using specific code (e.g. only final coded ICD) or group of codes (e.g. inpatient and outpatient encounters). This one is bit trickier. You do not want to "hard code" actual code value doing something like WHERE encounter.disch_disposition_cd = 123 because hard code is bad (every developer knows that, right?). If your organization runs multiple Cerner Millennium instances, code values might be different across domains. You might think of using actual display value WHERE cdd.disch_disposition_cd = 'Home - 01' but this even more unreliable because someone can change that description tomorrow and your code stops working properly.

The better practice is to pull that code indirectly, using code_set and cdf_meaning columns from code_value table:

-- Number of patients discharged to home for the past 30 days
SELECT
 count(e.encntr_id) as "Patients Discharged Home"
FROM encounter e
INNER JOIN code_value cdd
  ON    cdd.code_value = e.disch_disposition_cd
    AND cdd.code_set = 19
    AND cdd.cdf_meaning = 'HOME'
WHERE
      e.disch_dt_tm > sysdate - 30
  AND e.active_ind = 1
;

In this example, I figured out first that codes describing discharge dispositions belong to code_set 19 and the meaning of the code with display value Home - 01 has cdf_meaning= HOME. Even if someone will change display value tomorrow to something like Sent home - 02 your code will still work properly because they cannot really change code_set and cdf_meaning value. It is also safer if you run multiple instances of Cerner. By the way, code_set value corresponding to a column can be found in the data model document.

Another interesting table is code_value_alias. Think of it as a mapping table that stores original to organization codes which are then conformed into single code on code_value table. If you have 10 hospitals and every one has it is own code for patient gender (Male vs. M or 1), your build team would map all these codes to single trusted value/description which you pull from code_value table. I do not really use this table that much, but it is interesting to know how data is fed from all other organizations to your Cerner Millennium via HL7 feeds. Talk to your interface team to learn more about this - it is fascinating what these guys are doing using products like Infor Cloverleaf.

Speaking of aliases

Very often your clinical and financial folks will refer to a specific patient by FIN number (financial encounter number) or MRN/CMRN (Medical Record Number specific to an organization or community MRN). Cerner also stores PHI data like SSN, driver's license number etc. This data is stored on encntr_alias and person_alias tables and can be joined to encounter table by encntr_id and person table by person_id.

Here are a few examples just to give you an idea - the actual values of codes and cdf meaning might be different on your system:

-- MRN of a patient
SELECT 
 p.NAME_FULL_FORMATTED AS "Patient Name",
 pa.ALIAS as "Patient MRN"
FROM person p

LEFT JOIN person_alias pa
    ON  pa.person_id = p.person_id
    AND pa.end_effective_dt_tm > SYSDATE
    AND pa.active_ind = 1
    
LEFT JOIN code_value cvat
    ON  cvat.code_value = pa.person_alias_status_cd
    AND cvat.code_set = 4
    AND cvat.CDF_MEANING = 'MRN'

WHERE
    p.person_id = 21024095
;
-- Patient Encounter FIN number
SELECT
  e.encntr_id,
  ea.ALIAS AS "Patient FIN"
FROM encounter e

LEFT JOIN encntr_alias  ea
    ON  ea.encntr_id = e.encntr_id
    AND ea.end_effective_dt_tm > SYSDATE
    AND ea.active_ind = 1

LEFT JOIN code_value cvea
    ON  cvea.code_value = ea.encntr_alias_type_cd
    AND cvea.code_set = 319
    AND cvea.cdf_meaning = 'FIN NBR'

WHERE 
    e.ENCNTR_ID=97611312
;

Notice how we limited data on person_alias and encntr_alias tables. Some tables need these additional filters to pull only current rows and the best way to do that is to use active_ind and end_effective_dt_tm columns. Unfortunately, it is not something documented by Cerner and you have to check for yourself and just learn from your experience.

But I want more data!

Knowing patient encounter data, you can bring more data for your analysis by joining encounter to other table using appropriate foreign keys:

  • person table contains patient data like name and date of birth.
  • prsnl table holds data for personnel and there is an alias table prsnl_alias (to look up Provider NPI for example).
  • person_prsnl_reltn personnel related to a patient. For example, this table can be used to look up patient's Primary care physician.
  • encntr_prsnl_reltn can be used to look up Admitting or Attending physicians.
  • address and phone tables are used to store various types of addresses and phone numbers for patients and physicians.
  • organization table contains details about hospitals (facilities), outpatient practices, vendors, payers etc.
  • location, nurse_unit, room and bed tables can help you determine exact location of your patient in hospital down to a room and a bed.
  • orders,order_catalog, order_action, order_detail and order_ingredient are used to pull orders, alerts, drugs administration and procedures.
  • diagnosis and nomenclature tables keep various classifications like CPT and ICD and can be tied to an encounter. For example, you can pull all the ICD-10 codes which are final coded or only ones present on admission.
  • pathway and pathway_catalog store details about powerplans.
  • clinical_event table (my personal favorite that deserves a dedicated blog post!) accumulated thousands of events per patient encounter for every day of stay such as vital signs, drugs administration time, lab results and so much more. It is huge as well!
  • there are quite a few tables specific to revenue cycle, GL, radiology, emergency department, labs and etc.

Terry Byrne put a great list here to describe commonly used tables and columns and whether to filter inactive rows on certain tables. I wish I had this document 5 year ago when I just started learning Cerner or better I wish Cerner would have something similar.

Ok, but how do I find data I need?

We just scratched the surface and while I cannot teach you how to acquire every single data element you need, I wanted to show you the basics to get you started. The reality is this is not an easy task to track down specific table and column. I wish there was a magical document that would map application interface elements and controls to a data model but such document does not exist (yet?)

In the meantime here are some options for you:

  • Download Cerner Millennium Data Model here. Start with the tables we've looked at and read the help page that comes with it. Note the references to pdf files (ERDs) which are extremely helpful to figure out relationships between tables visually. Yes, it will be overwhelming at first but you do not have to understand everything. It is not a perfect document but I am really thankful we have it. Kudos to Tracy Boyd, Bob Ross and others at Cerner who made it possible and continue sharing and helping Cerner user community.

  • Search in all data model files for specific keyword - text editors like Notepad++ or my new favorite Visual Studio Code allow to quickly search through all files in a folder and if you are lucky you might find what you need and then test your guess:

  • Another hefty trick is that you can find Cerner tables and columns descriptions along with some useful metadata if you query dm_tables_doc and dm_columns_doc tables. I do not know if these tables are kept up to date though.

  • Get to uCern CCL community forum and search for answers or post your question there. Or better find CCL experts in your organization.

  • Reverse-engineer PowerInsight or DA2 queries as I mentioned in Part 1 of the blog series.

  • You might have some luck with your Cerner DBAs to run a database profiler tool that can output SQL trace logs while you are pressing buttons in PowerChart in a controlled test environment.

  • I've read about this but never used myself this exotic thing called "Millennium Troubleshooting Assistant (MTA.exe)" that supposedly can gather logs on activity and requests. CCL will be part of these logs which means you can reverse-engineer it to figure out what tables and columns were hit.

  • Finally you can always open SR (Service Request) with Cerner to request assistance but I am not sure if this type of support is offered to all the customers.

Triple check your data

Now once you have your query, it is important to validate it. Do at least a few random tests with different boundary cases and a few with some typical cases. Make sure to check data against Cerner apps like PowerChart. Your users will expect data to match exactly.

I hope you learned something new today.

Next time I will discuss some common data acqusition and ETL challenges. Stay tuned!