Data Preparation

Model your data as a series of process observations or measures that are associated with an outcome of interest.  Compose each observation as a common set of features (aka., independent variables or factors) along with any associated outcome.  Both features and outcomes are either numerical (dates, times, ages, etc.), binary (yes/no, true/false, etc.) or categorical (Gender, Service line , Floor unit, Shift, DRG, etc.).


Learn more »

Example Applications

Implementation simplicity with real-time insights for everyone


Example 1 - Anomaly Detection

The Anomaly Detection Engine (ADE) AI application enables anyone familiar with a comma delimited file to discover anomalies and build tailored applications rapidly, repeatedly, and to scale. 

You provide the data.  The ADE provides a .csv file output readable by most analytical and business graphics tools.

Business Intelligence tools are great if you know the right questions to ask and have copious amounts of time to analyze all the statistical reports and dashboards.  However, you likely don’t.

The ADE can provide you with an easy way to discovery data anomalies without requiring you to know what to look for. 

  • Quickly identify outliers in your data.
  • Find the events and feature values that may be influencing risk, costs, outcomes, quality, etc.
  • Visualize the specific factors within the data that contribute to anomaly classification.
  • Read the results directly with your business analytics platform

In today’s world of distributed systems, managing and monitoring a complex system’s performance or outcomes is a chore. With hundreds or even thousands of items to watch, anomaly detection can help point out where trouble is lurking while speeding findings of root cause analysis.  This is especially true if you can associate anomaly findings with outcomes of interest.

Sample Anomaly Detection on a simulated assembly line dataset.

  • Workflow processes can often be characterized by measuring milestone event timing, such as interarrival times in a process workflow, or qualitative factors in a production line. 
  • The ADE can identify process metric anomalies and their associated data values.
  • Use this information to improve overall quality measures by creating new specialized workflows or controlling process variables to reduce process event variablity.

Dataset (sample from 10,000 rows)


Anomaly Detection Engine User Interface


Instances of anomalous data are displayed in the results area.  

You may also display the factors associated with the 'anomaly' classification.


Example 2 - Classification/Prediction

Sample COVID-19 Critical Care Study for Predicting Need for Mechanical Ventilation

The ability to accurately predict the number of needed ventilators is one important aspect in the managment of this disease.  Machine Learning has the ability to leverage the various data points needed to create a predictive model.  This study was based on statistical data reported by the ICNARC report on COVID-19 in Critical Care on 04-APR-2020.  Table 2 of this report characterizes those critical COVID-19 patients who needed 'Basic' respiratory intervention and those who needed 'Advanced' intervention (need for mechanical ventilation).  A patient dataset was synthesized (n=2000) based on the published statistics and a predictive model for the type of respiratory intervention was generated.


Cross-validation metrics for the learned Multi-class Classification model for the ability to classify the need for advanced respiratory support


 Average MicroAccuracy: 0.823    Standard deviation: (0.013 - Confidence Interval 95%)
 Average MacroAccuracy: 0.758   Standard deviation: (0.021 - Confidence Interval 95%)

 Below is a chart that depicts the predictive strength of various factors that were measured within the cited population study.



Example 3 - Prediction

Sample HCAHPS Quality Ratings Study

The sample below shows encounter data coupled with healthcare HCAHPS survey results.  In this example, all encounter data features are categorical.  Use as many categories (columns) as you wish; however, use only the minimum number of categorical values as possible within each category.  For instance, if the answer to a question is 'sometimes' or 'usually' that answer should be rolled up to a single category.   In our example below responses of 'sometimes''usually, and 'never' were rolled up to a category of 'Other'.  A good practice is to not use categorical values that do not help distinguish between outcomes of interest.  In this case we were only interested in distinguishing leading factors between 'Always', the only desired response, and all other responses.

This is the question:

"During this hospital stay, how often did nurses explain things in a way you could understand?"

So what are the results?

Although Table 1 below consists of only 100 samples and 8 encounter features it is difficult to manually identify which factors most distinguish the undesired response of 'other' from the desired response of 'Always'.  View the table below.  For us humans there are too many inputs, too many outputs, too many anomolies and randomness.  We would never be able to determine, from looking at the data, what data relationships can predict the outcome.   However;  machine learning is able to train on 80% of this data and predict the response label (outcome) of the other 20% with 96-100% accuracy and tell you what findings it used in prediction!

We can use a series of rules derived from machine learning to implement process interventions designed to improve patient satisfaction.  For example; we may learn that a particular service line and staff shift are strong factors in a negative survey response.  One can then focus on those process areas for a positive impact on survey results.

 Table 1 - Simulated encounter and HCAHPS results data

  Q3 Survey Data

 See the findings used in prediction of survey results!


Standard Data Preparation

  1. Prepare a comma delimited file.
  2. The first row shall contain a Header so that each column (feature) is a label for the type of data in the column.  The label for the first column in the Header is the label for the row index column.  It can be anything you like, such as ‘Row#’ or ‘INDEX’, etc.   All header labels shall begin with a letter.
  1. The left most column in data rows ‘Column1’ must contain the unique row identifier.  This identifier must begin with a letter.
  2. For Anomaly Detection, the second column ‘Column 2’ must contain a group name.  This name must begin with a letter.  The group name will cause the system to only look for anomalies within a common group name.   For example, this could be a user login name.  In this case, anomalies within each group will be identified separately.  Rows can be sequenced in any order so various group names can be interspersed among rows.  If the entire dataset should be treated as one group the group name shall be the same on each row of data,
  3. Columns 2 to ‘n’ shall contain categorical feature data only.  All data elements shall begin with a letter. 
  4. If supplying data with a categorical  'Outcome' append that column as the right most column.
  5. FoundationDx can automate the data preparation step as part of a non-recurring development charge.


Categorical features must begin with a letter and are those where there is no order between the possible values for the variable (i.e. there is no order relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are just distinct.




HTO Magazine