Melting Data in Python to Discover $6 Million in Missed Risk Adjustment


It was the last week of the year. Validate Health was retained to validate the risk adjustment submission for a medium-sized Medicare Advantage plan. We had until the end of the week to recalculate risk adjustment for over 50,000 beneficiaries and millions of claims. The pressure was mounting. We needed a way to quickly process many files while performing a complex risk adjustment calculation.

In my last ACTEX webinar Python for Actuaries: Intro to Data Analysis, I introduced the core Python programming language and basic data analysis using pandas. We learned that core Python has a user-friendly syntax and solves a wide array of programming problems. We saw that pandas focused on data manipulation. It is a library written in Python focused on replicating many of the data analysis capabilities of Excel and R.

Risk adjustment requires the annual collection of all member diagnoses and calculation of a risk score, in this case for a Medicare Advantage plan. The risk adjustment data is separated into columns of diagnosis clusters. Ten diagnosis columns form a cluster and members can have multiple rows of diagnosis clusters. The combination of multiple rows and multiple columns of diagnoses poses a problem for reconciling risk adjustment. It is much easier to reconcile diagnoses if there is only one diagnosis column.

One of the advanced features of pandas is the ability to melt and reshape data. Melting data converts from a wide format with many columns to a narrow format with only a few columns. Melting data requires that identification columns and value columns be defined. The identification columns are a combination of columns that uniquely determine a row of data.

Using pandas to melt the diagnosis clusters in the risk adjustment files into individual diagnoses allowed us to simplify the reconciliation. Discrepancies in risk adjustment in excess of $6 million were discovered. In my next ACTEX webinar, Python for Actuaries: Advanced Data Analysis Techniques, I’ll be showing how to use pandas to melt pivoted data to better analyze healthcare claims data.

Andrew M. Webster, M.S., ASA, MAAA

Founder, actuary and licensed broker at Validate Health, a member company of the HealthTech incubator MATTER in Chicago, IL.  Validate Health provides cloud-based actuarial analytics and reinsurance procurement services to Accountable Care Organizations.

Andrew is also webmaster and predictive analytics track lead at the Chicago Actuarial Association, mentor for the Society of Actuaries non-traditional internship program and co-organizer of the Chicago InsurTech Meetup.

Andrew has been writing Python code since 2008 when he was employed as a software programmer at a leading Electronic Medical Records software company.  He has used Python to scrape websites for data, automate data migrations, perform data analytics, fit statistical models and write website applications.  His company, Validate Health, was named one of twelve Python-based companies selected for the 2016 PyCon Startup Row.  Andrew believes that Python will be a valuable skill for Actuaries to master for future employment in emerging InsurTech jobs.