Recently, I’ve been working on a medical provider (doctors, pharamcies, hospitals) matching system at change:healthcare. We have a clean/pristine medical provider table, but we have claims from insurance companies that don’t match up right. Most of the unmatched claims come from mis-spellings, mis-matched unique identifiers, or other small typos. By matching the claim to a provider in our table, we can then use the claim in our average cost and our savings calculations.
Currently this is being done by hand, we export a SQL query to CSV file and distribute that to a team of people to manually try to match through a simple search interface that already exists on the site. The goal is to make this less manual by offering an “admin area” where the team of people can view these unmatched claims, then click one and see suggestions for matches from our provider table. While this is still a manual process, it is much less time intensive and it also provides the first step towards possible automation in the future.
Let’s take a look at some of the difficulties I’ve had to address while work on this new feature:
The original SQL query I was given consisted of three UNION select statements, the most complex one containing GROUP BY and HAVING in the query. My original goal was to break the UNION statements into named scopes joined by OR statements. We already had searchlogic plugin installed so we went with that, it supports using OR for named scopes.
My approach was to break the three select queries into individual queries and run a count on the results. I then created the three named scopes and ran each of those and compared the counts. Everything was perfect! I was pretty excited… Until I joined them all with the OR statements. It turns out when you use a named scope and OR, the GROUP BY and HAVING statements apply to the whole query, not just the one part. The counts were off.
So I went back to the drawing board, but I had new information in hand, the queries were taking WAY to long to run during a normal web request cycle, on my small dataset the UNION took 50+ seconds and the OR statement took around 45 seconds.
The new approach was shaping up using find_by_sql and we would run it nightly via cron to fill up a new table that housed all the unmatched medical providers and the medical claims they came from. By creating a separate table and model to fill, it allowed us to add database indexes and nice named scopes.
Pre-caching into a regular model also allowed us to create a “normal” controller to interact with our unmatched providers. The benefit with that is we quickly built out the easy actions/views so we could focus on search and the match making.
The last key concern was to avoid double working. Medical Provider matching is mostly done in an automated fashion, but some of the data is so mangled that we have to manually match the rest. It is a whole team effort and it is dreaded by most in the office, so we wanted to make sure the list of unmatched providers we show you is a list you can work on. The intended idea was to use pessimistic locking and SELECT FOR UPDATE to ensure we didn’t show an unmatched provider in the index if someone was working on it.
That didn’t work like we wanted. We still kept the SELECT FOR UPDATE on the edit query to ensure integrity, but we also added an in_use timestamp and a named scope (not_in_use, I know, real original) to ensure we don’t show any records that are being matched up by someone else.
I think the great part about working on a more out of the box feature like this is that you get to explore different ideas and try new things, ultimately learning how to use everyday tools better. This is only my second time using locking, and I had no idea that mysql didn’t offer a row level lock that stopped reads from happening.