Klurig Analytics has developed a fully functional prototype for an end-to-end clinical decision support system (CDSS), using bioinformatics, social network analysis, recommendation systems and a graph database (neo4j). The high level idea was to place the CDSS in a medical clinic’s workflow, connected to an EHR systems and accounting system (via ICD-10). When a patient arrives at the clinic with a set of symptoms, our system automatically combines the patient’s EHR records with the patient’s vitals and symptoms, then queries the database and use the results to provide a probabilistic differential diagnosis.
On the back-end, the CDSS reads in a large amount of medical text (medical textbooks, pubmed etc). Using NLP, the data is parsed into sentences and terms (n-grams). Using Snomed CT, ICD-10 and other UMLS ontologies together with machine learning, medical terms (nodes) are classified as either symptoms, disorders or treatments. Between the terms, relationships are constructed with various strengths. Together, nodes and edges are stored in a k-partite graph, housed in a graph database (neo4j). The database had about 3 million nodes, comprised by terms and complete ontologies of Snomed CT and ICD-10, and 11 million relationships.
When new knowledge is read in, the CDSS either builds new nodes and relationships, or if the knowledge is already known then the CDSS strengthen current relationships. If there is no new information about a particular disease for some period of time, relationship strength within that disease starts to decay. In a sense, this is similar to how the brain learns and forgets information.
An advantage of the graph is that you can very easily query the database on symptoms to get a disease, and you can go backwards by querying the database on disease to see likely symptoms.
On the front-end, an R Shiny user interface was built to allow physicians to search for symptoms, diseases and treatments, using various severity of symptoms. Internally to searching, we were looking at Markov chains to facilitate probabilistic searches.
Query results are organized using an algorithm that takes variables such as symptom severity, symptom count and relationship strengths into account. The results from a query were optimized using a model based on supervised machine learning (SVM).
Feedback from physicians included that the CDSS was very fast, very flexible, very easy and fun to use.
Technologies used: Ontologies (Snomed CT, Mesh, ICD-10), Bioinformatics, Neo4j, R, NLP, machine learning.
Medical knowledge graph where data is read for a wide variety of sources including UMLS ontologies: