
This is our second blog on cTakes. Read our intro blog here. (https://techno-soft.com/ctakes-and-extraction-of-patient-information.html). In this blog, we share how was our cTakes implementation experience. Was cTakes able to reliably convert non-structure data into Structured data? And Other relevant observations.
cTAKES Setup
The pre-built dictionaries shipped with the current release of cTAKES are extremely limited on its own because they do not contain an extensive amount of codes that belong to different coding vocabularies.
In order to create our own customized dictionary, we had to follow the process that we have mentioned below:
- We had to sign up for a UMLS account and subsequently download the MetamorphoSys tool from the UMLS. It enabled us to create our own subset of codes by choosing the different coding vocabularies (SNOMED, ICD10, RXNORM, and CPT, etc) that we were interested in and therefore, it generated files in a directory.
- At Technosoft using the cTAKES dictionary creator, we selected this directory as an input directory and it created a SQL script. Now the problem with the generated script was that it was meant to be run on an HSQLDB database, which is an in-memory database.
- In order to convert the same SQL script into another SQL database-compliant script, we had to use another tool.
- After we had created an SQL script and run it on our own database, we made a configuration change to point to our own database.
- We observed that the codes were being returned from our database (as expected) in the CAS Visual Debugger when we processed some clinical notes. Therefore, cTAKES was able to convert non-structured data into structured data.
Implementing REST APIs on top of cTAKES
cTAKES doesn’t expose itself as a RESTful service and rather includes a set of GUI applications, so we had to create a Spring Boot application that exposed an API for processing clinical notes. The API used the libraries (that are part of cTAKES) in order to process clinical note that was sent in the request payload and returned a customized response back to the client.
Performance
Response times are directly dependent on the length of content in the raw unstructured note. In our case the API was taking from 30 – 45 seconds to respond back with structured data.
Memory Intensive
The issue that we were facing with cTAKES was that it consumed a lot of memory of about 2.5 GB.
We found out that cTAKES uses an in-memory database for custom dictionaries. To solve this problem, we used another tool to tailor the SQL script towards PostgreSQL and made a relevant configuration change to point to the database. But doing this only saved us about 500MB of memory and it was still using about 2GB worth of memory to function.
Aside from these small issues, overall cTakes produced very impressive results and were able to intelligently spot and separate discrete values from nonstructured data. Technosoft NLP and NLU team is currently using cTakes for customer use and we are very excited to put this tool to enhance patient care.