Contact Us

Contact Us

This is our second blog on cTakes. Read our intro blog here. (https://techno-soft.com/ctakes-and-extraction-of-patient-information.html). In this blog, we share how was our cTakes implementation experience. Was cTakes able to reliably convert non-structure data into Structured data? And Other relevant observations.

cTAKES Setup

The pre-built dictionaries shipped with the current release of cTAKES are extremely limited on its own because they do not contain an extensive amount of codes that belong to different coding vocabularies.

In order to create our own customized dictionary, we had to follow the process that we have mentioned below:

  1. We had to sign up for a UMLS account and subsequently download the MetamorphoSys tool from the UMLS. It enabled us to create our own subset of codes by choosing the different coding vocabularies (SNOMED, ICD10, RXNORM, and CPT, etc) that we were interested in and therefore, it generated files in a directory.
  2. At Technosoft using the cTAKES dictionary creator, we selected this directory as an input directory and it created a SQL script. Now the problem with the generated script was that it was meant to be run on an HSQLDB database, which is an in-memory database.
  3. In order to convert the same SQL script into another SQL database-compliant script, we had to use another tool.
  4. After we had created an SQL script and run it on our own database, we made a configuration change to point to our own database.
  5. We observed that the codes were being returned from our database (as expected) in the CAS Visual Debugger when we processed some clinical notes. Therefore, cTAKES was able to convert non-structured data into structured data.

Casual Visual Debugger for cTakes

Implementing REST APIs on top of cTAKES

cTAKES doesn’t expose itself as a RESTful service and rather includes a set of GUI applications, so we had to create a Spring Boot application that exposed an API for processing clinical notes. The API used the libraries (that are part of cTAKES) in order to process clinical note that was sent in the request payload and returned a customized response back to the client.


Response times are directly dependent on the length of content in the raw unstructured note. In our case the API was taking from 30 – 45 seconds to respond back with structured data.

Memory Intensive

The issue that we were facing with cTAKES was that it consumed a lot of memory of about 2.5 GB.

We found out that cTAKES uses an in-memory database for custom dictionaries. To solve this problem, we used another tool to tailor the SQL script towards PostgreSQL and made a relevant configuration change to point to the database. But doing this only saved us about 500MB of memory and it was still using about 2GB worth of memory to function.

Aside from these small issues, overall cTakes produced very impressive results and were able to intelligently spot and separate discrete values from nonstructured data. Technosoft NLP and NLU team is currently using cTakes for customer use and we are very excited to put this tool to enhance patient care.

Screencast for demonstration of the API

A screencast has been created to demonstrate the API and its functionality. Here is the video below:

The response returns four different root nodes representing conditions, medications, observations and procedures. For every array, it can be observed that there are nearly multiple codes (from different vocabularies) available in every element. The “begin” and “end” nodes represent the indices of the text which is referred to by the element. And there are also some important nodes: “confidence”, “polarity”, and “uncertainty”. It is upon the consumer of this API to decide how to use these nodes and decide if the “confidence”, “polarity”, and “uncertainty” values are above the threshold values in that specific case.

So with the above information, you can see how easy it is to use cTAKES with Technosoft’s Spring boot application to transform clinical notes into actionable discrete values to be consumed by any health information system.

This technology can be further integrated into Alexa and/or other NLP (Natural Language Processing) engines. Instead of a clinical note, a provider can input the clinical information via their voice that can be converted into text and other clinical values that can be fed into an appropriate application to process.

Starting any Healthcare Integration Project? Get Your questions answered in a Free 30 minutes consultancy!

Contact Us