Interview with Jose Gonzalez, CEO of MeaningCloud, a text analytics company based in New York City.
Hi Jose, what is your background and what are your responsibilities in your current role at MeaningCloud?
My background is in the field of Artificial Intelligence. I hold a Ph.D. from the Technical University of Madrid, where I entered as an assistant professor and researcher at the AI lab of the School of Telecommunication Engineering in 1985.
Years later, in 1998, I founded my first startup, Daedalus, along with other colleagues. We were developing and struggling to sell AI solutions twenty years ago, mainly in two areas, natural language processing and data mining. A good part of our activity consisted in developing our own technology, with the financial support of national and European research programs.
By then, we were dedicating 25% of our revenue to R&D. However, marketing and selling these solutions was tough. The game changed for us when we started deploying our text analytics solutions as a SaaS business on top of AWS in 2011.
Finally, in 2015 we decided to create a new company (MeaningCloud), incorporating new investors, merging Daedalus and starting a subsidiary in the US. My role as CEO of MeaningCloud involves managing every area of the company, from the technical product roadmap and HR, to business development and to finance.
What differs MeaningCloud from other text analytics companies?
There are a few differentiating elements in our offering; the first one is our deep semantic approach to truly “understand” and interpret any piece of text, extracting not only facts and sentiments, but relationships, beliefs, desires, and intent. It means that we rely on a linguistic approach, complemented with machine learning (including deep learning) to build base models and to generate candidate rules for human curation. This linguistic approach is essential to work in high-value information discovery scenarios, where precision is a must.
The second differentiator is what we call “vertical packs.” It means off-the-shelf industry-oriented solutions to address typical business or industry use cases.
The third one is customization; in Text Analytics, one size does not fit all. Therefore, we empower our customers to add their own dictionaries, classification schemes, and sentiment analysis rules.
MeaningCloud is originally a Spanish company, but opened an office in the US a few years back. How has that affected your business?
Three years after incorporating MeaningCloud in the US, we are getting 80% of our revenue out of Spain. Our most valuable customers are in the US. The movement has deeply affected every aspect of our work, starting with the motivation and the renewed ambition of our team, who feel like they are playing in a different league. We have made a special effort to recruit people from abroad, almost reaching 25% of non-Spanish nationals in the company.
What are your greatest challenges ahead for MeaningCloud when it comes to serving your customer analysis and developing your offer?
Our most valuable customers look for the extraction of very specific insights from any information source. The ability to develop tools to carry out this process for a particular purpose, with the required coverage and precision, and within acceptable time and costs, is our most important challenge today.
You work a lot with the pharmaceutical industry; can you please share what you do for them and how their needs differ from other industries when it comes to text analysis?
In pharma and healthcare, we address some general problems from the vantage point of having integrated and developed along the years a good amount of multilingual resources (medical terminology, thesauri, clinical codes) and tools to understand the health language. For instance, we have in place market intelligence solutions to unveil opportunities and threads in real time from digital sources.
A second area is pharmacovigilance (also called drug safety), the practice of monitoring the effects of medical drugs after they have been licensed for use, especially in order to identify and evaluate previously unreported adverse reactions. We apply text analytics to identify episodes of interaction between drugs, adverse effects, etc., from reported cases, specialized forums or scientific literature.
The third area is what we call “Voice of the Patient” analytics, a specialization of the more classic “Voice of the Customer” analytics, that we have been carrying out in retail, banking or telecom industries.
A promising new area that is currently under development is around “Real World Evidence.” RWE is information on healthcare that is derived from sources outside clinical research settings (the clinical trials carried out to obtain drug approval), including electronic health records (EHRs), claims and billing data, product and disease registries, and data gathered through personal devices and health applications.
Automatic analysis of such sources allows us to know how specific drugs perform within different population groups, in patients showing differences in disease severity conditions that require other medications, in long-term treatments, etc.
How has your clients’ perception about what text analysis can do for them changed over the years?
In the past, it was difficult convincing our customers in business areas about the effectiveness and integrability of the technology. On the other side, our customer’s IT departments were afraid of integration risks and costs. This situation changed utterly years ago with the availability of SaaS solutions. These days, our business customers play with our text analytics functions inside their own Excel spreadsheets, and technical users just call our APIs in their software environment seamlessly, whatever it is.
Have you recently, or are you about to, release any new technology-based solutions that will add or improve services for your clients? If so, what solutions, and how will your customers benefit from them?
We follow a roadmap for continuous improvement of the functionality and usability of our technology. Last month, a new API was added to our offering, the “Deep Categorization API.” It is a solution for assigning one or more categories to a text by finding snippets that match advanced semantic patterns and contexts expressed in a powerful (but simple) language made with macros and rules.
This technology has allowed us to market new services, our vertical packs. Vertical packs are solutions intended for specific industries. The first four packs are for the analysis of the Voice of the Customer (including different flavors for the retail, banking and insurance scenarios) and for Voice of the Employee analytics.
Regarding languages, in a few weeks, we will be publishing the Nordic Package, to add Danish, Finnish, Norwegian and Swedish to our current language offering. Chinese, Hindi, Arabic and Russian will follow shortly.
Finally, Summarization and Document Structure Analysis APIs will incorporate substantial improvements before the summer.
Is “fake news” a big issue for the text analysis that you do? If so, what are the challenges for the analysis you do for your customers, and how do you cope with them?
We all know how difficult filtering noise is, in general, in social media. This noise may appear in many different ways, vacuous, idiotic, fanatic, insulting, manipulative or merely as false messages.
However, this landscape does not differ too much from what happens already in offline media. Depending on the nature of our work and the purpose of a particular client, we may be forced to filter out some kind of noise, but we cannot tell, obviously, if an individual piece of news is truthful or not. The only means to do that involves analyzing the origin and the spreading mechanisms of information across networks, something that we are not currently doing.
Regarding this topic, I would first rely on education. As educated digital citizens, we should develop abilities to distinguish honest, reliable, sensible and relevant sources of information and opinion.
When it comes to the actual data behind the text analysis that you do, what kind of data or media can be interesting in the future that you don’t analyze today?
I would bet on Electronic Health Records. What we do now is on a minimum scale. On May 6th, the US National Institute for Health has launched the research program “All of Us,” whose aim is getting one million volunteers to contribute their physical, genomic and electronic health record data. It is the starting signal for the most relevant “Precision Medicine Initiative” so far. The analysis of the unstructured part of EHR will represent an essential contribution to advances in drug safety and effectivity.
How do you think the text analysis industry will change in the next 5 years, and what are the greatest challenges ahead?
The long-term challenges (beyond five years) have to do with our ability to interpret any communication act, such as discovering, reasoning and reacting on facts, beliefs, emotions, desires, intentions, and values of people and artificial agents. Despite the current hype on Artificial Intelligence, we are still far, far away from that goal.
How do you foresee the changes and developments for MeaningCloud over the next 5 years?
We will keep on following our dream, which is going deeper and deeper in extracting the meaning of all kinds of unstructured content. The next step will be a more powerful approach to the extraction of relationships from text. Stay tuned!
By Renata Ilitsky