Enterprise Data Mining

| March 26, 2012

Enterprise Data Mining: A far easier, lower cost approach.

Unlike other data mining approaches, ai-browser learns the meaning of documents by generating a lightweight ontology – a dynamic file that describes every relationship between every data element. It detects keywords and their association words which provide context. The combination of a keyword and all the association words can be thought of as a coordinate (x,y0->T) where x is the keyword and y0->T is the series of association words for that specific keyword. The collection of these coordinates creates a topology for the document: G(V,E) where G is graph and V is the set of vertices (or nodes) represented by each keyword and E is the edge represented by the associations to the keyword.

ai-browser: A prototype for human-machine collaboration

For the past several months, we have been working on a new approach for text analytics and data mining. The idea is to create a tool that enables human-machine collaboration to quickly mine unstructured data to find the single best answer.

We now have a working prototype, called ai-browser, that solves knowledge management and data mining problems involving unstructured text. It combines natural language processing (NLP) and pattern recognition technologies to generate a precise knowledge representation graph.  Our team selected OpenNLP because it is open-source, easy to use and customize. We used the Topic-Mapper API to detect patterns within the text after it was pre-processed to isolate parts of speech. The system also allows users to use ontologies and/or reference documents to sharpen the results. The output is a graph that can be used in a number of ways with 3rd party products, such as:

  • Submission to search appliances like Google, Bing, Lucene, etc.
  • Analysis with modelling tools like Cytoscape, MATlab, SAS, etc.
  • Enterprise systems for reporting, knowledge management and/or decision support

This graph makes it easy to ask questions like, “Find me something like _______!” and get a very tightly clustered group of results – rather than millions of hits.

Even more impressive, ai-browser’s graph is a powerful tool that can be applied to a wide range of applications, such as:

  • Healthcare – clinical decision support systems to enable physicians to make better decisions by understanding all the relevant information held in electronic medical records (EMRs) – including emerging trends and relationships within the patient population.
  • Social media – detecting and tracking sentiments in conversations over time (such as Twitter) to understand how brands are perceived by customers.
  • Innovation management – discovering the relationships of information across disciplines to foster more productive collaboration and interdisciplinary discoveries.
  • Information comparison and confirmation – determine the similarities and differences between two different sources of content.
  • Human resources – sourcing and placement of the best candidate for a job based on previous work experience.

The intent of the ai-browser design is to provide a starting point for developers to build solutions to meet the specific needs of enterprise customers. For example, modifying the system enables solutions to the following use cases:

  • Help a physician determine if additional tests are necessary to confirm a diagnosis.
  • Determine how perceptions about a brand are change through conversations on Twitter.
  • Find new uses for a drug by reviewing clinical studies published on PubMed and determining if there are relevant patent filings.
  • Identify stock market trading opportunities by comparing news feeds and SEC filings on a particular company or industry.
  • Finding the best person for a job by searching the internet for someone that is “just like person who has this job last year.”

Category: Uncategorized

About the Author ()

Comments are closed.