IBM Corp.’s India Research Laboratory has developed technology for retrieval and integration of information from both structured and unstructured data.
The new technology integrates data that is currently stored in separate silos, Mukesh Mohania, lead researcher on the project, said on Wednesday. “We are leveraging the structured data by extracting the information from the unstructured repositories, and then providing rich business intelligence on this consolidated information,” Mohania said.
A research prototype of the technology has been deployed in the customer support operations of HDFC Bank Ltd, a large private bank in Mumbai. The prototype application integrates customer data from a structured database and business intelligence sources, with incoming information on customers from multiple sources, including e-mails and phone calls.
The lab decided to use the new technology at HDFC Bank to validate it in a real, operational environment, Mohania said. IBM announced Wednesday it will offer the technology early next year.
The middleware software developed on Java technology currently runs on Linux, but can be ported on to any Java compliant operating system, Mohania said. The technology can be used in other areas of an enterprise, besides customer relationship management, where there is both structured and unstructured data to be integrated and analyzed, he said.
By automatically combining structured and unstructured information, HDFC Bank can provide call center agents with a more complete history of all customer activity so they are aware of information that has already been shared through previous interactions, IBM said. The technology also enables HDFC Bank to generate new contextual and actionable insights that can be used to automate tasks, enhance up-selling and cross-selling opportunities, and improve agent performance, it added.
EROCS (Entity Recognition in the Context of Structured data) is a key component of the technology developed by the lab. EROCS addresses the problem of linking a document with related structured data in an external relational database. Partial information provided, for example, in e-mails from customers do not often allow identification of the entity, such as the customer, in the structured data, Mohania said. EROCS views the structured data in the relational database as a set of predefined entities, and identifies the entities from this set that best match the given document, he added.
A highlight of EROCS is that it identifies an entity even if it is not explicitly mentioned in the document, according to Mohania. It exploits the context information present in the document to match and identify the entities, Mohania added.
Another technology, called SCORE (Symbiotic Content Oriented information Retrieval) addresses the problem of consolidated querying of structured and unstructured data, using a type of contextual search. The application specifies its information needs using a SQL query on the structured data, and this query is automatically “translated” into a set of keywords that can be used to retrieve relevant unstructured data. At the core of this technology lies a technique for obtaining these keywords from not only the query result, but also from additional “neighborhood” related information in the underlying database, Mohania said.