Towards natural language based data/text mining and summarization via soft approaches

Objectives

The aim of the mini-symposium is to present and discuss an emerging, novel and promising paradigm of using natural language to formulate user intentions and interests related to, and descriptions of interesting patterns present in large numerical and textual data sets as well as descriptions characterizing large numeric data sets as a whole in a concise, user-understandable and human consistent way. This is a challenge triggered by a rapidly growing amount of available data of various forms and a limited time and capacity of a human user to comprehend and process them, and make some decisions based on them. The goal is to present a general view on this paradigm as well as to show some specific approaches that address some related issues.

Linguistic modeling has been considered promising, and has been closely related to fuzzy sets theory and fuzzy logic since the very beginning. Linguistic rules are at the heart of fuzzy modeling and control and a vast literature is devoted to the representation and use of knowledge expressed as such rules. Many methods have been proposed to form sets of linguistic rules for the purposes of control. Much less has been done in the area of linguistic modeling seen from the perspective of data mining. The most important contribution is here the concept of linguistic summaries of data proposed by Yager and then developed by Kacprzyk, Yager and Zadrozny. Other approaches to summarization of data based, e.g., on the concept of the attribute oriented induction were elaborated. Some versions of a related concept of fuzzy linguistic association rules have been proposed.

Textual documents processing, including document summarization, categorization and querying has a long tradition of research in the framework of information retrieval. This type of processing may be seen as a basic form of text mining which forms a background for a more advanced approaches such as information extraction which can turn a collection of documents into a form appropriate for the application of traditional numerical data mining tools. The area of textual information processing was also quickly recognized as amenable to the application of soft approaches due to the existence of many inherently vague and uncertain notions such as relevance of a document against a query, importance of a keyword for the representation of a document or the membership of a document to a given class (category) of documents. Thus, the use of soft modeling for the purposes of text mining, both in its more rudimentary and more advanced forms seems to be a prerequisite for a successful implementation of these techniques in real applications.

During the mini-symposium we will look for covering the whole area of linguistic data mining as well as the use of tools and techniques from computational linguistics, notably natural language understanding, processing and generation. The problems related to both numerical and textual data mining will be discussed.

Emphasis will be on use of fuzzy and possibilistic tools, notably those related to broadly perceived Zadeh's computing with words (and perceptions) paradigm to extend the existing traditional tools, and develop new natural language based tools in the area. The speakers will also use new methods of aggregation and fusion operators that will be appropriate for the new linguistic settings proposed.

Topics of interest

Using linguistic, both natural language and information retrieval based, tools and techniques combined with soft computing to:

linguistic numerical and textual data summarization
textual data classification
knowledge-based numerical and textual data mining

Target audience

Scholars, researchers and practitioners in data mining, information retrieval, natural language processing, data analysis, decision support, etc.

List of invited speakers

Janusz Kacprzyk & Slawomir Zadrozny (Systems Research Institute, Polish Academy of Sciences, Poland), "A perspective on the role of fuzzy logic and natural language processing in data mining and summarization"
Ron Yager (Iona College, USA), "Multi-source fusion of information"
Antoon Bronselaer, Guy De Tre, Dirk Van Hyfte & Saskia Debergh (Ghent University, Belgium), "Document clustering based on Concept-Relationship-Concept patterns: A step towards multi-document summarization"
Marek Reformat & Zhan Li (University of Alberta, Canada), "Importance of keywords based queries and their ontology enhancement"
Richard A. McAllister & Rafal A. Angryk (Montana State University, USA), "Taxonomic abstraction-based document representation and categorization"