General Application SpecificationBasic Requirements
Readware requires a Windows or Linux based computer with memory and storage sufficient to store the texts, messages or documents to be analyzed, classified and indexed (see below) along with a suitable use case. While a majority of customers apply Readware in straightforward search applications, the use case for this technology exceeds search and retrieval and content management and extends to topic, issue and trend analysis, learning, filing, routing and classification applications.
Because the use case extends to more intelligent uses of computers, we will include a short description of the operational theory of Readware along with a description of programmatic resources (ConceptBases, files called cultures) that support more comprehensive categorization functions Together with the software architecture and hardware requirements, this constitutes the general specification of Readware software.
Programmatically, Readware parses strings of text of any size. Personal and proper names are distinguished from numbers and other words and concepts. Certain everyday words (we call concepts) are taken to signify abstract intersubjective objects in a systematic and multi-dimensional model of intersubjective meaning. This model interprets how these objects operate and fit with other representations (names and number patterns) and how they interconnect with the existential phenomena in the world to which they refer. The effect of this capacity can be illustrated in the following comparison of the ways people, readware and search engines typically parse or (read) a text.
Aptitude for Text Understanding and Comprehension
|A text is parsed into names, symbols and similar referents that represent things and conditions relative to like minds.||A text is parsed into names, symbols and similar referents that represent things, cognitive operations, boundaries and conditions, generally known to human minds.||Some artificial learning of similarity from patterns may be performed in some cases. In any case, a text is parsed into tokens and such tokens are stored in an index in order to search for such tokens over a number of similarly indexed tokens.|
|Passes scholastic aptitude tests||Passes scholastic aptitude tests|
|Picks out entities, number patterns and topical referents and identifies subject matter.||Recognizes (picks out) entities, number patterns and topical referents and identifies subject matter.||Can create a bag of words representing the text. A few systems identify named entities from a list and even fewer recognize noun phrases.|
|Can compare and relate topics and subjects to other knowledge and to other texts.||Recognizes comparable relationships from concepts, topics and subjects of a text to other knowledge and to other texts.||Not many comparisons possible.
No relationships recognized.
Some artificaial approaches can classify a text as being similar or different from other texts (with limited accuracy) according to a bag of words from both texts.
Readware methods use abstraction and the capacity to pick out the referents of symbols using one or more ConceptBases and one or more cultures (see below). These references are used by Readware algorithms to recognize intersubjective, characteristic and contextual relationships between named entities or place names, concepts from the ConceptBase, regular word forms, phrases and idioms, and dates, prices and other number patterns.
Predefined Reference Base-- The Readware ConceptBase
In order to augment Readware's capacity to correctly pick out the referents of the symbols it processes, and to counter the effects of language change, we chose a set of several thousand root words from an Ancient language to serve as a semantic cover for a human society or culture. A semantic cover can be defined as the range of words and definitions in common and everyday use in a society. To quantify the semantics of the chosen words, we measured the fidelity of their interconnections (according to semantic theory) and stored the results in a table we called a ConceptBase.
We then manually assigned tens of thousands of words from English, French and German to the set of original root words we chose from Ancient Arabic. The root forms we chose have a written history of more than 2500 years of regular use and "meaning" apparent to all cultures as evidenced by the existence of translatable words.
This is how the symbols encountered in an English or German language text have their meanings interpreted and their referents recognized by Readware programs operating on computer hardware. For Readware algorithms, the meanings are generated by the more formal correspondence between phonetic signs and abstract objects of interprocess control. The symbols composed from these signs are grounded in their (qualitative) connections to external referents (given by word-structure via the ConceptBase) and their corresponding interconnection to more formally defined (quantifiable) abstract objects.
The extensible English, French and German language ConceptBase informs Readware search processes. In programmatic ways, a Readware ConceptBase is similar to Word-Net except that words are not organized by parts of speech, they are organized by the theoretical axioms of Readware that operate on the underlying abstract objects of intersubjective and interprocess control.
Software ArchitectureWe wrapped the formal model with an ANSI C based parsing, indexing and search engine that loads one or more ConceptBases and extensions and compiles multiples of text objects into collections of conceptual and topical maps (called signatures) for search and analysis. A ConceptBase and its extensions define a memory-based search space.
The Readware compiler and search engine are available for Windows XP, 2000 and Vista; Linux and Solaris operating systems. Readware software can be ported to any OS supporting an ANSI C platform. See below for specific hardware requirements.
The inputs are texts and queries.
Texts can be input in plain text, Microsoft Office, Adobe PDF and standard HTML or derivative DHTML and XML formats; other formats can be supported through an open document interface. A data interface for accessing fields and records is also available.
Example Use Cases
Immediate Identification and ClassificationReadware can be implemented to read a single text like a web page, in which case, the input is a URL and a type of Readware questionnaire written in a text file that is called a culture. A Readware Culture is the specification of a list of topics and classifiers that may be used for classifying a broad range of topical interests. A Readware Topic is composed of inquires designed to find instantiations of facts that support or refer to the specified topic or classifier. A Readware Classifier is a weighted topic or named group of topics. With training, Readware Cultures can be authored by users of the software.
Readware algorithms apply all the queries of the applied cultures to determine which topics and classifiers score (are in agreement, have hits) and which do not. One or more cultures may be applied to any text.
The output is a ranked list of Readware Classifiers and Readware Topics (from the applied culture or cultures) that are relevant to the text of the web-page at the URL provided.
Readware Cultures are applied at indexing time, or the time Readware algorithms first encounter a text. The results may be stored in a data (signature) file and indexed for exchange on computers, or they may be used in real-time. The outputs (results) can be organized into taxons, directories or categories. Queries can also be presented and run ad hoc (for individual searches) using the Readware Query processor (part of the ReST-style IpServers).
Retrospective SearchThe Readware Analyst processor (part of the ReST-style IpServers) indexes text or fields of text into collections where the title, body and designated fields of documents are searchable using the Readware Query processor.
The outputs of the Readware Compiler (Analyst) is a compact database of the semantic signatures of texts and an index over the entire collection of texts. The Analyst can also output a ranked list of classifiers and topics for a text input or a single document.
The outputs of a search over a collection of text items are ranked lists of relevant files, URL's , or database records, each with a list of one or more contextual hit spots (each one accompanied by a byte-offset and a length).
The output of a query on a single document or web page URL is a ranked list of classifiers and topics or an XML rendition of the Readware identified data types (including names,numbers, concepts and topics) of the document and the byte offset and length of any relative hits in the text itself.
In any case, this automatically generated Readware meta data can be loaded into computer memory for further processing, e.g. search, retrieval, .and higher orders of analysis.