These would not have been possible without the collaboration and encouragement of the
large number of exceptionally talented and enthusiastic students, colleagues, mentors and others
who co-authored many of the resulting publications.
In the early 1980s, the Layered Data Placement Problem and several related problems were shown to be NP complete. Methods for solving the Layered Data/Object Placement Problem, include: the novel 'greedy', linear time, graph-collapsing method, the probabilistic hill-climbing method of Simulated Annealing, and a method based on Genetic Adaptation
The hash trees method of external hashing is known to have advantages for certain types of primary key distribution. The value of the method as a general indexing technique - for secondary keys as well as primary keys – was assessed, and comparison with the better-known B-trees method and with other members of the class of external hashing schemes. Attention was focused on the key space compression algorithms, storage utilization and sequential and direct access performance.
Programming a computer proved to be a useful vehicle for the study of this aspect of AI / problem-solving because the procedures could be assessed automatically.
Many studies were conducted in this area, and applications from Medicine to Transportation, and from Education to Engineering were addressed. This led to many developments and experiments. For example, the schema and component architectures of a prototype multidatabase management system, EDDS, sharing many features with other distributed database management systems, but with some novel features, such as the gateway feature which allows personal computers to share distributed database functionality, were found to be particularly attractive in many of the specific real-world scenarios for which the system was designed. The system was written in C on Unix, and in PASCAL on DEC VMS, and several PC ports were implemented.
The experience gained in such work led to co-authoring a widely-referenced book. It adopts a practical approach, reviewing the fundamentals of database technology and developments in data communications as at the early 1990s, before reviewing the principles of distributed DB systems, and it includes case studies of the leading products at the time.
In the late 1980s, a method for pragmatic estimation of join sizes and attribute correlations was presented. Attribute value distributions in database relations were modelled for the purpose of obtaining accurate estimates of intermediate relation sizes during query evaluation. The basic idea the uniformity assumption, taken as an estimation technique, holds for partitions of the key space, hence the name piecewise uniform, and it was implemented in two multidatabase management systems and extended to the modelling of important intra-relational attribute correlations.
A new classification criterion for buffer management algorithms for relational DBMSs, based on how the knowledge of query reference behaviour is reflected in buffering policies. An algorithm named RESBAL exploits parallelism in order to offer high performance for query execution, exploiting the use of prior knowledge of query reference behaviour. It was evaluated in a parallel database system environment based on an Inmos transputer architecture. A parallel database machine using a transputer network was designed and implemented, and the system used a communicating process model in both the implementation of database operations and in the provision of system services.
To handle temporal data, the prototype Temporal Data Management System, TDMS, used an approach to the modelling of temporal semantics which is pragmatic because it takes into account practical issues such as the time/space trade-offs. We introduced a conceptual framework which permits issues such as handling schema anomalies due to updates, query validation and correct semantics modelling to be handled in a consistent and usable manner. The time values of each of three `data constructs' were organised in a hierarchy which was combined at the implementation level with a well-established multi-database architecture to give a comprehensive and efficient temporal model which overcame several weaknesses of temporal models that existed at the time.
To provide rapid access to dispersed text and image information which are typically heterogeneous, one of the proposed architectures provided system support by using a distributed index approach, based on previous work on multidatabases, where the images indexes are held at the local sites. In another approach a global image index provided the list of available images within the network. Accessing huge collections of images from the World Wide Web (WWW) based on image content led to the proposal, in the late 1990s, of a distributed content-based image query system (DCBIQ) integrating knowledge from image processing, semantic descriptors, multi-agents, and WWW navigation. By extracting the texture, colour, shape features and spatial relationships of the image semantic descriptor models, along with constraints, desired images were sought on the WWW.
The Dempster-Shafer theory of evidence gives a solid basis for reasoning about situations characterized by uncertainty. Several contributions to the theory, computational methods and applications have been made over many years. To take an example, the theory was generalised to cover general Boolean algebras including collections of objects such as propositions. A particularly interesting case is where the objects of interest are geometric forms, and the Boolean algebra of geometric shapes can then be used directly in evidential reasoning. Medical and other fields can benefit from this approach.
Experience shows that different text classification methods can give different results. There is a way of combining the results of two or more different classification methods using an evidential approach. Some of the specific methods experimented with include the support vector machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods, but the analysis and methods apply to any methods. The combination could be done using evidential operations and that using only two focal points in the mass functions and this gives good experimental results. However, there are conditions under which more focal points should be used.
This novel method for evidential reasoning, RԐS, is very perspicacious in that decisions are made by trading off evidence items pro and con each contender conclusion. The novelty comes from the fact that it does not use numbers to indicate evidence strengths, as in the Dempster-Shafer theory; simple comparisons of evidential supports for alternatives is the principle adopted. Evidence strength is associated with an evidential support relationship (an argument) between a pair of statements and such strength is carried by comparison between arguments, rather than having evidence strength associated with a single statement represented numerically. This makes the reasoning easy to follow by domain experts.
Here Evidence E6 supports Hypothesis H2 with strength A6 - perhaps greater than the support, A3, of E3 for H1.
Parallel processing can play an important role in the efficient processing of recursive queries, and the implementation and the run-time performance of these strategies in a pipelined evaluation strategy on transputers. A wide range of queries, database structures and architectural configurations were considered as benchmarks in this study. The performance is studied in terms of both speed-up factors and communication costs. In one proposed strategy the query is compiled into a canonical form, and then the derived form is evaluated on a generalised pipe. The strategy is potentially applicable to any kind of recursive queries defined over the domain of function free Horn clauses. By pipelining the evaluation process on a parallel architecture, and restricting access to only the data which directly contribute to answering the query, performance improvements were possible.
Several Machine learning methods have been extensively studied, such as KNN, SVM, Reinforcement, and Neural Networks. Two studies are sketched here. The problem of learning Bayesian network structures (powerful knowledge representation and ’uncertain reasoning’ tools under conditions of uncertainty) from data was approached using an information theoretic dependency analysis approach. Based on a three-phase construction mechanism, two efficient algorithms were developed. One deals with a special case where the node ordering is given, the algorithm only require a relatively small number of tests and is correct given that the underlying model is ‘DAG-Faithful’. The other algorithm deals with the general case and requires a remarkably small number of conditional independence tests. It is again correct given that the underlying model is monotone DAG-Faithful. An award-winning system based on these algorithms was developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable.
Measuring the degree of inconsistency of a belief base is an important issue in many real-world applications. Deriving syntax sensitive inconsistency measures from its minimal inconsistent subsets is a natural way forward. We proposed a normalized framework for measuring the degree of inconsistency of a belief base which satisfies all the properties deemed necessary by common consent to characterize an intuitively satisfactory measure of the degree of inconsistency for belief bases. The problem of inconsistency handling in description logics has attracted approaches based on existing techniques for inconsistency management. Two revision operators in description logics are defined – a weakening-based revision operator and its refinement. The logical properties of the operators have been analysed, and an algorithm been proposed to handle inconsistency in a stratified description logic knowledge base.