1. Introduction¶

With the advancement of understanding our world, sometimes the simple structures used for its description like linear models or decision trees become staggeringly insufficient. The innumerable systems that our world encompasses are composed of interconnected parts, exhibiting imperceptible properties that interact with each other non-linearly. By having the capacity to change over time and learn from experience, such systems become complex and adaptive.

More than three decades ago, Holland proposed a conceptual rule-based system [1] comprising a set of “IF condition THEN action” rules covering different situations - calling it the cognitive systems [2]. From the holistic perspective, it can be viewed as a group of collaborating “agents” represented by a collection of simple rules. The rules are formed when interacting with the external “environment” and might take, for example, the following forms:

IF no cars on the street THEN walk forward,
IF stock share price is dropping for 3 days THEN buy,
IF colour of the mushroom is red THEN do not pick

The general idea is that seeking a single, omnibus and complex rule is less desirable than evolving a population of them to model the environment behaviour collectively. Such an idea gave rise to the concept of Learning Classifier Systems (LCSs) [3], introducing a new abstract term - a classifier, which encompasses a rule itself width additional statistics (such as its quality). Despite their somewhat misleading name, LCSs are not only systems suitable for classification problems but maybe instead viewed as a very general, distributed optimization technique.

They fit into a trend of the XAI (Explainable Artificial Intelligence) [4] and can be used in a variety of fields [5] like data mining [6] [7] [8] [9] [10] (discovering patterns in data), supervised or reinforcement learning (RL) tasks. Examples of such problems would include fighter aircraft manoeuvres [11], medical domains [12], robotic control [13] [14] [15] [16], game strategy [17], environmental navigation, modelling complex time-dependent systems (e.g., stock market) [18] [19] [20] or design optimization (e.g., engineering applications) [21].

The desired outcome after running an LCS is a set of interpretable classifiers being able to model an intelligent decision-maker collectively. Two biological metaphors - evolution and learning [22] are employed to accomplish this intention. A pair of internal mechanisms - the genetic algorithm and the learning mechanism embody them respectively by actively interacting with the outside environment, which in this work is considered as an independent source of data for an LCS algorithm.

At this moment, plentiful different LCS variations exist [23], but according to Holmes [24], the following four major components are considered universal:

a finite population of classifiers representing current knowledge of the system,
a performance component regulating the interaction between the environment and classifier population,
a reinforcement (or credit assignment) component distributing the reward from the environment to particular classifiers,
a discovery component using various operators to discover better rules and improve existing ones

However, this work is directed on a specific niche, called the Anticipatory Learning Classifier System (ALCS), capable of learning a generalized predictive model [25] of the environment online. In contrast to the traditional IF-THEN rule structure, they also have a state prediction or anticipatory part that predicts the environmental changes caused when executing the specified action in the specified context. Forming such an internal structure might facilitate the agent’s thought processes, such as planning or reflection, without any immediate behaviour. Thus, beliefs about the future control the decision-making process and behaviour in the present. This architecture also allows disambiguating perceptual aliasing problems, where the same observation is obtained in distinct states requiring different actions.

The capabilities of ALCS were exhaustively examined in environments with discrete and manageable observation spaces - such as navigating an agent in a maze or correctly determining an answer in a binary multiplexer problem [26]. No comparative research focused on the problems where the observation space contains the attributes expressed as real-valued numbers - for example, a car’s speed ranging from 0 to 200 km/h or a particular temperature range have not been performed yet.

This thesis demonstrates that certain families of ALCS can be adjusted to new problem domains. Despite the complicated and interconnected components hurdle, facilitating certain modifications allows them to be taking advantage of benefits associated with internal knowledge representation and prediction mechanisms. In his anticipatory systems book [27], Rosen goes one step further by putting the idea of anticipations in a mathematical framework and later identifying them as the essence of life [28].

In anticipatory systems, as I have defined them, the present change of state depends on a future state, which is predicted from the present circumstances on the basis of some model. Anticipatory, model-based behavior provided one basis for what I later called complexity, and which I defined in “Life Itself” on the basis of non-computability or nonformalizability of models of a system of this kind.

—Robert Rosen in [28]

While this work does not pursue such a big claim, it proves that a greater realm of possible problems can be expressed by ALCS frameworks, therefore benefiting from more comprehensive, intrinsic representation.

Real-valued Anticipatory Classifier System

Introduction

1. Introduction¶