Example of using CBR*Tools

4. Car Application

4.1.Car Application: Example use of the CBR*Tools framework

To demonstrate how the CBR*Tools facilitate the implementation of a Case-Based Reasoning system, we explain an example application, which calculates the risk factor of a car for an insurance company.

This example application uses only the Core level of CBR*Tools (i.e cbrcase, memory and reasoner).

In order to build this application, we need to create ten classes, whose code remains very declarative.

First, the framework of the application is presented, then the construction of the system of reasoning with CBR*Tools is explained. Lastly, samples of execution, as well as the source code are provided.

The objective of this application is to determine the factor of risk (integer ranging between -3 and 3) for a given car (case target) according to the tasks already carried out on other cars (case source). The CBR approach is based on the use of a case base in which a case associates the description of the characteristics of a car (formed problem of indices) with the factor of risk (solution).

A reasoning in three steps is then carried out, in which the phase of revision is not used (Figure 1). The phase of retrieve allows to identify a set of cases, the description of which is close to that of the current car and for which we want to determine the factor of risk. Two strategies for searching have been implemented in order to evaluate them: seek by closer neighbours, and seek as a combination of a filtering by hierarchy of prototypes and selection by closer neighbours. The phase of re-use takes into account the set of the cases found to calculate the factor of risk, by carrying out an average of the factors of risk assigned with the case sources balanced by their similarity. Finally, the phase of retain adds the new case into the memory, if it does not already exist a case too close to the target case, and if the case base does not exceed a given size.

Figure 1: Reasoning Cycle for the Car application.

Figure 1: Reasoning Cycle for the Car application

Although the reasoning suggested is realistic, it is not used to validate the results obtained. The purpose of the application is only to show how CBR*Tools facilitates the development of such a cycle of reasoning. The case base used in this prototype was already used to carry out comparisons and experiments in CBR. It is freely accessible on the following ftp server: ftp://ftp.ics.uci.edu/pub/ml-repos/machine-learning-databases/autos

4.2.How we constructed the reasoning system

We demonstrate the construction of our reasoning system and the use of the CBR*Tools in it, following the three axes of variability, namely representation of the cases (with 2 classes), organisation of the memory (with 3 classes) and management of the reasoning (with 4 classes).

Axes of Variability	Case (Representation of the cases)	Memory (Organisation of the memory)	Reasoning (Management of the reasoning)
Hot Spots
	CarCase	CarSimpleFileCaseBase	CarRetrieve
	CarCaseSituation	CarSimilarity	CarReuse
	-	CarIndexBase	CarRetain
	-	-	Car(CarReasonerFactory)

Table 1: Axes of Variability and Hotspots

The classes created for the application are the following:

CarCase.java
CarCaseSituation.java
CarSimpleFileCaseBase.java
CarIndexBase.java
CarNeuronIndex.java
CarReasonerFactory.java
CarRetain.java
CarRetrieve.java
CarReuse.java
CarSimilarity.java
CarsApp.java

4.3.Representation of the cases (first axis)

The representation of the cases is based on the data describing the 26 attributes as shown in Table 2:

Attributes	Values
1. Risk	integer: -3, -2, -1, 0, 1, 2, 3
2. Normalized-losses	integer: continuous from 65 to 256
3. Make	character: alfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar,mazda, mercedes-benz, mercury, mitsubishi, nissan, peugeot, plymouth,porsche, renault, saab, subaru, toyota, volkswagen, volvo
4. Fuel-type	integer: 1=diesel, 2= gas
5. Aspiration	integer: 1=std, 2=turbo
6. Num-of-doors	integer: 4=four, 2=two
7. Body-style	character: hardtop, wagon, sedan, hatchback, convertible
8. Drive-wheels	integer: 1=4wd, 2=fwd,3=rwd
9. Engine-location	nteger: 1=front, 2=rear
10. Wheel-base	real: continuous from 86.6 to 120.9
11. Length	real: continuous from 141.1 to 208.1
12. Width	real: continuous from 60.3 to 72.3
13. Height	real: continuous from 47.8 to 59.8
14. Curb-weight	real: continuous from 1488 to 4066
15. Engine-type	character: dohc, dohcv, l, ohc, ohcf, ohcv, rotor
16. Num-of-cylinders	integer: eight, five, four, six, three, twelve, two
17. Engine-size	real: continuous from 61 to 326
18. Fuel-system	character: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi
19. Bore	real: continuous from 2.54 to 3.94
20. Stroke	real: continuous from 2.07 to 4.17
21. Compression-ratio	real: continuous from 7 to 23
22. Horsepower	real: continuous from 48 to 288
23. Peak-rpm	real: continuous from 4150 to 6600
24. City-mpg	real: continuous from 13 to 49
25. Highway-mpg	real: continuous from 16 to 54
26. Price	real: continuous from 5118 to 45400

Table 2: Car Attributes

These basic attributes are then divided up into two groups: the solution (attribute risk) and the potential indices (all other attributes). According to the hot spots defined in CBR*Tools (Figure 2), we define two classes for the representation of the cases: the class CarCase (specialisation of the hot spot CbrCase) and the class CarCaseSituation (specialisation of the hot spot CompoundIndice).

Figure 2: use of the hot spots for the representation of the cases

The class CarCase inherits indirectly the class CbrCase through the abstract class AbstractCbrCase. By using the AbstractCbrCase class we reduce the coding of a case by managing the basic operations (assignment of an identifier to the case, link with the case base).

Similarly, the class CarCaseSituation inherits indirectly the class CompoundIndice through the class JavaClassIndice and the abstract class AbstractCompoundIndice. The class AbstractCompoundIndice provides the interface of a composite index, by using a a java class as a container of indices. (Figure 3)

Figure 3: representation of the cases

The code of the class CarCase contains the declaration of two attributes: one holding the factor of risk and one giving the indices:

   public classCarCase extends AbstractCbrCase { 
      public int risk = 0; 
      public CarCaseSituation indices = new CarCaseSituation(); 
      ...
   }

By using the class JavaClassIndice of CBR*Tools, the representation of indices is facilitated and it is also integrated into the Java programming language. The sub-indices of a composite index are then identified by attributes of a Java class. These attributes must be public so that they can be handled directly and effectively. However, the use of an accessor (public method of access with an attribute private or virtual) is completely possible and even necessary to represent calculated indices (though not used in this example).

  public class CarCaseSituation extends JavaClassIndice { 
     public int normalizedLosses; 
     public String make; 
     public float height; 
     public int numOfCylinders; 
     ...
  }

Moreover, the initialisation of the class allows the effective declaration of the indices and the specification of the type information: type (integer, real, character), interval for the integer and real attributes, domain of values for the types list of integers.

   static private int _cylinders [] = {2,3,4,5,6,8,12}; 
   static { 
      ArrayIndiceDescriptor desc = new ArrayIndiceDescriptor 
         ("cars.CarCaseSituation"); 
      desc.addIndiceDescriptor (new IndiceDescriptor 
         ("normalizedLosses", new RangeIntType(65,256))); 
      desc.addIndiceDescriptor (new IndiceDescriptor
         ("make", StringType.STRING)); 
      desc.addIndiceDescriptor(new IndiceDescriptor
         ("height", new RangeFloatType(60,75))); 
      desc.addIndiceDescriptor (new IndiceDescriptor 
         ("numOfCylinders",new ListIntType(_cylinders))); 
      ...
   }

4.4.Organisation of the memory (second axis)

The organisation of the memory consists in storing the cases in a case base and indexing these cases. Up to this point, three hot spots have been specialised (Figure 4): case base (with the class CarSimpleFileCaseBase), definition of a measurement of similarity (CarSimilarityclass) and constitution of the base of index (CarIndexBaseclass).

Figure 4: use of the hot spots for the organisation of the memory

4.4.1 Case Base (first hotspot)

In this simple example, we do not want to model the case base in sub-bases. We use directly the abstract class FileSimpleCaseBase of CBR*Tools to store the 205 cases available in a file. The file of the case base is then of the following form:

   CSV, Jeffrey C. Schlimmer, 19.05.1987, 1985 Auto Imports Database
   2,1,168,toyota,2,1,2,hatchback,3,1,94.5,168.7,64.0,52.6,2204.0,ohc,
      4,98.0,2bbl,3.19,3.03,9.0,70.0,4800.0,29.0,34.0,8238.0
   3,1,168,toyota,2,1,2,sedan,3,1,94.5,168.7,64.0,52.6,2169.0,ohc,
      4,98.0,2bbl,3.19,3.03,9.0,70.0,4800.0,29.0,34.0,8058.0
   37,1,?,porsche,2,1,2,hatchback,3,1,98.4,175.7,72.3,50.5,3366.0,dohcv,
      8,203.0,mpfi,3.94,3.11,10.0,288.0,5750.0,17.0,28.0,…

The first line of the file describes the metadata of the case base (format, name of the creator of the base, creation date, description). Then, each line represents the next case of the data separated by commas. The first data represents the unique identifier of the case, and the following ones give the values of the attributes in the order of Table 2. The attributes whose values are unknown in a case are marked by the sign "?". The management of this case base is done very easily by defining two methods declared as abstract in the FileSimpleCaseBase class: transformation of a line into a case (method stringToCase) for the reading, and transformation of a case into a line (method caseToString) for the writing. The other operations (opening, writing, access to the cases) are dealt with automatically by the super-classes.

4.4.2 Measure of Similarity (second hotspot)

The measure of similarity which will be used in the indexing is defined by the CarSimilarity class, which inherits indirectly the CompoundIndiceSimilarity class via the abstract class CompoundIndiceArraySimilarity. The latter establishes the basic operations of a composite similarity by using a table of sub-similarities. The code of the CarSimilarity class is completely declarative and consists of the instantiation of various objects, which provide the association between the indices and the elementary functions of similarity. First, the creator of this class allows the initialisation of the similarity with a function of aggregation per weighted average.

   public class CarSimilarity extends CompoundIndiceArraySimilarity { 
      public CarSimilarity () { 
         super(new MeanAggregationFct(),25);

Then, the structure of the measure of similarity, as well as the elementary functions of similarity are instantiated and associated with the indices.

         Class c = JavaClassIndice.getClass
            ("aid.cbr.samples.cars.CarCaseSituation"); 
         SimpleIndiceSimilarity floatAtt = 
            new SimpleIndiceSimilarity(new FloatDomainSimilarityFct()); 
         setSubSimilarity(CarCaseSituation.indexOf(c,"height"),floatAtt); 
         ...
         SimpleIndiceSimilarity intAtt = 
            new SimpleIndiceSimilarity(new IntDomainSimilarityFct()); 
         setSubSimilarity(CarCaseSituation.indexOf
            (c,"normalizedLosses"),intAtt); 
         setSubSimilarity(CarCaseSituation.indexOf
            (c,"numOfCylinders"),intAtt); 
         ...
         SimpleIndiceSimilarity eqsim = new SimpleIndiceSimilarity
            (new IdentitySimilarityFct
               (new CmpFactorValue(1), new CmpFactorValue(0))); 
         setSubSimilarity(CarCaseSituation.indexOf(c,"make"),eqsim);

For instance, an elementary function of similarity per symmetrical matrix is associated with the index bodyStyle. This function is defined to return the value 1 when the values of the indices to be compared are identical or 0 if one of them is not specified. Then, each couple of values is explicitly defined with the value of the corresponding similarity.

          SymetricExtensionSimilarityFct bodyStyleSim = 
             new SymetricExtensionSimilarityFct
                (new CmpFactorValue(0), new CmpFactorValue(1)); 
          bodyStyleSim.define("hardtop","convertible", 
             new CmpFactorValue((float)0.9)); 
          bodyStyleSim.define("hardtop","hatchback", 
             new CmpFactorValue((float)0.5)); 
          bodyStyleSim.define("hardtop","sedan", 
             new CmpFactorValue((float)0.5)); 
          setSubSimilarity(CarCaseSituation.indexOf(c,"bodyStyle"),
             new SimpleIndiceSimilarity(bodyStyleSim)); 
       } 
   }

4.4.3 Index Base (third hotspot)

For the indexing, we wanted to set up two strategies: the first uses an index by closer neighbours and the second combines an index by hierarchy of prototypes (filtering) with an index by closer neighbours (selection). The realisation of these two strategies is carried out in the CarIndexBase class, which inherits indirectly IndexBase via the abstract class BasicIndexBase , which establishes the basic management of the various strategies. The code of the CarIndexBase class is then reduced to the declaration and the configuration of the necessary indices.

   public classCarIndexBase extends BasicIndexBase { 
      CarIndexBase() { 
         // First strategy: knn 
         buildAndDeclareKnn(); 

         // Second strategy: prototype + knn 
         declarePrototype(); 
         declareKnnForPrototype(); 
         buildProtoKnn(); 
      } 
   ...
   }

More precisely, the implementation of the first strategy is carried out by the instantiation of the LinearKnnIndex index, having as parameters the ordering function, the measurement of similarity, and the number of cases returned at the time of a search. Other parameters are also specified: name of the index, comments, and value to be used in the event of unknown indices. Lastly, the index is added to the base.

   protected void buildAndDeclareKnn() { 
      LinearKnnIndex knnIndex = new LinearKnnIndex
         (new CmpFactorValueOrder(),new CarSimilarity(),5); 
      knnIndex.setName("CarCaseKnn"); 
      knnIndex.setComments
         ("This index uses Knn method to retrieve car cases"); 
      knnIndex.setUnknownCmpValue(new CmpFactorValue((float)0.5)); 
      addTopLevelIndex(knnIndex); 
   }

The second, more complex, strategy is still defined by simple declarations. First of all, the index by hierarchy of prototypes is built by instantiation of the PrototypeIndex index by indicating that each prototype of the hierarchy (leaves and nodes) must back up the cases which are compatible with it during the indexing (classification).

   protected void declarePrototype() { 
      PrototypeIndex protoIndex = new PrototypeIndex(); 
      protoIndex.setName("ProtoIndex"); 
      protoIndex.setAddCaseToAcceptedPrototypes(true);

Figure 5: prototype hierarchy for the second indexing strategy

Then the prototypes (Figure 5) are declared by using the anonymous classes (declaration of the class in the level of instantiation) of Java, which prove in this case to be very practical.

   Prototype p1 = new BooleanPrototype () { // root prototype 
      public CmpValue computeMatching(CompoundIndice indices) { 
         return CmpBoolValue.TRUE; 
      } 
   }; 
   p1.setName("Root"); 
   protoIndex.addPrototype(p1); 

   Prototype p2 = new BooleanPrototype () { // expensive cars prototype 
      public CmpValue computeMatching(CompoundIndice indices) { 
         int priceIndex = ((JavaClassIndice)indices).indexOf("price"); 
         if (!indices.isUnknown(priceIndex)) { 
            float price = indices.getFloatIndice(priceIndex); 
            return (price > 10000)
               ?CmpBoolValue.TRUE:CmpBoolValue.FALSE; 
         } 
         return CmpBoolValue.TRUE; 
      } 
   }; 
   p2.setName("ExpensiveCars"); 
   protoIndex.addPrototype(p2); 

   Prototype p3 = new BooleanPrototype () { // big cars prototype 
      public CmpValue computeMatching(CompoundIndice indices) { 
          int lengthIndex = ((JavaClassIndice)indices).indexOf("length"); 
          int widthIndex = ((JavaClassIndice)indices).indexOf("width"); 
          int heightIndex = ((JavaClassIndice)indices).indexOf("height"); 
          if ( !indices.isUnknown(lengthIndex) && 
          !indices.isUnknown(widthIndex) && 
          !indices.isUnknown(heightIndex)) { 
             float length = indices.getFloatIndice(lengthIndex); 
             float width = indices.getFloatIndice(widthIndex); 
             float height = indices.getFloatIndice(heightIndex); 
             return (length > 180 && height > 50 && width > 50)
                ?CmpBoolValue.TRUE:CmpBoolValue.FALSE; 
          } 
          return CmpBoolValue.TRUE; 
     } 
   }; 
   p3.setName("BigCars"); 
   protoIndex.addPrototype(p3); 
   // build the tree 
   p1.addSon(p2); 
   p1.addSon(p3); 
   protoIndex.setRoot(p1); 
   addIndex(protoIndex); 
   }

In a way identical to the first strategy, another index by closer neighbours is declared.

   protectedvoiddeclareKnnForPrototype() { 
      LinearKnnIndex knnIndex = new LinearKnnIndex
         (new CmpFactorValueOrder(),new CarSimilarity(),5); 
      knnIndex.setName("CarCaseKnnAfterProto"); 
      knnIndex.setComments("This index uses Knn method to retrieve car cases"); 
      knnIndex.setUnknownCmpValue(new CmpFactorValue((float)0.5)); 
      addIndex(knnIndex); 
   }

Lastly, the index by hierarchy of prototypes and the index by closer neighbours are combined thanks to a composite index instance of ToDynamicKnnIndex (Figure 6)

   protected void buildProtoKnn() { 
      ToDynamicKnnIndex index = new ToDynamicKnnIndex(); 
      index.setName("Proto+KNN"); 
      index.addIndex(getIndex("ProtoIndex")); 
      index.addIndex(getIndex("CarCaseKnnAfterProto")); 
      addTopLevelIndex(index); 
   }

Figure 6: structure of the second strategy for indexation

4.5.Management of the Reasoning (third axis)

The management of the reasoning requires the specialisation of four hot spots (Figure 7): phase of the reasoning (retrieve, re-use and retain) and the factory responsible for instantiating objects specific to the application.

Figure 7: use of hot spots for the control of the reasoning

4.5.1.Retrieve

The retrieve protocol is specialised with the CarRetrieve class which redefines the method updatedRetrieveResult to transform the results of the indices path into results of the retrieve phase: recovery of the case identifiers and transformation of the similarity of the cases found into a degree of utility for the phase of re-use.

   public classCarRetrieve extends BasicRetrievePattern { 
      protected void updateRetrieveResult
         (IndexResult result, RetrieveResult stepResult){ 
         List res= Assoc.getFactory().newListAssoc(); 
         res.addAll(((CaseSetIndexResult)result).retrievedCases()); 
         stepResult.setCases(res); 
         for (Iterator it = 
            ((CaseSetIndexResult)result).retrievedCases().iterator(); 
         it.hasNext(); ) { 
            Object id = it.next(); 
            float f = 
               ((CmpFactorValue)
                  ((CaseSimIndexResult)result).getCaseMatchingResult s(id)).getFactor(); 
            ((UsefulnessRetrieveResult)stepResult).setCaseUsefulness(id,f); 
         } 
      } 
   }

4.5.2.Re-use

The re-use phase is based on the adaptation schema per formula taking into account the whole set of the found cases. The protocol of this schema (abstract class CaseSetFormulaReusePattern) is specialised in the CarReuse class. First of all, the methods of reading and writing the solution of a case are specified, then the constructor initialises the object with a formula of adaptation per weighted average, while taking care to round the result which must be an integer ranging between -3 and 3 (to note, once again, the use of an anonymous Java class).

   public class CarReuse extends CaseSetFormulaReusePattern { 
      protected float getSourceCaseSolution(CbrCase c){ 
         return ((CarCase)c).risk; 
      } 
      protected void setTargetCaseSolution(CbrCase c, float sol){ 
          ((CarCase)c).risk=(int)sol; 
      } 
      CarReuse () { 
         setFormula( new MeanAdaptationFormula() { 
            public float compute(float[] solutions, float[] usefullness) { 
               float f = super.compute(solutions,usefullness); 
               return Math.round(f); 
            } 
         }; 
      } 
   }

4.5.3.Retain

The retain phase is implemented following the schema by chain of responsibility. The class CarRetain is then reduced to a simple declaration of two analysers: the first one examines whether the case base comprises more than 250 cases, and the second one checks if there is already a case in which the similarity is very close (above 0,95) to the current case targets. Thus, in a very flexible and simple way, the current case targets will be added and indexed in the memory if and only if these two conditions are not satisfied.

   public classCarRetainextends ResponsibilityChainRetainPattern { 
      CarRetain() { 
         RetainAnalyser a1 = 
            new CaseBaseSizeRetainAnalyser(250); 
         RetainAnalyser a2 = 
            new SimilarityRetainAnalyser(new CmpFactorValue((float)0.95)); 
         a1.setNextAnalyser(a2); 
         setAnalyser(a1); 
      } 
   }

4.5.4.Object Factory

The object factory is specialised to reference the classes of objects specifically created for the application.

   public class CarReasonerFactory extends AbstractReasonerFactory { 
      public CbrCase newTagetCase() { 
         return new CarCase(); 
      } 
      public CaseBase newCaseBase() { 
         return new CarFileSimpleCaseBase
           ("CarCaseBase","cars.txt"); 
      } 
      public Retrieve newRetrieveStep() { 
         return new CarRetrieve(); 
      } 
      public Reuse newReuseStep() { 
         return new CarReuse(); 
      } 
      public Retain newRetainStep() { 
         return new CarRetain(); 
      } 
      public IndexBase newIndexBase() { 
         return new CarIndexBase(); 
      } 
   }

4.6.Examples of execution

Two examples of execution of reasoning carried out in the CarsApp class are provided. This reasoning shows the use of two different configurations.

4.6.1.First Reasoning

In the first reasoning, we start the execution with the default settings. First of all, we instantiate a new controller of reasoning (variable reasoner) for the factory.

   ReasonerFactory factory = new CarReasonerFactory(); 
   Reasoner reasoner = factory.getReasoner();

We can then open the case base, and the automatic indexing of the set of the cases resulting from the base is carried out according to the two strategies of indexing presented.

   reasoner.getMemory().getCaseBase().open(); 
   reasoner.getMemory().buildFromCaseBase(new GlobalIndexParams());

Then, a new reasoning is launched after having instantiated and initialised a target case.

   CbrCase target = factory.newTagetCase(); 
   initialize(target); 
   Reasoning reasoning = reasoner.startReasoning(target);

The use of the object maintaining the state of the reasoning is thus very practical to carry out a report of execution (Table 3).

   printReport(reasoning);

Table 2: report of execution of the first reasoning

Table 3: report of execution of the first reasoning

4.6.2.Second Reasoning

In the second reasoning, we propose the use of the second strategy of indexing (hierarchy of prototypes and closer neighbours). We can also pass the maximum number of the cases found of initially 5, to 10. The report (Table 4) shows an execution time divided by two compared to the first reasoning for obtaining the same result.

   ((BasicRetrievePattern)reasoner.getRetrieveStep()).getDefaultParams().
setSelectedIndex
      (reasoner.getMemory().getIndexBase().getIndex("Proto+ KNN")); 
   ((KnnIndex)reasoner.getMemory().getIndexBase().getIndex
      ("CarCaseKnn AfterProto")).setK(10); 
   target = factory.newTagetCase(); 
   initialize(target); 
   reasoning = reasoner.startReasoning(target);

Table 4: report of execution of the second reasoning