Data Set
The following describes the data sets that will be used throughout this investigation.
Glass identification
Vina performs a comparative study based on: rules, Beagle, the algorithm of the nearest neighbor and the discriminant analysis. Beagle is a product available through VRS Consulting, Inc. It is determined whether the vessel was of a "float" type glass or not.
The study of the classification of glass types was carried out through criminological research.
Data SetCharacteristic:
Multivariate
Number ofInstances:
214
Area:
Computer
AttributeCharacteristic:
Real
Number of Attributes
10
Associated Task:
Classification
Missing Values?
N/A
Attributes info:
Identifier: 1 to 214
RI: Refractive index
Na: Sodium
Mg: Magnesium
Al: Aluminum
Yes: Silicon
K: Potassium
Ca: Calcium
Ba: Barium
Fe: Iron
Type of glass:
building_windows_float_processed
building_windows_non_float_processed
vehicle_windows_float_processed
vehicle_windows_non_float_processed (none in this database)
containers
tableware
headlamps
Ecoli
The data set created by Kenta Nakai by the Institute of Cellular and Molecular Biology, presents a knowledge base that contains the locality of proteins in various sites.
Data SetCharacteristic:
Multivariate
Number ofInstances:
336
Area:
Life
AttributeCharacteristic:
Real
Number of Attributes
8
Associated Task:
Classification
Missing Values?
N/A
Attributes info:
Sequence name: Access number of the SWISS-PROT database
mcg: the McGeoch method for recognition of the signal sequence.
gvh: von Heijne method of signal sequence recognition.
labio: signal peptidase II Result consensus sequence von Heijne. binary attribute.
chg: The presence of charge in N-terminal of the predicted lipoproteins. binary attribute.
aac: Score of the discriminant analysis of the amino acid content of the outer membrane and periplasmic proteins.
alm1: ALOM membrane score encompassing the region prediction program.
alm2: ALOM program score after excluding the putative scissile signal regions of the sequence.
Iris Data
This is perhaps the most well-known database found in the pattern recognition literature. Fisher's article is a classic in the field and is often referred to as this. The data set contains 3 classes of 50 cases each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; The latter are not linearly separable from each other.
Data SetCharacteristic:
Multivariate
Number ofInstances:
150
Area:
Life
AttributeCharacteristic:
Real
Number of Attributes
4
Associated Task:
Classification
Missing Values?
N/A
Attributes info:
Sepal length in cm
Separate width in cm
Petal length in cm
Petal width in cm
Class
Irisi setosa
Irisi versicolor
Iris virginica
Last updated
Was this helpful?