![]() |
AML functions by module |
|
|
|
|
4 File syntax
4.7 STATAn ASCII file format is defined for each of the following object type of the STAT module: COMPOUND
CONVOLUTION DISTRIBUTION, RENEWAL HIDDEN_MARKOV HIDDEN_SEMI-MARKOV HISTOGRAM MARKOV MIXTURE SEMI-MARKOV SEQUENCES TIME_EVENTS TOPS TOP_PARAMETERS VECTOR_DISTANCE VECTORS 4.7.1 type COMPOUNDA compound (or stopped-sum) distribution is defined as the distribution of the sum of n independent and identically distributed random variables Xi where n is the value taken by the random variable N. The distribution of N is referred to as the sum distribution while the distribution of the Xiis referred to as the elementary distribution. Consider the following example:
COMPOUND_DISTRIBUTION
The first line gives the distribution type. The parametric sum distribution and the parametric elementary distribution are then defined in subsequent lines according to the syntactic form defined for the type DISTRIBUTION. 4.7.2 type CONVOLUTIONThe distribution of the sum of independent random variables is the convolution of the distributions of these elementary random variables. Consider the following example:
CONVOLUTION 2 DISTRIBUTIONS
The first line gives the distribution type and the number of elementary distributions (2 or 3). The elementary parametric distributions are then defined in subsequent lines according to the syntactic form defined for the type DISTRIBUTION. 4.7.3 type DISTRIBUTION, type RENEWALThe available parametric discrete distributions are the binomial distribution, the Poisson distribution, the negative binomial distribution and the uniform (rectangular) distribution with an additional shift parameter which defines the lower bound to the range of possible values. The name of the distribution is first given, then the name of each parameter followed by its actual value as shown in the following examples:
BINOMIAL INF_BOUND : 2 SUP_BOUND : 5 PROBABILITY : 0.8
INF_BOUND and SUP_BOUND are integer-valued parameters while PARAMETER and PROBABILITY are real-valued parameters.
For every parametric distributions, the following constraint applies to the shift parameter:
For a BINOMIAL or a UNIFORM distribution, the following constraint applies to the parameters INF_BOUND and SUP_BOUND which define the range of possible values:
For a BINOMIAL distribution, the following constraint applies to the probability of 'success':
For a POISSON distribution, the following constraint applies to the parameter (which is equal to the mean):
For a NEGATIVE_BINOMIAL distribution, the following constraints apply to the parameters:
Pour une loi de type UNIFORM, les contraintes suivantes sur les paramètres doivent être respectées : A renewal process is built from a discrete parametric distribution (BINOMIAL, POISSON or NEGATIVE_BINOMIAL) termed the inter-event distribution which represents the time interval between consecutive events. Hence, the types DISTRIBUTION and RENEWAL share the same ASCII file format. 4.7.4 type HIDDEN_MARKOV
A hidden Markov chain is constructed from an underlying Markov chain and nonparametric observation (or state-dependent) distributions. Consider the following example: The first line gives the object type. The underlying Markov chain is then defined on subsequent lines according to the syntactic form defined for the type MARKOV. The observation (or state-dependent) probabilities relating the output processes to the non-observable state process are then defined. Since the process is 'hidden', at least one possible output should be observable in more than one state. 4.7.5 type HIDDEN_SEMI-MARKOV
A hidden semi-Markov chain is constructed from an underlying semi-Markov chain (first-order Markov chain representing transition between distinct states and state occupancy distributions associated to the non-absorbing states) and nonparametric observation (or state-dependent) distributions. The state occupancy distributions are defined as objects of type DISTRIBUTION with the additional constraint that the minimum time spent in a given state is 1 (INF_BOUND£1). Consider the following example:
The first line gives the object type. The underlying semi-Markov chain (embedded first-order Markov chain and state occupancy distributions associated to the nonabsorbing states) is then defined on subsequent lines according to the syntactic form defined for the type SEMI-MARKOV. The observation (or state-dependent) probabilities relating the output processes to the non-observable state process are then defined. Since the process is 'hidden', at least one possible output should be observable in more than one state. 4.7.6 type HISTOGRAMThe syntactic form of the type HISTOGRAM consists in giving, in a first column, the values in increasing order and, in a second column, the corresponding frequencies. If a value is not given, the corresponding frequency is assumed to be null. Consider the following example:
4.7.7 type MARKOV
Consider the following example of an homogeneous Markov chain: The first line gives the object type. Then, the number of states (between 2 and 15) and the order (between 1 and 4) are defined on the two subsequent lines. On the next lines, the initial probabilities and the transition probabilities are given. Since, the initial probabilities and the transition probabilities for a given memory constitute distributions, the elements of a line should sum to one.
It is also possible to define observation (or state-dependent) probabilities if each possible output can be observed in a single state. With this restriction, the state space corresponds to a partition of the output space and the overall process is a lumped process:
Consider the following example of a non-homogeneous Markov chain: The first line gives the object type. Then, the initial probabilities and the transition probabilities are given in the same way as for an homogeneous Markov chain. The non-homogeneous / homogeneous character is then defined state by state. In the case of a non-homogeneous transition distribution, the function pii(t) represents the self-transition in state i as a function of the index parameter t. The corresponding transition distribution defined in the transition probability matrix gives the relative weights of the probabilities of leaving state i.
For a MONOMOLECULAR function (
For a LOGISTIC function ( 4.7.8 type MIXTURE
A mixture is a parametric model of classification where each elementary distribution or component represents a class with its associated weight. Consider the following example: The first line gives the distribution type and the number of components of the mixture (between 2 and 4). The components are then defined on two lines, the first one giving the associated weight and the second one giving the definition of the elementary parametric distribution according to the syntactic form defined for the type DISTRIBUTION. The weights should sum to one. 4.7.9 type SEMI-MARKOV
A semi-Markov chain is constructed from a first-order Markov chain representing transition between distinct states and state occupancy distributions associated to the nonabsorbing states. The state occupancy distributions are defined as objects of type DISTRIBUTION with the additional constraint that the minimum time spent in a given state is at least 1 (INF_BOUND £1). Consider the following example: The first line gives the object type while the second line gives the number of states (between 2 and 15). The embedded first-order Markov chain is then defined on subsequent lines by its initial probabilities and its transition probabilities (note that, unlike for the type MARKOV, the order should not be specified). Since this embedded Markov chain represents only transitions between distinct states, the self-transitions (i.e. elements of the main diagonal) should be equal to zero except in the case of absorbing states where the self-transitions are equal to one (e.g. state 3 in the above example). The state occupancy distributions are then defined for each nonabsorbing state according to the syntactic form defined for the type DISTRIBUTION with the additional constraint that time spent in a given state is at least 1 (INF_BOUND £ 1). Like for the type MARKOV, observation (or state-dependent) probabilities can be defined in order to specify a lumped process (with the restriction that each possible output can be observed in a single state). 4.7.10 type SEQUENCES
The syntactic form of the type SEQUENCES is constituted of a header giving the number and the type of variables and of the sequence. Consider the following example of univariate sequences: The type STATE is the generic type. The character '\' enables to continue a sequence on the following line.
Consider the following example of multivariate sequences: The character '|' enables to separate successive vectors.
Consider the following example of sequences with an explicit index parameter of type POSITION: This explicit index parameter is given as a first variable and the other variables (at least one) should be of type STATE. The index values should be increasing along sequences and the sequence ends with a final index value.
The explicit index parameter of type POSITION can be replaced by inter-position intervals:
Consider the following example of sequences with an explicit index parameter of type TIME: The only difference with the explicit index parameter of type POSITION is that the index values should be strictly increasing along sequences and that no final index value is required.
The explicit index parameter of type TIME can be replaced by time intervals: 4.7.11 type TIME_EVENTSThe syntactic form of data of type {time interval between two observation dates, number of events occurring between these two observation dates} consists in giving, in a first column, the time interval between two observation dates (length of the observation period), in a second column, the number of events occurring between these two observation dates and, in a third column, the corresponding frequency. The time interval between two observation dates should be given in increasing order and then, for each possible time interval, the number of events should be given in increasing order. This is equivalent of giving successively the frequency distribution of the number of events for each possible time interval between two observation dates, ranked in increasing order. # frequency distribution of the number of events for an observation period of length 20
#frequency distribution of the number of events for an observation period of length 30
4.7.12 type TOPS
Consider the following example: The syntactic form of the type TOPS is a variant of the syntactic form of the type SEQUENCES. 'Tops' can be seen as sequences with an explicit index parameter of type POSITION. This index parameter represents the position of successive offspring shoots along the parent shoot and a final index value gives the number of internodes of the parent shoot. The second variable of type NB_INTERNODE gives the number of internodes of the offspring shoots.
The explicit index parameter of type POSITION can be replaced by inter-position intervals: 4.7.13 type TOP_PARAMETERS
A model of 'tops' is defined by three parameters, namely the growth probability of the parent shoot, the growth probability of the offspring shoots (both in the sense of Bernoulli processes) and the growth rhythm ratio offspring shoots / parent shoot. Consider the following example:
The following constraints apply to the parameters: 4.7.14 type VECTOR_DISTANCE
The parameters of definition of a distance between vectors are the number of variables, the distance type (ABSOLUTE_VALUE or QUADRATIC) if there is more than one variable, the variable types (NUMERIC, SYMBOLIC, ORDINAL or CIRCULAR), and eventually the weights of the variables (default behaviour: the variables have the same weight), and in the symbolic case, explicit distances between symbols (default behaviour: 0 / 1 for mismatch / match). Consider the following example: 4 VARIABLES DISTANCE : ABSOLUTE_VALUE VARIABLE 1 : NUMERIC WEIGHT : 0.4 VARIABLE 2 : ORDINAL WEIGHT : 0.2 VARIABLE 3 : SYMBOLIC WEIGHT : 0.2 4 SYMBOLS
4.7.15 type VECTORS
In the syntactic form of the type VECTORS, each row corresponds to an individual and each column corresponds to a variable. Consider the following example:
|