With this in mind, the new version of the script (3.0.0+) was designed to be fully extensible: developers can write their own Data Types to generate new types of random data, and even customize the Export Types - i.e. The format in which the data is output. Random test data generator: Software developers or testers often need test data to test software, websites, populate databases content management systems etc. This tool generates up to 5000 records of test data containing one or more of the following data types.
RANDOM_DATA, a MATLAB library which uses a random number generator (RNG) to sample points for various probability distributions, spatial dimensions, and geometries, including the M-dimensional cube, ellipsoid, simplex and sphere.
Most of these routines assume that there is an available source of pseudorandom numbers, distributed uniformly in the unit interval [0,1]. In this package, that role is played by the routine R8_UNIFORM_01(), which allows us some portability. We can get the same results in C, FORTRAN or MATLAB, for instance. In general, however, it would be more efficient to use the language-specific random number generator for this purpose.
If we have a source of pseudorandom values in [0,1], it's trivial to generate pseudorandom points in any line segment; it's easy to take pairs of pseudorandom values to sample a square, or triples to sample a cube. It's easy to see how to deal with square region that is translated from the origin, or scaled by different amounts in either axis, or given a rigid rotation. The same simple transformations can be applied to higher dimensional cubes, without giving us any concern.
For all these simple shapes, which are just generalizations of a square, we can easily see how to generate sample points that we can guarantee will lie inside the region; in most cases, we can also guarantee that these points will tend to be uniformly distributed, that is, every subregion can expect to contain a number of points proportional to its share of the total area.
However, we will not achieve uniform distribution in the simple case of a rectangle of nonequal sides [0,A] x [0,B], if we naively scale the random values (u1,u2) to (A*u1,B*u2). In that case, the expected point density of a wide, short region will differ from that of a narrow tall region. The absence of uniformity is most obvious if the points are plotted.
If you realize that uniformity is desirable, and easily lost, it is possible to adjust the approach so that rectangles are properly handled.
But rectangles are much too simple. We are interested in circles, triangles, and other shapes. Once the geometry of the region becomes more 'interesting', there are two common ways to continue.
In the acceptance-rejection method, uniform points are generated in a superregion that encloses the region. Then, points that do not lie within the region are rejected. More points are generated until enough have been accepted to satisfy the needs. If a circle was the region of interest, for instance, we could surround it with a box, generate points in the box, and throw away those points that don't actually lie in the circle. The resulting set of samples will be a uniform sampling of the circle.
In the direct mapping method, a formula or mapping is determined so that each time a set of values is taken from the pseudorandom number generator, it is guaranteed to correspond to a point in the region. For the circle problem, we can use one uniform random number to choose an angle between 0 and 2 PI, the other to choose a radius. (The radius must be chosen in an appropriate way to guarantee uniformity, however.) Thus, every time we input two uniform random values, we get a pair (R,T) that corresponds to a point in the circle.
The acceptance-rejection method can be simple to program, and can handle arbitrary regions. The direct mapping method is less sensitive to variations in the aspect ratio of a region and other irregularities. However, direct mappings are only known for certain common mathematical shapes.
Points may also be generated according to a nonuniform density. This creates an additional complication in programming. However, there are some cases in which it is possible to use direct mapping to turn a stream of scalar uniform random values into a set of multivariate data that is governed by a normal distribution.
Another way to generate points replaces the uniform pseudorandom number generator by a quasirandom number generator. The main difference is that successive elements of a quasirandom sequence may be highly correlated (bad for certain Monte Carlo applications) but will tend to cover the region in a much more regular way than pseudorandom numbers. Any process that uses uniform random numbers to carry out sampling can easily be modified to do the same sampling with a quasirandom sequence like the Halton sequence, for instance.
The library includes a routine that can write the resulting data points to a file.
Licensing:
The computer code and data files made available on this web page are distributed under the GNU LGPL license.
Languages:
RANDOM_DATA is available in a C version and a C++ version and a FORTRAN90 version and a MATLAB version.
Related Data and Programs:
ASA183, a MATLAB library which implements the Wichman-Hill pseudorandom number generator.
BALL_GRID, a MATLAB library which computes grid points that lie inside a ball.
CIRCLE_GRID, a MATLAB library which computes grid points that lie inside a circle.
HISTOGRAM_DATA_2D_SAMPLE, a MATLAB program which demonstrates how to construct a Probability Density Function (PDF) from a frequency table over a 2D domain, and then to use that PDF to create new samples.
HISTOGRAM_PDF_SAMPLE, a MATLAB library which demonstrates how sampling can be done by starting with the formula for a PDF, creating a histogram, constructing a histogram for the CDF, and then sampling. Persecond 1 4 6.
HISTOGRAM_PDF_2D_SAMPLE, a MATLAB library which demonstrates how uniform sampling of a 2D region with respect to some known Probability Density Function (PDF) can be approximated by decomposing the region into rectangles, approximating the PDF by a piecewise constant function, constructing a histogram for the CDF, and then sampling.
RING_DATA , a MATLAB library which can create, plot, or save data generated by sampling a number of concentric, possibly overlapping rings.
Random Data Generator Number
SAMMON_DATA, a MATLAB program which generates six sets of M-dimensional data for cluster analysis.
SIMPLEX_COORDINATES, a MATLAB library which computes the Cartesian coordinates of the vertices of a regular simplex in M dimensions.
TETRAHEDRON_GRID, a MATLAB library which computes a tetrahedral grid of points.
TETRAHEDRON_MONTE_CARLO, a MATLAB program which uses the Monte Carlo method to estimate integrals over a tetrahedron.
TETRAHEDRON_SAMPLES, a dataset directory which contains examples of sets of sample points from the unit tetrahedron.
TRIANGLE_GRID, a MATLAB library which computes a triangular grid of points.
TRIANGLE_HISTOGRAM, a MATLAB program which computes histograms of data on the unit triangle.
TRIANGLE_MONTE_CARLO, a MATLAB program which uses the Monte Carlo method to estimate integrals over a triangle.
TRIANGLE_SAMPLES, a dataset directory which contains examples of sets of sample points from the unit triangle.
UNIFORM, a MATLAB library which samples the uniform random distribution.
Source Code:
- bad_in_simplex01.m is a 'bad' (nonuniform) sampling of the unit simplex.
- brownian.m creates Brownian motion points.
- dpo_fa.m factors a real symmetric positive definite matrix.
- dpo_sl.m solves a linear system factored by DPO_CO or DPO_FA.
- direction_uniform_nd.m generates a random direction vector.
- grid_in_cube01.m generates grid points in the unit hypercube.
- grid_side.m finds the smallest grid containing at least N points.
- halham_dim_num_check.m checks DIM_NUM for a Halton or Hammersley sequence.
- halham_leap_check.m checks LEAP for a Halton or Hammersley sequence.
- halham_n_check.m checks N for a Halton or Hammersley sequence.
- halham_seed_check.m checks SEED for a Halton or Hammersley sequence.
- halham_step_check.m checks STEP for a Halton or Hammersley sequence.
- halton_base_check.m checks BASE for a Halton sequence.
- halton_in_circle01_accept.m accepts Halton points in the unit circle.
- halton_in_circle01_map.m maps Halton points into the unit circle.
- halton_in_cube01.m generates Halton points in the unit hypercube.
- hammersley_base_check.m is TRUE if BASE is legal.
- hammersley_in_cube01.m generates Hammersley points in the unit hypercube.
- i4_factorial.m computes the factorial N!
- i4_modp.m returns the nonnegative remainder of integer division.
- i4_to_halton.m computes one element of a leaped Halton subsequence.
- i4_to_halton_sequence.m computes N elements of a leaped Halton subsequence.
- i4_to_hammersley.m computes one element of a leaped Hammersley subsequence.
- i4_to_hammersley_sequence.m computes N elements of a leaped Hammersley subsequence.
- i4_uniform_ab.m returns a pseudorandom I4 between A and B.
- i4vec_transpose_print.m prints an integer vector 'transposed'.
- ksub_random2.m selects a random subset of size K from a set of size N.
- normal.m creates normally distributed points.
- normal_circular.m creates circularly normal points.
- normal_multivariate.m samples a multivariate normal distribution.
- normal_simple.m creates normally distributed points.
- polygon_centroid_2d.m computes the centroid of a polygon in 2D.
- prime.m returns any of the first PRIME_MAX prime numbers.
- r8_acos.m evaluates the arc cosine function, with argument truncation.
- r8_normal_01.m returns a unit pseudonormal R8.
- r8_uniform_01.m returns a unit pseudorandom R8.
- r8mat_normal_01.m returns a unit pseudonormal R8MAT.
- r8mat_print.m prints an R8MAT, with an optional title.
- r8mat_print_some.m prints some of an R8MAT, with an optional title.
- r8mat_write.m writes an R8MAT file.
- r8mat_uniform_01.m returns a unit pseudorandom R8MAT.
- r8vec_normal_01.m returns a unit pseudonormal R8VEC.
- r8vec_print.m prints an R8VEC.
- r8vec_uniform_01.m returns a unit pseudorandom R8VEC.
- s_len_trim.m returns the length of a string to the last nonblank.
- scale_from_simplex01.m rescales data from a unit to non-unit simplex.
- scale_to_ball01.m translates and rescales data to fit within the unit ball.
- scale_to_block01.m translates and rescales data to fit in the unit block.
- scale_to_cube01.m translates and rescales data to the unit hypercube.
- stri_angles_to_area.m, computes the area of a spherical triangle;
- stri_sides_to_angles.m, computes the angles of a spherical triangle from its sides;
- stri_vertices_to_sides.m, computes the sides of a spherical triangle from its sides;
- timestamp.m prints the current YMDHMS date as a time stamp.
- triangle_area_2d.m computes the area of a triangle in 2D.
- tuple_next_fast.m computes the next element of a tuple space, 'fast'.
- uniform_in_annulus.m samples a circular annulus.
- uniform_in_annulus_accept.m accepts points in an annulus.
- uniform_in_annulus_accept.m samples an annular sector in 2D.
- uniform_in_circle01_map.m maps uniform points into the unit circle.
- uniform_in_cube01.m creates uniform points in the unit hypercube.
- uniform_in_ellipsoid_map.m maps uniform points into an ellipsoid.
- uniform_in_hexagon01.m uniformly samples from the regular unit hexagon.
- uniform_in_parallelogram_map.m maps uniform points into a parallelogram.
- uniform_in_polygon_map.m maps uniform points into a polygon.
- uniform_in_sector_map.m maps uniform points into a circular sector.
- uniform_in_simplex01_map.m maps uniform points into the unit simplex.
- uniform_in_sphere01_map.m maps uniform points into the unit sphere.
- uniform_in_tetrahedron.m maps uniform points into a tetrahedron.
- uniform_in_triangle_map1.m maps uniform points into a triangle.
- uniform_in_triangle_map2.m maps uniform points into a triangle.
- uniform_in_triangle01_map.m maps uniform points into the unit triangle.
- uniform_on_cube.m maps uniform points onto the surface of a cube.
- uniform_on_cube01.m maps uniform points onto the surface of the unit cube.
- uniform_on_ellipsoid_map.m maps uniform points onto an ellipsoid.
- uniform_on_hemisphere01_phong.m maps uniform points onto the unit hemisphere, with the Phong distribution.
- uniform_on_simplex01_map.m maps uniform points onto the unit simplex.
- uniform_on_sphere01_map.m maps uniform points onto the unit sphere.
- uniform_on_sphere01_patch_tp.m maps uniform points onto a unit sphere TP (THETA,PHI) patch in 3D.
- uniform_on_sphere01_patch_xyz.m maps uniform points onto a unit sphere XYZ patch in 3D.
- uniform_on_sphere01_triangle_xyz.m maps uniform points onto a spherical triangle on the unit sphere, using XYZ coordinates.
- uniform_on_triangle.m maps uniform points onto the boundary of a triangle.
- uniform_on_triangle01.m maps uniform points onto the boundary of the unit triangle (0,0), (1,0), (0,1).
- uniform_walk.m generates points on a uniform random walk.