Python: how to use Python to generate a random sparse symmetric matrix? -
how use python generate random sparse symmetric matrix ?
in matlab, have function "sprandsym (size, density)"
but how in python?
if have scipy, use sparse.random. sprandsym
function below generates sparse random matrix x, takes upper triangular half, , adds transpose form symmetric matrix. since doubles diagonal values, diagonals subtracted once.
the non-zero values distributed mean 0 , standard deviation of 1. kolomogorov-smirnov test used check non-zero values consistent drawing normal distribution, , histogram , qq-plot generated visualize distribution.
import numpy np import scipy.stats stats import scipy.sparse sparse import matplotlib.pyplot plt np.random.seed((3,14159)) def sprandsym(n, density): rvs = stats.norm().rvs x = sparse.random(n, n, density=density, data_rvs=rvs) upper_x = sparse.triu(x) result = upper_x + upper_x.t - sparse.diags(x.diagonal()) return result m = sprandsym(5000, 0.01) print(repr(m)) # <5000x5000 sparse matrix of type '<class 'numpy.float64'>' # 249909 stored elements in compressed sparse row format> # check matrix symmetric. difference should have no non-zero elements assert (m - m.t).nnz == 0 statistic, pval = stats.kstest(m.data, 'norm') # null hypothesis m.data drawn normal distribution. # small p-value (say, below 0.05) indicate reason reject null hypothesis. # since `pval` below > 0.05, kstest gives no reason reject hypothesis # m.data distributed. print(statistic, pval) # 0.0015998040114 0.544538788914 fig, ax = plt.subplots(nrows=2) ax[0].hist(m.data, normed=true, bins=50) stats.probplot(m.data, dist='norm', plot=ax[1]) plt.show()
ps. used
upper_x = sparse.triu(x) result = upper_x + upper_x.t - sparse.diags(x.diagonal())
instead of
result = (x + x.t)/2.0
because not convince myself non-zero elements in (x + x.t)/2.0
have right distribution. first, if x
dense , distributed mean 0 , variance 1, i.e. n(0, 1)
, (x + x.t)/2.0
n(0, 1/2)
. fix using
result = (x + x.t)/sqrt(2.0)
instead. result
n(0, 1)
. there yet problem: if x
sparse, @ nonzero locations, x + x.t
distributed random variable plus zero. dividing sqrt(2.0)
squash normal distribution closer 0 giving more tightly spiked distribution. x
becomes sparser, may less , less normal distribution.
since didn't know distribution (x + x.t)/sqrt(2.0)
generates, opted copying upper triangular half of x
(thus repeating know distributed non-zero values).
Comments
Post a Comment