Thursday, July 15, 2010

cold with position jitter

I have re-run the scenario where I define equally spaced positions, define the velocities according to the unperturbed position, and then added a 20% perturbation to the particle positions. This is in contrast to the run I showed you before, where I perturbed the position first, and used the perturbed positions to define the velocities. The result seems to be consistent with what we were finding with warm simulations, namely that the slope of the inner profile is flatter than for the cold simulations. If I understood Walter correctly, these conditions are effectively "cold," (though this sort-of confuses me), so this is a somewhat surprising result. Here is the plot:


There is a minor caveat, something went wrong with my run, and only 5 of the 6 instances of the initial conditions ran. I don't THINK I have a mix-up in my book-keeping, but it's odd that the run did not continue, and I am re-running from scratch just to double check.

Thursday, July 8, 2010

k-s clarifications

Here are a few clarifications and responses to Walter's email, I wrote my post latish last night, and was perhaps neglecting a few of the details:

we best take a simulation with equal-mass particles, such that the
hypothesis is equivalent to
the number density is proportional to $r^{-\alpha}$ for $r$ less than $R$.

Yes, this is what I did.

Then the K-S distance is just

D = max_{r < R} | N(< r)/N(< R) - (r/R)^{1-alpha} |

Indeed. This is exactly what I calculated. I used a canned k-s routine, but I also computed the D statistic by hand and checked that it agreed with what was coming out of the kstest function.

From your plots it looks like alpha is less than 1/2

What you call alpha is what I called expt in the previous post. The answer to this is that alpha is rolling from less than 1/2 for some of the range of the data over to greater than 1/2 for other parts. A better way to put it might be that the data is simply not particularly well described by a power law (a result that is seemingly in conflict with the chi-squared result). More on this below.


Also it seems to me that you performed the K-S test for data at r>R, but as you did not explain what exactly you did (no mentioning of R or a similar parameter), I can only guess.

I am not sure if this is what you were concerned about, but I truncated the data I used in the k-s test to only include the range where the power-law scaling behavior is seen. I did the analysis at two different choices of truncation, 0.1 and 0.3, and included only data between 0 and the truncation radius in the analysis. The results were qualitatively similar for the two choices. The plots in the previous post were for 0.3. What you call R was included in my previous post as $x_max$, in the first version of the cdf I used. The second version was done as a sanity check, because the results came out so poorly, I wanted to double check with a number density function that I was sure was a good fit to the density data (with error bars). It surprised me how much the two disagreed.

I had a chat with Scott about this, he thought that potentially the run-to-run scatter may be substantially greater than just the shot noise. So I will repeat the chisq. fit using only the shot noise, rather than the variance between runs, and see if the value of reduced chi square increases dramatically (suggesting that a power law is not an especially good fit). That would make the results more consistent.

I'll also have a look at the k-s test between two separate runs to see if the profile is at least something like "universal" even if it is not particularly well described by $r^{-1/2}$.

Wednesday, July 7, 2010

K-S testing.

I tried the k-s test on the data to see if it agrees with a r^-1/2 density profile. I had a fair bit of trouble with this k-s test stuff. At first I blamed the python implementation, but eventually I tested it by generating normally distributed random numbers, and passing it the cumulative distribution function cdf(x)=$\frac{1}{2} \left[1+erf(x/\sqrt{2})\right]$ and got reasonable agreement (D,p=(0.0583, 0.885) for a sample of 100 numbers). So I am reasonably sure now that it works.

The bottom line I guess is that the k-s test is concluding that the data are probably not drawn from an $x^{-1/2}$ distribution. I searched around the vicinity of -0.5, and found the local maximum for the p coming out of the ks test. Here is a plot of the cumulative distribution functions, and the D/p values that come out (for the best scaling). I plotted a scaling of -0.5 for comparison. N.B. the labeling for these models is $\rho(x)\propto x^{-expt}$.



This apparently means that even for the local maximum value of p, there is only a 2ish percent chance that these data came from an $x^{-expt}$ distribution.

I should mention at this point that I had trouble deciding what to use for the cdf; there were two logical options and I tried them both. The plot above was made using cdf(x)=$(x/x_{max})^{-expt+1}$, which has the advantage that the curves meet again at 1. The other strategy I tried was to turn the density $\rho(x)$ into a number density function, and then divide by the number of points I was using in the k-s test. In other words: cdf(x)=$\frac{A}{m_p N_{ks}} \frac{1}{(-expt+1)} x^{(-expt+1)}$. This has the advantage that it makes more sense. Here is the plot I got:

I was surprised this was so much worse, given that at the maximum point, the value of the theory cdf was 0.994, but I will explain the difference. In order to get the number density, I needed the amplitude of the mass density A. I did this by performing a least-squares fit to the density data points (with error bars) that I showed you in previous posts. Every time I adjust the value of the exponent, the leastsq fit allows the amplitude A to compensate, to best match the density data (which has been binned, and using the data twice in this way might be a cheat, statistically speaking...). That is why you see the curves jumping around a bit more in these plots. It makes me confused, because I felt that these theory CDFs differed only in a normalization from one another, but I guess there is more to it.

In any event, in the process of doing the least squares fit, I also got a reduced chi square value. Here is a plot of the measured density and the best fit model fixing the exponent to be $-1/2$.

I took this one further step, and fit both the amplitude and the exponent together. The plot is visually quite similar, and the parameters were A=0.2422, expt=0.4988 and the reduced chisq was 0.6912. Taking the circularity of my analysis to the point of the absurd, I then tried a k-s test using both the amplitude and the exponent I got out of the data (although I am vaguely recalling that NRC++ warned that there was a rule against such behavior)... still it very much distresses me, take a look:

That p is rediculous! I guess the only conclusion I can draw here is that my error bars are just way too big, so that the chi-suqare statistic is basically just telling me about the error bars (?). Otherwise how can these two statistical tests differ so extremely? I am beginning to lose my faith in statistics... not that I had all that much to begin with.