Dear all, I am aiming to calculate variograms using variogram() from gstat. The problem is that some of my datasets are very large (> 400000 points). Running the command takes some hours of time and does not give any errormessage. Nevertheless the result seem not to be appropriate  the first few bins are ok (up to a distance of about 300) but then it gains lags which are much larger than the spatial extent of the data and the bins are not continuous any more. Running the code on smaller areas gives correct results. That's why I think that the problem is the memory. I am running the code with R 2.10.1 on a linux grid (Intel(R)Core(TM) i72600CPU@3.40GHz; 32 bit). So my questions:  is there a better way to calculate variograms with such large data sets or do I have to reduce the data?  Could parallel computation (on multiple cores) be a solution? And if yes, how could that be done? Here is the code I am using: "scans" is a 3 column vector containing x, y, and z values resulting from a high resolution (1 m) digital elevation model. The extent of the data is about 600*600 m, the #define 50 bins logscaled and with a maximum of 600 x = seq(1,50,1); a = exp(log(600)/50); logwidth = a^x; #variogram coordinates(scans) = ~V1+V2; v = variogram(V3~1, scans, boundaries = logwidth); Thank you very much, Tom  Thomas Grünewald WSL Institute for Snow and Avalanche Research SLF Research Unit Snow and Permafrost Team Snow Cover and Micrometeorology Flüelastr. 11 CH7260 Davos Dorf Tel. +41/81/417 0365 Fax. +41/81/417 0110 [hidden email] http://www.slf.ch 
Administrator

On 11/17/2011 01:59 PM, gruenewald wrote: > > > Dear all, > I am aiming to calculate variograms using variogram() from gstat. > The problem is that some of my datasets are very large (> 400000 points). > Running the command takes some hours of time and does not give any > errormessage. > Nevertheless the result seem not to be appropriate  the first few bins > are ok (up to a distance of about 300) but then it gains lags which are > much larger than the spatial extent of the data and the bins are not > continuous any more. Running the code on smaller areas gives correct > results. > That's why I think that the problem is the memory. > > I am running the code with R 2.10.1 on a linux grid (Intel(R)Core(TM) > i72600CPU@3.40GHz; 32 bit). > > So my questions: >  is there a better way to calculate variograms with such large data > sets or do I have to reduce the data? well, you could as well take smaller samples of the data. Most likely, variograms of 40.000 observations will give you enough information; maybe even 4000  it all depends a bit on how the spatial distribution of points is. >  Could parallel computation (on multiple cores) be a solution? And if > yes, how could that be done? difficult  if you split the data 10 slices, and compute variograms for each of them, and average those, it is not the same as the variogram of the full data set, as point pairs accros slices are not considered. > > Here is the code I am using: > "scans" is a 3 column vector containing x, y, and z values resulting > from a high resolution (1 m) digital elevation model. The extent of the > data is about 600*600 m, the > > #define 50 bins logscaled and with a maximum of 600 > x = seq(1,50,1); > a = exp(log(600)/50); > logwidth = a^x; > > #variogram > coordinates(scans) = ~V1+V2; > v = variogram(V3~1, scans, boundaries = logwidth); > > Thank you very much, > Tom >  Edzer Pebesma Institute for Geoinformatics (ifgi), University of Münster Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.unimuenster.de http://www.52north.org/geostatistics [hidden email] _______________________________________________ Geostatistics mailing list [hidden email] http://list.52north.org/mailman/listinfo/geostatistics http://geostatistics.forum.52north.org 
On 11/21/2011 08:22 PM, Edzer Pebesma wrote:
> > On 11/17/2011 01:59 PM, gruenewald wrote: >> >> Dear all, >> I am aiming to calculate variograms using variogram() from gstat. >> The problem is that some of my datasets are very large (> 400000 points). >> Running the command takes some hours of time and does not give any >> errormessage. >> Nevertheless the result seem not to be appropriate  the first few bins >> are ok (up to a distance of about 300) but then it gains lags which are >> much larger than the spatial extent of the data and the bins are not >> continuous any more. Running the code on smaller areas gives correct >> results. >> That's why I think that the problem is the memory. >> >> I am running the code with R 2.10.1 on a linux grid (Intel(R)Core(TM) >> i72600CPU@3.40GHz; 32 bit). >> >> So my questions: >>  is there a better way to calculate variograms with such large data >> sets or do I have to reduce the data? > well, you could as well take smaller samples of the data. Most likely, > variograms of 40.000 observations will give you enough information; > maybe even 4000  it all depends a bit on how the spatial distribution > of points is. > >>  Could parallel computation (on multiple cores) be a solution? And if >> yes, how could that be done? > difficult  if you split the data 10 slices, and compute variograms for > each of them, and average those, it is not the same as the variogram of > the full data set, as point pairs accros slices are not considered. First generating the points pairs and then calculating the semivariance in parallel might work. But I agree with Edzer that you probably do not need the full dataset to get a good variogram model. You can generate variograms for different amounts of data and check if it makes a difference. Probably the variogram model will converge after a given amount of data points. regards, Paul >> Here is the code I am using: >> "scans" is a 3 column vector containing x, y, and z values resulting >> from a high resolution (1 m) digital elevation model. The extent of the >> data is about 600*600 m, the >> >> #define 50 bins logscaled and with a maximum of 600 >> x = seq(1,50,1); >> a = exp(log(600)/50); >> logwidth = a^x; >> >> #variogram >> coordinates(scans) = ~V1+V2; >> v = variogram(V3~1, scans, boundaries = logwidth); >> >> Thank you very much, >> Tom >>  Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10  3732 GK  De Bilt  Kamer B 3.39 P.O. Box 201  3730 AE  De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paulhiemstra/20/30b/770 _______________________________________________ Geostatistics mailing list [hidden email] http://list.52north.org/mailman/listinfo/geostatistics http://geostatistics.forum.52north.org 
Free forum by Nabble  Resume Templates  Edit this page 