(25C) Sampling without repetition

04272017, 04:07 AM
Post: #1




(25C) Sampling without repetition
This program generates random sample of k elements out of total set of N elements. Element can be used only once. All such subsets are drawn with equal probabilitites (assuming RNG is fair).
Algorithm considers each integer i from 0 to N1 and decides on each whether to include it in a sample or not with probability (kdone)/(Ni) where done is the number of already selected items. The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N. Instruction: <N> sto 0 <k> sto 1 <0.random_seed> sto 5 f prgm r/s Output: first element of a sample 0..N1 r/s Output: next element of a sample. When no more elements outputs 1 r/s after that starts a new sample. Code:
Example: 100 sto 0 10 sto 1 0.1234 sto 5 f prgm r/s [2] r/s [4] r/s [23] r/s [30] r/s [50] r/s [53] r/s [57] r/s [59] r/s [66] r/s [96] r/s [1]  indicates end of sample I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000. 

10272018, 03:52 PM
Post: #2




(42S) Sampling without repetition
Here's a translation of your program for the HP42S:
Code: 00 { 41Byte Prgm } However, I reversed the order so I could use DSE. Thus the generated samples appear in descending order. Example: 100 STO 00 10 STO 01 XEQ "SAMPLE" 85 R/S 64 R/S 62 R/S 59 R/S 55 R/S 53 R/S 39 R/S 32 R/S 30 R/S 11 R/S 1 (04272017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000. In this case we might simply generate them at random and hope there's no collision. Kind regards Thomas 

10282018, 09:18 AM
(This post was last modified: 10282018 09:20 AM by Dieter.)
Post: #3




RE: (25C) Sampling without repetition
(10272018 03:52 PM)Thomas Klemm Wrote: Here's a translation of your program for the HP42S: One more case where something like INV DSZ would be useful (greetings to the TI community ;)). This can be accomplished quite easily by having the DSE followed by a test that always tests false. Since X holds a positive number at this point, a simple x<0? will do. Code: ... Slightly shorter, slightly faster, saves one label. Edit: OK, in this special case it's neither shorter nor labelsaving, simply because LBL 01 is required anyway. #) (10272018 03:52 PM)Thomas Klemm Wrote:(04272017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.In this case we might simply generate them at random and hope there's no collision. In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;) Dieter 

10282018, 02:18 PM
(This post was last modified: 10282018 04:52 PM by Albert Chan.)
Post: #4




RE: (25C) Sampling without repetition
(04272017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, Code: import random Above code only need k random numbers, and return sorted samples. >>> randindex(10,10) # no repeat [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> random.seed(1) >>> randindex(30,10000000) [21060, 254459, 283474, 290410, 305901, 938595 ... 

10282018, 03:42 PM
Post: #5




RE: (25C) Sampling without repetition  
10282018, 08:03 PM
(This post was last modified: 10282018 08:04 PM by John Keith.)
Post: #6




RE: (25C) Sampling without repetition
(10282018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;) Going further oftopic... This would be a variation of the FisherYates shuffle, implemented in the ListExt Library as LSHUF. 

10282018, 08:51 PM
(This post was last modified: 10282018 08:52 PM by Albert Chan.)
Post: #7




RE: (25C) Sampling without repetition
(10282018 08:03 PM)John Keith Wrote:(10282018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;) I do not see any shuffling from Dieter description. That seems more like "rinse, and repeat", throwing away (rare) repeated samples. My Python randindex(...) look more like shuffling, except working thru the samples list. 

« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)