User:Yamagawa/Drop Rates

Drop Rates
I use a method called.... um... I know I saw it on wiki.... ... ....  Um, we'll call it the average of averages, or... powered average (eg: average squared). Apologies if the terms don't match whats in the books, I don't have them, I just use what I remember, which is more *how* it works than *what* it's called....

Basically, we get a bunch of different samples together, each with different rates. Each sample can be considered an average. (My ten samples of ecto drops averaged 20% ectos... My next sample averaged 0% ectos...). We take all those averages and average them together. There is however a fair amount of precision to be lost depending on how you average them together.

This breaks down into two parts Bucketing the data. Averaging the buckets.

Bucketing the data
The process calls for n buckets of n drops.

This tries to maximize on the raw 'information' in the drop table. So the first step is to take all the rows of data, find the total number of measurements. Get the square root of that. Say, I have 100 drops. The square root is 10. That's my N. I want to lump the data together into N samples of N size, or 10 samples of 10 drops. If I make the sample size larger, I get fewer drops, at a cost of information. If I make the sample size smaller, I get more total buckets, but information is again lost in the smaller sample size. A 'power' effect (or is that inverse square) applies to sample size. A sample of 4 may be considered to carry 2 units of data. A sample of 9 carries 3 units. A sample of 16 carries 4 units. A sample of 25 carries 5 units. In the process used here, each bucket is considered a sample, and the collection of buckets is another sample. By getting those numbers both as low as possible, we can maximize the transfer of data into the final result.

To aid in this process, I'd suggest encouraging users to target sample sizes in the 10-25 drops range, where possible.

To buckets should consist of the raw counts (not percentages), plus a sample size column.

Averaging the Buckets
Unless you are very very lucky, the buckets are not all the same size. This is annoying, but not fatal to the process.

Extending the buckets
Going back to the previous section on bucketing the data, it's mentioned that a sample of 4 carries 2 units of data, and a sample of 25 caries only 5 (21 more drops, for 3 more units...) We need to treat each row according to this weight. So: Add a sum-of-drops column (if missing), to sum the drops in each bucket. Add a n^0.5 column which carries the square root of the sum of drops in the bucket. If the bucket has 25 drops, this will show 5. Add averages row. Get an average for your n^0.5 column. Add a 'Bucket Weight' column. This row will contain the row's n^0.5 value divided by the average n^0.5 value. This is a measure of how significant the row is relative to other rows, and it's needed because the standard statistical functions built into spreadsheet programs work on the basis that samples have equal weight. In this instance, they do not.

The Adjusted Table
Now we need to make a drop rates table that is adjusted for the weight of the rows. Create a new sheet. Copy the column headings from the first table to it. You can omit the columns we added in the last table, we just start with the drops themselves. The basic formulae is simple multiplication. For each bucket, you multiply the observed drop rate by the row's Bucket Weight. Copy & paste is your friend here, most spreadsheet programs will update the formulae as you copy it from cell to cell, however you may need to sort out something for the Bucket Weight value that you read in, as you move from column to column.

The Results - part 1
Sort out average drop rates, same as you might with the base table: Add a 'total drops' column that contains a total of the adjusted drops value for each row. Create a totals row, and put into it the total for each column. Create an averages row, and put into it total for that column divided by the total for the totals column.

This gives you an adjusted average drop rate -- we aren't done yet.

We still need to sort out 'what do these mean'. ie, what is the margin of error. If we just wanted averages, I'd have skipped the whole complicated process and used the base average, as that's not far from what we get here, and they have similar meaning to me. On with the math... perhaps later.


 * I am double-checking work thus-far, making sure I've not included a completely botched step or such, using a small table for practice.... gifts of the traveler looks workable. Yamagawa 15:50, October 17, 2010 (UTC)