Author Topic: Python help - How do I do an "average If" function? (Read 8214 times)

Entrepreneur · « **on:** January 23, 2015, 03:23:21 PM »

Attached is a sample data set along with the results I would like Python to produce.

I would like to what Excel calls and Average If formula.
I have a several columns of data where one of the columns has unique codes that repeat. The number of unique code numbers will vary based on the data set.
I want to average the other columns in the data set for each of the unique codes. Kind of like using a filter where you use a filter on the Code column to limit the data set to one of the numbers in that column. You then average the remaining columns for that number. So if there are 14 unique Code numbers, then I would have 20 rows of averaged data for the remaining columns in the data set.

an613 · « **Reply #1 on:** January 23, 2015, 04:37:44 PM »

Where is the data coming from? you can iterate through the rows and just add up each column in a list or dict but if your datasource is a DB you can get that into a pretty trivial sql query. Alternatively I'm sure there is a function in numpy that would do average if but dont know it offhand

an613 · « **Reply #2 on:** January 23, 2015, 04:38:21 PM »

side note, mods may want to move this to Tech Talk

Entrepreneur · « **Reply #3 on:** January 24, 2015, 08:48:32 PM »

Quote from: an613 on January 23, 2015, 04:37:44 PM

Where is the data coming from? you can iterate through the rows and just add up each column in a list or dict but if your datasource is a DB you can get that into a pretty trivial sql query. Alternatively I'm sure there is a function in numpy that would do average if but dont know it offhand

I would be uploading the data from an excel or csv file.

shtank · « **Reply #4 on:** January 24, 2015, 09:38:01 PM »

Quote from: Entrepreneur on January 24, 2015, 08:48:32 PM

I would be uploading the data from an excel or csv file.

I have no experience with Python, but in pseudocode this is how I would do it:

float averageIf(list[][], uniqueCode)
{
sum=0, count=0; //sum is where you will store all the values you are averaging and count keeps count so you know what to divide by
for i = 1 to list.size() //iterate through the list
if list[0]==uniqueCode then //if this row contains the unique code then you can ->
sum+=list[1]; //add the value in the second column to the sum and ->
count++; //increment the counter
return sum/count; //once you have exhausted the list, return the average
}

chff · « **Reply #5 on:** January 24, 2015, 09:51:17 PM »

I need help with Python as well

an613 · « **Reply #6 on:** January 25, 2015, 09:40:22 AM »

Quote from: shtank on January 24, 2015, 09:38:01 PM

I have no experience with Python, but in pseudocode this is how I would do it:

float averageIf(list[][], uniqueCode)
{
sum=0, count=0; //sum is where you will store all the values you are averaging and count keeps count so you know what to divide by
for i = 1 to list.size() //iterate through the list
if list[0]==uniqueCode then //if this row contains the unique code then you can ->
sum+=list[1]; //add the value in the second column to the sum and ->
count++; //increment the counter
return sum/count; //once you have exhausted the list, return the average
}

Wont matter unless OP is doing tens of thousands of rows but in general this wouldn't be very efficient bc you'd have iterate through the list for each unique code. More efficient way would be to iterate through it once and then keep track total and count for each unique code.

haven't actually tried this but something similar to below should work. You could definitely do it in less code but would be harder to read/understand.

Alternative would be dump your rows into a simple sql table (in memory sqllite would be 1 or 2 lines of code) and then just query it easily with something like " select uniqueCode, avg(c1), avg(c2),avg(c3),avg(c4) from rows group by uniqueCode" and that would do all the math for you.

Code: [Select]

#get rows from somwhere
rows = getInput()

#calculate totals
numTimesUniqueCodeFound = defaultdict(int) #this will create dictionary with default of 0 for new keys
totalsForEachUniqueCode = defaultdict(list)#this will create dictionary with default of empty list for new keys
for row in rows:
    uniqueCode = row[0] #or whatever column unique code is in
    numTimesUniqueCodeFound[uniqueCode] += 1

    for cColumnNumber in range(1, 10):
        totalsForEachUniqueCode[uniqueCode][cColumnNumber] += row[cColumnNumber]

#now get averages
averageForEachCode = {}
for uniqueCode in totalsForEachUniqueCode:
    averageForEachCode[uniqueCode] = []
    for cColumnNumber in range(1, 10):
        averageForEachCode[uniqueCode][cColumnNumber] = totalsForEachUniqueCode[uniqueCode][cColumnNumber] / numTimesUniqueCodeFound[uniqueCode]

#now do something with averages
for code in averageForEachCode.keys():
    print "{},".format(code)
    for cColumnNumber in range(1, 10):
        print "{},".format(averageForEachCode[code][cColumnNumber])

an613 · « **Reply #7 on:** January 25, 2015, 09:41:18 AM »

Quote from: chff on January 24, 2015, 09:51:17 PM

I need help with Python as well

might be a little easier to help if you explain what you need help with

chff · « **Reply #8 on:** January 25, 2015, 10:29:50 AM »

Quote from: an613 on January 25, 2015, 09:41:18 AM

might be a little easier to help if you explain what you need help with

If there is someone who is in the know about Python, pls PM me

shtank · « **Reply #9 on:** January 25, 2015, 10:38:27 AM »

Quote from: an613 on January 25, 2015, 09:40:22 AM

Wont matter unless OP is doing tens of thousands of rows but in general this wouldn't be very efficient bc you'd have iterate through the list for each unique code. More efficient way would be to iterate through it once and then keep track total and count for each unique code.

haven't actually tried this but something similar to below should work. You could definitely do it in less code but would be harder to read/understand.

Alternative would be dump your rows into a simple sql table (in memory sqllite would be 1 or 2 lines of code) and then just query it easily with something like " select uniqueCode, avg(c1), avg(c2),avg(c3),avg(c4) from rows group by uniqueCode" and that would do all the math for you.

Code: [Select]
#get rows from somwhere rows = getInput() #calculate totals numTimesUniqueCodeFound = defaultdict(int) #this will create dictionary with default of 0 for new keys totalsForEachUniqueCode = defaultdict(list)#this will create dictionary with default of empty list for new keys for row in rows: uniqueCode = row[0] #or whatever column unique code is in numTimesUniqueCodeFound[uniqueCode] += 1 for cColumnNumber in range(1, 10): totalsForEachUniqueCode[uniqueCode][cColumnNumber] += row[cColumnNumber] #now get averages averageForEachCode = {} for uniqueCode in totalsForEachUniqueCode: averageForEachCode[uniqueCode] = [] for cColumnNumber in range(1, 10): averageForEachCode[uniqueCode][cColumnNumber] = totalsForEachUniqueCode[uniqueCode][cColumnNumber] / numTimesUniqueCodeFound[uniqueCode] #now do something with averages for code in averageForEachCode.keys(): print "{},".format(code) for cColumnNumber in range(1, 10): print "{},".format(averageForEachCode[code][cColumnNumber])

I think you're assuming that the data is not changing. I'm assuming that the data may change from time to time and therefore the average would be different each time the function is called.

an613 · « **Reply #10 on:** January 25, 2015, 02:59:12 PM »

Quote from: shtank on January 25, 2015, 10:38:27 AM

I think you're assuming that the data is not changing. I'm assuming that the data may change from time to time and therefore the average would be different each time the function is called.

What do you mean by changing? What case would my code return a different result than yours? In both solutions, the average for a specific code is calculated anew. Please CMIIW but the only difference I see is with yours only one code is calculate and with mine all the codes are calculated.

Both solutions would work - mine will use a little extra storage and yours will use a little extra cpu cycles neither of which will matter for a small dataset so its up to the OP to determine which one hed prefer

shtank · « **Reply #11 on:** January 25, 2015, 03:53:41 PM »

Quote from: an613 on January 25, 2015, 02:59:12 PM

What do you mean by changing? What case would my code return a different result than yours? In both solutions, the average for a specific code is calculated anew. Please CMIIW but the only difference I see is with yours only one code is calculate and with mine all the codes are calculated.

Both solutions would work - mine will use a little extra storage and yours will use a little extra cpu cycles neither of which will matter for a small dataset so its up to the OP to determine which one hed prefer

You know what? I completely skipped over the fact that the OP attached sample data. I misunderstood the problem. I will look through your code too when I get a chance, but suffice to say that my code won't work and is not the correct solution for the OPs problem.

DansDeals.com Forums

Author Topic: Python help - How do I do an "average If" function? (Read 8214 times)

Entrepreneur

Python help - How do I do an "average If" function?

an613

Re: Python help - How do I do an "average If" function?

an613

Re: Python help - How do I do an "average If" function?

Entrepreneur

Re: Python help - How do I do an "average If" function?

shtank

Re: Python help - How do I do an "average If" function?

chff

Re: Python help - How do I do an "average If" function?

an613

Re: Python help - How do I do an "average If" function?

an613

Re: Python help - How do I do an "average If" function?

chff

Re: Python help - How do I do an "average If" function?

shtank

Re: Python help - How do I do an "average If" function?

an613

Re: Python help - How do I do an "average If" function?

shtank

Re: Python help - How do I do an "average If" function?