Author Topic: Python help - How do I do an "average If" function?  (Read 8214 times)

Offline Entrepreneur

  • Dansdeals Platinum Elite + Lifetime Silver Elite
  • *****
  • Join Date: Dec 2011
  • Posts: 583
  • Total likes: 3
  • DansDeals.com Hat Tips 4
    • View Profile
  • Location: New York
Python help - How do I do an "average If" function?
« on: January 23, 2015, 03:23:21 PM »
Attached is a sample data set along with the results I would like Python to produce.

I would like to what Excel calls and Average If formula.
I have a several columns of data where one of the columns has unique codes that repeat. The number of unique code numbers will vary based on the data set.
I want to average the other columns in the data set for each of the unique codes. Kind of like using a filter where you use a filter on the Code column to limit the data set to one of the numbers in that column. You then average the remaining columns for that number. So if there are 14 unique Code numbers, then I would have 20 rows of averaged data for the remaining columns in the data set.

Offline an613

  • Dansdeals Platinum Elite
  • ****
  • Join Date: Apr 2010
  • Posts: 319
  • Total likes: 2
  • DansDeals.com Hat Tips 1
    • View Profile
Re: Python help - How do I do an "average If" function?
« Reply #1 on: January 23, 2015, 04:37:44 PM »
Where is the data coming from? you can iterate through the rows and just add up each column in a list or dict but if your datasource is a DB you can get that into a pretty trivial sql query. Alternatively I'm sure there is a function in numpy that would do average if but dont know it offhand

Offline an613

  • Dansdeals Platinum Elite
  • ****
  • Join Date: Apr 2010
  • Posts: 319
  • Total likes: 2
  • DansDeals.com Hat Tips 1
    • View Profile
Re: Python help - How do I do an "average If" function?
« Reply #2 on: January 23, 2015, 04:38:21 PM »
side note, mods may want to move this to Tech Talk

Offline Entrepreneur

  • Dansdeals Platinum Elite + Lifetime Silver Elite
  • *****
  • Join Date: Dec 2011
  • Posts: 583
  • Total likes: 3
  • DansDeals.com Hat Tips 4
    • View Profile
  • Location: New York
Re: Python help - How do I do an "average If" function?
« Reply #3 on: January 24, 2015, 08:48:32 PM »
Where is the data coming from? you can iterate through the rows and just add up each column in a list or dict but if your datasource is a DB you can get that into a pretty trivial sql query. Alternatively I'm sure there is a function in numpy that would do average if but dont know it offhand
I would be uploading the data from an excel or csv file.

Offline shtank

  • Dansdeals Platinum Elite
  • ****
  • Join Date: May 2014
  • Posts: 308
  • Total likes: 14
  • DansDeals.com Hat Tips 0
    • View Profile
  • Location: Brooklyn, NY
Re: Python help - How do I do an "average If" function?
« Reply #4 on: January 24, 2015, 09:38:01 PM »
I would be uploading the data from an excel or csv file.
I have no experience with Python, but in pseudocode this is how I would do it:

float averageIf(list[][], uniqueCode)
{
sum=0, count=0; //sum is where you will store all the values you are averaging and count keeps count so you know what to divide by
for i = 1 to list.size() //iterate through the list
  if list[0]==uniqueCode then //if this row contains the unique code then you can ->
    sum+=list[1]; //add the value in the second column to the sum and ->
    count++; //increment the counter
return sum/count; //once you have exhausted the list, return the average
}

Offline chff

  • Dansdeals Lifetime Presidential Platinum Elite
  • *********
  • Join Date: Jul 2012
  • Posts: 5940
  • Total likes: 695
  • DansDeals.com Hat Tips 25
    • View Profile
  • Location: NYC (not anymore)
  • Programs: All of them...
Re: Python help - How do I do an "average If" function?
« Reply #5 on: January 24, 2015, 09:51:17 PM »
I need help with Python as well

Offline an613

  • Dansdeals Platinum Elite
  • ****
  • Join Date: Apr 2010
  • Posts: 319
  • Total likes: 2
  • DansDeals.com Hat Tips 1
    • View Profile
Re: Python help - How do I do an "average If" function?
« Reply #6 on: January 25, 2015, 09:40:22 AM »
I have no experience with Python, but in pseudocode this is how I would do it:

float averageIf(list[][], uniqueCode)
{
sum=0, count=0; //sum is where you will store all the values you are averaging and count keeps count so you know what to divide by
for i = 1 to list.size() //iterate through the list
  if list[0]==uniqueCode then //if this row contains the unique code then you can ->
    sum+=list[1]; //add the value in the second column to the sum and ->
    count++; //increment the counter
return sum/count; //once you have exhausted the list, return the average
}

Wont matter unless OP is doing tens of thousands of rows but in general this wouldn't be very efficient bc you'd have iterate through the list for each unique code. More efficient way would be to iterate through it once and then keep track total and count for each unique code.

haven't actually tried this but something similar to below should work. You could definitely do it in less code but would be harder to read/understand.

Alternative would be dump your rows into a simple sql table (in memory sqllite would be 1 or 2 lines of code) and then just query it easily with something like " select uniqueCode, avg(c1), avg(c2),avg(c3),avg(c4) from rows group by uniqueCode" and that would do all the math for you.

Code: [Select]
#get rows from somwhere
rows = getInput()

#calculate totals
numTimesUniqueCodeFound = defaultdict(int) #this will create dictionary with default of 0 for new keys
totalsForEachUniqueCode = defaultdict(list)#this will create dictionary with default of empty list for new keys
for row in rows:
    uniqueCode = row[0] #or whatever column unique code is in
    numTimesUniqueCodeFound[uniqueCode] += 1

    for cColumnNumber in range(1, 10):
        totalsForEachUniqueCode[uniqueCode][cColumnNumber] += row[cColumnNumber]

#now get averages
averageForEachCode = {}
for uniqueCode in totalsForEachUniqueCode:
    averageForEachCode[uniqueCode] = []
    for cColumnNumber in range(1, 10):
        averageForEachCode[uniqueCode][cColumnNumber] = totalsForEachUniqueCode[uniqueCode][cColumnNumber] / numTimesUniqueCodeFound[uniqueCode]

#now do something with averages
for code in averageForEachCode.keys():
    print "{},".format(code)
    for cColumnNumber in range(1, 10):
        print "{},".format(averageForEachCode[code][cColumnNumber])

Offline an613

  • Dansdeals Platinum Elite
  • ****
  • Join Date: Apr 2010
  • Posts: 319
  • Total likes: 2
  • DansDeals.com Hat Tips 1
    • View Profile
Re: Python help - How do I do an "average If" function?
« Reply #7 on: January 25, 2015, 09:41:18 AM »
I need help with Python as well

might be a little easier to help if you explain what you need help with :)

Offline chff

  • Dansdeals Lifetime Presidential Platinum Elite
  • *********
  • Join Date: Jul 2012
  • Posts: 5940
  • Total likes: 695
  • DansDeals.com Hat Tips 25
    • View Profile
  • Location: NYC (not anymore)
  • Programs: All of them...
Re: Python help - How do I do an "average If" function?
« Reply #8 on: January 25, 2015, 10:29:50 AM »
might be a little easier to help if you explain what you need help with :)
If there is someone who is in the know about Python, pls PM me

Offline shtank

  • Dansdeals Platinum Elite
  • ****
  • Join Date: May 2014
  • Posts: 308
  • Total likes: 14
  • DansDeals.com Hat Tips 0
    • View Profile
  • Location: Brooklyn, NY
Re: Python help - How do I do an "average If" function?
« Reply #9 on: January 25, 2015, 10:38:27 AM »
Wont matter unless OP is doing tens of thousands of rows but in general this wouldn't be very efficient bc you'd have iterate through the list for each unique code. More efficient way would be to iterate through it once and then keep track total and count for each unique code.

haven't actually tried this but something similar to below should work. You could definitely do it in less code but would be harder to read/understand.

Alternative would be dump your rows into a simple sql table (in memory sqllite would be 1 or 2 lines of code) and then just query it easily with something like " select uniqueCode, avg(c1), avg(c2),avg(c3),avg(c4) from rows group by uniqueCode" and that would do all the math for you.

Code: [Select]
#get rows from somwhere
rows = getInput()

#calculate totals
numTimesUniqueCodeFound = defaultdict(int) #this will create dictionary with default of 0 for new keys
totalsForEachUniqueCode = defaultdict(list)#this will create dictionary with default of empty list for new keys
for row in rows:
    uniqueCode = row[0] #or whatever column unique code is in
    numTimesUniqueCodeFound[uniqueCode] += 1

    for cColumnNumber in range(1, 10):
        totalsForEachUniqueCode[uniqueCode][cColumnNumber] += row[cColumnNumber]

#now get averages
averageForEachCode = {}
for uniqueCode in totalsForEachUniqueCode:
    averageForEachCode[uniqueCode] = []
    for cColumnNumber in range(1, 10):
        averageForEachCode[uniqueCode][cColumnNumber] = totalsForEachUniqueCode[uniqueCode][cColumnNumber] / numTimesUniqueCodeFound[uniqueCode]

#now do something with averages
for code in averageForEachCode.keys():
    print "{},".format(code)
    for cColumnNumber in range(1, 10):
        print "{},".format(averageForEachCode[code][cColumnNumber])
I think you're assuming that the data is not changing. I'm assuming that the data may change from time to time and therefore the average would be different each time the function is called.

Offline an613

  • Dansdeals Platinum Elite
  • ****
  • Join Date: Apr 2010
  • Posts: 319
  • Total likes: 2
  • DansDeals.com Hat Tips 1
    • View Profile
Re: Python help - How do I do an "average If" function?
« Reply #10 on: January 25, 2015, 02:59:12 PM »
I think you're assuming that the data is not changing. I'm assuming that the data may change from time to time and therefore the average would be different each time the function is called.

What do you mean by changing? What case would my code return a different result than yours? In both solutions, the average for a specific code is calculated anew. Please CMIIW but the only difference I see is with yours only one code is calculate and with mine all the codes are calculated.

Both solutions would work - mine will use a little extra storage and yours will use a little extra cpu cycles neither of which will matter for a small dataset so its up to the OP to determine which one hed prefer

Offline shtank

  • Dansdeals Platinum Elite
  • ****
  • Join Date: May 2014
  • Posts: 308
  • Total likes: 14
  • DansDeals.com Hat Tips 0
    • View Profile
  • Location: Brooklyn, NY
Re: Python help - How do I do an "average If" function?
« Reply #11 on: January 25, 2015, 03:53:41 PM »
What do you mean by changing? What case would my code return a different result than yours? In both solutions, the average for a specific code is calculated anew. Please CMIIW but the only difference I see is with yours only one code is calculate and with mine all the codes are calculated.

Both solutions would work - mine will use a little extra storage and yours will use a little extra cpu cycles neither of which will matter for a small dataset so its up to the OP to determine which one hed prefer
You know what? I completely skipped over the fact that the OP attached sample data. I misunderstood the problem. I will look through your code too when I get a chance, but suffice to say that my code won't work and is not the correct solution for the OPs problem.