Windows 7 - Need help with excluding duplicate values...

Asked By Alan Casey on 07-Feb-12 02:56 PM
Earn up to 20 extra points for answering this tough question.
Can anyone help with a excel formula to exclude duplicate values?  For example:

part #  unit cost  usage
aabb    $10    10
aabb    $10    10
ccdd    $20    20    
ccdd    $20    20
eeff       $30    15

The total value I need I after is $950 = (aabb unit cost * aabb usage) + (ccdd unit cost * ccdd usage) +(eeff unit cost * eeff usage).       

 

 
  
Venkat K replied to Alan Casey on 07-Feb-12 07:21 PM
You can use the following VBA script to remove the duplicates,
Specify the Range in the first line after the variable declaration, and use CountIF to identify the duplicates.

Sub DeleteDups() 
  Dim x         As Long
    Dim LastRow     As Long
  
    LastRow = Range("A10").End(xlUp).Row
    For x = LastRow To 1 Step -1
      If Application.WorksheetFunction.CountIf(Range("A1:A" & x), Range("A" & x).Text) > 1 Then
        Range("A" & x).EntireRow.Delete
      End If 
    Next x
  
End Sub

If you are looking for removing duplicates without VBA script, there are many methods to identify duplicates
1. using Conditional Formatting
2. Using Remove Duplicates icon
3. Using Unique option in advance filters.
The methods are clearly described in the below link:
http://www.mrexcel.com/tip138.shtml

Hope this helps!
[)ia6l0 iii replied to Alan Casey on 07-Feb-12 09:35 PM
I hope you use excel 2007 or greater. It is so easy with the "Remove duplicates" feature. 

a) Select the data range - In the screeshot below, it would be A1:D6.
b) Go to the data tab, and click on "Remove duplicates"
c) Select all the columns 
d) Click Ok. 
e) You will get a message on how many values are removed. and how many remain. 
f) You can then use the SUMPRODUCT formula to compute what you need. 
=SUMPRODUCT(B2:B6,C2:C6) in your case would suffice.


 
Hope this helps.

P.S: You posted in the wrong category. 
Alan Casey replied to Alan Casey on 08-Feb-12 10:37 AM

Wow thanks, I should have said I need a formula that can take out duplicates.  So I need one formula that can find the duplicates in a range of part numbers and then used the first part number that is in each duplicate to do a =SUMPRODUCT(B2:B6,C2:C6) along with part numbers that aren't duplicated. 

 

=IF(A2-A6 find any duplicate part number,  select the first part listed all duplicates part number AND non-duplicate parts, =SUMPRODUCT(B2:B6,C2:C6)), =SUMPRODUCT(B2:B6,C2:C6))

Alan Casey replied to [)ia6l0 iii on 08-Feb-12 10:39 AM

Wow thanks, I should have said I need a formula that can take out duplicates.  So I need one formula that can find the duplicates in a range of part numbers and then used the first part number that is in each duplicate to do a =SUMPRODUCT(B2:B6,C2:C6) along with part numbers that aren't duplicated. 

 

=IF(A2-A6 find any duplicate part number,  select the first part listed all duplicates part number AND non-duplicate parts, =SUMPRODUCT(B2:B6,C2:C6)), =SUMPRODUCT(B2:B6,C2:C6))



What catagory should I post this type of request in?  Do I need to re-post this?
[)ia6l0 iii replied to Alan Casey on 08-Feb-12 08:57 PM
Do not re-post. I have asked one of our Excel experts for help. They will respond.
Pichart Y. replied to Alan Casey on 09-Feb-12 09:11 AM
Hi Alan Casey,

Use this array function....{=SUM(((B2:B6)*(C2:C6))/COUNTIF(C2:C6,C2:C6))}

find this attachment...--->Fml_SumNonDuplicate.zip

Hope this help.

Pichart Y.
Alan Casey replied to Pichart Y. on 09-Feb-12 09:30 AM
Pichart,

Not all that familier with array formula, but I would like to learn.  The one issue here is that formula needs to key off the part number and not the duplicate unit cost and/or usage.  In the example below I believe gghh would not be captured in the formula {=SUM(((B2:B7)*(C2:C7))/COUNTIF(C2:C7,C2:C7))}.

part #   unit Cost usage
aabb    10 10
aabb    10 10
ccdd    20 20
ccdd    20 20
eeff    30 15
gghh 20 20

Got any more ideas.  Again thanks soooo much here.
Pichart Y. replied to Alan Casey on 09-Feb-12 10:58 AM
Hi Alan,

Sorry, I have made a mistake, actually the formula must be like this {=SUM(((B2:B7)*(C2:C7))/COUNTIF(A2:A7,A2:A7))}

to explain...
  • we sum the multiplied between the value in column B&C
  • Then divided by the column A which are nested in function countif( )
How does this formula work?
  • when we multiply the column B&C in Array function then we will get....
  • multiplied
    100
    100
    400
    400
    450
    400
  • Then with the countif in array we get...
  •  
    2
    2
    2
    2
    1
    1
    Then the multiplied column / countif column
  • 50
    50
    200
    200
    450
    400
  • Finally we sum them all =1350
I am not sure if I understand your question well, but it I am wrong please feel free to ask back again,...it will be great if you explain more with attachment...it will be more easy to understand.

Here is the attachment with some formula extraction in gray area ...hope this help.
------------>Fml_SumNonDuplicate2.zip

pichart Y.



pichart Y.

wally eye replied to Pichart Y. on 09-Feb-12 01:47 PM
I like your approach.  I was working on excluding the second value last night and could not get it to go.  This version doesn't exclude it, it just accommodates it.  Good job.
[)ia6l0 iii replied to Pichart Y. on 09-Feb-12 09:49 PM
Thanks Pichart.
Pichart Y. replied to [)ia6l0 iii on 10-Feb-12 08:22 AM
Hi Alan,

Now I got it...
This is really only sum the non duplicated row, if there are more than 1 row, the only the 1st row will be taken in account.
  • =SUM((ROW(A1:A6)=MATCH(A2:A7,A2:A7,0))*(B2:B7)*(C2:C7))
  • How does this work...
    • first, row(A1:A6) will generate the row of our data
    • Second, MATCH(A2:A7,A2:A7,0) will always generate the 1 matched row of each part#
    • Third, put equation to check if both arrays of date equal to each other.
    • **only the 1st row of each part will be true, because match function always gives the first matched row it found.
    • Forth, sum( this range * unit cost * usage)
    • ** only true will return the value, but false will be 0
    • Finally you get this {=SUM((ROW(A1:A6)=MATCH(A2:A7,A2:A7,0))*(B2:B7)*(C2:C7))}
  • Here is the attachment ---> Fml_SumNonDuplicate3.zip
Hope you like this.

Pichart Y.
Pichart Y. replied to wally eye on 10-Feb-12 08:24 AM
Hi Wally Eye,


Thank you for your comment...

Now I got it, This is really only sum the non duplicated row, if there are more than 1 row, only the 1st row will be taken in account.

I post my answer in Alan's below...

http://www.eggheadcafe.com/community/windows-7/61/10419264/need-help-with-excluding-duplicate-values.aspx 

Regards,

Pichart Y.