Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


CBMS Deduplicator

No description

on 21 October 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of CBMS Deduplicator

CBMS Deduplicator
CBMS Deduplicator
CBMS deduplicator is a simple tool developed to help you and LGUs pinpoint the duplicates
Steps in using the CBMS Deduplicator
1. Install the file "vcredist_x86.exe"
The installer was developed by the CBMS INCT and is shared in your training kits/USBs
2. Install the deduplicator tool "CBMSDeduplicator.exe"
4. Select the file that will be checked for duplicates by doing the following:

Community-based Monitoring System (CBMS) International Network
DLSU Angelo King Institute for Economic and Business Studies
10th Floor Angelo King International Center
Estrada corner Arellano Avenue, Malate, Manila, Philippines 1004
Tel. No.: (632) 5262067 or 2305100 loc. 2461
Fax Number: (632) 5262067
Website: http://pep-net.org
Email: cbms.network@gmail.com
Facebook Page: http://facebook.com/CBMSNetwork

Thank you!!!
3. Once installed, launch the program by clicking on the shortcut icon in the desktop or by going to
> All Programs > CBMS Deduplicator
4.1. Click
from the main menu, then click
4.2. Find the path where you saved your extracted .can file. In this example, it is saved in
C:\CBMSDatabase\Tanauan City
4.3. Select the file
and click
5. The CBMS Deduplicator will display information on the duplicates in two lists:
5a. List of HH cases with identical ID variables
This lists the duplicates detected in terms of
brgy, purok, hcn
but usually has different
respondent's name
and other contents
5b. List of HH cases with identical ID data contents for most variables
This list contains matched households with identical ID data contents for most variables. It shows the main ids and highlights the difference between the data.
List of HH cases with identical ID variables
The first list below shows the mainids (in dup.ids column) with duplicate
brgy, purok, hcn
. The duplicated
brgy, purok, hcn
can be viewed by opening the 'main.csv' file

- shows where the duplicates are in the 'main.csv' file
- these are the mainids of the duplicated brgy, purok, hcn
Description of the columns:

- the first main.id in the in the 'main.csv' file detected to have a duplicate brgy, purok, hcn
- the main row wherein the first duplicate can be found
To check if they are really duplicates:
i. Open the 'main.csv' file
ii. Find (CTRL + F) the first mainid to have a duplicate brgy, purok, hcn
iii. Or check the row based from the list displayed in the Deduplicator

Note: Add 2 in the identified row/s in the Deduplicator. From the example, the row is 285, + 2 = 287th row in the main.csv file
To check if they are really duplicates:
v. By doing this, you would notice that they are indeed duplicates and would need a
new hcn
to remove the duplicates
iv. Highlight all the observation with duplicates and check for the brgy, purok, hcn, respondent and other contents
6. Assign
new hcns
by going to the Deduplicator and double-clicking the
column. HCN window will be displayed:
7. Enter [] if the hcn will be retained while enter a new hcn if it will be edited
Guidelines in assigning new hcn:
i. Ensure that the new hcn to be assigned will not be used by the enumerators, if it is an ongoing data collection
ii. Ensure that you are going to assign a new hcn not yet used based from the main.csv file
iii. Check the last hcn used from the main.csv file by sorting the hcn column from smallest to largest
In the main.csv file, highlight the
in the menu and click the icon with AZ as displayed below:
Expand the selection
Browse the file until the end where you can find the last hcn used
Make this as basis for assigning new HCNs. Also consider if the data collection is ongoing, then assign a higher hcn so that it will not overlap
9. The newly entered/assigned hcn which will replace the old one will be shown in the
8. Enter the new hcns in the HCN window then click
10. Repeat the steps for the remaining duplicates in the first list
11. To save, click
File > Save HCN Script list
12. Use this as filename:
13. Send this file to the CBMS Network and wait for an email if the changes have already been implemented in your LGU's .can file
List of HH cases with identical ID and data contents for most variables
1. Double-click the first row from the second list
2. The duplicates window will pop-out showing two data. The discrepancy between the data will be highlighted in red
The figure below shows the similarities between the two data
3. Figure out which of the duplicates will be deleted by comparing the date and time the data was sent, by checking for the contents of the variables, by asking the field editor/enumerators, etc.
4. Highlight the column to be deleted and click save
5. Click
> Save duplicates list
6. Use the filename:
7. Send this file to cbms.network@gmail.com for implementation in the CBMS Portal and wait for an email if changes have already been implemented to the LGU's data
8. The clean .can file with no duplicates can now be exported/downloaded and can now be processed in StatSim
Prepared by CBMS Network Team
Full transcript