Merging knowledge from a number of sources into a brand new ecommerce or customer-management platform could be complicated. It’s doable, nevertheless, with cautious planning and execution.
Migrating to a brand new buyer administration platform could require merging a number of knowledge sources or including a brand new supply. On this submit, I’ll overview methods to merge a number of sources right into a single database — with out errors or duplicates.
Merging Information from A number of Sources
1. Obtain all knowledge from every supply. When combining a number of knowledge sources, create a brand new discipline with the title “Supply.” This can allow you to determine the place the report got here from, which might help in eradicating duplicates.
2. Mix all knowledge sources into one listing. If in case you have fewer than 1 million mixed rows, you are able to do it in Excel. A few suggestions:
- Be sure that the date fields are the identical. If Excel just isn’t formatting your date fields accurately, chances are you’ll want to make use of the “Import” perform at File > Import and choose the proper date sort earlier than merging the information.
- Map to fields within the new platform. In case your new platform doesn’t comprise a discipline that exists in a number of of your sources, add it as a customized discipline or mix it with different fields or as a observe in order to not lose any knowledge.
3. Determine duplicates. Duplicates throughout the identical supply or between sources are probably. Use Excel’s Conditional Formatting at Format > Conditional Formatting to determine duplicates in the identical column by highlighting that column. In my expertise, fields that always have duplicates are:
- E mail handle,
- Cellphone quantity,
- Bodily handle,
- Final title,
- Firm title,
- Web site URL.
Use fuzzy logic to seek out information which might be the identical however have barely totally different spelling, which conditional formatting is not going to catch. For instance, ABC Company may additionally seem as ABC Corp. For fuzzy logic, use machine-learning packages or rent an expert de-duping firm. Alternatively, type the information and manually overview line by line.
4. Merge duplicates by figuring out the surviving report. Use knowledge completeness, supply of knowledge, final modified date, and different standards to determine which report to maneuver ahead. Then:
- Paste lacking info to a report. Determine information with lacking fields after which copy and paste (or append) that discipline from a report that might be deleted.
- Resolve conflicting information for a similar discipline. For instance, you will have two totally different telephone numbers, electronic mail addresses, or bodily addresses. Mark these fields for validation earlier than deleting one of many information.
5. Confirm and validate all fields. Many third-party suppliers can confirm and validate knowledge for a price. To cut back the expense, take into account verifying and validating solely a portion of your database. For instance, validate solely electronic mail addresses which have just lately bounced or bodily addresses that have been final up to date a number of years in the past.
6. Standardize the information. Your knowledge ought to match the fields in your new platform and, additionally, needs to be constant whatever the supply. For instance, one supply may use the 2-digit code of “NY” and one other may use the complete spelling of “New York.” Make every discipline the identical format. Vital knowledge fields to standardized are:
- Cellphone numbers. For companies working internationally, telephone numbers are difficult because the variety of digits varies based mostly on the nation.
- ZIP codes. Some knowledge sources use ZIP+4. Excel removes a 0 in entrance of a ZIP code. For instance, Excel shops “000154” as “154.” Thus, make certain ZIP codes are textual content fields in Excel. Add zeros that Excel could have deleted within the course of.
- Dates. Once more, formatting dates is a standard drawback. For instance, some sources use “mm-dd-yyyy.” Others use “dd-mm-yy” or one thing completely totally different. Be sure that the date format is constant amongst knowledge sources and, additionally, along with your new platform.
- Textual content. Typical textual content fields to standardize embrace state, nation, and private {and professional} titles. Manually reviewing row by row could be time-consuming. A faster strategy is to create a listing of all variations for that one discipline (use a pivot desk or copy and de-dupe in a special sheet) and add a column with the proper standardized worth. Use Excel’s VLOOKUP perform to match the unique worth to the brand new standardized model in a brand new column.
Last Overview
When you full the above steps, conduct a last overview to make sure the information is able to add. Look carefully at discipline codecs. Overview the variety of information. Does the quantity make sense? Did you de-dupe? Is the general file format, similar to .csv, suitable along with your new platform? Add just a few information to the brand new platform to substantiate the method and the accuracy. Then add your complete file.