Excel VBA for Beginners – Use Excel VBA to Cleanse a Dataset
RATIONALE
This Excel VBA for Beginners series is aimed at Excel users just starting out with code, or those who have never coded, and who are curious about it. Too often, Excel users find themselves engaged in tricky manual tasks that take up time and create stress. Data cleansing is one such task. If you have ever gone through a list of data and looked for mistakes (in the spelling of people’s names, for example) then you have done a data cleansing task. It probably was not much fun! With a little coding knowledge, it is possible to dramatically speed up this kind of task, reduce the stress, and kick start your coding journey. Welcome to Excel VBA for Beginners!
The video series breaks this kind of data cleansing task down into three steps. In the first video, we tell Excel where our data is, and which cells to look at. It is possible to take advantage of Excel VBA’s collections to maximise efficiency at this stage. A collection refers to a group of objects that Excel knows belong together. For example, Excel understands all of the open workbooks as a ‘collection’, and all of the worksheets within a workbook. Using collections, we can say to Excel programatically ‘do something to everything in this group of things’.
This is exactly what we do. Using a For Each construct, we define a range of interest and establish a basic loop. It would be possible to define this loop in static terms by using notation such as range(“B3:B200”); if we took this approach, however, what would happen if data were added to the list? It would be discluded from the process. It is much better, therefore, to use a dynamic reference that could handle additions to the data. In the video, we create an Excel formula to measure the size of the dataset, and then reference its cell in the code. In doing so, we create a thing of beauty: worksheet formulae and VBA code working together in harmony. The result: awesome dynamic functionality.
The purpose of the exercise is to identify mistakes (for example, spelling mistakes in people’s names) in a dataset. In the next video in this Excel VBA for Beginners series, this is what we do. What worksheet formula might help us to do this? We have a list of data and a list of names, spelt correctly. The countif formula counts how many times the value in a cell appears in a specified range. So, if the name is spelt correctly, the countif formula would return a value of 1 – indicating that the value in the list of data appears in the name list, and that the name is spelt correctly. A mistake, on the other hand, would return a value of 0. Once again, we can reference the cell that contains the countif formula in the VBA routine. In this way, we establish a mechanism for flagging up errors.
By harnessing the power of the loop, we instruct Excel to repeat the process for each piece of data. Cool!
In the final two videos, we tell Excel what to do if a mistake is found. We explore two possible approaches: the first exits the code when a mistake is found, tells us where the mistake is, and allows us to fix it; the second loops through the whole data set, tells us how many mistakes were found, and highlights the mistakes in a distinctive colour. This kind of helpful user interaction will help you to maximise the impact of your coding skills.