Processing a whole directory of files in Golang using channels and go routines

In many situations, you won’t just want to pick up one file from a directory for processing, you”ll want to pick lots of files up (perhaps log files from a system, for example). In this article, I will show you a script to do just that.

The first thing we do is the usual boilerplate stuff – importing libraries.

Below, I have included the full code. Let’s walk through it line by line. Let’s start with the main function:

  • The first thing we do is create an empty slice of strings called files.
  • Next, we define the root directory that we want to check. If there are files there, we want to process them.
  • We then do a filepath walk which returns the name of all the files in that root directory path.
  • Next, we define a channel
  • We then define a loop which iterates over the list of files in our directory and calls the process_file function for each. Note, it says ‘go’ before the function & we also pass in the channel as a function argument. This enables us to utilize the built in concurrency of Golang. I’ve discussed that in a previous article.
  • We then create a listener. As we discussed in the previous article on go routines, when a routine is spawned, the main function often exists as it’s executed all of its code. To make sure it doesn’t exit and that the go routine completes, we put a listener out, which waits for the right number of responses from our channel, before the main function is able to complete and the application is able to exit. The len(files) defines how many files we are waiting on and hence how many we need to listen for.

Now, let’s look at the process file function:

  • The process file function:
    • reads in the CSV and converts the byte slice to a string
    • prints the literal string including hidden characters. In this case, I had a problem with unexpected characters, which simply present themselves as a space when printed normally.
    • I then complete a string replace to get rid of those characters – these are often added by editors when we create the CSV without us knowing
    • I then remove the file from the directory, so it’s never processed again.
    • Finally, the routine reports back to the channel that the file has been processed.

The full code: