Recently I had to build a static site generator using Node. My build script traversed, sorted and filtered files, and run them through my templating system. For this task, I chose an npm package called walk, which is a port of Python’s os.walk
. In this post, I give a bit more detail on how to get started with node-walk.
For this example, we will be playing with a file hierarchy that has both files and several layers of folders. Also, I want to be able to control what files and folders to include in the traversal.
Here is a file hierarchy I created for this example (the files we’ll be operating on are in the files folder):
Initial Setup
- In your terminal, navigate to your working directory (I am using
node-walk-test
in this example). If you are starting from scratch, runnpm init
to create package.json:cd path/to/dir npm init
- Choose your main file. In this example, I am using main.js.
- Install the walk package from npm:
npm install --save walk
- We will use two Node modules: fs (File System) and Path. They come with your Node installation, and to use them you just need to
require
them in your main.js:var fs = require("fs"); var path = require("path");
- Finally, we will require and initiate the
walk
object:var walk = require("walk"); var pathToFiles = "files"; var options = { followLinks: false } var walker = walk.walk(pathToFiles, options);
At this point, this is what your main.js file should contain:
"use strict"; var fs = require("fs"); var path = require("path"); var walk = require("walk"); var pathToFiles = "files"; var options = { followLinks: false } var walker = walk.walk(pathToFiles, options);
List all files and directories, recursively
While the walker “walks” your file system, it will fire a number of events. You can select what events you want to listen to.
The first event we will test is called names
. It will give us an array of strings containing names of all files and directories:
walker.on("names", function (root, nodeNamesArray) { console.log("Files & folders in the " + root + " folder: " + nodeNamesArray); });
Here is what the output of the above function will look like:
Files & folders in the files folder: .DS_Store,dir1,dir2,donotread.md,exclude,file1.md,file2.md,file3.md Files & folders in the files/exclude folder: exclude.md Files & folders in the files/dir2 folder: file1.md Files & folders in the files/dir1 folder: .DS_Store,file1.md,subdir1 Files & folders in the files/dir1/subdir1 folder: file1.md
As you can see, the walker traverses your target folder, prints the names of files and directories in that folder, then moves on to the subfolders, and repeats the process for each subfolder, recursively.
You can use it to sort or filter files before performing more costly file operations.
Get an array of directory stat objects
The next event we’ll look at is directories
. This event will be fired after the walker has processed all the files in the current folder. It will give you an array of stat
objects for all the directories in your target folder:
walker.on("directories", function (root, dirStatsArray, next) { console.log('Current directory root: ' + root); console.log(dirStatsArray); next(); });
In your callback, you will have access to 3 arguments: root
, dirStatsArray
and next
- root: path to the current directory
- dirStatsArray: array of
stat
objects. Each object has aname
andtype
attribute. In this case, thetype
will be ‘directory’. Here is an example of astat
object:{ dev: 16777221, mode: 16877, nlink: 3, uid: 501, gid: 20, rdev: 0, blksize: 4096, ino: 91318178, size: 102, blocks: 0, atime: Wed Feb 08 2017 09:29:38 GMT-0500 (EST), mtime: Wed Feb 08 2017 08:56:54 GMT-0500 (EST), ctime: Wed Feb 08 2017 08:56:54 GMT-0500 (EST), birthtime: Wed Feb 08 2017 08:56:45 GMT-0500 (EST), name: 'subdir1', type: 'directory' }
- next: callback function for the next iteration
Keep in mind that next()
will only be called on folders that include subfolders. In our example, the files folder has 3 subfolders: dir1, dir2 and exclude. However, only dir1 contains a child folder (subdir1). In this case, there will be only two iterations: first, the starting files folder, and second, the dir1 folder.
If you modify this array in any way – sort or remove a node, for example – you will affect the rest of the traversing. Use the names
event to get a list of all files and folders, modify it as necessary, and then proceed to perform file operations.
Get an array of file stat objects
files
event works in the same way as directories
, and will recursively traverse all folders in your target folder, and return an array of file stat
objects of type “file”:
walker.on("files", function (root, fileStatsArray, next) { console.log('Current directory root: ' + root); console.log(fileStatsArray); next(); });
This event will be fired after all the files in the current folder have been processed by the walker.
Read each file during traversal
Now, this is probably one of the most used events: file
. As the walker traverses the file system, it will fire a file
event every time it encounters a – you guessed it! – file:
walker.on("file", function (root, fileStats, next) { fs.readFile(path.join(root, fileStats.name), function () { console.log(fileStats.name); next(); }); });
Here you can use Node’s file operations to work on your file.
The output of the above example will look like this:
.DS_Store donotread.md file1.md file2.md file3.md exclude.md .DS_Store file1.md .DS_Store file1.md file1.md
As you can see, the walker reads all files, including hidden files. This may not be exactly what you want. You may want to tell walker what kind of files you want it to skip.
Filter out files you don’t want to process
Walker provides an option to skip directories you don’t want to process. You can add a filters
array containing a list of folders to exclude, like so:
var options = { followLinks: false, filters: ['exclude'] }
In the above example, I am telling walker to skip the exclude folder. Here is the outcome after this modification:
.DS_Store donotread.md file1.md file2.md file3.md .DS_Store file1.md .DS_Store file1.md file1.md
As you can see, the exclude.md file from the exclude folder is not on the list.
To filter out certain files or file types, you have several options:
- use the
names
event to generate a list of all files and folders, and apply a filter to that array - add a check in the
file
callback
Here is an example of the latter:
walker.on("file", function (root, fileStats, next) { // skip file names starting with '.' if (fileStats.name.substr(0, 1) === '.' ) { next(); return; } // skip file donotread.md if (fileStats.name === 'donotread.md' ) { next(); return; } fs.readFile(path.join(root, fileStats.name), function () { console.log(path.join(root, fileStats.name)); next(); }); });
Your implementation will depend on exactly what you want to filter out: you can use RegEx to apply rules.
The outcome of the above example:
files/file1.md files/file2.md files/file3.md files/dir2/file1.md files/dir1/file1.md files/dir1/subdir1/file1.md
As you can see, the hidden system files are gone, as well as donotread.md.
Do something when the walker has finished traversing
It can be useful to know then the walker has completed the ‘walk’. You can use the end
callback:
walker.on("end", function () { console.log("all done"); });
***
I hope these examples helped you get a feel of what walker has to offer. For more features, including how to run walker synchronously, head over to the official documentation page.