Why I Obsessed Over JavaScript Generators

So here’s how my obsession with JavaScript generators started…

A few days back, I was working on a task where I had to write a data-fix/migration script. At first, it looked easy and ran well for a few users, but then we had to run it for all users. That’s when it started taking more time and causing memory and CPU issues, as it had to fetch all users’ data and store it while processing.

I understood that to handle this efficiently, we needed to process data in batches. We considered a database-level approach using cursors, but after studying them more, I found out that they can also cause performance issues. I started searching for better approaches, and that’s when I discovered a solution with generators.

So, enough of the past. Let’s focus on the present.

Understanding the Foundations

Before diving into generators, we need to understand a few concepts like iterator and iterable. Let’s get started.

What are Iterators?

Simple definition: any object that follows the iterator protocol. This means any object which has a method called next() that returns an object containing two keys named value and done.

  • value refers to the current value for the corresponding iteration
  • done indicates whether the iteration is finished or not

Here’s an example:

 1const getUsers = {
 2  index: 0,
 3  next() {
 4    const users = ["Tejas", "Omkar", "Yash", "Manthan"];
 5    return {
 6      value: users.at(this.index++),
 7      done: this.index > users.length,
 8    };
 9  },
10};
11
12getUsers.next() // {value: 'Tejas', done: false}
13getUsers.next() // {value: 'Omkar', done: false}
14getUsers.next() // {value: 'Yash', done: false}
15getUsers.next() // {value: 'Manthan', done: false}
16getUsers.next() // {value: undefined, done: true}
17// ...            this still continues if called
18getUsers.next() // {value: undefined, done: true}

In the above snippet, you can see we’re indicating that iteration ends when the index is out of array bounds using the done key, but we’re still able to call the next() method. There’s no automatic termination here.

By the way, we can also have an infinite iterator like this:

 1const iAmInfinity = {
 2  index: 0,
 3  next() {
 4    return {
 5      value: this.index++,
 6      done: false,
 7    };
 8  },
 9};
10
11iAmInfinity.next() // {value: 0, done: false}
12iAmInfinity.next() // {value: 1, done: false}
13iAmInfinity.next() // {value: 2, done: false}
14iAmInfinity.next() // {value: 3, done: false}
15iAmInfinity.next() // {value: 4, done: false}
16// ... continues indefinitely

So, this is the iterator protocol. Not exciting, right? Same as functions and closures? But bear with me for the next 10 minutes—you’ll understand what I’m getting at.

What are Iterables?

Again, simple definition: any object that follows the iterable protocol. This means any object which has a method called [Symbol.iterator]() that returns an iterator.

Here’s an example:

 1const getUsers = {
 2  [Symbol.iterator]() {
 3    return {
 4      index: 0,
 5      next() {
 6        const users = ["Tejas", "Omkar", "Yash", "Manthan"];
 7        return {
 8          value: users.at(this.index++),
 9          done: this.index > users.length,
10        };
11      },
12    };
13  },
14};

Now you might ask: what’s the difference? Just some weird syntax change? But wait a minute…

You know what for...of takes as input to loop over? That’s right—an iterable!

So you can do the following with a custom-defined iterable:

1for (const user of getUsers) {
2  console.log(user); // "Tejas", "Omkar", "Yash", "Manthan"
3}

And since we can do similar things with arrays, objects, etc., this means I can also destructure it like:

1const [u1, u2, ...others] = getUsers;
2console.log(u1); // "Tejas"
3console.log(u2); // "Omkar"
4console.log(others); // ['Yash', 'Manthan']

Enter Generators

Now generators enter the scene. They are similar to iterators and iterables but with cleaner syntax.

 1function* getUsers() {
 2  yield "Tejas";
 3  yield "Omkar";
 4  yield "Yash";
 5  yield "Manthan";
 6}
 7
 8const getUsersGenerator = getUsers();
 9
10getUsersGenerator.next() // {value: 'Tejas', done: false}
11getUsersGenerator.next() // {value: 'Omkar', done: false}
12getUsersGenerator.next() // {value: 'Yash', done: false}
13getUsersGenerator.next() // {value: 'Manthan', done: false}
14getUsersGenerator.next() // {value: undefined, done: true}
15// ...            this continues if called
16getUsersGenerator.next() // {value: undefined, done: true}
17
18// We can also use them in for..of
19for (const user of getUsers()) {
20  console.log(user); // "Tejas", "Omkar", "Yash", "Manthan"
21}

As you can see, it’s the same as a normal function but with a * at the end of the function keyword, which acts as an identifier for generator functions. We also see a keyword called yield which acts as the return value for the current iteration. Using yield, we can start and pause our processing/execution.

Normal functions run from start to end, but using yield we can pause the function and restart its execution from the paused step. It continues working until it reaches the last yield statement or return statement.

Async Generators: The Real Magic

There are also Asynchronous Generators which handle async processing, and here’s how I used them to solve my use case:

 1async function* getUsersInBatch(batchSize = 10) {
 2  const usersCount = await getAllUsersCount();
 3  const totalBatches = Math.ceil(usersCount / batchSize);
 4
 5  for (let batchNumber = 0; batchNumber < totalBatches; batchNumber++) {
 6    const offset = batchNumber * batchSize;
 7    const limit = Math.min(batchSize, usersCount - offset);
 8
 9    const usersBatch = await getUsers(limit, offset);
10    
11    yield {
12      usersBatch,
13      batchNumber: batchNumber + 1,
14      totalBatches
15    };
16  }
17}
18
19// Usage
20for await (const { usersBatch, batchNumber, totalBatches } of getUsersInBatch()) {
21  await performDataFixForUsers(usersBatch); // some function for processing
22  console.log(`Completed ${batchNumber}/${totalBatches} batches`);
23}

This might look like a lot to digest, but let me explain how it works step by step:

  1. Initialize the generator: We call getUsersInBatch(), which returns a generator object
  2. Calculate batches: We first get the total count of users to determine batch count, then calculate the actual number of batches needed to fetch all users
  3. Process in batches: We use a loop to fetch users with limit and offset, and after fetching every batch, we yield that batch
  4. Pause and resume: The yield statement is inside a for loop, which means this function continues working until all batches are completed (all users are fetched)

Here’s the beautiful part: we pause execution after fetching every batch and give that batch to a function for processing only that particular batch. Once processing is finished, we restart execution—the batchNumber is incremented and we fetch a new batch based on the new limit and offset.

Also, this generator function is generic, meaning we can use it in multiple use cases where we need to fetch users in batches and perform some processing on them.

Why This Approach Won

This solution solved my original problem perfectly:

  • Memory efficient: Only one batch is loaded in memory at a time
  • Pausable execution: Processing can be paused and resumed
  • Clean code: Much more readable than manual pagination logic
  • Reusable: The generator can be used across different scenarios

So yes, that’s how I fell in love with generators.

See you in next blog post!