John Henry… For Spreadsheets?
We are in interesting times aren’t we? How many movies have been made cautioning us about AI, and yet here I am, contemplating how ChatGPT might take my job.
*Sigh….*
The Legend of John Henry
John Henry is an American legend about a black railroad hammerer that beat a steam powered machine in a competition of tunneling prowess during the time of the Industrial Revolution. This competition resulted in him losing his life.
Whether or not you believe John Henry actually existed, his story represents the sacrifices that workers make to ensure that capitalists don’t immediately move on to the next, less expensive, option to make their profit, leaving the workers that depend on their skills for their livelihood to starve. (The globalization of industry’s effect on Appalachia is a fantastic example; while I don’t have time to expand, this article speaks to this)
We can draw a parallel between John Henry’s story and the technological revolution that we are currently experiencing with Generative AI by examining Google Sheets of all things. As ChatGPT gets better and better at coding- Google Sheets formulas in this particular case- it makes me think… Where do I stand? How do I measure up? Is my job on the chopping block?
A Flawed Analogy
Comparing myself to a legend like John Henry is… pretentious. If I had to drive railroad spikes with nothing but a hammer all day, my skinny ass would literally die.
There are other reasons why this analogy doesn’t quite work. There is no way that I could keep up with AI in formula creation. Even if I could, I- definitely… probably… hopefully…?- would not die in the process. The answer of who would win in a 1 off competition is clear, and it’s ChatGPT. What makes this a compelling comparison is not the metric of speed. This is a bit of a more systemic conversation. If John Henry represents workers in general in his story, allow me to, for a moment, stand in that place as well. However, the metric is not speed but rather, who does the job better, me or ChatGPT? Better… Let’s think about that word.
Am I on the chopping block?
So, for now, let’s ignore the environmental impact that AI’s immense energy consumption has, as well as the ethical/legal dilemma of who actually owns the input and output of Generative AI, and instead focus on the question posed: who does the job better, me or ChatGPT?
See, I have always said, and will continue to say, that the amount of technical jobs that artificial intelligence will actually take is a lot slimmer than we are being led to believe. (This article does some work to break down why) Make no mistake, though, capitalists WILL try. However, when it comes to my case specifically; I mean… yeah, my job is on the chopping block. Google Sheets formulas, especially those for transforming data, are typically more one-off solutions that are easier to ask ChatGPT for compared to longer and oftentimes more complicated code.
So if my job is on the chopping block, that means I’m on the chopping block, as I kinda need a job to feed myself. So let’s find what I’m testing myself against. For this, I have a friend that has adapted to using ChatGPT in their technical work: making spreadsheet dashboards.
Explanations for the lay person!
Now this blog is going to get very technical very quickly. Therefore, it's worth spending some time defining things, and setting up frameworks for understanding. I’ll do this by breaking down what I do, and in doing so, explain why this is so relevant to me.
I have experience, and work as a Salesforce Administrator and Consultant. Salesforce is a type of database. A database is a structured way to store data. Think of it like a digital filing cabinet with lots of drawers, and file folders. Salesforce’s business model is allowing companies and organizations to purchase access to the database infrastructure (the way that the data is stored), and storage space that Salesforce has available.
However, databases, much like filing cabinets, are inherently messy and complicated. It takes years of training, education, and experience to really wrap your head around managing them, let alone one as complicated as Salesforce. So oftentimes, organizations will hire people like me to do that. (You can too if you like :D Progitech Solutions Scheduling Page)
My job ends up intersecting with spreadsheets often because spreadsheets act as an excellent model for how data is stored inside the database. In fact, arguably, you could build a database in spreadsheets alone, though it would probably not be worth the time. Salesforce fields equate to spreadsheet columns, Salesforce records equate to spreadsheet rows, and points of data equate to individual cells in a spreadsheet.
Because of all of this, spreadsheets make for a very easy, and simple way to transfer data. If you are trying to move data from one database to Salesforce, or vice versa, and you don’t have a direct integration; then you’d export data into a spreadsheet format (Excel, or Google Sheets), clean/reformat the data, and then import that data into your database of choice.
In come spreadsheet formulas. Most spreadsheet programs will have what are called functions that you can use to manipulate the data in those spreadsheets. We’ve all probably used the =SUM() function. Instead of calculating your total revenue from a list of sales transactions by doing all that math yourself, Google Sheets, Excel, or whatever else you’re using will do that math for you.
But what happens when your coworkers start working in your spreadsheet? They add new columns like “Item Sold” or “Customer Satisfaction on Transaction.” Dealing with these changes is when spreadsheet formulas can start getting complicated.
All of that to say, spreadsheet formulas can range from extremely simple to extremely complicated (as we will soon get into), and the way that they can manipulate data can make the data cleaning/reformatting process, or account for changes in the source data much easier.
Spreadsheet formulas can make a LOT of things easier, and the time that can be saved by using them is invaluable. Hence why ChatGPT being able to make such complicated formulas is a bit threatening. If ChatGPT can spit out a custom-made formula based on your needs- which you can explain to Chat in plain language- in about 2 seconds, then that’s even more time saved along with the money saved by not having to hire someone like me. Or is it?
OK! Back to it!
Now allow me to be clear, this friend of mine- let’s call them Tyree- that I mentioned earlier absolutely has a technical background; we are alike in that way. We have both engaged in a technical career from a very young age- with myself being a little over one year older. The difference being that Tyree has invested far more time than me in understanding how to generate specific ChatGPT prompts. That paired with the technical background they need to understand, and interpret the responses that they get, puts them a bit ahead of me in the AI department. I would argue, however, that I am better at generating formulas without AI’s help, as I have adamantly refused to depend on that silicone-tongued devil for things that I know I can do myself. So with Tyree and I both having a technical background, could I effectively replace AI?
Using AI to solve complex Google Sheets Problems
Ok so here’s the situation, Tyree makes Google Sheets dashboards. This means they take data from spreadsheet tabs, and turn that data into a more visually appealing medium. Want to see how much money you made in Q-1, and all of your sales data is in spreadsheets? Tyree is the expert you need! (Or is it ChatGPT that you need? lol)
One of the issues that they solved was that if they use direct column/cell references, the source data may change as people interact with it, breaking the formula that they created for the visualization. For example, if someone adds a new column, the column that they’re referencing in the formula now becomes the column that was added, and breaks the logic. Therefore a solution would ensure that even if someone is working on the spreadsheet, and potentially moving columns around, copy-pasting data and the like, the dashboard formulas will not break. So instead of =SUM(C2:C), you want something like =SUM(Column Name: “Price”).
So here is the formula that Tyree had ChatGPT come up with:
=SUM(INDIRECT( "'DataSet'!" & ADDRESS(2, MATCH(E1, DataSet!$1:$1, 0)) & ":" & ADDRESS(ROWS(DataSet!R:R), MATCH(E1, DataSet!$1:$1, 0))))
And here is the formula that I came up with to counter ChatGPT’s:
=SUM(QUERY(DataSet!$A2:$CV, "SELECT" & VLOOKUP(E1, {{ARRAY_CONSTRAIN(TRANSPOSE(DataSet!$A$1:$CV$1), COLUMNS(DataSet!$A$1:$CV$1),1)}, {ARRAY_CONSTRAIN(ARRAYFORMULA( "Col" & COUNTIFS(ROW(A$2:A), "<=" & ROW(A$2:A))),COLUMNS(DataSet!$A$1:$CV$1),1)}} ,2,FALSE)))
Using INDIRECT() vs Using QUERY()
In effect, we are doing the same thing, just with different functions that have different technical implications. They both create a custom line of code based on the respective syntax- syntax being the rules that dictate how to structure and write valid code in a programming language- of each function. In the end, they both solve the problem of maintaining a Sum value from a datasource, even if the location, or information within the specific header of the data source changes.
So Which one is better?
So… obviously, mine is definitely way more complicated. I didn’t even explain half of the full formula, just the fundamental idea. I include far more functions which pulls on more processing power, and multiple references would make the sheet slow down and lag when handling large data sets… which Tyree does.
When I wrote this, I hadn’t taken the time to really understand Tyree’s, so I didn’t have much context for levels of complexity. But for me- outside of time brainstorming beforehand over the course of a few days, not because I was brainstorming about this specific problem, but because I just hadn’t decided to try to solve it yet- this took me probably about 20-30 minutes to write bones that worked, and about another 30 minutes to optimize; compared to the about 30 seconds it took ChatGPT to do.
So… my formula is more complex, and probably less efficient, and also took more time to make… on to the chopping block for me.
But since we’re talking about ChatGPT anyways, why don’t we let it explain what these two formulas do deeper:
Let’s Allow ChatGPT to explain Tyree’s formula
I won’t go into a full explanation here so as to not bog this blog down with too much technical talk. See Tyree’s Code: John Henry… for Spreadsheets
This is ChatGPT’s summary:
“Summary of What It Does
Finds the column corresponding to the header in E1.
Creates a range from the 2nd row in that column down to the last row.
Sums up all the values in that range.
In short, this formula dynamically identifies and sums a vertical column of values in the DataSet sheet, where the column is chosen based on the header specified in E1."
-ChatGPT
____________________________________________________________________________________________________________
So About INDIRECT()
ChatGPT says this about the INDIRECT() formula:
“...If your dataset is large, this can get slow because INDIRECT recalculates every time a cell in the spreadsheet is changed, affecting performance.”
-ChatGPT
I didn’t know that the INDIRECT() function recalculated upon any update in Google Sheets. This means that each time ANY place within the sheet gets updated, the formula will recalculate. Therefore making the formula worse for large data sets. Interestingly, Tyree mentioned their dashboard spreadsheet slowing down and lagging; this would explain why.
I wasn’t expecting to learn something! But either way, if that’s the case, maybe I’m off the chopping block right! Since, actually, my formula is theoretically more efficient? Well let’s find out.
Now let’s get an explanation of mine
But first! Let’s take a break. That was a lot of writing, and certainly, I’m sure, a lot of reading. Next week, we’ll explore my formula, and see if we can come to a conclusion of which one is better. Who becomes the victor? Me, or the machine??!