Exploring Causal Threads: Counterfactuals, and Confounders

Awadelrahman M. A. Ahmed
7 min readMar 10, 2024

In this post, I would write about a question that holds immense importance in the business setting: the “What-if” question. This question has many use cases across various business environments. For instance, in the banking industry, concerns about churn rates prompt constant consideration of scenarios such as, “What if we launched this particular consumer loan program a week before Black Friday instead of waiting for Christmas?” Similarly, in the retail sector, questions like, “What if we advertised our products to this specific age group instead of that one? Could we have attracted more sales?” arise frequently.

The “What-if” question is a common and significant inquiry. Based on its answer, businesses can make better decisions. However, let’s simplify it and make it personal, as doing so aids in remembering and understanding the concepts.

Should I work from home today?

In this post-pandemic era, with the flexibility to choose between working from home or going to the office, many of us face this decision on a daily basis. And over the course of a year, this decision accumulates to a certain percentage of time spent working at the office and a certain percentage spent working from home.

Now, suppose that you work in a company that has implemented what they term “bi-monthly performance-based raises,” wherein bi-monthly performance evaluations facilitate more frequent assessments of employee performance. So, every two months, you get some raise. If an employee consistently meets or exceeds expectations during these evaluations, they may become eligible for a raise.

This evaluation round, suppose you are expecting a raise of 10% based on your own evaluation and expectations, but you only received 2%. Wow, that is significantly lower than you expected!

Because your company might not have this wonderful transparent feedback loop, you started thinking on your own, trying to figure out what is the cause of that! And a very natural question jumps to your mind:

“What would I have done differently last round that would have qualified me to get the 10% salary increase?”

Seems like an important question, right? A very important thing that you notice here: this question is asking about some past actions that led to this present result! In other words, once you ask yourself this question, you imagine a parallel universe which is completely identical to the current universe, except for the outcome — that is, you got your desired 10% raise and also the other specific reasons led to this outcome but everything else is the same! That is counterfactual! It refers to imagining or considering alternatives to past events or decisions, particularly in evaluating their potential outcomes or consequences.

Specifically, you started wondering about one variable that nagged you the whole year! And that is: what if I showed up more often during the last round? In fact, you are questioning the system. If the system is biased toward those who show up regularly, they have the chance to make good impressions compared to the ones who work from home, even though it is stated very clearly that it is flexible to work from home or the office! So your question is:

Would I have gotten a 10% raise had I shown up at the office 100% of the time last round?

This is what you can see in the following chart. What happened and what you observed are on the left green side, but what you really want to know is about the right red side.

Well… Let me see my data!

You are a data scientist! And you want to test that! You are also a nerdy one and you have collected your raise and your show up at the office for more than 5 years!

Your data is that if you showed up at the office more than 50% of the time during that two-month period, then you consider that as a 1, and if you showed up less than 50% to the office and worked mostly from home, that will be a 0. So you ended up with 30 data points (6 data points per year for 5 years), and you just did this association plot below!

Surprisingly, you discovered that out of 16 instances when you did not show up at the office, 15 times (more than 93%) you did not receive a raise, while only once did you receive it.

Conversely, out of 14 occasions when you showed up at the office, 11 times (about 79%) you received the raise, with only 3 times not resulting in a raise.

Additionally, you calculated the correlation coefficient and found it to be 0.74! This positive correlation indicates that the more you show up at the office, the more likely you are to receive a raise!

Now, this analysis makes it very obvious and confident on what to do in order to get the raise next time!

That decision is to show up more in the office and work less from home. That even makes sense to you as you might think your boss is somehow impressionistic with some subjective approach to evaluate employees.

Two months later… Even four months later…

So, you decided immediately to go every day, say hi to your boss, eat lunch with him, all in hopes of getting the raise and paying your bills! And you think you are the smart guy because you did your data analysis homework!!

But guess what! You did not get the raise! You did the same in the next two months, and you ended up coming to the office every day for four months in a row, thinking you will get your desired outcome soon! But NO, nothing happened!! And you started wondering, what is going on?! WHY does that not work!!

Why it didn’t work

It didn’t work because of the concept of confounding !! Yes, there is a confounder in the room!

Let me show you what happens: In fact, you work as a consultant, and a big part of your job is to meet clients during the pre-project phases. So, you attend meetings at the office to meet these clients when there are potential projects.

YES, your presence at the office is mostly in situations where there are numerous potential tasks. Additionally, when there are many potential tasks, it is more likely that you will be assigned some of them. Then, depending on how you perform on those tasks assigned to you, you receive the raise! We can observe these relationships between these variables in the following figure.

It’s not at all that showing up at the office affects your salary increase; rather, it’s the number of assigned tasks to you and how you perform on them. This is also influenced by the number of potential tasks and hence the market. And guess what… your presence at the office has nothing to do with your raise; it is also influenced by the number of potential tasks, as you show up when you are needed!!

In fact, this is the causal relationship based on which I generated the data we plotted earlier, and I used this piece of code:

class RaiseSCM:
def __init__(self, random_seed=None):
self.random_seed = random_seed
self.u = stats.uniform()
def sample(self, sample_size=50):
if self.random_seed:
np.random.seed(self.random_seed)
u = self.u.rvs(sample_size)
x = (0.1*u) > 0.05
v = (0.5*u) > 0.25
x_random_indices = np.random.choice(len(x), size=2, replace=False)
v_random_indices = np.random.choice(len(v), size=2, replace=False)
v[v_random_indices] = ~v[v_random_indices]
x[x_random_indices] = ~x[x_random_indices]
y= v > .5
return u, x, v, y
scm = RaiseSCM(random_seed=1)
total_tasks_u, show_at_office_x, assined_tasks_v, get_raise_y = scm.sample(30)
df = pd.DataFrame({'showed_up': show_at_office_x.astype(int), 'got_raise': get_raise_y.astype(int)})
import seaborn as sns
import matplotlib.pyplot as plt

cross_tab = pd.crosstab(df['got_raise'], df['showed_up'])

sns.heatmap(cross_tab, annot=True, cmap="YlGnBu", fmt="d")
plt.xlabel('got_raise')
plt.ylabel('showed_up')
plt.title('One-to-One Mapping between got_raise and showed_up')
plt.show()

What is the Correct Decision then?

The better decision is to focus on influencing the number of potential tasks available. This could involve actively participating in sales activities or other initiatives that directly contribute to generating more projects for the company. However, simply showing up at the office is not the correct action to take, as it does not impact the availability of tasks or your likelihood of receiving a raise. Instead, prioritize actions that directly contribute to increasing the number of potential tasks.

Differentiating Decision-Making from Prediction

In conclusion, this post underscores the critical distinction between prediction and decision-making, emphasizing the nuanced role of counterfactuals and confounders in shaping outcomes.

While prediction relies on extrapolating patterns from data to anticipate future events, decision-making demands a deeper understanding of causal relationships and the ability to navigate counterfactual scenarios.

By exploring the impact of confounders, we recognize the complexity of real-world situations and the need for careful consideration of multiple factors in strategic decision-making.

Ultimately, embracing a causal perspective empowers business to make informed decisions, anticipate potential pitfalls, and chart a course towards desired outcomes in an uncertain and dynamic business landscape.

--

--