Causal Graph Neural Networks for Temporal Consistency in AI Predictions

September 12, 2025•9,589 words

Abstract

Artificial intelligence (AI) models that make sequential or time-dependent predictions often struggle with temporal consistency – the requirement that outputs vary smoothly and logically over time without erratic jumps. Abrupt or incoherent changes in predictions can degrade performance in applications from video analysis to time-series forecasting. This paper surveys recent advances in Causal Graph Neural Networks (GNNs) as a promising approach to enforce temporal consistency in AI predictions. We review how Graph Neural Networks model structured data and how their temporal extensions (Temporal GNNs) capture evolving dependencies over time. We then define temporal consistency in the context of AI predictions and discuss its importance, drawing examples from video generation and sequential decision tasks. Traditional sequence models (e.g. recurrent networks) and smoothing techniques can mitigate jitter, but often lack principled guarantees of consistency. We argue that integrating causal relationships into graph-based learning – by leveraging causal graphs, structural causal models, and interventions – yields more stable and interpretable predictions. We survey peer-reviewed studies where causal GNN frameworks have improved prediction reliability across domains: for example, causal graph-based deep learning outperforms conventional models in epidemiological forecasting by learning time-varying cause–effect relations*[1][2], and GNN-based video models propagate information between frames to eliminate flicker[[3]](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Video_Matting_via_Consistency-Regularized_Graph_Neural_Networks_ICCV_2021_paper.pdf#:~:text=need%20to%20guarantee%20that%3A%20,scale%20video%20matting). We discuss methodological approaches, including encoding known causal graphs in GNN architectures, discovering causal links from data (e.g. via transfer entropy or invariant prediction), and using *counterfactual reasoning** in GNN training to remove spurious correlations. Through in-depth examples – spanning traffic flow prediction, climate time-series forecasting, and vision – we illustrate how causal GNNs enforce that changes in outputs correspond to meaningful causes, thereby ensuring smoother temporal evolution. Finally, we address challenges such as learning causal graphs from limited data, handling non-stationary relationships, and scaling to large networks, and we outline open research directions. Causal Graph Neural Networks emerge as a powerful paradigm to achieve more temporally consistent, robust, and interpretable AI predictions by marrying the strengths of graph-based deep learning with the rigor of causal inference.

Introduction

AI systems are increasingly deployed in dynamic environments where predictions at one time influence or relate to future outputs. In such settings, it is crucial that the model’s behavior remain temporally consistent: predictions should evolve smoothly and logically over time, without unexplained oscillations or contradictions. Temporal consistency is important for user trust and downstream utility – for instance, an autonomous vehicle’s perception module should not rapidly flicker in its object detections from frame to frame, and a medical prognosis model should not output wildly different risk predictions on consecutive days absent any meaningful change in patient data. Unfortunately, many machine learning models, especially those trained to maximize instantaneous accuracy, can exhibit temporal incoherence when making sequential predictions. This problem has been observed across domains: video frame classifiers can produce jittery results that distract the human eye[3], and time-series forecasters may output erratic point estimates that violate the smooth trends observed in reality. Addressing this issue requires new methods that explicitly account for temporal relationships and enforce consistency constraints on model outputs.

In this paper, we explore Causal Graph Neural Networks (GNNs) as a framework to promote temporal consistency in AI predictions. Graph Neural Networks have risen to prominence as a powerful tool for modeling relational data, where entities (nodes) and their interactions (edges) can be represented as a graph. By applying neural message-passing on graphs, GNNs capture complex dependency structures that traditional vector-based models might miss. In recent years, researchers have extended GNNs to handle temporal or dynamic graphs – so-called Temporal Graph Neural Networks (TGNNs) – which can model how node states and edges evolve over time[4]. These TGNNs have achieved impressive empirical success in a variety of applications, including traffic forecasting, financial time-series analysis, social network dynamics, recommender systems, and climate modeling*[4][5]. The ability of TGNNs to learn *temporal dependencies along with spatial/relational patterns makes them natural candidates for tackling temporal inconsistency in predictions. However, as we will discuss, standard TGNNs trained purely on data may still suffer from inconsistencies if they latch onto spurious correlations or noise. This is where the integration of causal inference comes into play.

Causal inference provides a principled way to distinguish genuine cause–effect relationships from coincidental correlations. A causal graph (often a directed acyclic graph) encodes hypotheses about which variables directly influence others. By leveraging causal structure, an AI model can discern which changes in input features or latent factors should legitimately lead to changes in outputs, and which should not. Our central thesis is that embedding causal reasoning into graph-based neural networks can act as a regularizer that enforces temporal consistency. Intuitively, if a model knows the causal mechanisms of the system it is learning – i.e. which factors drive others over time – it will be less prone to making arbitrary fluctuations in its predictions that violate those mechanisms. Changes in the model’s output will tend to occur only in response to changes in a causal driver, not due to noise or incidental temporal patterns. Conversely, the model can maintain stable predictions when the underlying causes remain stable. Recent research indeed suggests a strong interplay between temporal information and causality: temporal data provides clues for discovering causal relationships, and in turn causal information facilitates more reliable future predictions*[6][7]. In other words, time and causality are deeply intertwined in sequential systems, and combining them can yield more *robust and consistent** predictive models.

The remainder of this paper is organized as follows. In Section 2, we provide background on Graph Neural Networks and their temporal extensions, and we define the notion of temporal consistency more formally, highlighting examples of inconsistency issues in AI predictions. Section 3 introduces the concept of causal graph neural networks, surveying how causality has been incorporated into GNN models. We review methodologies such as using known causal graphs in GNN architecture, learning causal graphs from data (e.g. via Granger causality or transfer entropy), and designing loss functions or regularizers that enforce causal constraints. Section 4 presents case studies and empirical evidence from peer-reviewed literature where causal GNNs improved temporal consistency: for example, in video analysis, a consistency-regularized GNN significantly reduced frame-wise prediction jitter[3]; in epidemiological forecasting, a causal GNN achieved lower error and more stable performance across changing conditions*[1][2]; in traffic flow prediction, attention-based GNNs capturing causal dynamics outperformed purely correlation-based models[[8]](https://arxiv.org/html/2311.14994#:~:text=dynamics%20using%20attention,temporal%20dependencies%20using%20local%20and). In Section 5, we discuss challenges and open problems, including the difficulty of causal discovery in high-dimensional time-series, handling non-stationary causal effects (where relationships change over time), computational complexity, and the need for proper evaluation metrics for temporal consistency. We also touch on how to evaluate temporally consistent predictions, noting that traditional i.i.d. metrics may fail to capture temporal irregularities[[9]](https://arxiv.org/html/2412.07273v2#:~:text=Theorem%C2%A03.1%20demonstrates%20that%20instance,predictions%20within%20the%20temporal%20process). Finally, Section 6 concludes with a summary of key insights and future research directions. By drawing together insights from graph neural networks, time-series analysis, and causal inference, this work aims to chart a path toward AI prediction systems that are *both accurate and temporally reliable**, a combination that is essential for high-stakes and real-world deployments.

Background: Graph Neural Networks and Temporal Consistency in Predictions

Graph Neural Networks and Temporal Graphs

Graph Neural Networks (GNNs) are deep learning models designed to operate on graph-structured data, where relationships between entities are as important as the attributes of the entities themselves. In a GNN, each node in the graph aggregates information from its neighbors through an iterative message-passing scheme, producing rich node representations that capture both local and global graph structure*[10][11]. This makes GNNs well-suited for problems where relational inductive bias is critical – for example, social network analysis, molecular property prediction, and knowledge graphs. Formally, a typical GNN layer updates a node $v$’s feature $\mathbf{h}v$ by combining it with messages $m$ from each neighbor $u$ (possibly using trainable weights). Repeated layers allow information to propagate and mix over multi-hop neighborhoods, enabling the network to learn complex patterns over the graph topology[[12]*](https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3#:~:text=spatio,also%20computationally%20scale%20up%20to).

While early GNN research focused on static graphs, many real-world graphs are dynamic, evolving over time with changing node states, edges forming or dissolving, and new nodes appearing. This has led to the development of Temporal Graph Neural Networks (TGNNs), which incorporate temporal dynamics into the graph learning process[4]. There are several paradigms for TGNNs. Some models discretize time into intervals and use a sequence of graph snapshots, employing recurrent neural networks (RNNs) or temporal convolution to propagate information across time in addition to space[13]. For example, an RNN-based TGNN might update node embeddings at each time step based on the previous embedding and messages from neighbors at the current time[14]. Other TGNNs maintain a continuous-time representation, where events (like edge appearances) trigger updates (these include methods like temporal point process models and attention mechanisms over event history). Regardless of implementation, the goal is for the GNN to learn temporal dependencies – how a graph’s state at time t influences states at future time t+Δ – on top of the graph’s structural dependencies.

TGNNs have demonstrated state-of-the-art performance in many sequential prediction tasks. For instance, in traffic flow forecasting, where the road network is naturally a graph and traffic conditions evolve over time, TGNN models such as temporal graph convolution networks and attention-based graph networks significantly outperform non-graph models*[4][5]. These models leverage the fact that traffic at one location affects downstream locations with some time lag, a relational dependency well captured by graph propagation. Similarly, in *financial transaction networks, TGNNs can model the spread of risks or shocks through an economic network over time[5]. In **social networks, temporal graph models capture how information or behaviors diffuse from person to person across the network, improving predictions of trends or outbreaks[5]. By capturing who influences whom and when, TGNNs inherently incorporate some notion of consistency – e.g. a shock in one part of the graph will reflect in connected parts with a learned delay, rather than each node fluctuating independently. However, it is important to note that standard TGNNs are still primarily predictive models trained to fit data; they do not by themselves guarantee adherence to causal or logical consistency constraints beyond what the data suggests.

One challenge that has been identified in recent work is that existing TGNNs, despite their success, can exhibit peculiar error patterns such as volatility clustering[15]. Volatility clustering means that periods of large prediction errors tend to be bunched in time – effectively, the model’s accuracy is temporally inconsistent, being good during some intervals and poor during others. A recent empirical study found that almost all existing TGNN models are prone to making temporally clustered errors even if their overall accuracy (measured by average precision or ROC) is high[16]. This insight suggests that typical training and evaluation of TGNNs might overlook when errors occur, focusing only on aggregate performance. New evaluation metrics have been proposed to address this, emphasizing the temporal distribution of errors and penalizing volatility clusters[9]. Those findings underline that achieving truly reliable temporal performance requires more than just capturing correlations in data; it needs robust mechanisms to avoid sudden lapses in predictive accuracy. This is a motivation for introducing causal knowledge, which can potentially prevent or mitigate such lapses by grounding the model’s predictions in stable relationships.

Defining Temporal Consistency

Before delving into causality, we clarify what we mean by temporal consistency in AI predictions. At a high level, a model’s predictions are temporally consistent if they change over time in a smooth, plausible manner, reflecting the actual evolution of the underlying system rather than artifacts of the model. The opposite of temporal consistency is a kind of temporal instability or jitter, where the outputs oscillate, jump, or contradict themselves from one time step to the next without a corresponding change in inputs or environment.

A concrete example comes from computer vision: imagine a video segmentation model that labels each pixel as foreground or background in each frame of a video. If the model is temporally inconsistent, an object might intermittently disappear and reappear in the predicted masks, or its label might flicker between classes across consecutive frames. Human observers notice such flicker immediately, as our visual system is highly sensitive to temporal inconsistencies[3]. In fact, in the context of video processing, temporal coherence is often considered as important as spatial accuracy of each frame[3]. Temporal consistency implies that if an object was present and unchanged between frame t and frame t+1, the model should output a consistent segmentation for it in both frames – any change in the output should correspond to a real change (e.g. the object moved or disappeared) and not to random model fluctuations. Formally, one could define a temporal consistency metric that penalizes the model’s output differences between successive time steps (this could be as simple as the mean squared difference between predictions at $t$ and $t+1$, or more sophisticated measures that account for known dynamics). In video matting (estimating foreground opacity in video frames), for instance, the MESSD_dt metric has been used to quantify temporal coherence, and models are evaluated on both per-frame accuracy and across-frame consistency*[17][18]*.

Another domain: time-series forecasting. Consider a model predicting daily temperature. If on Monday it predicts 20°C, on Tuesday 5°C, and on Wednesday 21°C, this would likely be deemed temporally inconsistent unless such a sharp cold spell truly happened. Real temperature changes tend to be gradual. A consistent forecasting model might incorporate a prior that temperatures don’t usually drop 15°C overnight barring a known weather front – it would need a strong causal reason (like a cold front variable) to make such a prediction. Temporal consistency here aligns with the concept of physical plausibility or continuity. In practical terms, many forecasting models enforce smoothness through regularization, such as penalizing second-order differences in the predicted sequence or using latent state space models that evolve smoothly.

Crucially, temporal consistency does not necessarily mean no change or flatness. A model can and should predict sharp changes when warranted – e.g. a sudden spike in network traffic due to a breaking news event, or an abrupt change in a patient’s vital signs if an intervention occurred. Rather, consistency means that changes in predictions should correspond to appropriate causes and should not occur in isolation. This perspective leads naturally to causal reasoning: if we can identify the causes of change, we can demand that predictions only change when their causes change. This is an underpinning idea of using causal models to enforce consistency.

In summary, temporal consistency in AI predictions refers to the smooth and coherent evolution of outputs over time, free from unnatural jumps. As Yin et al. (2023) put it in the context of video generation, “the changes in elements such as objects, scenes, lighting and motion should be smooth and coherent, without abrupt jumps or unnatural variations.”[19]. A lack of temporal consistency often manifests as flicker, jitter, or volatility in the predictions, which can degrade user experience and reliability. In the next section, we examine why conventional models struggle with this and how Graph Neural Networks, especially when augmented with causal logic, can help address the problem.

Limitations of Conventional Approaches and Motivation for Causality

Traditional sequence modeling techniques have some built-in mechanisms to handle temporal consistency, but they are not foolproof. For example, Recurrent Neural Networks (RNNs), including LSTMs and GRUs, maintain a hidden state that carries information over time. In theory, this should help smooth out predictions, since the state acts as memory and inertia. An RNN won’t forget the recent past instantly, so its outputs won’t change too abruptly unless forced by new evidence. However, in practice, RNNs and even Transformers can still produce inconsistent outputs if not explicitly trained or regularized to avoid it. They might oscillate between two plausible interpretations of the data in successive steps, especially in ambiguous scenarios (a phenomenon known as prediction chattering). Without an external constraint, the network might latch onto minor short-term patterns or noise, causing instabilities.

Another common approach to enforce temporal consistency is post-processing or regularization using techniques like optical flow or smoothing filters in video analysis. For instance, early video segmentation methods would take independent frame-wise predictions and then use optical flow to warp predictions from frame t to t+1, averaging them with the new prediction to smooth out differences[20]. While this can reduce flicker, it introduces a dependency on the quality of the optical flow estimation and typically cannot correct more subtle inconsistencies (if the model consistently mislabels something every other frame, optical flow averaging might just blur the error rather than eliminate it). Other methods simply add a penalty to the loss function for changes between consecutive outputs. This encourages smoother predictions but can also over-smooth and ignore legitimate rapid changes.

These approaches treat temporal consistency largely as a statistical property to enforce (smoothness), rather than deriving it from an understanding of the system’s dynamics. This is where causal modeling distinguishes itself. A causal model of a system aims to capture the true generating processes – how the state at time t causes the state at time t+1. If we had a perfect causal model, its predictions would be temporally consistent by construction, because it would only change output when a causal factor changes. While we rarely have a perfect model, we can inject pieces of causal reasoning into our AI systems to guide them. One way to do this is through structural causal models (SCMs) or causal graphs that define which variables influence others. By overlaying a causal graph onto a neural network’s architecture or training objective, we constrain the model to follow certain directional dependencies.

Recent studies underscore the value of causality for improving prediction stability. Cai et al. (2025) argue that traditional time-series models, including deep learning ones like RNNs and Transformers, “often fail to fully exploit causal relationships… constraining their generalization under distributional changes”[7]. In other words, a model that ignores causality may perform well on average but can break down when the temporal distribution shifts, because it might rely on spurious correlations that don’t hold under new conditions. By contrast, a model utilizing causal features is more invariant and stable as conditions evolve. This has direct implications for consistency: as an environment changes over time (say, due to seasonality or interventions), a causally-informed model will adapt its predictions in a stable way, whereas a non-causal model might exhibit sudden jumps or drops in accuracy.

Moreover, there is evidence that integrating temporal and causal information leads to better understanding of underlying dynamics. Temporal information (the ordering and co-evolution of variables) helps in discovering causal interactions, while causal information “can in turn facilitate the reliable prediction of future temporal states”[6]. This two-way street motivates methods that learn causal structure and predictive model jointly. For example, Lowe et al. (2022) learn to infer causal graphs from time-series data (amortized causal discovery) and use those graphs to improve predictions[21]. Such approaches explicitly aim to capture invariant causal mechanisms that govern the time evolution, thereby naturally enforcing consistency: the learned causal graph remains the backbone for predictions at all times, preventing the model from making inconsistent jumps that would violate the learned causal links.

In summary, while conventional deep sequential models and heuristic smoothing can partially address temporal consistency, they lack an explicit representation of why a prediction should or should not change over time. Causal Graph Neural Networks fill this gap by embedding a causal inductive bias into the model’s reasoning process. In the following sections, we delve deeper into how this is achieved, and we review concrete implementations and their outcomes in various domains.

Causal Graph Neural Networks: Integrating Causality into Temporal Modeling

Causality in Machine Learning: A Brief Overview

Causality, in the context of machine learning, refers to modeling the data-generating process in terms of cause and effect relationships rather than purely statistical associations. A classic representation is the Structural Causal Model (SCM), which consists of structural equations defining each variable as a function of its causal parents and an independent noise term. The structure of an SCM can be depicted as a directed graph (often a Directed Acyclic Graph, DAG) where arrows point from causes to effects. Unlike a mere Bayesian network that encodes conditional dependencies, a causal DAG implies interventional semantics: one can predict how intervening on a cause (forcing it to a certain value) will affect its descendants, using tools like do-calculus (Pearl’s do-operator).

Incorporating causality into machine learning models has several benefits. It enables reasoning about interventions and counterfactuals (e.g., “if X had not happened, would Y still have occurred?”), improves robustness to distribution shifts (because causal relations tend to remain invariant across different contexts, whereas spurious correlations do not), and often enhances interpretability (one can trace an outcome back to specific causal factors). In recent years, there has been a surge of interest in combining deep learning with causal inference, to get the best of both worlds – the flexibility and predictive power of deep nets with the reliability and insight of causal models*[22][23]*.

In sequential decision or prediction problems, a key causal concept is that of temporal causality: earlier events can be causes of later events (but not vice versa, respecting time order). This is naturally aligned with time-series analysis where, for example, Granger causality is used to test if one time-series is useful in forecasting another. Graph neural networks offer a convenient platform to represent causal relationships among multiple entities evolving in time. Each node can represent a variable or an entity, and directed edges can represent causal influence. If these edges and their weights can be learned or incorporated, the GNN can propagate effects through the network in a way that mirrors the true causal propagation in the system.

One straightforward way to create a Causal GNN is: if one has a known causal graph (say from domain knowledge or discovery algorithms), use that as the adjacency in a GNN model. Then the GNN’s message passing will only allow information flow along the causal directions. This was the approach taken by Han et al. (2024) in their AI-driven Bayesian causal framework for COVID-19 analysis: they built a graph where nodes were features like PM2.5 pollution, public interventions, and infection rates, and edges reflected hypothesized causal links (e.g., pollution → infection, interventions → infection). They then used a GNN encoder to estimate time-varying strengths of these causal links, followed by an RNN decoder to predict infection rates*[1][24]. The result was a model that *“identifies time-varying causal relationships... using a graph neural network”** and makes predictions based on those identified structures[1]. Notably, this model outperformed purely statistical baselines, highlighting that using causal structure improved predictive consistency and accuracy[2].

If a causal graph is not known a priori, another approach is to learn the causal graph concurrently with training the predictive model. Some works use differentiable relaxation of DAG constraints or regularizers to infer graph structure. Others use two-step procedures: first apply a causal discovery method (e.g., constraint-based or score-based algorithms) on the time-series data to get candidate causal links, then feed that graph into a GNN for prediction. An example is the work by Wang et al. (2023b) on energy load forecasting (RLF-MGNN model), where they used measures like transfer entropy to construct a graph of relationships between household electricity loads, then applied a multi-relational GNN to forecast future load[25]. By incorporating transfer entropy-based causal analysis, the model captured both linear and nonlinear influence between time series[25]. The result was improved forecasting accuracy, especially in capturing collective behavior across households, compared to models that treated each time series independently.

Architectural Elements of Causal GNNs for Temporal Data

A Causal Graph Neural Network for temporal consistency typically involves the following components:

Causal Graph Structure: This is either given or learned. Nodes represent variables (which could be sensor readings, features, or higher-level concepts), and directed edges represent hypothesized causal influence. For dynamic systems, the causal graph might be time-invariant or could change over time (time-varying graphs). In a time-varying causal graph, the presence or strength of an edge can evolve, which allows modeling non-stationary relationships (for example, a public health intervention might only have a causal effect during a certain period of an epidemic, and later its effect diminishes).
Graph Neural Network Encoder: The GNN uses the causal graph as its backbone for message passing. Essentially, at each time step, messages flow along the directed edges of the causal graph, updating node states. This means a node’s new state is some aggregation of its parents’ past states (causal parents) and possibly its own past state. Such an architecture inherently respects causality: information only flows from cause to effect, not vice versa, preventing the model from inadvertently using “leakage” of future information or descendant information when making predictions. By comparison, a standard TGNN might use an arbitrary learned adjacency or fully-connected attention, which could violate causal directions; a causal GNN restricts that connectivity to align with cause–effect directions.
Temporal Updating Mechanism: Since we are dealing with time, the GNN might integrate with a temporal model. Some designs treat the causal graph as a static template and use a recurrent GNN (where each GNN layer corresponds to a time step). Others explicitly model time within the GNN edges; for instance, each edge might have a function that operates on time delays (some works use attention over recent history along each edge). An example is the Dynamic Causal GCN (DCGCN) by Chen et al. (2023), which was proposed to handle spatio-temporal data by updating a causal graph at each time step and using it for forecasting[26]. In their case, they integrated the frontdoor adjustment criterion from causal theory to handle confounders in bike-sharing flow predictions[26]. The framework would adjust the influence of, say, weather on bike usage by accounting for confounding variables via an intermediate (frontdoor) node, then propagate these adjusted causal effects through the GCN. The result was more reliable long-term predictions that were not skewed by confounders, hence more consistent when policies or conditions changed.
Counterfactual or Intervention Module (optional): Some causal GNNs include mechanisms to perform do-interventions or simulate counterfactual scenarios internally. For example, a model might train by intervening on certain input nodes (setting them to certain values) and encouraging the output to change in a way consistent with known causal effects. This is essentially a form of data augmentation or regularization. In recommendation systems, a recently proposed causal GNN called CauTailReS (Causal Tailored Recommendation with reasoning) uses counterfactual reasoning to debias recommendations[27]. In that model, the GNN generates user–item interaction representations, and via do-calculus it removes the influence of popularity bias – effectively simulating a world where popularity of an item does not affect the recommendation. By training on these counterfactual scenarios, the recommender achieved more consistent and fair recommendations[27]. The “consistency embedding” learned in CauTailReS captures stable user interests unaffected by popularity, leading to recommendations that do not swing wildly with popularity trends[27]. This idea can be generalized: by injecting counterfactual logic into training (e.g., “if factor A were different, outcome should change in X way”), the model is guided to maintain consistency with respect to those causal rules.
Loss Functions and Regularization: Beyond architecture, causal GNNs often employ loss terms that explicitly target temporal consistency. One example is the temporal difference loss used by Maystre et al. (2025) for incremental sequence classification[28]. They introduced a temporal-consistency condition derived from reinforcement learning’s temporal difference idea: successive predictions should not contradict each other, and they formulated a loss to penalize violations of this condition. By training with this loss, their sequence classifier yielded more consistent predictions as a sequence unfolded[28]. In a causal GNN context, one might penalize the model if it produces an effect change without a corresponding cause change. This could be done by monitoring the latent states: for each causal edge $U \to V$, if $U$’s state hasn’t changed but $V$’s state has significantly, that could be penalized unless explained by another parent of $V$. Implementing such losses is non-trivial but conceptually aligns with enforcing causal consistency.

In practice, not every model will incorporate all these elements; the design depends on the problem. Some might use a known causal graph but not do counterfactual training; others might focus on discovery of a causal structure. What unites them under the banner of Causal Graph Neural Networks is that they use graph-based neural computation constrained or guided by causal principles, to improve the temporal logic of predictions.

How Causal GNNs Improve Temporal Consistency

The central question we address is: why and how do causal GNNs lead to more temporally consistent predictions? There are several intuitive and evidenced reasons:

Preventing Spurious Fluctuations: In a non-causal model, especially one that uses attention or fully connected layers, the model might pick up any correlation to inform its prediction, including transient ones. This can lead to prediction volatility when those correlations change. A causal GNN, by contrast, restricts influences to those that make structural sense. For example, if variable $X$ is not a cause of $Y$ in the graph, changes in $X$ won’t directly disturb $Y$’s prediction. This avoids situations where an unrelated fluctuation in $X$ inadvertently causes $Y$’s output to wiggle. Empirical evidence for this comes from a study by Han et al. (2024): they noted that graph-based models which “utilized the causal structure” had generally lower error and more stable predictions compared to regression models that treat everything as independent inputs[29]. The causal GNN could handle nonlinear relations and only propagate meaningful changes, achieving a significantly lower SMAPE error (0.204) than baselines[30]. The improvement was attributed to the model’s use of domain-specific causal regularization and time-varying graph adaptation[30] – essentially, it wasn’t fooled by coincidental alignments in the data that a black-box deep net might overfit.
Enforcing Change-Only-When-Cause-Changes: By its very nature, a causal model implies that effect nodes remain stable unless a parent node (cause) changes or an exogenous noise term triggers change. If the causal GNN has learned an accurate representation, it will tend to keep outputs steady until an input or a driving factor deviates. For instance, consider a causal GNN for weather where temperature is causally influenced by time of day and weather conditions. As long as those factors don’t shift dramatically, the model will keep predicting a gradually varying temperature (consistent with diurnal cycle). It won’t suddenly spike downward unless some causal input (like a cold front indicator) appears. In other words, the invariance of causal mechanisms across time leads to invariance in predictions unless a mechanism is activated by a new input. This idea is closely related to invariant risk minimization and causal stability in ML, which aim for models that rely on invariant (causal) features and thus remain stable across different environments or time periods[7]. A causal GNN naturally embodies this – it propagates invariant causal effects rather than ephemeral correlations, yielding predictions that remain consistent under temporal shifts or interventions.
Localized Error Correction via Graph Propagation: Graph neural networks have a unique ability to correct local errors by exchanging information with neighbors. In a video consistency GNN (like Wang et al.’s CRGNN for video matting), frames are nodes and edges connect adjacent frames[31]. If one frame’s prediction is an outlier (say, the model accidentally mislabeled a region), the GNN can “pull” information from neighboring frames (which likely have that region labeled correctly due to temporal continuity) and correct the anomaly[32]. In effect, the GNN enforces a consensus or smoothness among adjacent time steps, much like a Conditional Random Field (CRF) would, but learned end-to-end. Wang et al. (2021) demonstrated that their Consistency-Regularized GNN dramatically reduced flicker in video matting results by allowing frames to inform each other, compared to an image-by-image approach*[20][33]. The GNN treated each frame’s alpha matte as a node feature and built a fully-connected graph between a few neighboring frames, enabling the model to correct inconsistent predictions by leveraging the redundancy that objects don’t change too quickly from frame to frame[[31]*](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Video_Matting_via_Consistency-Regularized_Graph_Neural_Networks_ICCV_2021_paper.pdf#:~:text=temporal%20coherence%20by%20exploiting%20the,With%20the). The result was much smoother alpha mattes over time, confirming that graph-based propagation of information is a powerful way to enforce temporal coherence.
Time-Varying Causal Adaptation: Some causal GNNs explicitly handle the fact that causal relationships themselves might evolve (e.g., seasonal changes, regime shifts). By updating the graph structure or weights over time, the model can remain consistent within each regime. For example, a time-varying causal GNN was used to model COVID-19 waves, where the causal impact of interventions like lockdowns changed from the first wave to later waves*[34][24]. The model identified that school closures were most effective early on, whereas later public transport closures had more effect[[2]](https://www.nature.com/articles/s41599-024-04202-y?error=cookies_not_supported&code=e0e669d1-40ca-4d79-98b2-346c74b79edc#:~:text=based%20on%20the%20identified%20causal,be%20crucial%20for%20effective%20pandemic). By allowing causal link strengths to vary, the GNN produced predictions that were consistent with the *current causal regime and didn’t overshoot or lag when those regime changes happened. This adaptability means the model avoids inconsistency that would arise if it stuck with an outdated causal assumption. Han et al.’s framework encoded this via an encoder–decoder: the GNN encoder captured the current causal graph at each time step, and the RNN decoder used it for prediction[1]. Because the encoder was Bayesian, it could regularize sudden changes in the graph (temporal smoothing on the graph parameters), ensuring the causal structure itself changed gradually and only as needed. The outcome was a lower prediction error than static models, indicating better alignment with the true evolving system and thus more stable predictions*[30][35]*.

To put it succinctly, causal GNNs improve temporal consistency by aligning the model’s internal representation with the true causal structure of the temporal process. This alignment acts as a constraint that prevents arbitrary changes in outputs, focusing the model on meaningful changes. The benefit is twofold: the predictions become more reliable (as measured by error metrics) and more interpretable (one can trace which cause led to a given change in prediction). In domains like finance or healthcare, this is extremely valuable – not only is a stable prediction trajectory important, but understanding the trigger for a change (e.g., “X caused the risk level to rise”) adds trust and transparency.

Examples and Case Studies

Let us now look at specific peer-reviewed studies and projects that implemented causal GNN ideas to achieve temporal consistency, illustrating the concepts discussed.

Video Frame Consistency (Wang et al., 2021): As mentioned, Wang and colleagues proposed a model for video matting that uses a graph neural network to enforce temporal coherence across video frames[32]. Each video frame’s predicted alpha mask is a node in a graph, and edges connect each frame to its neighboring frames in time. They call their method CR-GNN (Consistency-Regularized GNN). During training, they also employ a consistency regularization trick: they composite frames with different backgrounds to ensure the model learns to maintain the same alpha matte even when appearance changes, thereby focusing on true object opacity which is invariant[33]. This can be seen as a causal intervention: the background is changed (intervened) but the foreground alpha should remain the same – enforcing the idea that background is not a cause of the alpha matte. By training the GNN with this principle, CR-GNN achieved superior temporal stability. Quantitatively, their method was evaluated with a temporal coherence metric and outperformed prior methods, showing dramatically reduced jitter*[17][36]. Qualitatively, as shown in their results, previous approaches would produce flickering in hair details frame-to-frame, whereas CR-GNN kept those details consistently tracked over time[[37]*](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Video_Matting_via_Consistency-Regularized_Graph_Neural_Networks_ICCV_2021_paper.pdf#:~:text=segmenting%20the%20foreground%20regions%20does,subtle%20details%20on%20the%20hairs). This example encapsulates how a graph (frames connected over time) plus a causal consistency idea (background change should not affect foreground prediction) yields improved temporal consistency in a vision task.
Brain Connectivity and fMRI (Wein et al., 2021): Wein et al. introduced a GNN framework for inferring causal interactions in brain networks using fMRI time-series data[10]. Here, brain regions are nodes, structural connections (from DTI scans) form the base graph, and the GNN learns functional interactions (influences of neural activity) on top of that structure. This is a form of causal GNN in that the structural brain network is taken as the causal skeleton, and the model tries to determine directed influences (effective connectivity) between regions*[12][38]. They compare the GNN’s ability to replicate observed neural activity sequences with that of a traditional Vector Autoregression (VAR) model used in Granger causality analysis[[11]](https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3#:~:text=causal%20connectivity%20strength,We%20conclude%20that%20the%20proposed). The GNN was found to capture long-term dependencies better and scale to large brain networks more feasibly than VAR[[39]](https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3#:~:text=evaluating%20its%20capabilities%20to%20replicate,novel%20perspective%20on%20the%20structure). By leveraging the known anatomical graph (causal prior) and the flexibility of deep learning, the GNN provided a more consistent explanation of brain activity over time, even generalizing across different scans and subjects[[40]](https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3#:~:text=,information%20flow%20in%20brain%20networks). Importantly, the features learned by the GNN were shown to correspond to meaningful causal connectivity patterns, aligning with known neuroscience, which adds confidence that the predictions are not only consistent but causally grounded[[41]*](https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3#:~:text=the%20analysis%20of%20large,information%20flow%20in%20brain%20networks). This case demonstrates how injecting a physical causal graph (brain structure) into a GNN leads to consistent tracking of a complex dynamical system (brain function) over time.
Epidemiological Forecasting (Han et al., 2024): In the context of COVID-19 infection rate prediction, Han and colleagues developed a Bayesian causal deep learning framework that explicitly models causes (like pollution and interventions) and effects (infection rates) with a graph neural network encoder[34]. The GNN part of their model learns the time-varying causal relationships – for example, how the effect of PM2.5 on infections might strengthen or weaken in different pandemic phases*[34][24]. The decoder, an RNN, then uses these learned causal links to make temporally coherent predictions of infection rates. Because the model knows which factors truly drive infection changes, it can adjust predictions when those factors change and otherwise keep predictions steady when they don’t. The empirical results were striking: their causal GNN-based model outperformed all compared baselines (including standard deep LSTM and transformer models) in terms of predictive error[[2]](https://www.nature.com/articles/s41599-024-04202-y?error=cookies_not_supported&code=e0e669d1-40ca-4d79-98b2-346c74b79edc#:~:text=based%20on%20the%20identified%20causal,be%20crucial%20for%20effective%20pandemic). Moreover, it yielded sensible interpretations – e.g., identifying that early on, school closures strongly curbed infections, whereas later, changes in public transport usage were more influential[[2]](https://www.nature.com/articles/s41599-024-04202-y?error=cookies_not_supported&code=e0e669d1-40ca-4d79-98b2-346c74b79edc#:~:text=based%20on%20the%20identified%20causal,be%20crucial%20for%20effective%20pandemic). This aligns with historical policy effectiveness and demonstrates temporal consistency: the model’s attributions and predictions changed over time in a gradual, explainable way (from one dominant cause to another), rather than abruptly or opaquely. It’s a prime example of how causality helps a model *“adapt based on pandemic phases”*, maintaining consistency with the evolving reality[[2]*](https://www.nature.com/articles/s41599-024-04202-y?error=cookies_not_supported&code=e0e669d1-40ca-4d79-98b2-346c74b79edc#:~:text=based%20on%20the%20identified%20causal,be%20crucial%20for%20effective%20pandemic). Without the causal component, a purely statistical model might have either missed the phase shift (thus making errors) or overreacted to short-term trends (causing inconsistency).
Traffic Flow Prediction: Traffic networks have been a fertile ground for graph-based deep learning, and researchers have begun infusing causal reasoning here as well. Gu and Deng (2022) proposed STAGCN (Spatial-Temporal Attention Graph Convolution Network) for traffic forecasting, which implicitly captures causal interactions by separating global and local traffic dynamics[8]. STAGCN’s architecture assumes that some traffic patterns are due to a global effect (e.g., overall rush hour trends) while others are local (accidents or events affecting a specific area). By attending to these separately, the model in effect disentangles different causal sources of traffic variation. It was noted that STAGCN assumes no direct interaction between static and dynamic parts of the graph, simplifying the causal structure[8]. Zhao et al. (2022) took a more explicit causal approach with STCGAT, a spatio-temporal causal graph attention network[42]. They introduced what they term “causal convolutions” to capture how local changes cause broader traffic effects, and they modeled both local and global dependencies with attention[8]. The addition of “causal” in the network’s design indicates that the model was crafted to pay attention only to relevant causes for predicting a target road’s traffic. Empirical results from these works show improved forecasting accuracy and temporal smoothness of predictions – for example, fewer sudden prediction errors when traffic flow transitions between states, thanks to the model understanding the causal lag (like how a congestion upstream causes a downstream slowdown after some delay). The key idea is that by learning the directional influence of traffic flows (which road segments influence others), these GNNs produce predictions that respect those dynamics and thus change in a consistent manner when, say, a congestion propagates. Without causal structure, a model might mis-predict timing or oscillate in its guess of where congestion will hit next; with it, the propagation is more accurately and consistently tracked[8]. These examples from traffic highlight how causal GNNs manage both spatial consistency (across the network) and temporal consistency (over time) by embedding transportation domain knowledge into their graph structure.
Climate and Environmental Systems: Climate systems often involve complex causal interactions (e.g., ocean temperature → weather patterns, or greenhouse gases → temperature trends). A recent work by Zhang et al. (2023) introduced ResGraphNet for global temperature forecasting, which uses GraphSAGE (a GNN variant) along with a residual network to capture causal relations between climate indicators[43]. Each climate variable (like temperature at different locations, ocean indices, etc.) is a node, and edges represent physical influence (like atmospheric or oceanic currents). They found that while their causal GNN (ResGraphNet) improved accuracy, it had higher training time cost than a simpler graph model[44]. This underscores a general trade-off: adding causal structure can increase model complexity. Nonetheless, the benefit was a more causally faithful representation of climate dynamics, leading to forecasts that were consistent with known phenomena (e.g., correctly predicting that if an El Niño event occurs, certain regional temperatures will rise with a lag). The consistency here is that the model’s predictions over a season changed in accordance with causal events (El Niño onset), rather than jittering based on short-term noise. Another study (Cai et al., 2025, introducing CReP) demonstrated on simulation and real datasets that explicitly separating cause-related and effect-related components of time-series leads to “robust forecasting accuracy and reliable causal insights.”*[45][46]*. Their model produced not just accurate predictions but ones that remain stable under interventions, meaning if you forcibly change a cause input in the model, the output changes accordingly and not otherwise. This property is exactly what one wants for temporal consistency: if no cause changes, the prediction should remain stable (no unexpected drift).

These case studies collectively show that causal GNNs provide tangible improvements in making AI predictions more temporally consistent and credible across diverse fields. From videos to traffic to epidemics, embedding the right causal graph or learning it on the fly helps the model avoid nonsensical fluctuations and adhere to the true temporal logic of the underlying system.

Challenges and Future Directions

While the fusion of causal inference and graph neural networks is promising for achieving temporal consistency, it also introduces new challenges and open questions. In this section, we discuss some of the key challenges, along with potential directions for future research.

Causal Graph Discovery and Validation

One fundamental challenge is discovering the causal graph itself (when not given by domain knowledge). Learning a DAG from data is notoriously difficult, especially in high dimensions and with temporal feedback loops. In time-series, algorithms like Granger causality, transfer entropy, or PCMCI (Peter and Clark Momentary Conditional Independence) can suggest candidate links, but they may suffer false positives/negatives due to limited data or nonstationarity*[47][48]. Causal GNN approaches that rely on a learned graph might be sensitive to errors in that graph. A spurious edge could allow undesired influence and compromise consistency; a missing edge could make the model ignorant of an important cause, leading to delayed or incorrect response in predictions. Ensuring the *reliability of learned causal structures** is thus crucial. One future direction is to integrate uncertainty estimation for causal links – for example, using Bayesian methods to obtain a distribution over possible graphs. Han et al. (2024) took a Bayesian approach in their encoder, which enabled them to express uncertainty in the causal strength of each link over time[34]. Expanding on this, an ensemble of causal GNNs or a posterior distribution over graphs could make the model more robust to any single graph’s error.

Validating causal graphs inferred by GNNs is another issue. It often requires external knowledge or experiments. One idea is to design experiments in simulation: use known simulation environments (like physics engines or epidemic simulators) where the true causal graph is known, and see if the GNN learns it. This can guide improvements in algorithms. Additionally, hybrid approaches that combine domain heuristics with data-driven discovery might strike a good balance. For instance, in traffic networks, one might know that only geographically connected roads can have causal influence (so you restrict edges to those, reducing search space) – then learn the strengths of those influences. Such informed initialization can improve learning and consistency.

Non-Stationarity and Temporal Variability

As noted, causal relationships can change over time. Handling non-stationary or evolving causal graphs is challenging because it adds a layer of complexity: the model must detect when a change has occurred (concept drift) and adjust accordingly. Time-varying causal GNNs (like the COVID example) show it’s feasible, but generally this may require a lot of data to distinguish a genuine change in causal mechanism from noise. Change-point detection methods or meta-learning could be applied: the model might have a meta-network that learns when to rewire the graph. This area is ripe for exploration – e.g., continual learning techniques for GNNs that update the causal graph incrementally as new data comes, without forgetting past important relations. Ensuring temporal consistency during and after a change is tricky; one doesn’t want the model to oscillate if the causal effect oscillates seasonally, for example. A promising direction is to impose smoothness in the graph parameter space (like a penalty on how fast edge weights can change over time) to avoid overly abrupt shifts[49]. Han et al. did something along these lines by using priors that encouraged gradual change in causal influence[35]. Future models might incorporate explicit constraints like “edge weight can’t change more than X per week” if justified by domain knowledge.

Computational Complexity

Adding causal structure and especially learning it can be computationally expensive. Some of the methods we discussed have higher training times (e.g., ResGraphNet’s training time was significantly higher than simpler models[44]). The use of attention mechanisms, recurrent layers, and the need to potentially search over graph structures can all contribute to complexity. For large graphs (many nodes), naive causal discovery is combinatorial. Scaling causal GNNs to very large systems (say, thousands of variables) is an open challenge. Approximate methods, such as using sparsity regularization to keep the learned graph sparse, or clustering variables into groups to reduce dimensionality, might be necessary. Another technique could be to use Graph Coarsening: learn causal relations at a coarse level (like clusters of nodes) and then refine within clusters. Also, parallelization and efficient graph algorithms can help – since GNNs often allow parallel message computations, one could distribute computations across time steps or subgraphs.

On the inference side, evaluating a causal GNN might also be slower if it requires simulating many interventions to produce an output distribution (in Bayesian settings). However, for many applications, the improved consistency might justify some overhead if it prevents critical errors. Still, for real-time systems like autonomous driving or streaming analytics, optimizing these models is important. Research into pruning causal models – identifying which causal links are truly essential for prediction – could lead to lighter models that run faster but retain consistency guarantees.

Evaluation Metrics for Temporal Consistency

Traditionally, model evaluation focuses on aggregate accuracy (MSE, MAE, etc. over time). As we’ve emphasized, these can miss temporal inconsistency issues[9]. There is a need for widely accepted metrics that specifically quantify temporal consistency or the stability of predictions. Some metrics have been domain-specific (e.g., the temporal coherence metric in video matting[17], or volatility cluster metrics in TGNN evaluation[9]). A challenge is to create a general metric that can be applied to any sequential prediction. One idea is to measure autocorrelation of the error: in a consistent model, errors should not cluster, so the error sequence ideally is uncorrelated over time (apart from any inherent pattern in the ground truth). A high autocorrelation in errors at lag 1 or 2 might indicate volatility clustering. Su and Wu (2024) proposed looking at the distribution of errors to identify volatility bursts*[50][9]. They even introduced a training objective to minimize such bursts (Volatility-Cluster-Aware learning)[[16]](https://arxiv.org/html/2412.07273v2#:~:text=objective%20to%20alleviate%20the%20clustering,corresponding%20author)[[51]](https://arxiv.org/html/2412.07273v2#:~:text=where%20functionality%20is%20ensured%20if,of%20volatility%20cluster%20in%20errors)*. Future evaluations could incorporate this by default: e.g., reporting the longest run of consecutive high errors, or the variance of errors within a sliding window, as a measure of consistency. For causal GNNs, another interesting evaluation is consistency under interventions: apply an intervention in time (like remove a cause suddenly) and see if the model’s output responds in a monotonic, expected way rather than erratically. This tests both causal correctness and temporal behavior.

Combining Model-Based and Learning-Based Approaches

An exciting direction is to blend model-based approaches (like differential equation models or physics-based simulations) with learning-based causal GNNs. For example, one could derive a graph of interactions from first principles (e.g., power grid connections, epidemiological compartment models) and then use a GNN to learn the magnitude and nonlinear details of those interactions. This hybrid can ensure basic consistency (the model won’t violate known physical causal relations) while still fitting complex data patterns. It might also allow the injection of domain-specific invariances – for instance, conservation laws or symmetries – which automatically enforce certain consistencies over time. Neural ODEs (ordinary differential equations) and GNN ODEs are a related area, where continuous-time models are learned. Ensuring those respect causality could yield continuous-time causal GNNs that are stable by design (since they’d integrate a possibly stiff but stable ODE).

Interpretability and Trust

One of the selling points of causal models is interpretability. Causal GNNs often yield interpretable parameters (e.g., an edge weight reflecting a causal influence strength). Exploiting this could build user trust in model predictions: users are more likely to trust a sequence of predictions if they understand why the model is changing its output. Providing a causal explanation (like “The model predicts a spike tomorrow because variable X, known to cause Y, increased significantly today”) can justify the prediction sequence. Future work could integrate explanation methods with causal GNNs, perhaps translating the graph and node states into natural language or human-understandable rules. This goes hand in hand with consistency: a model that can explain its changes is typically one that won’t make inexplicable changes. In safety-critical applications, having this clarity is paramount.

Application to Decision Making and Control

Thus far we focused on prediction, but many AI systems loop predictions into decisions (e.g., in control systems or reinforcement learning). Temporal consistency is equally important there (often termed stability or smooth control). Causal GNNs could be used in model-based RL to ensure the agent’s value function or policy doesn’t oscillate unpredictably. By modeling the environment with a causal GNN, an agent could simulate interventions and see stable outcomes, leading to more stable policies. This is speculative but an interesting frontier: causal reinforcement learning with graph-based world models that maintain consistent dynamics across training and deployment.

In conclusion, while causal graph neural networks offer clear advantages for temporal consistency, realizing their full potential requires overcoming issues in graph learning, computational tractability, and evaluation. The field is rapidly evolving. The intersection of causality and deep learning – sometimes dubbed “causal representation learning” – is addressing some of these challenges by seeking representations that capture causal factors of variation*[45][46]*. Graphs are a natural way to represent those factors and their interactions. As research progresses, we expect to see more robust frameworks that can learn causal graphs on the fly, adapt to changes, and provide strong guarantees of consistency and reliability.

Conclusion

AI systems that operate in the time domain must produce not only accurate predictions, but also temporally consistent ones. Inconsistent predictions – those that jump or oscillate without reason – can undermine the usefulness of AI in critical applications, causing anything from user annoyance (e.g., flickering video filters) to serious consequences (e.g., an autonomous car misjudging a situation due to sensor prediction jitter). This paper has examined how the integration of causal reasoning with graph neural networks provides a powerful remedy to this issue. By leveraging causal graphs, whether given or learned, a model gains an understanding of the underlying why behind temporal changes, enabling it to distinguish meaningful signals from noise.

We began by reviewing Graph Neural Networks and their extension to temporal problems, noting the successes and also the limitations (such as clustered errors and volatility) of purely correlation-based temporal models. We then defined temporal consistency and highlighted why it is essential. The notion that predictions should only change when there is a cause for change emerged as a guiding principle – one that is naturally enforced in causal models.

Delving into Causal Graph Neural Networks, we discussed various mechanisms: encoding known causal structures in GNNs, learning causal relations through data-driven methods, and incorporating these relations into the message-passing and update rules of the network. We saw that in practice, causal GNNs often use specialized architectures or loss functions to ensure that effect variables remain stable unless their causes compel them to change. This paradigm was exemplified in multiple domains through case studies. From the smooth video mattes produced by consistency-regularized GNNs[32], to the steady and explainable forecasts in epidemiology*[1][2], to the robust traffic and climate predictions, the evidence strongly suggests that *introducing causal structure greatly enhances the temporal coherence of model outputs**.

One recurring theme is that causal GNNs do not merely smooth out predictions arbitrarily – they maintain the ability to respond to real changes. This is a critical point: consistency does not mean inertia in the face of genuine shifts. Rather, it means the model’s sensitivity is aligned with actual causal drivers. For example, when an important input or environmental factor changes, a causal GNN might even respond faster or more sharply than a black-box model (because it knows the change’s significance), but it won’t respond to random fluctuations in unconnected variables. This intelligent filtering is what yields consistency that is compatible with reactivity.

We also confronted the challenges in this approach. Learning causal graphs remains a difficult problem, and causal GNNs can be complex and computationally heavy. We discussed the need for improved methods to learn and validate causal structures, handle evolving dynamics, and measure consistency. These challenges represent opportunities for future work. If they are addressed, the reward will be AI models that are not only accurate and interpretable but also reliable over time – a cornerstone for trust in AI.

In concluding, we emphasize that temporal consistency should be regarded as a first-class performance criterion for AI systems dealing with sequential data. Causal Graph Neural Networks offer a promising route to achieving this consistency by building models that respect the natural laws or logical rules governing a system’s evolution. This resonates with a broader trend in AI: moving away from purely statistical pattern-matching towards models that capture the structure of the real world, which includes its causal structure. By doing so, we make models that are more robust when the world throws curveballs, more interpretable when we seek explanations, and more stable in their predictions from moment to moment – qualities that are indispensable as AI systems become pervasive in our dynamic, ever-changing lives.

References (APA 7th Edition)

Cai, S., Peng, H., Liu, R., & Chen, P. (2025). Causal-oriented representation learning for time-series forecasting based on the spatiotemporal information transformation. Communications Physics, 8(1), Article 242. https://doi.org/10.1038/s42005-025-02170-6 *[45][46]
Gu, Y., & Deng, L. (2022). STAGCN: Spatial–temporal attention graph convolution network for traffic forecasting. Mathematics, 10(9), 1599. https://doi.org/10.3390/math10091599 *[52][8]
Han, Y., Lam, J. C. K., Li, V. O. K., & Crowcroft, J. (2024). Interpretable AI-driven causal inference to uncover the time-varying effects of PM₂.₅ and public health interventions on COVID-19 infection rates. Humanities and Social Sciences Communications, 11(1), Article 1713. https://doi.org/10.1057/s41599-024-04202-y *[34][24]
Maystre, L., Barello, G., Berariu, T., Cambray, A., Dolga, R., Gonzalez, A. O., Nica, A., & Barber, D. (2025). Incremental sequence classification with temporal consistency. Proceedings of the ... (preprint arXiv:2505.16548).[28]
Su, J., & Wu, S. (2024). Temporal-aware evaluation and learning for temporal graph neural networks. Proceedings of the ... (preprint arXiv:2412.07273).*[15][9]
Wang, T., Liu, S., Tian, Y., Li, K., & Yang, M.-H. (2021). Video matting via consistency-regularized graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021) (pp. 4902–4912). IEEE. *[32][3]
Wang, Y., Zhang, S., Zhou, B., & Wang, B. (2022). STCGAT: A spatio-temporal causal graph attention network for traffic flow prediction in intelligent transportation systems. arXiv preprint arXiv:2203.10749.*[42][8]
Wein, S., Malloni, W. M., Tomé, A. M., Frank, S. M., Henze, G.-I., Wüst, S., Greenlee, M. W., & Lang, E. W. (2021). A graph neural network framework for causal inference in brain networks. Scientific Reports, 11(1), 8061. https://doi.org/10.1038/s41598-021-87411-8 *[10][11]
Yin, Z., Chen, K., Bai, X., Jiang, R., Li, J., Li, H., & Zhang, M. (2023). A survey: Spatiotemporal consistency in video generation. arXiv preprint arXiv:2502.17863.[19]

[1] [2] [24] [29] [30] [34] [35] [49] Interpretable AI-driven causal inference to uncover the time-varying effects of PM2.5 and public health interventions on COVID-19 infection rates | Humanities and Social Sciences Communications

https://www.nature.com/articles/s41599-024-04202-y?error=cookies_not_supported&code=e0e669d1-40ca-4d79-98b2-346c74b79edc

[3] [17] [18] [20] [31] [32] [33] [36] [37] Video Matting via Consistency-Regularized Graph Neural Networks

https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Video_Matting_via_Consistency-Regularized_Graph_Neural_Networks_ICCV_2021_paper.pdf

[4] [5] [9] [13] [14] [15] [16] [50] [51] Temporal-Aware Evaluation and Learning for Temporal Graph Neural Networks

https://arxiv.org/html/2412.07273v2

[6] [7] [21] [22] [23] [45] [46] [47] [48] Causal-oriented representation learning for time-series forecasting based on the spatiotemporal information transformation | Communications Physics

https://www.nature.com/articles/s42005-025-02170-6?error=cookies_not_supported&code=59207913-4f07-4353-8648-c650ad18b795

[8] [25] [26] [27] [42] [43] [44] [52] Exploring Causal Learning through Graph Neural Networks: An In-depth Review

https://arxiv.org/html/2311.14994

[10] [11] [12] [38] [39] [40] [41] A graph neural network framework for causal inference in brain networks | Scientific Reports

https://www.nature.com/articles/s41598-021-87411-8?error=cookies_not_supported&code=cb22fa23-14a1-4434-8222-f73ce76537b3

[19] A Survey: Spatiotemporal Consistency in Video Generation

https://arxiv.org/html/2502.17863v1

[28] [2505.16548] Incremental Sequence Classification with Temporal Consistency

https://arxiv.org/abs/2505.16548