Raw-fi Data

Google Analytics for Streamlit application

2024-10-14

Streamlit and Google Analytics

In the previous article, We deployed Streamlit on our server. Now, we probably want to introduce Google Analytics for it in order to analyze your access.

At first, I tried to output the Google Analytics code snippet in the Streamlit application by calling st.markdown(ga_code, unsafe_allow_html=True), but it was not output.

After that, I found the solution below. https://stackoverflow.com/questions/76034389/google-analytics-is-not-working-on-streamlit-application

So, I tried to apply this solution when starting the Docker Container.

How it applies

As explained on the Stack Overflow page above, this solution involves directly modifying the Streamlit source code and embedding the GA code snippet. As explained last time, We deploy Streamlit as a Docker container. Therefore, we edit Streamlit source code for GA when the container starts.

At first, it needs to install beautifulsoup4 and lxml for editting index.html in Streamlit.

requirements.txt:

(...snip...)
beautifulsoup4==4.12.3
lxml==5.3.0

And add a python scirpt as setup_ga.py. (Please replace "G-xxxxxxxxxx" with your GA code)

import pathlib
from bs4 import BeautifulSoup
import logging
import shutil
import streamlit as st


def inject_ga():
    GA_ID = "google_analytics"
    GA_JS = """
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-xxxxxxxxxx"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-xxxxxxxxxx');
</script>
"""
    # Insert the script in the head tag of the static template inside your virtual
    index_path = pathlib.Path(st.__file__).parent / "static" / "index.html"
    logging.info(f'editing {index_path}')
    soup = BeautifulSoup(index_path.read_text(), features="lxml")
    if not soup.find(id=GA_ID):  # if cannot find tag
        bck_index = index_path.with_suffix('.bck')
        if bck_index.exists():
            shutil.copy(bck_index, index_path)  # recover from backup
        else:
            shutil.copy(index_path, bck_index)  # keep a backup
        html = str(soup)
        new_html = html.replace('<head>', '<head>\n' + GA_JS)
        index_path.write_text(new_html)


if __name__ == "__main__":
    inject_ga()

Then, add "entry-point.sh" for executing "setup_ga.py" on starting your container.

#!/bin/bash
set -e

python3 ./setup_ga.py
streamlit run app.py --server.baseUrlPath=/pdf-to-csv --server.port=8501 --server.address=0.0.0.0

At last, update Dockerfile for copying all files and calling "entry-point.sh" on starting the container.

FROM python:3.9-slim

WORKDIR /app

RUN apt-get update && apt-get install -y \
    build-essential \
    curl \
    software-properties-common \
    git \
    openjdk-17-jre \
    && rm -rf /var/lib/apt/lists/*

COPY ./app.py .
COPY ./setup_ga.py .
COPY ./requirements.txt .
COPY ./entry-point.sh .
RUN chmod 755 ./entry-point.sh

RUN pip3 install -r requirements.txt

EXPOSE 8501

ENTRYPOINT ["/app/entry-point.sh"]

Then, create a container image and deploy it, which will output the GA code snippet inside the head tag as follows: streamlit_ga

Conclusion

I introduced how to embed a GA code snippet in Streamlit. This allows you to analyze web access to the Streamlit application with GA. It's a bit of a trivial way, isn't it? I hope that in the future, it will be possible to embed arbitrary code inside the head tag of Streamlit.