[RFC PATCH] usertools: add telemetry exporter

Robin Jarry rjarry at redhat.com
Mon Apr 1 23:28:22 CEST 2024


Anthony Harivel, Mar 27, 2024 at 16:18:
> Hi Robin,
>
> Thanks for this patch. I did test it and it works as expected. 
> Nonetheless, maybe we can improve on some parts.

Hey Anthony, thanks a lot for testing!

> In 'class  TelemetrySocket', there is:
> ...
> self.sock.connect(path)
> data = json.loads(self.sock.recv(1024).decode())
> ...
>
> Maybe we can improve with something like:
>
>         try:
>             rcv_data = self.sock.recv(1024)
>             if rcv_data:
>                 data = json.loads(rcv_data.decode())
>             else:
>                 print("No data received from socket.")
>         except json.JSONDecodeError as e:
>                 print("Error decoding JSON:", e)
>         except Exception as e:
>                 print("An error occurred:", e)
>
> So that it handles a bit better the error cases.

This is undesired as it would actually mask the errors from the calling 
code. Unless you can do something about the error, printing it should be 
left to the calling code. I will handle these errors better in v2.

> In the same way to implement more robust error handling mechanisms in:
> def load_endpoints
> ...
> except Exception as e:
>     LOG.error("Failed to load endpoint module '%s' from '%s': %s", name, f, e)
> ...
>
> For example, you might catch FileNotFoundError, ImportError, or SyntaxError.
> That could help to debug!

We could print the whole stack trace but I don't see what special 
handling could be done depending on the exception. I will try to make 
this better for v2.

> About TelemetryEndpoint I would see something like:
>
> class TelemetryEndpoint:
>     """
>     Placeholder class only used for typing annotations.
>     """
>
>     @staticmethod
>     def info() -> typing.Dict[MetricName, MetricInfo]:
>         """
>         Mapping of metric names to their description and type.
>         """
>         raise NotImplementedError()
>
>     @staticmethod
>     def metrics(sock: TelemetrySocket) -> typing.List[MetricValue]:
>         """
>         Request data from sock and return it as metric values. Each metric
>         name must be present in info().
>         """
>         try:
>             metrics = []
>             metrics_data = sock.fetch_metrics_data()
>             for metric_name, metric_value in metrics_data.items():
>                 metrics.append((metric_name, metric_value, {}))
>             return metrics
>         except Exception as e:
>             LOG.error("Failed to fetch metrics data: %s", e)
>             # If unable to fetch metrics data, return an empty list
>             return []
>
> With these changes, the metrics method of the TelemetryEndpoint class 
> could handle errors better and the exporter can continue functioning 
> even if there are issues with fetching metrics data.
>
> I don't know if all of that makes sens or if it's just nitpicking! 
> I can also propose an enhanced version of your patch if you prefer.

As indicated in the docstring, this class is merely a placeholder. Its 
code is never executed. It only stands to represent what functions must 
be implemented in endpoints.

However, your point is valid. I will update my code to handle errors in 
endpoints more gracefully and avoid failing the whole request if there 
is only one error.

Thanks for the thorough review!



More information about the dev mailing list