Challenges with observability
Using Observability tools in our systems provides us with several great benefits. Some of these benefits include better visibility into what's going on in our applications, easier finding and resolving problems causing downtime, and much more.
While observability solves one set of problems i.e not having visibility into resources, performance, and other metrics, it also introduces a new set of challenges. Obserbality systems include many different tools which provide many useful functionalities such as logs, traces, metrics, and alerting systems. Check out this article to learn more about what goes into observability.
While all these features are important to trace downtime, performance, or security-related issues, it involves a ton of context switching. For recording metrics, we could use a tool such as Prometheus along with Graphana for visualizing the metrics, for logs we could use Kibana, Elastic, or something else, traces could be tracked by open telemetry, etc.
As you may have already noticed from all the tools listed above, we are using a lot of tools to achieve different functionalities for observability. This also means that to find out what is wrong with your system, you will need to analyze information from multiple dashboards and co-relate all of them together, which involves constant context switching.
Moreover, these tools are designed with only a single user in mind, i.e, it becomes very difficult to collaborate efficiently. To discuss what you might think might be the potential problem, you would need to switch to slack or some other messaging platform that involves even more context switching.
Another problem that developers and operations teams might often face is creating post-mortems. Post mortems are a way of analyzing how the incident went, and what opportunities there are to improve next time. This is useful to do so that teams will have an easier time fixing the same issue if the arises again. While this is useful documentation to have, it can get time-consuming to sit and write this documentation.
Even if you can take out the time to write it, you are not writing it while resolving the incident. This means that you might not mention a step or two that will cause difficulty for anyone referring to these docs to resolve the same issue in the future.
Fiberplane is a collaborative notebook that connects to your observability stack and helps you monitor and debug your infrastructure. These notebooks allow you to collaborate with teams in real-time and allow you to fetch metrics from your data source in the notebook itself using queries.
Since all the steps you are performing get recorded in the notebook itself, you can download the notebook and use it to easily create an accurate post-mortem with the exact steps and thought process you used.
Fiberplane also has Providers, which is a WebAssembly-based specification and protocol for building observability integrations, designed for portability, security, and extensibility. These providers enable developers to fetch metrics from various platforms such as AWS Cloud provider, ElasticSearch, or Prometheus, and you can use the Provider Development Kit to create custom providers.
How does it work
Fiberplane uses a daemon, that lets you connect your instance of Fiberplane studio, with the data sources in your cluster such as ElasticSearch, without exposing any of the information to the internet.
Fiberplane Studio accesses your infrastructure through Proxy, a lightweight package, which is available as a Docker image, that once installed allows you to query your observability data from your Notebook.
When executing a query, it is first forwarded to the Fiberplane proxy, where it queries the data source, such as Elastic search, encrypts it, and returns it to the studio.
Fiberplane is a great tool that helps you resolve incidents while documenting the exact steps you are taking. Its providers also help in extending its capabilities which ultimately helps integrate a wide variety of different observability tools.