Fully Connected – Weights & Biases, the AI developer platform, today announced W&B Weave at their annual conference Fully Connected. W&B Weave is a lightweight toolkit for software developers who want to deploy generative AI applications with confidence, providing software developers with a system of record for the experimental large language model (LLM) application development process.
W&B Weave extends the Weights & Biases AI developer platform beyond machine learning practitioners who want to build and train large scale models to serve software developers who want to build applications with LLMs. Over the last 6 years, Weights & Biases has enabled the most innovative foundation model builders to create the generative AI industry. Today, Weights & Biases is used by over 30 foundation model builders and 1,000 companies to productionize machine learning at scale.
Since the emergence of LLMs such as OpenAI’s GPT 3.5, Anthropic’s Claude 2, Meta’s Llama 2, and Mistral’s Mixtral 8x7B, every organization has been challenged to define and execute their generative AI strategy. While it’s easy to create a generative AI demo, it’s hard to confidently deploy a generative AI application into production given the risks of abuse, misalignment, and hallucinations.
Building software with LLMs is fundamentally different from traditional software development as LLMs are non-deterministic by nature. The solution to this challenge is to treat the model as a closed system where only the inputs and outputs are visible and follow a scientific workflow. This experimental workflow is similar to the workflow machine learning practitioners use to build these LLMs in the first place.
W&B Weave was built to enable this experimental workflow:
- Log everything: Capture every interaction with LLMs, from development to production. This data is expensive to produce, so log it. Use it to improve applications, and build up evaluations.
- Experiment: Try many different configurations and parameters to explore what improves the application.
- Evaluate: Build up suites of evaluations to measure progress. Use evaluations to determine what improves the application. Evaluations are to LLM development what Unit Tests are to traditional software development.
“We built Weave for developers to support the scientific workflow of building AI applications,” said Shawn Lewis, CTO at Weights & Biases. “We made it lightweight and practical with minimal abstractions. We built the tools that helped bring LLMs into the world. Now we're building tools to make them work in production applications.”
W&B Weave has two components:
- Traces. Add one line of code to log the behavior of the LLM application to pinpoint exactly what went wrong. Build RAG applications with full observability into retrieved documents, functions used, and chat messages served to the LLM.
- Evaluations. Score the LLM application in a lightweight, customizable way. Build rigor with a systematic and organized evaluation framework. Spot trends, identify regressions, and make informed decisions about future iterations.
“I like Weave both conceptually and in practice,” said Jonathan Whitaker, AI Researcher at Answer.AI. “I want a lightweight tool that fits into my workflow, rather than a tool that imposes a new workflow on me. Weave achieves this by making it easy to decorate a few functions and then persistently capture the LLM inputs and outputs.”
To learn more about W&B Weave, visit http://wandb.me/weave and get started today.
About Weights & Biases
Weights & Biases is the leading AI developer platform supporting end-to-end MLOps and LLMOps workflows. Used by over 30 foundation model builders and 1,000 companies to productionize machine learning at scale including teams at OpenAI, Toyota, and Microsoft. Weights & Biases is part of the new standard of best practices for machine learning.
View source version on businesswire.com: https://www.businesswire.com/news/home/20240418287123/en/