Radar Trends to Watch: December 2024

It’s the end of the year for Radar! We hope all our readers enjoy the holidays. Here is one prediction for 2025:

Is this the end of the road to improving LLM performance by scaling either the number of parameters or the training data? Nobody knows that yet. Regardless of the answer, we expect interest to shift towards smaller models. We’ll begrudgingly allow a 70B model to qualify as “small”, but we’re really referring to 20B or less. These models will prove to be easier for companies developing AI applications: they won’t cost as much to run and will be easier to fine-tune for specialized applications. Very few applications will need a fully generic language model.

Learn faster. Dig deeper. See further.

Artificial intelligence

The OpenGPT-X project has released its Teuken-7B open large language model. This model is significant because it supports 24 European languages and is designed to comply with European law. It is available on Hugging Face.
The OLMo 2 is a newly released, fully open, small tongue model that comes in 7B and 13B sizes. Both versions claim the best performance in their group.
NVIDIA announced Fugatto, a new generative text-to-audio model that can create entirely new kinds of sounds. They see it as a tool for creators.
Anthropic has announced a developer version of its Model Context Protocol. MCP allows Claude Desktop to communicate securely with other resources. The MCP server limits the services that are exposed to Cloud, filters Cloud requests and prevents data from being made available over the Internet.
OpenScholar is an open source language model designed to support scientific research. It is significantly more accurate than GPT-4o and more economical to operate. It uses RAG to access a large database of open access scientific papers, ensuring that citations are accurate.
Meta has partnered with VSParticle to create new materials from AI-generated instructions. They focus on nanoporous materials that could be catalysts for the decomposition of CO2 into useful products.
Perplexity introduced in-app shopping: Users can search for something and then have Perplexity buy it. It is the first widely available example of an AI agent changing the state of the physical world.
Research has shown that generative AI models have their own distinctive styles, not unlike human writers. Stylistic analysis can identify the source of the text to the model that generated it.
Mistral has launched the Pixtral Large, a 124B multimodal model with comparative performance on par with the latest versions of other frontier models.
Mozilla’s Common Voice project collects speech samples in languages other than Anglo-American English to help developers create voice-enabled applications using other languages and dialects. The project is open source.
Mechanistic interpretability is a research field that uses AI to investigate what is happening at each layer of a large language model. It provides a path to AI interpretability: the ability to understand why AI produces whatever output it generates, and possibly control that output.
Google’s Pixel phones will be able to monitor phone conversations and detect fraud in real time. Processing takes place exclusively on the phone. This feature is disabled by default and can be activated on a per-call basis. Another new feature detects stalkerware, apps that collect data without the user’s consent or knowledge.
The Common Corpus dataset for training large language models is now open and available on Hugging Face. The dataset contains over 2T tokens taken from “permissibly licensed” sources and documents the provenance of each source.
OpenAI’s latest model, Orion, is an improvement over GPT-4. But is it a significant improvement? Apparently not. This may be the end of the road to improving LLMs by making them bigger. (And is Orion GPT-5?)
FrontierMath is a new AI benchmark that is based on very challenging math problems. At the moment, no language model scores above 2% (Gemini 1.5 Pro).
Separating the instruments in a musical performance is difficult, but it is possible. Here’s an AI-free signal processing masterpiece that attempts just that. Can we change the performance back to sheet music?
Standard Intelligence has released hertz-dev, a new model for real-time voice synthesis. He was trained purely on audio and can participate in unscripted conversations without using text.
Microsoft’s Magentic-One is a general-purpose agent system capable of performing complex tasks. Magentic-One is open source for researchers and developers. Microsoft also released AutoGenBench, an open source tool for evaluating the performance of agent systems.
ChainForge is a new visual tool for rapid engineering. It can be used to test challenges against multiple models and evaluate response quality.
Artificial intelligence has been used to age Tom Hanks and Robin Wright in the new film, allowing the actors to play their characters over the course of 60 years.
Anthropic has released the Claude 3.5 Haiku, a new version of its smallest and fastest model. The company claims that its performance in many measures is better than the Claude 3 Opus, its previous flagship model. Anthropic has also significantly increased the cost of using Haiku.
OpenAI introduced predicted outputs. If the output to the prompt is largely known in advance—for example, if you’re asking GPT to modify a file—you can upload the expected result with the prompt and GPT will make the necessary changes. Predictive outputs reduce latency; they are clearly not cutting costs.
Fortunately, AI Psychiatry has nothing to do with psychoanalyzing human patients. It is a forensic tool for post-mortem analysis of AI failures that allows investigators to obtain the exact model that was in use when the failure occurred.
SmolLM2 is a new small language model designed to run on devices. It comes in versions with 135M, 360M and 1.7B parameters. Early reports say its performance is impressive.
vLLM is a framework for serving LLM. Works with most language models on Hugging Face. Not only does it claim to be simpler, it also claims to have significant performance and cost advantages by using a key-value store to cache input tokens.
AI Flame Graphs show developers in detail what their models are doing. If you care about performance or power consumption, they are revolutionary.
Google’s Jarvis project is said to be the company’s answer to the Anthropic computing API. Jarvis takes over the browser (probably Chrome) to perform tasks on the user’s behalf.
NotebookLM’s ability to generate a podcast from documents is impressive. Can other models do the same? NotebookLlama is an open source project that generates podcasts using Llama models.

Programming

bpftune is a tool that continuously tunes Linux performance using observability data from BPF. It has “zero configurability” (no configuration) and low overhead, and is smart enough to stay away from settings made by the system administrator. Apparently it doesn’t use AI.
Kyanos is a new open source network analysis tool based on eBPF. Because it has access to eBPF data, it can filter packets by process or service and can provide accurate information about packet latency.
VMware Fusion and VMware Workstation are now free for all users, including commercial users. Broadcom will continue to develop products, but will no longer provide troubleshooting support to users.
OpenCoder is a family of language models for code generation. It is completely open source and in addition to the code, the training data, data feed, training results and training logs are available. Its intention is to encourage further experimentation and research in code generation.
Mergiraf is a Git merge conflict resolution tool using an understanding of common programming languages (including Java, Rust, and Go) and file formats (including JSON, HTML, XML, and YAML). The authors claim that new languages can be easily added.
A proposal for Safe C++, a new version of C++ that will include memory safety features, has been published.
DataChain is a Python library for working with structured data in the context of artificial intelligence. It is designed for creating data feeds and manipulating data at scale.
NoCode GitHub? GitHub Spark allows users to create small “microapps” or sparks without having to write any code. What may be more important than no code is no deployment; sparks are deployed in the GitHub infrastructure and are accessible via the web.
Using Git to back up your Linux /etc directory is obvious once you think about it.
Ractor is an Actor framework for Rust, which means you can program in Rust as if it were Erlang. I’m amazed at the longest and most complicated “Hello, World” I’ve ever seen.
Kubernetes is a platform for building platforms. And platforms must serve both development and operations teams.
GitHub Copilot can now use models other than GPT. Users can choose Claude Sonnet or Gemini in addition to the various OpenAI models. Other new features include automatic code review, an upgrade assistant for Java, multiple file editing, and something called Spark, which sounds a bit like Claude’s Artifacts.
Is your AI generated code safe? No. It’s likely that we won’t stop using tools like Copilot and Cursor, but we have to understand the challenge: AI models were trained on publicly available code. Most publicly available code has security flaws. These will be reflected in the AI output.
Does Java need another build tool? Mill awaits takeover. Mill claims to be 5-10x faster than Maven, 2-4x faster than Gradle.
Amphion is an open source toolkit for generating all forms of sound, including music and speech.

Security

Robots

Grasso is an AI trashbot: a mobile robot made of trash. It uses Llava-v1.6-mistral-7B to understand visual input from the camera and Mistral-7B for prompts and responses. (Does not understand or generate speech.)
Meta has released several new projects for touch perception, a critical element in building AI-controlled robots that can interact with the real world. Digit 360 is a tactile digital finger, Sparsh is an encoder for tactile data, and Digit Plexus is a platform for creating artificial hands.
Connect two non-intelligent microrobots (bristlebots) together with a short flexible rope and they gain the ability to solve simple problems.

Web

Want to run Linux in your browser? You can. WebVM is a virtual machine that runs in a browser. Linux in a browser might not be that interesting; it’s more important as another example of Wasm’s abilities.

Virtual reality

Want to talk to Rosa Parks or Abraham Lincoln? Try ENGAGE XR, a tool that combines VR and generative AI. Whether this is really history is an interesting question; the bus in the Rosa Parks example looks like a modern European bus, not a 1950s American bus.

Quantum computing

Google’s DeepMind has developed AlphaQubit, an artificial intelligence system that detects errors in quantum systems. Error correction has made huge strides in the past year, but it still remains a major problem in quantum computing.