Insights 9: Stable Diffusion, Code Generation, Oren Joining Incubator
It's been an eventful summer for AI, startups, and the AI2 Incubator. The AI2 Incubator added a world-renown AI expert and entrepreneur as a technical director, while our portfolio companies raised a total of $35M in funding this summer. We witnessed a flurry of image generation announcements from Big Tech as well as much smaller players, including our startup pick for this issue. Image generation is a subcategory within the AI generation category that includes augmented writing tools such as copy.ai and jasper.ai and code generation technologies such as GitHub CoPilot. We survey new advances in AI code generation as a special topic. Read on as we give a detailed account of these exciting developments.
AI2 Incubator Update
First, the biggest news for us here at the AI2 Incubator. Oren Etzioni
stepped down as CEO of AI2 and joined the incubator as a technical director. With the late Paul Allen, Oren launched AI2 in 2014 and grew it into an AI research powerhouse with more than 200 employees and 700+ research papers that include seminal works on Aristo, Semantic Scholar, Yolo, XNOR, AllenNLP, ELMo, AI2Thor, and others. Just this summer, AI2 researchers and engineers won
paper awards at ICML, NAACL, and CVPR. As an entrepreneur who co-founded companies such as Netbot, Farecast, and Decide, Oren worked tirelessly to plant the seed and grow the AI2 Incubator. Our companies collectively have raised $130M in funding and are worth $650M, thanks in a big part to Oren's experience, wisdom, and just plain hustle. Oren remains on AI2's board and will have more time to help the incubator become successful in the road ahead. Welcome, Oren!
Three AI2 Incubator companies announced funding rounds this summer. Congratulations!
- Ryan Millner and Ashwin Singhania raised $3M for Unwrap.ai, a company that aggregates and analyzes user feedback with AI.
- Varun Puri and Esha Joshi raised $6M for Yoodli. Yoodli helps professionals improve public speaking with speech and NLP technologies.
- Ozette raised $26M. Ozette builds technologies at the intersection of big data, immune cell analysis, and machine learning to unlock the secret of our immune system, leveraging many years of research at the Fred Hutch Cancer Center.
In late July, we hosted a summer BBQ party to celebrate the good food, drinks, and weather with the Seattle startup community: 340 people ate 510 pounds of BBQ and drank 656 beers to the music superbly DJ-ed by our friend,
Kirby Winfield.
AI Image Generation and Stability AI
2022 may be remembered in the future as the year when AI became capable of generating high-resolution, photorealistic images from text prompts. This past April, OpenAI announced DALL-E 2, the successor to DALL-E with higher resolutions, improved comprehensions, and new capabilities such as in-printing. Early this summer, Google announced two image generation models: Imagen and Parti, while Meta entered the fray with Make-A-Scene. On July 20th, OpenAI made DALL-E 2 beta available to the public. OpenAI however was not even the first here; the honor went to MidJourney, an independent research team of 11. MidJourney proved that AI image generation is not a game played by the exclusive club of Big Tech (OpenAI is a member here as they are backed by Microsoft). MidJourney positions itself as a DALL-E competitor that has a bias toward "pretty" images.
This blog post gives an extensive comparison between DALL-E and MidJourney.
The most interesting development in this space is Stability AI's public release of Stable Diffusion (SD) on August 22nd. We are blown away by the following:
- The latent diffusion technique is so effective at reducing model size while minimizing impact to model quality. According to its model card, SD was trained on the LAION-5B dataset (5.85 billion image-text pairs, about 280TB), resulting in a final 4.2GB of binary blob of floating point numbers that fits into a typical recent NVIDIA gaming card. Having a model that can run on consumer hardware is HUGE! We are intrigued by the prospect of applying this technique to text and other modalities.
- Everything behind SD is open, including the model weights that are hosted on Hugging Face. Open AI should consider renaming/rebranding themselves.
- How poorly the research community understands diffusion models. As an example, a recent paper from U of Maryland reports that the Gaussian noise can be replaced by all sorts of alterations, including deterministic ones. Readers of this newsletter are well aware of our excitement around large models (large language models, or LLMs, are now increasingly used), but SD proves that size may not always matter! As our understanding improves (it always will), we expect more exciting developments in this space.
- How quickly the community latches on to SD to create all sorts of cool/innovative ideas/demos. Using Lexica, you can search over 5M+ Stable Diffusion images and prompts. There will be interesting applications of this technology in the startup space.
Stability AI is thus our startup pick for this issue, beating out John Carmack's Keen Technologies that goes after the big prize, Artificial General Intelligence with $20M
funding from Sequoia Capital, Nat Friedman, Daniel Gross, and others. Stability AI is however an unusual startup, if we can even call it a startup at all. It's mainly funded by
Emad Mostaque. Watch an
interview he gave to Yannic Kilcher to get a sense of his background and his thinking around Stability AI.
Speaking of startups, the duo of Nat Friedman and Daniel Gross put together $10M to fund
AI Grant, a startup grant program that provides early access to Stability AI's models among other benefits (Emad Mostaque is an advisor). We love the way they articulate the challenge: "Researchers have raced ahead. It's time for entrepreneurs to catch up!" Check out the section on "What will AI-first products look like?" at AIGrant.org for inspiration, or use Cohere's playground to
generate startup ideas using LLMs.
AI Code Generation: CoPilot and Friends
Recent advances in LLMs (GPT-3, GitHub CoPilot) and diffusion models (DALL-E, Stable Diffusion) are shifting the public attention from perception (e.g. text classification, entity extraction, object detection, speech recognition, etc.) to generation. The first wave of GPT-3-based writing assistant companies includes Copy.ai and Jasper.ai. Built on top of OpenAI's Codex, GitHub CoPilot assists software developers with writing code. CoPilot reached general availability on June 21st. At the AI2 Incubator we are happy users of CoPilot and more than willing to pay $100/year for it. We believe it's still early in the area of AI-assisted software development. Below are a few recent noteworthy developments.
When it comes to this hot area, we can count on the usual suspects to participate. Following GitHub CoPilot's lead, Meta released InCoder in April, as reported in our
previous issue. Amazon
launched CodeWhisperer in preview on June 23rd. Then Google
shared details about its internal-only, unnamed tool that employs a hybrid approach that combines ML (a relatively small, 500M-parameter model trained on Google's monorepo) with rule-based semantic engines (SEs). They reported:
We compare the hybrid semantic ML code completion of 10k+ Googlers (over three months across eight programming languages) to a control group and see a 6% reduction in coding iteration time (time between builds and tests) and a 7% reduction in context switches (i.e., leaving the IDE) when exposed to single-line ML completion. These results demonstrate that the combination of ML and SEs can improve developer productivity. Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions.
This is interesting to us since combining the power of LLMs with rules to increase accuracy while reducing model size gives us a compelling sweet spot in real-world applications.
What if AI can generate a whole program instead of providing a better Intellisense? Salesforce AI research team looked at the special case where unit tests are available. By combining pre-trained language models and reinforcement learning (test failing = bad, test passing = good), Salesforce's CodeRL system
achieved new SOTA results of 2.69% pass@1, 6.81% pass@5, and 20.98% pass@1000 on the difficult APPS benchmark. Our verdict: full program synthesis remains challenging for current AI techniques.
Lastly, we discuss briefly the problem of compiler optimization. If you are a C/C++ programmer, you probably used GCC's -O flag that instructs the GCC compiler to optimize (fun fact: there are
8 levels of optimizations for GCC). Traditionally, compilers use increasingly complex heuristics for code optimization. Increasingly, anything that employs complicated heuristics could become the target of machine learning (eating software). Along this line, the folks at Google built and
released:
“MLGO: a Machine Learning Guided Compiler Optimizations Framework”, the first industrial-grade general framework for integrating ML techniques systematically in LLVM (an open-source industrial compiler infrastructure that is ubiquitous for building mission-critical, high-performance software). MLGO uses reinforcement learning (RL) to train neural networks to make decisions that can replace heuristics in LLVM. We describe two MLGO optimizations for LLVM: 1) reducing code size with inlining; and 2) improving code performance with register allocation (regalloc). Both optimizations are available in the LLVM repository, and have been deployed in production.
The
AI Grant folks coined the phrase
CoPilot model, using the metaphor of an assistant that looks over your shoulder and removes drudgery and expands your creative imagination. It's wonderful that AI can assist by looking over the shoulder of a newbie programmer learning how to code as well as the most experienced compiler engineers building complex optimization heuristics.
Applied AI Update
In the age of large models, there is a strong focus among practitioners on finding the sweet spot between performance and model size/cost. Stable Diffusion's use of latent diffusion technique is one of the best examples here, but there are some other interesting developments since our last issue.
- Amazon shared some ideas on how to make BERT and BART models smaller/faster while minimizing accuracy degradation. The distillation and quantization method that compressed BART to 1/16th its size is particularly noteworthy.
- Hugging Face's Philip Schmid has a series of blog posts on performance and latency optimization for transformers using Optimum and DeepSpeed-Inference. For example, he walks through the process of using Optimum to reduce inference latency of BERT from 7ms to 3ms while keeping 99.7% of the original accuracy.
- Tim Dettmers, a UW PhD candidate, wrote a wonderful blog post on LLM.int8() and emergent features. LLM.int8() is a quantization technique that saves 2x memory while maintaining performance of LLMs. As an example, you can now run T5-11B on Google Colab, or 175B models (OPT and BLOOM) on a single 8x A100 machine! What are emergent features? Unfortunately we could not distill it into a nutshell; you just have to read the post for a deep dive. The best part: LLM.int8() has already been integrated into Hugging Face via the library bitsandbytes.
Additional Readings That We Found Interesting
Stay up to date with the latest A.I. and deep tech reports.