Databricks releases Dolly 2.0, the first open, instruction-following LLM for commercial use

Recently, Databricks released Dolly 2.0, the next version of the large language model (LLM) with ChatGPT-like human interactivity (aka instruction-following) that the company released just two weeks ago. 

The company states Dolly 2.0 is the first open-source, instruction-following LLM fine-tuned on a transparent and freely available dataset that is also open-sourced to use for commercial purposes. That means Dolly 2.0 is available for commercial applications without the need to pay for API access or share data with third parties.

According to Databricks CEO Ali Ghodsi, while there are other LLMs out there that can be used for commercial purposes, “They won’t talk to you like Dolly 2.0.” And, he explained, users can modify and improve the training data because it is made freely available under an open-source license. “So you can make your own version of Dolly,” he said.

Databricks released the dataset Dolly 2.0 used to fine-tune

Databricks has said that as part of its ongoing commitment to open source, it is also releasing the dataset on which Dolly 2.0 was fine-tuned on, called databricks-dolly-15k. This is a corpus of more than 15,000 records generated by thousands of Databricks employees, and Databricks says it is the "first open source, human-generated instruction corpus specifically designed to enable large language to exhibit the magical interactivity of ChatGPT."

