LICENSE · e17c83653e66714b228010fccb89b8d19e99cb59 · Xuqin Zhang / alpaca_eval · JiHu GitLab

gitignore · 14230f34

Yann Dubois authored May 26, 2023

setting up

clean utils

pairwise lb

types

initial setup

initial requirements

README

pairwise annotator done

openai done

main

metrics

setting up empty

license

all prompts

examples

add anthropic

add claude prompts

minor OAI

anthropic installation

get_decoder

get_decoder

max_instances

adding guanaco

oasst

stablelm

hugging face

remove langchain

minor

finish all decoders

huggingface_local_completions

huggingface_api_completions

PACKAGES_ALL

add opt test

update packages

debugging huggingface_local_completions

api_completions

[ENH] add timer

[ENH] fast hugging face local

[CONF] better default models

[CONF] adding all basic conf

tested all basic configs

add constatns

add constatns

add constatns

docstrings

gigignore

[ENH] cohere

[CLEAN] use hf datasets

cleaning

cleaning

WIP analyze

fn_completions

mino

[ENH] return price and time per example

[ENH] return price and time per example

add price and time for turkers

WIP agreement_of_annotations

[ENH] agreement_of_annotations

[ENH] add vicuna parsing

finish vicuna adding

[SCRIPT] add precompute script

[SCRIPT] add precompute script

add falcon

add vicuna with inputs

black

[ENH] list bias

[ENH] vicuna -> lmsys

[ENH] vicuna -> lmsys

black

alpaca_farm_ppo_human_7b

setup

max_instances

bug vicuna

[ENH] analyze_evaluators

clean prompts

minor

leaderboards

make_evaluator_leaderboard

rm make_evaluator_leaderboard

change gpt3 to text-davinci-003

[ENH] max_instances to precompute

solve merging

evaluator leaderboard

minor

add plotting

add plotting

rename all and finish leaderboard

rm json

add local models to lb

add local models to lb

add local models to lb

add local models to lb

README

update the readme

update the readme

initial adding of constants

ignore

claude lb

formatting

add make_model_leaderboard

update lb

add constants

minor

is_return_instead_of_print

save main outputs

MODELS_TO_BENCHMARK

update claude leaderboard

leaderbaords

rename

minor

minor

minor

[NOTEBOOK] compare annotators

rm i.dea

update readme

caching

prices

prices

gpt

leadeboards

instruction-following prompt

minor

minor

rm caches

leaderboard claude drop

aviary

aviary

README

aviary

readme

API constants

API constants

making new evaluator

formatting readme

minor

Making a new evaluator

minor

installation

developing notebooks

rm unecessary

ranking

better error

readme

minor

is_single_annotator

leaderboard

ANTHROPIC_MAX_CONCURRENCY

[enh] is_save_to_leaderboard

[enh] is_save_to_leaderboard

imports

ranking_parser

ranking_parser

minor rename

check imports

caching leaderboard

caching leaderboard

rename completion kwargs

rohan benchmarking

rm example

moving to evaluators_configs

single prompt

remove all unecessary prompts

model_configs

rm all input field

update readme

update readme

adding strip

documentation

[CONF] add improved configs

prompts

leaderboards

gitignore

anthropic n_retries

names of models to keep

hugging face inference_helper

save to results

constants

update readme

allow globing

leaderboards

cleaning leaderboards

cleaning leaderboards

package_data

delete example

add manifest

add outputs example

AlpacaEval

finish developing evalset

leaderboards

leaderboards

aviary

bug alpaca farm prompt

leaderboards

leaderboards

bias 1

compare annotators

notebook anntoators

constants

precompute

allow additional columns

leaderboard

update lb

add table of content

add TOC

adding more dropdowns

update leaderboard

update leaderboards

boilerplate for website

move boilerplate

Create CNAME

Delete CNAME

AlpacaFarm -> AlpacaEval

adding doc

update html

adding helper

adding all helper to README

update all leaderboards

update all leaderboards

smaller example of outputs

add leaderboard modes

udpate readmes

evaluators leaderboard

print_leaderboard

udpate precompute

constants

leaderboard_mode_to_print to analyze eval

update html

add radio buttons

udpate differences with alpacafarm

update all notebooks

error out

003 leaderboard

notebooks analyzing all

analyzing_annotators

finish plotting of analyzs

add figures

add figures

dding first plot

finish readme

finish readme

fix typos in readme.

fix citation issues.

fix readme.

fix setup.

minor.

add outputs.json example

fix small issues with first headline cmd.

title aesthetics.

title.

add filters button

add all model configs

add results export file

minor diffs

prettify website

udpate leaderboards

finish website

scoping intro

scoping intro

scoping intro

bug fix

add gpt4 full leaderboard

udpate gpt4 leaderboard website

add interpretation of leaderboards

finish explanation of main eval metrics

finish explanation of all eval metrics

finish explanation of all eval metrics

finish explanation of all eval metrics

finish up to evaluator

test

test

run on claude instead of gpt4

add related work

shorter section

add limitation section

add to related work

add to related work

finish readme

update website:

format dividers

update readme

make image bigger

make image bigger

add contribution guidelines

typo

update readmes

running notebook

add wizard lm

change subtitle webiste

add link

add github

update leaderboards

last

update

finished through tatsu PR

finished through tatsu PR

pass through tatsu PR

pass through tatsu PR

add github

14230f34

This project is licensed under the Apache License 2.0. Learn more