-
Yann Dubois authored
setting up clean utils pairwise lb types initial setup initial requirements README pairwise annotator done openai done main metrics setting up empty license all prompts examples add anthropic add claude prompts minor OAI anthropic installation get_decoder get_decoder max_instances adding guanaco oasst stablelm hugging face remove langchain minor finish all decoders huggingface_local_completions huggingface_api_completions PACKAGES_ALL add opt test update packages debugging huggingface_local_completions api_completions [ENH] add timer [ENH] fast hugging face local [CONF] better default models [CONF] adding all basic conf tested all basic configs add constatns add constatns add constatns docstrings gigignore [ENH] cohere [CLEAN] use hf datasets cleaning cleaning WIP analyze fn_completions mino [ENH] return price and time per example [ENH] return price and time per example add price and time for turkers WIP agreement_of_annotations [ENH] agreement_of_annotations [ENH] add vicuna parsing finish vicuna adding [SCRIPT] add precompute script [SCRIPT] add precompute script add falcon add vicuna with inputs black [ENH] list bias [ENH] vicuna -> lmsys [ENH] vicuna -> lmsys black alpaca_farm_ppo_human_7b setup max_instances bug vicuna [ENH] analyze_evaluators clean prompts minor leaderboards make_evaluator_leaderboard rm make_evaluator_leaderboard change gpt3 to text-davinci-003 [ENH] max_instances to precompute solve merging evaluator leaderboard minor add plotting add plotting rename all and finish leaderboard rm json add local models to lb add local models to lb add local models to lb add local models to lb README update the readme update the readme initial adding of constants ignore claude lb formatting add make_model_leaderboard update lb add constants minor is_return_instead_of_print save main outputs MODELS_TO_BENCHMARK update claude leaderboard leaderbaords rename minor minor minor [NOTEBOOK] compare annotators rm i.dea update readme caching prices prices gpt leadeboards instruction-following prompt minor minor rm caches leaderboard claude drop aviary aviary README aviary readme API constants API constants making new evaluator formatting readme minor Making a new evaluator minor installation developing notebooks rm unecessary ranking better error readme minor is_single_annotator leaderboard ANTHROPIC_MAX_CONCURRENCY [enh] is_save_to_leaderboard [enh] is_save_to_leaderboard imports ranking_parser ranking_parser minor rename check imports caching leaderboard caching leaderboard rename completion kwargs rohan benchmarking rm example moving to evaluators_configs single prompt remove all unecessary prompts model_configs rm all input field update readme update readme adding strip documentation [CONF] add improved configs prompts leaderboards gitignore anthropic n_retries names of models to keep hugging face inference_helper save to results constants update readme allow globing leaderboards cleaning leaderboards cleaning leaderboards package_data delete example add manifest add outputs example AlpacaEval finish developing evalset leaderboards leaderboards aviary bug alpaca farm prompt leaderboards leaderboards bias 1 compare annotators notebook anntoators constants precompute allow additional columns leaderboard update lb add table of content add TOC adding more dropdowns update leaderboard update leaderboards boilerplate for website move boilerplate Create CNAME Delete CNAME AlpacaFarm -> AlpacaEval adding doc update html adding helper adding all helper to README update all leaderboards update all leaderboards smaller example of outputs add leaderboard modes udpate readmes evaluators leaderboard print_leaderboard udpate precompute constants leaderboard_mode_to_print to analyze eval update html add radio buttons udpate differences with alpacafarm update all notebooks error out 003 leaderboard notebooks analyzing all analyzing_annotators finish plotting of analyzs add figures add figures dding first plot finish readme finish readme fix typos in readme. fix citation issues. fix readme. fix setup. minor. add outputs.json example fix small issues with first headline cmd. title aesthetics. title. add filters button add all model configs add results export file minor diffs prettify website udpate leaderboards finish website scoping intro scoping intro scoping intro bug fix add gpt4 full leaderboard udpate gpt4 leaderboard website add interpretation of leaderboards finish explanation of main eval metrics finish explanation of all eval metrics finish explanation of all eval metrics finish explanation of all eval metrics finish up to evaluator test test run on claude instead of gpt4 add related work shorter section add limitation section add to related work add to related work finish readme update website: format dividers update readme make image bigger make image bigger add contribution guidelines typo update readmes running notebook add wizard lm change subtitle webiste add link add github update leaderboards last update finished through tatsu PR finished through tatsu PR pass through tatsu PR pass through tatsu PR add github
14230f34
This project is licensed under the Apache License 2.0.
Learn more
Loading