Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepseek-math-7b-rl-stepdpo推理后的结果问题 #13

Open
wjn1996 opened this issue Aug 4, 2024 · 1 comment
Open

deepseek-math-7b-rl-stepdpo推理后的结果问题 #13

wjn1996 opened this issue Aug 4, 2024 · 1 comment

Comments

@wjn1996
Copy link

wjn1996 commented Aug 4, 2024

你好,使用released的deepseek-math-7b-rl-stepdpo模型,在dpo10k的数据上进行推理,效果却很差,是什么原因?

按照提供的推理参数和deepseek-math对应的template
temperature=0.9,topp=0.95

prompt输入:

{"role": "user", "content": "Jenny has 8 stamp books, each containing 42 pages with 6 stamps on each page. She decides to reorganize her books so that each page now contains 10 stamps. She fills up 4 complete books, still with 42 pages per book. In her fifth book, she has 33 pages filled with 10 stamps per page and 1 last page with the remaining stamps. How many stamps are on the last page of her fifth book?\nPlease reason step by step, and put your final answer within \\boxed{{}}"}

prompt输出:

{
"role": "assistant", 
"content": " Calculate total number of stamps in all four completed books.\\njenny starts with eight stamp books, where seven of them (the first four) each contain 42 pages with 6 stamps on each page. So, each book has 42 * 6 = 252 stamps.\\ntherefore, for these four books, there are 4 * 252 = 1008 stamps in total.\\ndefinition\\nsolving problems involving fractions can be challenging but it doesn’t have to be!\\nyou just need a few simple steps:\\nnfirst identify what fraction you want to solve or simplify,\\nin this case we start off knowing $\\frac{a}{b}$. Then use cross-multiplication if needed which means multiplying both top numbers together ($ac$)and bottom numbers ($\\cdot bd$), giving us $ad=bc$.
\\nsometimes people find working out common denominators easier than usingcross multiplication directly - especially when dealingwith more complex equations like$\\tfrac{\\sqrt x + y}{\\ sqrt z+w}.\\nit really depends how comfortableyou feel solvingproblems!either way will give sameanswer unless specified otherwiseby teacher/textbook author etcetera..so keep calm & carryon no matterwhich method seemsmore straightforwardatmoment :)step one : multiply numeratorexpression numeratorpart×denominator part ->$ axbxcdyczd=(axc)( bxd)=(ayzcbwd)$\\nweshould endup having equivalentfractions regardlessofmethodused!(just making sureeveryone understandsthisbitok?).now let‘slighterinto secondstageoftask.!.[underline]\"second stage\"[/\"][bold](simplifying)[/-]\
\ngiven two(or moreshighlycomplexmathsythingsstillworksameway)\\nequationsthatlooklike:$e=\\fracefgfhij$.wewanttosimplifytheseuntiltheyareinlowesttermspossiblewithoutlosinganyinformationfromoriginalequation.(i\"llusethisexamplecauseitsnottoooo complicatedbut basically anythinggoes here)!to dothisis pretty easy actually;allwer e gotta doistakethe num&dem individuallyandrearrangemthem sothateachoneonlyhasprimefactorsleftafterdivisionortakingoutcommon factorsifnecessary-[highlight]'likwiseordraw alinesacrossnumeratorisndivideeachnumberbysmallest possibleintegeruntilit cant bedownfurther'[unhighlight]-thenrepeatprocessforthedenominaterightnexttothenumeratorline-(trytoputspacesbetweenlinesforclarity).when finishedbothsidesshouldhavelikestandardformexceptmaybeahugebiggerthan usualcoefficientontop--(dona/forgetaboutitthoughbecause weregoingtocancelith!)--finallymultiplyingthroughboththesidesgets ridoffactorsandsleavesustwithexactsamethingasyour original equation only muchsimplerlooking![italic].\"as long as everyone stays focus[/italics]\",
no mistakeswillhappenhere:[reset].letstartagain:.givenequatio nwas$[tex]\\\\E\\\\\\\\ =\\ \\\\\\\\fractextrmegft ext rmhgjfifigjj$:splitnum eratopart into primes ,get $[itext]=\\ fractr\\\\\\ p^ \\\\\\ q\\/tr\\$ [strike]{\\ color {red}p}^ {\\ colo r {#ba na }}\\ / tr\\$.denomi natorestilt he sam [#vanish]{color{#ffda}\\ smalldarkgreen }{rs}.cancelling identicalprimesona bot hs sid esgive$$k=[col o rprouncesomewhathardtomakesuretypoicallyemptybox]*l $$foreachpairsofidenticalfactorsexistsbetweeenthet op&\\bottomthatcanbecancelled .repeatedfo rationgiva essameresult astherstattemptabovewhichtellsu stheyared ideni cal !thatswhy mathematiciansalwayskeepcheckingbackovertheir work:(correctmyselfifa missthingshapp en).\\ finallymultiply through t obot hl sides getridofo actors leavesuse wit hexactlysmalle rieq uivalentthingyourorigina l equati \\$onge was ;$$\\ E\\[smallblacktriangle]= \\[fontfamily=\"Times New Roman\"]\\{ }[\\ fra cex]/_{mathrmbfq }\\^{boldsymbol}/_{\\ mathsfv}/\\dfracfxgxhxixjykxlxmxnxoypx\\_/{}_{mat hbb}]{rrrr}[\\mathb ff}]F\\[\\ smallbl ack triangle\\]G\\(\\normalsize V\\./\\)/PXQYRZSUTWVXYZWVXYZWZYXVUVTSSRQONMLK JIHGFEDCBA\\`_ ]\\)dividingeverything across line makes thingssimplersomuch!! .\\ okayletsgo backtonumbers.$$ lookclosely...everythin gcheckoutsright ?yesindeeditisfinehoweverthereseemsto betwoextra\\,…wellactually wedid makeanerrorbefore cancel ing ...forgottogiveattention tot hatlittle \\, right?\\somessomehow gotstuckinfin dlegolasprogramtryingtodoitse"}

就是生成的文本都很怪,这是什么原因呢?

@x6p2n9q8a4
Copy link

我用DPO也有这个问题;就是DPO之后的模型 输出的东西有幻觉 都不合法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants