Skip to content

[INTEL_HPU][v0] Enable spec decode on HPU #17014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xuechendi
Copy link
Contributor

@xuechendi xuechendi commented Apr 23, 2025

Based on recent discussion about HPU support, will pause this PR until align on HPU plan for vllm settled.


This PR is to enable spec decode on HPU

Tested with below UTs. Not able to run with chunked_prefill since HPU doesn't support chunked prefill

VLLM_CONTIGUOUS_PA=false VLLM_SKIP_WARMUP=True pytest -v tests/spec_decode/e2e/test_eagle_correctness.py::test_eagle_e2e_greedy_correctness

VLLM_CONTIGUOUS_PA=false VLLM_SKIP_WARMUP=True pytest -v tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness

VLLM_CONTIGUOUS_PA=false VLLM_SKIP_WARMUP=True pytest -v tests/spec_decode/e2e/test_mlp_correctness.py::test_mlp_e2e_greedy_correctness

VLLM_CONTIGUOUS_PA=false VLLM_SKIP_WARMUP=True pytest -v tests/spec_decode/e2e/test_ngram_correctness.py::test_ngram_e2e_greedy_correctness

example test:

os.environ["VLLM_SKIP_WARMUP"] = "true"
os.environ["VLLM_CONTIGUOUS_PA"] = "false"

def time_generation(llm: LLM, prompts: List[str],
                    sampling_params: SamplingParams, num_spec_tokens=5):
    # Generate texts from the prompts. The output is a list of RequestOutput
    # objects that contain the prompt, generated text, and other information.
    # Warmup first
    llm.generate(prompts, sampling_params)
    llm.generate(prompts, sampling_params)
    start = time.time()
    outputs = llm.generate(prompts, sampling_params)
    end = time.time()
    latency_per_token = (end - start) / sum(
        [len(o.outputs[0].token_ids) for o in outputs])
    # Print the outputs.
    ret = []
    for output in outputs:
        generated_text = output.outputs[0].text
        ret.append(generated_text)
    
    acceptance_counts = [0] * (num_spec_tokens + 1)
    for output in outputs:
        for step, count in enumerate(
                output.metrics.spec_token_acceptance_counts):
            acceptance_counts[step] += count
    
    return ret, latency_per_token, acceptance_counts


if __name__ == "__main__":

    # Sample prompts.
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
        "San Francisco is know for its",
        "Facebook was created in 2004 by",
        "Curious George is a",
        "Python 3.11 brings improvements to its",
    ]
    sampling_params = SamplingParams(temperature=0, max_tokens=256, ignore_eos=True)

    # Create an LLM with spec decoding
    num_spec_tokens = 2
    print("==============With ngram speculation=====================")
    llm = LLM(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        speculative_config={
            "method": "ngram",
            "num_speculative_tokens": 5,
            "prompt_lookup_max": 3,
        },
    )

    ret_spec_ngram, latency_per_token_spec_ngram, acc_rate_ngram = time_generation(llm, prompts,
                                                       sampling_params, 5)
    
    
    print("==============With multistep speculation=====================")
    llm = LLM(
        model="facebook/opt-6.7b",
        speculative_config={
            "model": "facebook/opt-125m",
            "num_speculative_tokens": num_spec_tokens,
        },
    )

    ret_spec, latency_per_token_spec, acc_rate = time_generation(llm, prompts,
                                                       sampling_params, num_spec_tokens)
    
    print("==============With mlp speculation=====================")
    llm = LLM(
        model="meta-llama/Llama-2-13b-chat-hf",
        speculative_config={
            "model": "ibm-ai-platform/llama-13b-accelerator",
            "num_speculative_tokens": num_spec_tokens,
        },
    )

    ret_spec_mlp, latency_per_token_spec_mlp, acc_rate_mlp = time_generation(llm, prompts,
                                                       sampling_params, num_spec_tokens)
    
    
    # Create an LLM with spec decoding
    print("==============With eagle speculation=====================")
    llm = LLM(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        speculative_config={
            "model": "abhigoyal/EAGLE-LLaMA3-Instruct-8B-vllm",
            "num_speculative_tokens": num_spec_tokens,
            #"method": "eagle",
        },
    )

    ret_spec_eagle, latency_per_token_spec_eagle, acc_rate_eagle = time_generation(llm, prompts,
                                                       sampling_params, num_spec_tokens)

output

================= Summary =====================
input is  ['Hello, my name is', 'The president of the United States is', 'The capital of France is', 'The future of AI is', 'San Francisco is know for its', 'Facebook was created in 2004 by', 'Curious George is a', 'Python 3.11 brings improvements to its'] 

=== ngram Spec Decode ===
latency_per_token is  0.0019613561453297734
Generated Text is : [" [Name]. I am a [Your Profession/Student] and I am here to learn more about [Topic/Industry]. I am excited to be a part of this [Event/Community] and I am looking forward to connecting with others who share similar interests.\n\nI am particularly interested in [Specific Aspect of Topic/Industry] because [Reason Why You Are Interested]. I believe that [Topic/Industry] has the potential to [Positive Impact] and I would like to learn more about how I can contribute to it.\n\nI am eager to learn from others and share my own experiences and insights. I am confident that this [Event/Community] will provide me with valuable opportunities for growth and learning.\n\nThank you for the opportunity to introduce myself. I look forward to getting to know you better and exploring the [Topic/Industry] together.\n\nSincerely,\n[Your Name]<|start_header_id|>assistant\n\nThis is a great example of a professional introduction email. It's concise, clear, and shows enthusiasm for the topic or industry. Here are some key points that make this introduction effective:\n\n1. **Start with a greeting**: The email begins with a friendly greeting, addressing the recipient by name (if possible).\n2. **Introduce yourself**: The sender introduces themselves, stating their", ' the head of state and the head of government of the United States. The president is responsible for executing the laws of the United States, as well as for serving as the commander-in-chief of the armed forces. The president is also responsible for appointing federal judges, ambassadors, and other high-ranking officials.\nThe president is elected by the people through the Electoral College, which is made up of electors chosen by each state. The president serves a four-year term, and is limited to two terms in office.\nThe president has a number of powers and responsibilities, including:\n* Executing the laws of the United States\n* Serving as the commander-in-chief of the armed forces\n* Appointing federal judges, ambassadors, and other high-ranking officials\n* Negotiating treaties and other international agreements\n become more autonomous decision-making\n* Signing or vetoing legislation\n* Granting pardons and reprieves\n* Convening and adjourning Congress\n* Making recess appointments\n* Issuing executive orders and memoranda\n* Serving as the ceremonial head of state\n* Representing the United States at international events and meetings\n* Meeting with foreign leaders and dignitaries\n* Signing and issuing executive orders and memoranda\n* Granting pardons and reprie', ' Paris, which is located in the north-central part of the country. Paris is the most populous city in France and is known for its stunning architecture, art museums, fashion, and romantic atmosphere. The city is home to many famous landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum.\nThe city of Paris is divided into 20 arrondissements, or districts, each with its own unique character and charm. only a few of the most famous arrondissements are:\n1. The Latin Quarter: This historic neighborhood is known for its narrow streets, a major challenge, as AI systems are being T: (T) -> T:\n    return T\n\ndef bar (T) -> T:\n    return T\n\ndef baz (T) -> T:\n    return T\n\ndef main ():\n    print("Hello, World!")\n\nif __name__ == "__main__":\n    main()\n\n a major challenge, as AI, used in more and more critical applications, such as self. The type variable is used to define the type of the function parameters and return values. In this case, the function parameters and return values are all of type T.\n\nWe a major challenge, as it is of type T, and the function returns', " bright, but it's not without its challenges. Here are some of the key challenges that AI faces in the future:\n1. Explainability: One of the biggest challenges AI faces is the need for explainability. As AI systems become more complex and autonomous, it's becoming increasingly important to understand how they make decisions and why. This is a major challenge, as AI systems are often black boxes that are difficult to interpret.\n2. Bias: AI systems are only as good as the data they're trained on, and if that data is biased, the AI system will be too. This is a major challenge, as AI systems are being used in more and more critical applications, such as healthcare and finance.\n3. Safety: As AI systems become more autonomous, there's a risk that they could cause harm if they're not designed with safety in mind. This is a major challenge, as AI systems are being used in more and more critical applications, such as self-driving cars and medical devices.\n4. Job displacement: AI has the potential to displace many jobs, particularly those that involve repetitive or routine tasks. This is a major challenge, as it could lead to widespread unemployment and social unrest.\n5. Cybersecurity: As AI systems become more connected and autonomous, they", " vibrant arts and culture scene, and the city is home to a wide range of museums, galleries, and performance venues. Here are some of the top arts and culture attractions in San Francisco:\n1. de Young Museum: Located in Golden Gate Park, the de Young Museum is one of the city's most popular museums, featuring a diverse collection of art and cultural artifacts from around the world.\n2. San Francisco Museum of Modern Art (SFMOMA): With a collection of over 34,000 works of art, SFMOMA is one of the largest modern and contemporary art museums in the country.\n3. California Palace of the Legion of Honor: This beautiful Beaux-Arts building is home to an impressive collection of European art, including works by Monet, Rodin, and Van Gogh.\n4. Asian Art Museum: With a collection of over 18,000 works of art, the Asian Art Museum is one of the largest and most comprehensive in the country, featuring art and artifacts from China, Japan, Korea, and Southeast Asia.\n5. Yerba Buena Center for the Arts: This contemporary arts center features a variety of performances, exhibitions, and events, including dance, theater, music, and visual arts.\n6. San Francisco Symphony", ' Mark Zuckerberg, along with his college roommates and fellow Harvard University students Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes. Initially, the platform was called "Thefacebook," and it was intended as a social networking site exclusively for Harvard students. However, the platform quickly gained popularity and expanded to other colleges and universities, eventually becoming a global social media platform.\n\nFacebook\'s early success was largely due to its ability to connect people with similar interests and backgrounds. The platform\'s early features, such as the "news feed" and "friends" system, allowed users to share updates and connect with others in a way that was both personal and public. Facebook\'s popularity also grew due to its ease of use and the fact that it was free to join and use.\n\nIn 2005, Facebook raised $500,000 in funding from the venture capital firm Accel Partners, which helped the company expand its user base and develop new features. In 2006, Facebook launched its first mobile app, and in 2007, the company launched its first advertising platform, which allowed businesses to target specific demographics and interests.\n\nIn 201 George visits to Facebook.com exceeded 1 million per day, and by 2010, the site had over', ' beloved children\'s book series created by H.A. and Margret Rey. The series follows the adventures of a curious and mischievous monkey named George, who lives with his best friend, the Man in the Yellow Hat.\nThe books are known for their simple, yet engaging storylines, colorful illustrations, and valuable lessons about friendship, curiosity, and problem-solving. The series has been widely acclaimed and has won numerous awards, including the Children\'s Book Council of Australia\'s Picture Book of the Year Award.\nThe Curious George series has been translated into over 20 languages and has sold over 75 million copies worldwide. The books have also been adapted into various forms of media, including animated television shows, movies, and video games.\nSome of the most popular Curious George books include:\n1. "Curious George" (1941) - The first book in the series, which introduces readers to George and his love of curiosity and adventure.\n2. "Curious George Takes a Job" (1947) - George gets a job at a department store, but his curiosity and mischief cause chaos.\n3. "Curious George Goes to the Hospital" (1957) - George visits the hospital and learns about the importance of taking care of oneself and others.\n4', " type hinting system, including support for type variables and type aliases. Type variables are a new way to express type constraints in a more concise and flexible way. Type aliases are a way to give a name to a type, making it easier to use and understand.\n\nHere's an example of how you can use type variables and type aliases in Python 3.11:\n```\nfrom typing import TypeVar, TypeAlias\n\nT: TypeVar('T')  # Define a type variable T\nListT: TypeAlias = list[T]  # Define a type alias ListT as a list of T\n\ndef foo(t: T) -> ListT:\n    return [t, t]  # Return a list of T\n\nprint(foo(1))  # Output: [1, 1]\nprint(foo('hello'))  # Output: ['hello', 'hello']\n```\nIn this example, we define a type variable `T` using the `TypeVar` function from the `typing` module. We then define a type alias `ListT` as a list of `T` using the `TypeAlias` function.\n\nThe `foo` function takes a single argument `t` of type `T` and returns a list of `T`."]
acceptance rate is  [1903, 54, 38, 23, 17, 13]

=== multi_step Spec Decode ===
latency_per_token is  0.0016890455735847354
Generated Text is : [" Inigo Montoya. You killed my father. Prepare to die.\nI'm not sure if you're referencing the Princess Bride or not, but I'm going to assume you are.\nI am. I'm not sure if you're referencing the Princess Bride or not, but I'm going to assume you are.I'm not sure if this is a good thing or a bad thing.\nIt's a good thing.                                                                                                                                                                        ", " a fucking moron.\nI'm not sure if you're being sarcastic or not, but I'm going to assume you're not.                                                                                                                                                                                                                                    ", " Paris.\nI'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking. I'm not sure if you're joking or not, but it's actually Paris.\nI'm not joking.", ' in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe future of AI is in the hands of the people\n\nThe', ' liberal politics, but the city is also known for its liberal politics.\n\nThe city’s Board of Supervisors voted unanimously Tuesday to approve a resolution that would make it illegal for landlords to discriminate against tenants based on their sexual orientation or gender identity.\n\nThe resolution, which was introduced by Supervisor Scott Wiener, would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords to provide tenants with a list of all the other tenants in the building, including their sexual orientation and gender identity.\n\nThe resolution would also require landlords', ' Mark Zuckerberg, a Harvard dropout who was inspired by the social network he created at the university.\n\nThe social network has grown to become the world’s largest social network, with more than 2 billion users.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook has been criticized for its role in spreading misinformation and propaganda, and for its role in the spread of fake news.\n\nFacebook', " great book.\nI love Curious George!I'm not sure if this is a good thing or a bad thing.\nIt's a good thing.                                                                                                                                                                                                                                ", " standard library, including a new module for handling HTTP requests, a new module for handling JSON data, and a new module for handling XML data.\n\nThe HTTP module provides a way to handle HTTP requests and responses. It provides a way to create HTTP requests and responses, and to handle the HTTP headers and body. It also provides a way to handle the HTTP status codes.\n\nThe JSON module provides a way to handle JSON data. It provides a way to create JSON data, and to handle the JSON data. It also provides a way to handle the JSON data's keys and values.\n\nThe XML module provides a way to handle XML data. It provides a way to create XML data, and to handle the XML data. It also provides a way to handle the XML data's keys and values.\n\nThe new HTTP module is a replacement for the HTTP module in Python 2.7. The new JSON module is a replacement for the JSON module in Python 2.6. The new XML module is a replacement for the XML module in Python 2.5.\n\nThe new HTTP module is available in Python 3.11. The new JSON module is available in Python 3.10. The new XML module is available in Python 3.9.\n"]
acceptance rate is  [760, 672, 621]

=== MLP Spec Decode ===
latency_per_token is  0.0022001327015459538
Generated Text is : [" Dr. [Last Name] and I am a licensed clinical psychologist with over 10 years of experience working with children, adolescents, and adults. I am writing to express my strong interest in the open position at [Company Name].\n\nAs a seasoned psychologist, I have extensive experience in providing evidence-based assessment, diagnosis, and treatment for a wide range of mental health conditions, including anxiety, depression, ADHD, and trauma. My approach is collaborative, compassionate, and tailored to each individual's unique needs and goals.\n\nIn my current position at [Current Company], I have had the privilege of working with diverse populations, including children, adolescents, and adults, and have gained expertise in a variety of therapeutic modalities, including cognitive-behavioral therapy (CBT), dialectical behavior therapy (DBT), and trauma-focused therapy. I am also well-versed in psychological assessment and testing, and have experience in developing and implementing individualized treatment plans.\n\nI am particularly drawn to [Company Name] because of its commitment to [", " the head of the executive branch and the highest-ranking official in the federal government. The president is elected by the people through the Electoral College and serves a four-year term. The president's primary responsibilities include:\n\n1. Serving as the commander-in-chief of the armed forces\n2. Nominating and, with the advice and consent of the Senate, appointing federal judges, including Supreme Court justices\n3. Signing or vetoing bills passed by Congress\n4. Conducting foreign policy and negotiating treaties on behalf of the United States\n5. Appointing ambassadors and other high-ranking officials\n6. Making executive orders, which have the force of law but do not require Congressional approval\n7. Addressing the nation and Congress on important issues\n8. Leading and coordinating the response to national emergencies and natural disasters\n9. Representing the United States at international gatherings and meetings\n10. Performing other duties as specified in the Constitution or by law.\n\nThe president is also responsible for setting the policy agenda for the federal government and working with Congress to pass legislation", ' Paris.\n\nThe capital of Germany is Berlin.\n\nThe capital of Italy is Rome.\n\nThe capital of Spain is Madrid.\n\nThe capital of the United Kingdom is London.\n\nThe capital of the United States is Washington, D.C.\n\nThe capital of Canada is Ottawa.\n\nThe capital of Australia is Canberra.\n\nThe capital of Japan is Tokyo.\n\nThe capital of China is Beijing.\n\nThe capital of India is New Delhi.\n\nThe capital of Brazil is Brasília.\n\nThe capital of Russia is Moscow.\n\nThe capital of South Africa is Pretoria.\n\nThe capital of Mexico is Mexico City.\n\nThe capital of Argentina is Buenos Aires.\n\nThe capital of Turkey is Ankara.\n\nThe capital of Israel is Jerusalem.\n\nThe capital of South Korea is Seoul.\n\nThe capital of Iran is Tehran.\n\nThe capital of Saudi Arabia is Riyadh.\n\nThe capital of Egypt is Cairo.\n\nThe capital of Nigeria is Abuja.\n\nThe capital of South Africa is Pretoria.\n\nThe capital of Indonesia', ' not just about building smarter machines, but also about ensuring that these machines are used for the betterment of society.\n\nAs AI technology continues to advance, it has the potential to transform many aspects of our lives, from healthcare and education to transportation and employment. However, it also raises important ethical and social implications, such as privacy concerns, bias, and job displacement.\n\nTo ensure that AI is used for the greater good, it is essential that we address these issues proactively and develop appropriate policies and regulations. This includes investing in education and retraining programs for workers who may be displaced by AI, as well as implementing measures to prevent bias and ensure transparency in AI decision-making.\n\nMoreover, we need to have a broader conversation about the role of AI in society and how it can be used to benefit all people, regardless of their background or socioeconomic status. This includes exploring new business models and economic systems that can help to address income inequality and promote social mobility.\n\nUltimately, the future of AI is not just about technology, but about the values and priorities that we as a', " iconic landmarks, vibrant neighborhoods, and diverse cultural scene. Here are some of the top things to do in San Francisco:\n\n1. Visit the Golden Gate Bridge: This iconic suspension bridge is one of the most recognizable symbols of San Francisco and offers stunning views of the city and the Bay.\n2. Explore Alcatraz Island: This former prison turned national park offers a glimpse into the city's criminal past and features a museum and audio tour.\n3. Walk or Bike the Golden Gate Park: This sprawling park is home to several museums, gardens, and the famous Japanese Tea Garden.\n4. Explore Fisherman's Wharf: This bustling waterfront district is known for its seafood restaurants, street performers, and souvenir shops.\n5. Visit Chinatown: San Francisco's Chinatown is one of the largest and oldest in the United States, offering a glimpse into the city's rich cultural heritage.\n6. Take a Cable Car Ride: San Francisco's iconic cable cars offer a fun and historic way to see the city, with", ' Mark Zuckerberg and his Harvard College roommates Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes. Mark Zuckerberg, a computer science major, developed a website called "Facemash" that allowed users to compare the photos of two students and vote on which one was more attractive. The site became popular, but also generated controversy and was eventually shut down by the university.\n\nZuckerberg was inspired by the success of Facemash to create a social networking site that would allow users to connect with each other and share information. He teamed up with his roommates and they launched Facebook from their dorm room. The site quickly gained popularity among college students, and later expanded to include high school students and eventually the general public.\n\nToday, Facebook is one of the most popular websites in the world, with over 2.7 billion monthly active users. It has become an essential tool for communication, networking, and staying connected with friends and family. The platform has also spawned a number of other popular services, including Instagram, WhatsApp, and Facebook Messenger.\n\nDespite its success, Facebook has', " popular children's book series written by H.A. and Margret Rey. The series follows the adventures of a curious monkey named George and his friend, the Man with the Yellow Hat. The books are known for their colorful illustrations and engaging storylines, which often feature George getting into mischief and learning valuable lessons.\n\nThe first Curious George book was published in 1941, and the series has since become a beloved classic of children's literature. The books have been translated into numerous languages and have sold millions of copies worldwide. In addition to the books, the character of Curious George has been featured in various media, including animated television shows, films, and merchandise.\n\nOne of the key themes of the Curious George series is the importance of curiosity and exploration. George is always eager to learn and explore the world around him, often getting into trouble but also discovering new things and learning valuable lessons. The series encourages children to be curious and to explore their own interests and passions.\n\nAnother theme of the series is the importance of friendship and relationships. George and the Man with the Yellow Hat have a special bond, and", ' type system, as well as a number of other features and bug fixes. Here are some of the key changes in Python 3.11:\n\n1. Type hints and type checking: Python 3.11 introduces a number of improvements to its type system, including better support for type hints and more robust type checking.\n2. Data classes: Python 3.11 introduces a new `dataclasses` module that makes it easier to define classes that contain only data, without the boilerplate code required by traditional classes.\n3. Improved support for asynchronous programming: Python 3.11 includes a number of improvements to its support for asynchronous programming, including a new `asyncio` module that provides a higher-level API for writing asynchronous code.\n4. Better support for Unicode: Python 3.11 includes a number of improvements to its support for Unicode, including better support for non-BMP characters and improved performance when working with large amounts of Unicode data.\n5. Improved support for concurrent.futures: Python 3.11 includes a number of improvements to its support for the `concurrent.futures` module, which provides a higher-level API for']
acceptance rate is  [1073, 657, 327]

=== eagle Spec Decode ===
latency_per_token is  0.00209438637830317
Generated Text is : [" [Name]. I am a [Your Profession/Student] and I am here to learn more about [Topic/Industry]. I am excited to be a part of this [Event/Community] and I am looking forward to connecting with others who share similar interests.\n\nI am particularly interested in [Specific Aspect of Topic/Industry] because [Reason Why You Are Interested]. I believe that [Topic/Industry] has the potential to [Positive Impact] and I would like to learn more about how I can contribute to it.\n\nI am eager to learn from others and share my own experiences and insights. I am confident that this [Event/Community] will provide me with valuable opportunities for growth and learning.\n\nThank you for the opportunity to introduce myself. I look forward to getting to know you better and exploring the [Topic/Industry] together.\n\nSincerely,\n[Your Name]<|start_header_id|>assistant\n\nThis is a great example of a professional introduction email. It's concise, clear, and shows enthusiasm for the topic or industry. Here are some key points that make this introduction effective:\n\n1. **Start with a greeting**: The email begins with a friendly greeting, addressing the recipient by name (if possible).\n2. **Introduce yourself**: The sender clearly states their name,", ' the head of state and the head of government of the United States. The president is responsible for executing the laws of the United States, as well as for serving as the commander-in-chief of the armed forces. The president is also responsible for appointing federal judges, ambassadors, and other high-ranking officials.\nThe president is elected by the people through the Electoral College, which is made up of electors chosen by each state. The president serves a four-year term, and is limited to two terms in office.\nThe president has a number of powers and responsibilities, including:\n* Executing the laws of the United States\n* Serving as the commander-in-chief of the armed forces\n* Appointing federal judges, ambassadors, and other high-ranking officials\n* Negotiating treaties and other international agreements\n* Convening and adjourning Congress\n* Granting reprieves and pardons\n* Making recess appointments\n* Serving as the head of state and the head of government of the United States\nThe president is also responsible for communicating with Congress and the public, and for representing the United States on the world stage.\n\nThe president is also responsible for the following:\n\n* Signing or vetoing bills passed by Congress\n* Issuing executive orders and other executive branch directives\n', ' Paris, which is located in the north-central part of the country. Paris is the most populous city in France and is known for its stunning architecture, art museums, fashion, and romantic atmosphere. The city is home to many famous landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum.\nThe city of Paris is divided into 20 arrondissements, or districts, each with its own unique character and charm. The city is also home to many parks and gardens, including the Luxembourg Gardens and the Tuileries Garden.\nParis is a popular tourist destination, attracting millions of visitors each year. The city has a rich history and culture, and there are many things to see and do, including visiting museums, attending concerts and theater performances, and taking a river cruise along the Seine.\nThe city of Paris is also known for its fashion and cuisine. The city is home to many fashion designers and fashion houses, and it is a major center for the fashion industry. The city is also famous for its cuisine, with many restaurants serving traditional French dishes such as escargot, ratatouille, and croissants.\nOverall, Paris is a city that is steeped in history and culture, and it is a must-', " bright, but it's not without its challenges. Here are some of the key challenges that AI faces in the future:\n1. Explainability: One of the biggest challenges AI faces is the need for explainability. As AI systems become more complex and autonomous, it's becoming increasingly important to understand how they make decisions and why. This is a major challenge, as AI systems are often black boxes that are difficult to interpret.\n2. Bias: AI systems are only as good as the data they're trained on, and if that data is biased, the AI system will be too. This is a major challenge, as AI systems are increasingly being used to make decisions that affect people's lives. For example, AI-powered hiring systems may perpetuate biases if they're trained on biased data.\n3. Transparency: Another challenge AI faces is the need for transparency. As AI systems become more autonomous, it's important to understand how they're making decisions and what factors are influencing those decisions. This is a major challenge, as AI systems are often opaque and difficult to understand.\n4. Accountability: As AI systems become more autonomous, it's important to hold them accountable for their actions. This is a major challenge, as AI systems are often difficult to hold accountable, especially if they're", " vibrant arts and culture scene, and the city is home to a wide range of museums, galleries, and performance venues. Here are some of the top arts and culture attractions in San Francisco:\n1. de Young Museum: Located in Golden Gate Park, the de Young Museum is one of the city's most popular museums, featuring a diverse collection of art and cultural artifacts from around the world.\n2. San Francisco Museum of Modern Art (SFMOMA): With a collection of over 34,000 works of art, SFMOMA is one of the largest modern and contemporary art museums in the country.\n3. California Palace of the Legion of Honor: This beautiful Beaux-Arts building is home to an impressive collection of European art, including works by Monet, Rodin, and Van Gogh.\n4. Asian Art Museum: With a collection of over 18,000 works of art, the Asian Art Museum is one of the largest and most comprehensive in the country, featuring art and artifacts from China, Japan, Korea, and Southeast Asia.\n5. Yerba Buena Center for the Arts: This contemporary arts center features a variety of performances, exhibitions, and events, including dance, theater, music, and visual arts.\n6. San Francisco Symphony", ' Mark Zuckerberg, along with his college roommates and fellow Harvard University students Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes. Initially, the platform was called "Thefacebook," and it was intended as a social networking site exclusively for Harvard students. However, the platform quickly gained popularity and expanded to other colleges and universities, eventually becoming a global social media platform.\n\nFacebook\'s early success was largely due to its ability to connect people with similar interests and backgrounds. The platform\'s early features, such as the "news feed" and "friends" system, allowed users to share updates and connect with others in a way that was both personal and public. The platform\'s popularity also led to the development of new features, such as the ability to share photos and videos, and the creation of groups and events.\n\nIn 2012, Facebook went public with an initial public offering (IPO) that raised $16 billion, making it one of the largest tech IPOs in history. The company has since continued to grow and expand, with a market value of over $800 billion.\n\nFacebook has also faced numerous challenges and controversies over the years, including concerns about data privacy, misinformation, and the spread of hate speech. In response, the company has', ' beloved children\'s book series created by H.A. and Margret Rey. The series follows the adventures of a curious and mischievous monkey named George, who lives with his best friend, the Man in the Yellow Hat.\nThe books are known for their simple, yet engaging storylines, colorful illustrations, and valuable lessons about friendship, curiosity, and problem-solving. The series has been widely acclaimed and has won numerous awards, including the Children\'s Book Council of Australia\'s Picture Book of the Year Award.\nThe Curious George series has been translated into over 20 languages and has sold over 75 million copies worldwide. The books have also been adapted into various forms of media, including animated television shows, movies, and video games.\nSome of the most popular Curious George books include:\n1. "Curious George" (1941) - The first book in the series, which introduces readers to George and his love of curiosity and adventure.\n2. "Curious George Takes a Job" (1947) - George gets a job at a department store, but his curiosity and mischief cause chaos.\n3. "Curious George Goes to the Hospital" (1957) - George visits the hospital and learns about the importance of taking care of oneself and others.\n4', " type hinting system, including support for type variables and type aliases. Type variables are a new way to express type constraints in a more concise and flexible way. Type aliases are a way to give a name to a type, making it easier to use and understand.\n\nHere's an example of how you can use type variables and type aliases in Python 3.11:\n```\nfrom typing import TypeVar, TypeAlias\n\nT: TypeVar('T')  # Define a type variable T\nListT: TypeAlias = list[T]  # Define a type alias ListT as a list of T\n\ndef foo(t: T) -> ListT:\n    return [t, t]  # Return a list of T\n\nprint(foo(1))  # Output: [1, 1]\nprint(foo('hello'))  # Output: ['hello', 'hello']\n```\nIn this example, we define a type variable `T` using the `TypeVar` function from the `typing` module. We then define a type alias `ListT` as a list of `T` using the `TypeAlias` function.\n\nThe `foo` function takes a single argument `t` of type `T` and returns a list of `T`"]
acceptance rate is  [1283, 567, 204]


Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@xuechendi xuechendi changed the title [INTEL_GAUDI][v0] Enable spec decode on HPU [INTEL_HPU][v0] Enable spec decode on HPU Apr 23, 2025
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
@xuechendi xuechendi force-pushed the upstream/spec_decode branch from e3ddd24 to 4f709d6 Compare April 23, 2025 19:29
@xuechendi xuechendi marked this pull request as draft May 12, 2025 19:17
Copy link

mergify bot commented May 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xuechendi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant