28,99 €
Mastering spaCy, Second Edition is your comprehensive guide to building sophisticated NLP applications using the spaCy ecosystem. This revised edition embraces the latest advancements in NLP, featuring new chapters on Large Language Models with spaCy-LLM, transformers integration, and end-to-end workflow management with Weasel.
With this new edition you’ll learn to enhance NLP tasks using LLMs with spaCy-llm, manage end-to-end workflows using Weasel and integrating spaCy with third-party libraries like Streamlit, FastAPI, and DVC. From training custom named entity recognition (NER) pipelines to categorizing emotions in Reddit posts, readers will explore advanced topics like text classification and coreference resolution. This book takes you on a journey through spaCy’s capabilities, starting with the fundamentals of NLP, such as tokenization, named entity recognition, and dependency parsing. As you progress, you’ll delve into advanced topics like creating custom components, training domain-specific models, and building scalable NLP workflows.
By end of the book, through practical examples, clear explanations, tips and tricks you will be empowered to build robust NLP pipelines and integrate them with web applications to build end-to-end solutions.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 265
Veröffentlichungsjahr: 2025
Mastering spaCy
Build structured NLP solutions with custom components and models powered by spacy-llm
Déborah Mesquita
Duygu Altinok
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Tejashwini R
Book Project Manager: Aparna Ravikumar Nair
Content Engineer: Vandita Grover
Senior Content Development Editor: Priyanka Soam
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Proofreader: Priyanka Soam
Indexer: Manju Arasan
Production Designer: Nilesh Mohite
Growth Lead: Kunal Sawant
First published: July 2021
Second edition: February 2025
Production reference: 1240125
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83588-046-3
www.packtpub.com
I’d like to thank everyone who directly or indirectly contributed to making this book happen. First and foremost, a huge shoutout to the team at Packt, who made the process of writing a book way less painful than it could’ve been. Special thanks to Aparna Nair and Priyanka Soam for being so understanding when I kept saying I needed more time to finish chapters, and to David for all the super valuable feedback on my very first chapter draft. I also want to thank Quincy Larson from FreeCodeCamp for accepting my submitted piece for Medium back in 2017 and editing it so well that it became a hit, even helping John Maeda learn TensorFlow. A huge thanks to my managers, Carlos Porto Filho and Talita Menezes Brognara, for always championing my work and supporting my growth—you’re the best managers anyone could ask for. To all my friends, thank you for sticking with me. Special thanks to Nicole Charron and Sabrina Guimarães for putting up with my daily complaints about having to finish a chapter, and to Suelen Mazza for always running late when we’d go out, giving me just a bit more time to write. Last but not least, to my family—thank you for loving me no matter my achievements. Augusta and Carlos, I’m so proud to have you as my parents. Thanks for teaching me to always be a good person; for me, that’s the most important lesson in life.
– Déborah Mesquita
Déborah Mesquita is a data science consultant and writer. With a BSc in computer science from UFPE, one of Brazil’s top computer science programs, she brings a diversified skill set refined through hands-on experience with various technologies. Déborah has consistently delivered exceptional results in various data science projects, being able to navigate the business and technical sides of each project. Her ability to translate complex concepts into simple language, coupled with her quick learning and broad vision, make her an effective educator. Actively engaged in community initiatives, she works to ensure equitable access to knowledge, reflecting her belief that technology is not a panacea, but a powerful tool for societal improvement when used for that purpose. She writes a personal blog at deborahmesquita.com.
Duygu Altinok is a senior Natural Language Processing (NLP) engineer with 12 years of experience in almost all areas of NLP, including search engine technology, speech recognition, text analytics, and conversational AI. She has published several publications in the NLP domain at conferences such as LREC and CLNLP. She also enjoys working on open source projects and is a contributor to the spaCy library. Duygu earned her undergraduate degree in computer engineering from METU, Ankara, in 2010 and later earned her master’s degree in mathematics from Bilkent University, Ankara, in 2012. She is currently a senior engineer at German Autolabs with a focus on conversational AI for voice assistants. Originally from Istanbul, Duygu currently resides in Berlin, Germany, with her cute dog Adele.
Souvik Roy is a senior data scientist at Sun Life Financial, specializing in NLP and machine learning to address challenges in the financial services domain. He has over four years of experience and a master’s degree in machine learning from the University of Waterloo. Souvik focuses on developing innovative solutions to optimize client experience interactions and enhance financial strategies. At Bell Canada, he improved cross-selling efficiency by 30% through advanced NLP solutions. He has also contributed to transformer model compression research at Huawei Noah’s Ark Lab to optimize inference on resource-constrained devices. He thanks the authors and Packt Publishing for the opportunity to contribute to this book.
This section will introduce the basics of Natural Language Processing (NLP) with spaCy and guide you through the initial steps of setting up your environment. You’ll start by learning the core functionalities of spaCy, including its processing pipelines and data structures, providing a solid foundation for the more advanced topics that follow.
This part has the following chapters:
Chapter 1, Getting Started with spaCyChapter 2, Core Operations with spaCy