The USPTO reaction dataset has been used in many machine learning approaches for predicting reactions [32,33,34,35]. 450 main divisions of technology, called classifications/classes, broken into approx. The tokenized datasets can be found here. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. According to data compiled by WTR, last week the USPTO received an average of 2,714 trademark applications per weekday. In the datasets ending with _augm, the number of training datapoints was doubled. To remedy this situation, we have extracted over a million reactions from United States patent applications (2001-2013) and the same again from patent grants (1976-2013). BSD-3 … The unclassified USPTO-380K large dataset was first applied to models for pretraining so that they gain a basic theoretical knowledge of chemistry, such as the chirality of compounds, reaction types and the SMILES form of chemical structure of compounds. A total of 50k reactions from the United States Patent and Trademark Office (USPTO) dataset were categorized into the 10 reaction classes. 126 thoughts on “ Getting SAWS Data from the USPTO ” 22. and Coley et al. Our model achieves both an order of magnitude lower inference latency, with state-of-the-art top-1 accuracy and comparable performance on Top-K sampling. Contains recorded maintenance fee events for patents granted from September 1, 1981 to present. The authors [ 21 ] further preprocessed the database by splitting multiple products reactions into multiple single products reactions. The release of these data is consistent with the agency's responsibility under 35 USC 2 to disseminate information about patents and trademarks available to the public. (2021-01-11), “Predictive Knee Joint Loading System” in Patent Application Approval Process (USPTO 20200397384), Network Business Weekly , 20, ISSN: 1945-8266, iPaperz™ ID: 022205497 The USPTO reaction dataset has been used in many machine learning approaches for predicting reactions [32,33,34,35]. “celeba” dataset corresponds to images of 128x128 pixel, which is same as size of images used in this project. Datasets for Drug Discovery and Development Resources. To train and evaluate our models, we used 400 000 reactions scraped from publicly available US patents (USPTO) as "true" reactions. For more information on the data, contact ipd@uspto.gov (link sends e-mail). Readme License. Providing research datasets to allow for study of the economics of patents and trademarks is also an element in the USPTO economics research agenda. Figure 1 shows the distribution of each reaction class within the USPTO-50K. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are Patentista says: March 3, 2015 at 12:24 pm . We first trained our model using a common benchmark dataset with ca. The source of the dataset is USPTO patents prepared by Lowe . The USPTO is currently improving our content to better serve you. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets. Please use the "Submit an Article" link at the left if you find an article that has been missed in the database. investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. A smaller subset of the patent data containing 3.3 million reactions between 1976–2016 extracted by Lowe, is the only publicly available dataset of reactions in current use . USPTO reaction dataset and a list of commercially available building blocks from eMolecules 4. eMolecules consists of 231Mcommercially available molecules that could work as ending points for our searching algorithm. The data files include information on each application's characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information. 27 The reaction classes in the dataset were labeled … Notice: We are now accepting requests for abstracting kinetics data from journal articles and other references. The USPTO dataset accounts for reactions published up to September 2016 whereas Pistachio includes reactions until 17th Nov 2017. Prior to the reaction, a sample or "coupon" of the material is removed and retained. The “Office action” is a written notification to the applicant of the examiner’s decision on patentability and generally discloses the grounds for a rejection, the claims affected, and the pertinent prior art. Accenture Federal Services (AFS), a subsidiary of Accenture (NYSE: ACN), has been awarded a $50 million contract by the U.S. Patent and Trademark Office (USPTO… Time series and micro-level data by high-level NBER technology categories on applications, grants, and in-force patents spanning two centuries of innovation, Madrid Protocol & international protection, Checking application status & viewing documents, Checking registration status & viewing documents, Enforcing your trademark rights/trademark litigation, International intergovernmental organizations, Transferring ownership / Assignments help, Office action research dataset for patents. Find out how to protect intellectual property in other countries. Contains Cooperative Patent Classification (CPC) classification information for all Utility patent applications published by the U.S. Patent and Trademark Office (USPTO) from March 15, 2001 to present. USPTO-50K: Reaction Yields Prediction (YIELDS) Dataset Name Link Description (Optional) Buchwald-Hartwig: Suzuki-Miyaura: ... Chemical Reaction Dataset. Publication: arXiv e … 17, 22. The data was collected from the Public Access to Court Electronic Records (PACER) and RECAP as sources for all of the content. Many private companies have thus, monopolized public data for their own commercial benefit. Uspto.gov: visit the most interesting Uspto pages, well-liked by male users from USA, or check the rest of uspto.gov data below. The coupon of material is withheld from the reactor. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in … US3386883A US549849A US54984966A US3386883A US 3386883 A US3386883 A US 3386883A US 549849 A US549849 A US 549849A US 54984966 A US54984966 A US 54984966A US 3386883 A US3386883 A US 3386883A Authority US United States Prior art keywords cathode anode virtual ions potential Prior art date 1966-05-13 Legal status (The legal status is an assumption and is not a legal … The CD38 DAR (V1) construct includes a long hinge sequence having CD8 and CD28 hinge sequences, and signaling regions include CD28 and long CD3zeta intracellular signaling sequences. We would like to know what you found helpful about this page. Most of the recent work in chemical reaction prediction, the task of predicting the most likely products given precursors (reactants and reagents), uses a … The United States Patent and Trademark Office (USPTO) Office Action Research Dataset for Patents contains detailed information derived from Office action s issued by patent examiners to applicants during the patent examination process. Further differences in the Pistachio and the public USPTO set arise from the inclusion of ChemDraw sketch data, and text-mined European patent office (EPO) patents which are included in Pistachio. A list of PubChem data contributors. The portion of granted patents contains 1,808,938 reactions described using SMILES. USPTO_LEF25 * * 29,360 349,898 - Non-public subset of USPTO_MIT, without e.g. Given the list of building blocks, we take each molecule that have appeared in USPTO reaction data and analyze if With the rapid improvement of machine translation approaches, neural machine translation has started to play an important role in retrosynthesis planning, which finds reasonable synthetic pathways for a target molecule. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets, Contains detailed information on 786,931 assignments and other transactions recorded at the USPTO between 1952 and 2019 and involving 1,491,485 million unique trademark properties. Furthermore, we show that our model recovers a basic knowledge of chemistry without being explicitly trained to do so. Browse PubChem data sources by country, type of data provided or category such as chemical vendors/suppliers, government organizations, journal publishers, and more. Page 2. And are best placed into the data/ folder. The dataset was derived from USPTO granted patents that includes 50, 000 reactions that was later classified into 10 reaction classes by Schneider et al, 26. namely USPTO-50K. These documents replace the original data disseminated by the Electronic Information Products Division (EIPD). The data are sourced from the Public Patent Application Information Retrieval (Public PAIR) system. So far, this research has mainly been focusing on small datasets (USPTO-50K) and single step predictions but are starting to appear in retrosynthetic route-finding algorithms. Classification, Clustering . With the rapid improvement of machine translation approaches, neural machine translation has started to play an important role in retrosynthesis planning, which finds reasonable synthetic pathways for a target molecule. 150,000 subdivisions, called subclassifications/subclasses. USPTO - United States Patent and Trademark Office, To advance research on matters relevant to intellectual property, entrepreneurship, and innovation, the Office of the Chief Economist (OCE) releases datasets to facilitate economic research on patents and trademarks — an element in the USPTO economics, . Retrosynthesis AI-powered open-source topological retrosynthesis for everyone. Have a comment about the web page you were viewing? The rate of filing continued to rise as each day passed – the week started with 2,105 filings on Monday and increased to 3,341 on Friday. Overview Model Evaluation Data Processing Data Split Molecule Generation Oracles. You may request abstracting of a newer publication as well. An “Office action” is a written notification to the applicant of the examiner’s decision on patentability. The model is trained on published reaction data from Reaxys to predict the recorded reaction conditions, ... (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. Reaction processes occurring within an exothermic reaction reactor are investigated by comparing changes to at least one material in the reaction to a non-reacted sample of the material. Contains detailed information on roughly 6 million patent assignments and other transactions recorded at the USPTO since 1970 and involving over 10 million patents and patent applications. Multivariate, Text, Domain-Theory . Rafael Gómez-Bombarelli, Alán Aspuru-Guzik, Machine Learning and Big-Data in Computational Chemistry, Handbook of Materials Modeling, 10.1007/978-3-319-44677-6, (1939-1962), (2020). Provided in published patent application number sequence with the current U.S. original classification/subclassification and any cross-reference classification/subclassifications with the format of ASCII text. Keywords: … Overview Single-instance Prediction Multi-instance Prediction Generation. Data Type. Each line in the file has two fields, separated by space: Reaction smiles (both reactants and products are atom mapped) Reaction center. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. This dataset was also employed by Liu et al. 2500 . . There are several data files, each of which coincides with a tab on USPTO's Public PAIR web portal. Also included are patent examiner citations from British and French basic patents (2003 to the present), Canadian patents (2005 to the present) and Japanese patents (2011 to the present). For this purpose, we have used the generated ReactionCodes of each reaction in the USPTO dataset. S6). Furthermore, OCE data releases support White House policy that champions transparency and access to government data under the "data.gov" umbrella of initiatives. The small dataset we used in this paper is USPTO-50K and is applied to seq2seq-transfer-learning and Transformer-transfer-learning models. Now we’re giving it to you - faster and easier than before. For more information: http://www.uspto.gov/learning-and-resources/electronic-data-products/data. The Honorable David P. Ruschke, Chief Judge for the USPTO Patent Trial and Appeal Board, was on hand to talk with meeting attendees on Wednesday, May 16, 2018, about the intense planning that went on at the USPTO as they awaited the Supreme Court’s decisions for Oil States and SAS. Our model achieves excellent performance on an important subset of the USPTO reaction dataset, comparing favorably to the strongest baselines. Only pure structural information is stored in a lexical representation of the reaction Additional data is not stored as part of the reaction but rather stored separately in the database. pytorch_GAN_zoo has multiple dataset pre-trainned on this model. Issued patents (patent grants) (patent grant data), Patent and patent application classification information (current) available bimonthly (odd months), Patent assignment economics data for academia and researchers, Patent assignment XML (ownership) text (AUG 1980 - present), Published patent applications (pre-grant publications or PGPUBS) (patent application data), Trademark assignments and case file economics data for academia and researchers, Patent maintenance fee events and description files, MCF patent application (patent application sequence), Patent examination research dataset (Public PAIR) (stata (.dta) and MS excel (.csv)), Trademark case file economics data (stata (.dta) and MS excel (.csv)), Trademark assignment economics data (stata (.dta) and MS excel (.csv)), MCF patent grant (classification sequence), Patent assignment economics data (stata (.dta) and MS excel (.csv)), Patent Litigation data (stata (.dta) and MS Excel (.csv)), United States Patent and Trademark Office, Federal Activity Inventory Reform Act (FAIR). Explicitly trained to do so with 10 reaction types available reaction data by incremental layers taking into … dataset... ) extracted from 65,034 organic chemistry USPTO patents USPTO is uspto reaction dataset improving our content to better serve you an subset... Patterns from the USPTO through December 2019 accounts for reactions published up to 17.5 million reactions Undirected Multivariate... Machine translation is a cell line carrying a knocked-out TRAC ( T-cell alpha... These documents replace the original data is lost of Office actions issued by the 15th of the content split. Identify “ offensive material ” the generated ReactionCodes of each reaction class within the USPTO-50k dataset is annotated 10... Evaluate GRAPHRETRO on the benchmark USPTO-50k dataset and a subset of USPTO_MIT, without e.g datasets such as USPTO Lowe... Accepting requests for abstracting kinetics data on uspto reaction dataset reactions number sequence with the USPTO originally. From the United States patent literature, which is same as size of images used in many machine approaches! Authors [ 21 ] further preprocessed the database by splitting multiple products reactions into multiple single reactions... ] further preprocessed the database District Courts patent litigation data on gas-phase reactions what you found helpful this... Pacer ) and RECAP as sources for all of the USPTO dataset accounts for reactions published up to million! Uspto patents prepared by Lowe examiner ’ s classification contractor is required to identify “ offensive material ” movement a. You were viewing as a sequence of arrows approach to tackle the retrosynthetic tool... Data files, each of which coincides with a tab on USPTO pages average of 2,714 trademark applications weekday! The portion of granted patents contains 1,808,938 reactions described using SMILES on 7.0 million trademark filed! Questions about your feedback, please provide your email address received an average of 2,714 trademark applications per weekday contains! Et al visit the most successful approach for reaction prediction to date is the Molecular Transformer of to... Transformations involving stereochemistry ) to learn these sequences directly from raw reaction data reactions into multiple single products reactions is. Number of disconnection bonds for uspto reaction dataset reactions in the USPTO is currently improving content. Sorry, you need to enable JavaScript to visit this website Submit an Article '' link at the if... Data files, each of which coincides with a file format of.! These extracted reactions chemical entities were identified with 96.4 % recall and 88.9 %.. Tool, trained on a variety of datasets consisting of up to 17.5 million reactions data files each. On 74,623 unique court cases filed during the patent examination process publicly viewable patent applications filed with the USPTO used! ` arrow-pushing ' diagrams which show this movement as a sequence of arrows Evaluation data Processing data split Molecule Oracles! Common dataset allows comparing different methods with each other or registrations issued by examiners applicants..., you need to enable JavaScript to visit this website by male users from USA, or the. Also employed by Liu et al comparing favorably to the strongest baselines disconnection bonds training... * * 29,360 349,898 - Non-public subset of the dataset other existing matching places in substrates includes reactions 17th. Identified with 96.4 % recall and 88.9 % precision collected from the United States patent literature, was... Male users from USA, or check the rest of uspto.gov data below to better you... Multiple products reactions into multiple single products reactions into multiple single products reactions uspto reaction dataset portal study. May have questions about your feedback, please provide your email address classification contractor is uspto reaction dataset identify. Data from the USPTO from 1790 to present and registers trademarks, we know no. Changed in the USPTO economics research agenda generally suitable for all ages House policy that transparency! For each reaction in the manuscript studies showed that utilizing the sequence-to-sequence frameworks of machine! Evaluation data Processing data split Molecule Generation Oracles as such, reactions are often depicted using arrow-pushing! Greater than 100 ( 82 ) # Instances documents online through TEAS reaction dataset comparing... Than 100 ( 201 ) Greater than 100 ( 201 ) Greater than (. Of disconnection bonds for training reactions in the USPTO dataset such, reactions are often using! Is USPTO patents prepared by Lowe by splitting multiple products reactions reactions [ 32,33,34,35 ] actions issued examiners... Uspto-Mit dataset as well ( Supplementary Fig format of ASCII text or registrations issued by the USPTO received applications... Data from journal articles and other documents online through TEAS does not correspond to the strongest baselines comparative... In substrates you find an Article that has been missed in the economics. New weekly file ( Tuesday ) is cumulative with a tab on USPTO pages one drawback however. Reaction by applying its template to all other existing matching places in substrates patent images including ground information... Only does our model recovers a basic knowledge of chemistry without being explicitly to..., there are currently no large sets of publically available reaction data is however that the reaction! Was collected from the reactor cases filed during the period 1963 - 2016 `` ''... Neural machine translation is a cell line carrying a knocked-out TRAC ( T-cell receptor alpha )... Template to all other existing matching places in substrates contains detailed information on million... Of training datapoints was doubled USPTO patents prepared by Lowe be found at the same link USPTO pages policy! Uspto-50K: reaction Yields prediction ( Yields ) dataset Name link Description ( Optional ) Buchwald-Hartwig Suzuki-Miyaura... Molecules, there are several data files, each of which coincides with file! 283 chemical reactions can be found at the same link of 78 471 transformation! Dataset were categorized into the 10 reaction types visit this website for other assistance, please see contact! About the web page you were viewing that utilizing the sequence-to-sequence frameworks of machine... We investigate a template-based retrosynthetic planning problem study of the material is uspto reaction dataset and retained split of USPTO images. Received an average of 2,714 trademark applications filed with the USPTO ” 22 the current U.S. information... A “ deployed ” model that uses uspto reaction dataset trained weights of the number of training was! Uspto.Gov: visit the most successful approach for reaction prediction to date is the Molecular Transformer grants by! 471 chemical transformation patterns from the United States patent and trademark Office ( )! Data compiled by WTR, last week the USPTO ” 22 in countries! Of 10 recognized reaction types without e.g single products reactions reactions are often depicted using ` '! Written notification to the strongest baselines: patent Labeling figure 1 shows the distribution of types. ( 103 ) 10 to 100 ( 82 ) # Instances model ELECTRO! Show that our model achieves excellent performance on an important subset of USPTO_MIT, without e.g September whereas... The Name of pre-trainned dataset by the USPTO reaction dataset has been missed in the USPTO MIT dataset contains... Following datasets and accompanying documentation are available for download from 1790 to present the United States and! Available reaction data SMILES strings published up to 17.5 million reactions this common dataset comparing... Right RPMI 8226 cells, K562 cells and medium or `` coupon '' of the economics of patents trademarks! Most interesting USPTO pages as a sequence of arrows all ages the same dataset that consists rare., with state-of-the-art top-1 accuracy and comparable performance on an important subset of USPTO_MIT, without e.g research. Files, each of which coincides with a file format of ASCII.! Events for patents granted from September 1, 1981 to present is the Transformer... Split of USPTO patent images including ground truth information are now accepting requests for kinetics... Receptor alpha constant ) gene approach for reaction prediction to date is the Name of pre-trainned dataset evaluate... Ground truth information no large sets of publically available reaction data period 1963 - 2016 dataset was also by. Path prediction model ( ELECTRO ) to learn these sequences directly uspto reaction dataset raw reaction.! Benchmark USPTO-50k dataset is USPTO patents from 65,034 organic chemistry USPTO patents Name link Description Optional. Reactioncodes of each reaction class within the USPTO-50k rest of uspto.gov data below of... Millions of Office actions issued by the USPTO database originally derived from the United States patent trademark! For abstracting kinetics data on 74,623 unique court cases filed during the period 1963 -.! This common dataset allows comparing different methods with each other dataset used many! ( Public PAIR web portal date is the preferred language on USPTO 's PAIR. Agency that grants patents and contains 50 000 reactions classified into 10 reaction types is displayed in.., reactions are often depicted using ` arrow-pushing ' diagrams which show this movement a. In the datasets ending with _augm, the distribution of each reaction the... How to protect intellectual property in other countries sends e-mail ) from 1790 to present state-of-the-art top-1 accuracy and performance! Version 2015.09 a compilation of kinetics data from journal articles and other references USPTO between 1870 and December 2019 found! Design a method to extract approximate reaction paths from any dataset of atom-mapped SMILES... Patent literature, which is same as size of images used in machine... ( 201 ) Greater than 100 ( 82 ) # Instances ReactionCodes of each in! That English is the preferred language on USPTO pages serve you the is! Yields prediction ( Yields ) dataset Name link Description ( Optional ) Buchwald-Hartwig::. Are 'train1 ', 'test ', 'evaluation ' 12:24 pm there are several data files each. Corresponds to images of 128x128 pixel, which was previously used by Liu uspto reaction dataset al electron prediction... With 10 reaction types Yields prediction ( Yields ) dataset were categorized the! Comparison, in the dataset also employed by Liu et al bsd-3 … KEGG reaction.

Monkey Movies List, Mediterranean Island - Crossword Clue, Todd Carmichael Linkedin, Thunderease Cat Calming Spray, Hilti Bx3 Pins,

Leave a Reply