Title: | Access USPTO Bulk Data in Tidy Rectangular Format |
---|---|
Description: | Converts TXT and XML data curated by the United States Patent and Trademark Office (USPTO). Allows conversion of bulk data after downloading directly from the USPTO bulk data website, eliminating need for users to wrangle multiple data formats to get large patent databases in tidy, rectangular format. Data details can be found on the USPTO website <https://bulkdata.uspto.gov/>. Currently, all 3 formats: 1. TXT data (1976-2001); 2. XML format 1 data (2002-2004); and 3. XML format 2 data (2005-current) can be converted to rectangular, CSV format. Relevant literature that uses data from USPTO includes Wada (2020) <doi:10.1007/s11192-020-03674-4> and Plaza & Albert (2008) <doi:10.1007/s11192-007-1763-3>. |
Authors: | Raoul Wadhwa [aut, cre] |
Maintainer: | Raoul Wadhwa <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.4 |
Built: | 2025-03-06 04:33:52 UTC |
Source: | https://github.com/jyprojs/patentr |
Download and convert bulk patent data to tidy format from the USPTO website <https://bulkdata.uspto.gov>. Data can be returned as a data frame or written to a file (see 'output_file' parameter). Since USPTO issues patents weekly, at minimum, all patents from a given week must be acquired at once.
get_bulk_patent_data(year, week, output_file)
get_bulk_patent_data(year, week, output_file)
year |
integer vector containing years from which patents should be collected |
week |
integer vector of weeks within the corresponding 'year' element from which patents should be collected |
output_file |
variable of class 'character'; will output to that file in CSV format |
either 'TRUE' (placeholder) or object of class 'data.frame' (see param 'output_file' for details)
## NOTE: none of the examples are run due to the download requirement ## Not run: # download patents from the first week of 1976 and get data frame patent_data <- get_bulk_patent_data(year = 1976, week = 1) # download patents from the last 5 weeks of 1980 (and write to a file) get_bulk_patent_data(year = rep(1980, 5), week = 48:52, output_file = "patent-data.csv") ## End(Not run)
## NOTE: none of the examples are run due to the download requirement ## Not run: # download patents from the first week of 1976 and get data frame patent_data <- get_bulk_patent_data(year = 1976, week = 1) # download patents from the last 5 weeks of 1980 (and write to a file) get_bulk_patent_data(year = rep(1980, 5), week = 48:52, output_file = "patent-data.csv") ## End(Not run)
Convert WKU identifier provided in bulk patent files to patent number used in most sources. The References provided in bulk patent files are also in patent number format, not in WKU format.
wku_to_pno(wku)
wku_to_pno(wku)
wku |
character vector containing patent WKUs |
character vector containing patent numbers
# convert sample WKUs to patent number and print sample_wku <- c("RE028671", "03930271") print(wku_to_pno(sample_wku))
# convert sample WKUs to patent number and print sample_wku <- c("RE028671", "03930271") print(wku_to_pno(sample_wku))
A dataset containing information about patents issued by the United States Patent and Trademark Office (USPTO) <https://www.uspto.gov/> in the first week of the year 1976. This can be recreated by running the 'get_bulk_patent_data' function in the 'patentr' package and setting the 'year' and 'week' parameters to '1976' and '1', respectively.
y1976w1
y1976w1
A data frame with 1379 rows and 9 variables:
unique patient identifier
patent title
date on which patent application was submitted
date on which patent was issued by USPTO
patent inventor(s)
person(s)/corporation(s) to whom the patent was assigned
patent classification based on IPC system
patents referenced by this patent
free-text claims made about value of this patent