|
|
EconPapers FAQ for Archive Maintainers
Setting up and maintaining a RePEc archive
- Common syntax mistakes
- Spelling...
- Plain spelling mistakes, like Auhtor-Name:, Lenght:, Title : (note the extra space between "Title" and ":") happen.
Using preset templates prevents that.
- Use of fields that does not exist in ReDIF
- Our scripts will stumble on fields that are unknown to them. For example, File-Address: does not exist,
use File-URL: instead. See the ReDIF documentation for a complete
list of available fields.
- Line starts with inappropriate field
- This can happen accidentally, for example with a title or an abstract, when by chance a line starts with a word
immediately followed by a colon (":"). This is then interpreted as a field instead of part of the abstract.
The solution is to let a space precede this word.
- Bad use of clusters
- Information about an author has to be in one single cluster, before starting with the co-author. the same
applies to various other clusters, like File-*, Author-*, Provider-*, etc. All clusters have to start with
the *-Name:, for example Author-Name:, except for File-* that starts with File-URL:.
- Problems with handles
- Each entry must have a handle that is unique. Each handle must contain the proper archive code (3 letters) and
the proper series code (6 characters). For example, for a paper it would be RePEc:cre:crefwp:75 or
RePEc:wop:turnip:planting3456.
- File-URL: wraps in the address
- File-URL: should not wrap in the middle of the address, no browser can understand that. The address can
be put on a separate line, though, as long as it starts with a space.
- Series handles do not match
- The handle for each series is specified in aaaseri.rdf (or aaaseri.redif) and should have 6 characters (RePEc:aaa:ssssss).
The items of this series should be located in a directory with the same 6 character name. Each item handle
should contain the same 6 character code.
- New additions to my archive does not appear in RePEc services
This is most likely caused by a mirroring problem. Please check the mirroring status of your
archive at the RePEc Data Check. You can use the mirrored files link
on the Data Check page for you archive to verify that all files in your archive are up to date
and mirrored. If they are, the problem is probably caused by syntax errors.
- Some papers/articles etc from my archive are missing in RePEc services
This is most likely caused by a syntax error in the templates for the missing items. The
RePEc Data Check for your archive should show any syntax errors.
You can use the mirrored files link
on the Data Check page for you archive to verify that the files for the missing items are up to date
and mirrored.
- Items are missing from the RePEc Author Service
This is typically because the template is rejected because of a syntax error. See the RePEc Data Check for your archive.
If the syntax appears correct, the problem has typically to do with inappropriate information in the
Author-Name field. There should be one field for each author, and it should contain only the name.
Affiliations belong in Author-Workplace-Name, email addresses in Author-Email. Finally,
the names should be specified as First Last or Last, First. Anything else will confuse the RePEc
Author Service.
- How can I have an item automatically added to the author's profile?
In the Author-* cluster, add a line like this:
Author-Person: ppp00
where ppp00 is the short ID that can be found on an author's page on EconPapers or with this
lookup tool. Make sure, as always, that Author-Name: is the first element of the cluster.
- How can I add a new series to my archive?
This is quite simple. Add another series template to your aaaseri.rdf (aaaseri.redif) file, where
aaa is your archive code. Make sure that the series handle is the archive handle
plus six letters, for example RePEc:aaa:wpaper. Then create a wpaper directory within your
archive directory. For more details, see the series template description from the
step by step instructions.
- My working paper series has been selected for inclusion in Econlit. What do I need to do?
Econlit requires more information than RePEc does. But most of this information can be provided with
the regular RePEc fields. Make sure there is: Abstract, Keywords, Classification-JEL, Length,
Creation-Date. In addition, Econlit requires that first and last names of all authors are
explicitely specified. Thus, after each Author-Name, put the tags Author-Name-First
and Author-Name-Last.
- What should I use to edit the .rdf/.redif files?
The .rdf/.redif files are plain text files and should be edited with a text editor such
as Notepad. Using a word processor (Word, WordPerfect, etc) will only cause problems.
- Unicode: Which character set is used for templates?
The default character set is Windows-1252 (Western European), this is a superset of ASCII and ISO-8859-1
(Latin-1) so these character sets are also accepted by default. Unicode is also acceptable
(and necessary with characters unavailable in Western European alphabets). Unicode files must
start with a BOM (Byte Order Mark) otherwise Windows-1252 is assumed. The files can be saved in
the UTF-8 or UTF-16 encodings.
As an alternative to starting the file with a Unicode BOM, the use of Unicode can be signalled by
a .redif extension instead of .rdf. This assumes the UTF-8 encoding and if another encoding is used
a BOM is still needed.
Server Setup and Management
For RePEc to be able to mirror your archive, some minimal requirements on your server
must be fulfilled. These are listed here. In addition, note that ftp based archives are strongly preferred
if the archive contains many files. With a web based archive we suggest that you put papers from the
same year into a single .rdf/.redif file instead of creating a new file for each paper.
See the RePEc Data Check for the current mirroring status of your archive.
- Directory structure
There are very strict rules for the directory structure of your archive. The rules ensure that RePEc
services can display your data correctly.
- The link in the URL: line in your archive template (xxxarch.rdf/ or xxxarch.redif) must point to a directory on your
web or ftp server.
- This directory must contain two .rdf/.redif files, xxxarch.rdf/.redif (with your archive template) and
xxxseri.rdf/.redif (with your series templates). xxx represents your archive code assigned by RePEc and
stated in the Handle: line of your archive template (Handle: RePEc:xxx).
- Each series template in xxxseri.rdf/.redif must have a unique handle constructed by adding a colon and a six character
code to your archive handle (Handle: RePEc:xxx:wpser1, Handle: RePEc:xxx:wpser2 etc).
- For each series template there must be one subdirectory with the same name as the six character code from the Handle
(e.g. wpser1).
This directory holds the .rdf/.redif files with templates for all the items in the series (Paper, Article, Book, Chapter
or Software templates depending on the type of the series).
To summarize, the directory structure should thus be like this (where <URL> represents the link in your URL: line):
<URL>/xxxarch.rdf
<URL>/xxxseri.rdf
<URL>/wpser1/redif1.rdf
<URL>/wpser1/redif2.rdf
...
<URL>/wpser2/redif.rdf
- One file, many files and templates generated from a data base
For mirroring efficiency (and to reduce the work load of updating RePEc services) three things are important.
These issues are particularly important to consider for archives that generate the ReDIF templates from
bibliographic data in a data base. Keeping the data in a data base and generating the ReDIF templates and
other content automatically is something we encourage, it often leads to better quality data and reduces the
time needed for maintaining the archive.
The three things to consider:
- To have a balance between the size and number of files (or "response sets" for dynamically generated ReDIF
templates). Each file requires a separate request to be sent in order to determine if the file should be
mirrored. This takes time. Having a single large file means only one request is needed, which is good. The
downside is that the whole file must be downloaded each time a record is added or updated. RePEc services
must also process all the records in order to determine what has changed. Having one file per year of a
journal or a working paper series often gives a good balance.
- Web based services should support "Conditional GET" requests (with If-Modified-Since). This is how we find out if the
file has been updated and should be mirrored. If conditional GET is not supported we must download all
files every day and check them for updates. All recent web servers supports conditional GET if you have
a standard file based RePEc archives so this is mainly an issue for those who generate the ReDIF templates
dynamically. It is then up to the application generating the templates to implement conditional requests
properly.
- As an alternative to generating the templates dynamically they can be written out to files automatically.
If so the files should only be updated if the data has been updated. Writing out all the files whether
the data has changed or not defeats the purpose of the conditional GET, forces us to download the files
and search through them to find what - if any - has changed.
- FTP based archives
- The server must allow anonymous FTP
- We are aware that this could be considered a security issue. However, RePEc only requires
read access to the server and with a properly configured ftp server this should be just as secure
as a web based arcive. (In any case it is far better than handing out user ids and passwords
to the relative strangers that run RePEc.)
- The server must generate Unix-style directory listings
- This is typically only an issue with Windows based FTP servers. Using a command line
FTP client you should see something like this
C:\>ftp your.server.here
Connected to server
220 Swopec Microsoft FTP Service (Version 5.0).
User: anonymous
331 Anonymous access allowed, send identity (e-mail name) as password.
Password:
230 Anonymous user logged in.
ftp> dir
200 PORT command successful.
150 Opening ASCII mode data connection for /bin/ls.
dr-xr-xr-x 1 owner group 0 Nov 4 4:41 LogEc
dr-xr-xr-x 1 owner group 0 May 29 2003 RePEc
226 Transfer complete.
ftp: 132 bytes received in 0,00Seconds 132000,00Kbytes/sec.
Microsoft IIS: IIS 6: Open the properties dialog for the FTP server in the IIS management console.
Select Unix directory listing style in the Home Directory tab. Click on OK.
IIS 7 with FTP Service 7.5 and IIS 8: Go to the FTP server in IIS manager and select FTP Directory Browsing.
Select Unix directory listing style and click on Apply.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
- Web based archives
- No javascript
- Our mirroring software does not understand javascript and any javascript will thus be ignorred. Mirroring will only
work if the functionality described below does not rely on javascript.
- All .rdf/.redif files must be linked (Directory Browsing)
- The mirroring process works by our robot accessing the URL given in your archive template (xxxarch.rdf) and following
the links to all .rdf/.redif files that are reachable from this URL. If there is no link to a file our robot
will not know that the file exists and the file can not be mirrored. If a link to a file is removed the
file will be deleted from our copy of your archive.
Providing correct links to your .rdf/.redif files is thus crucial.
- The easiest way to ensure this is to let the
server automatically generate a listing of the directory content. In other words to enable directory
browsing.
Apache: Directory browsing is enabled by default. Otherwise Options +Indexes in the httpd.conf
or the per directory .htaccess files enables directory listings.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
Microsoft IIS: IIS 6: Directory browsing is enabled for a directory by opening the properties dialog for
the directory in the IIS management console. Select the Directory tab and check Directory browsing.
Click on OK.
IIS 7 and later: Go to the directory holding your archive in the IIS manager and select Directory Browsing. Click on enable
in the right pane to enable directory browsing.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
- If it, for some reason, is not possible to let the server generate a directory listing
you must maintain the directory listing yourself by placing
a html file in the directory with links to all the .rdf/.redif files. This file must then be served automatically
by the server when the directory is requested (i.e. when you enter the URL from your archive template in
a browser without adding any file name to the URL). With most servers the behaviour of automatically
serving a file is controlled by giving the file a special name (which is different for different servers).
Common names are default.htm, index.html and welcome.html. Consult your web master to find out the appropriate
way to do this on your web server.
- As a last resort, if the server configuration does not allow that a file is served automatically when the
directory is requested, you can use a file with a special name. Allowed names are default, index, or welcome
with one of the suffixes .asp, .aspx, .htm, .html or .php. The name of the file must then be included in the URL:
line in your archive template. This is the one and only exception to the rule that the URL: line must point
to a directory on your server.
- The URL can not be redirected
- Our robot will only follow links that are under the URL in your archive template (xxxarch.rdf). This prevents the robot
from trying to mirror the whole web. One consequence of this is that the robot will not follow redirects.
Redirects are typically used when content is moved to a different server or to a new location on
the same server. See the Moving an archive topic below if you need to move your archive or your archive
has moved.
- http and https are different
- Logically http://example.com/abc/ and https://example.com/abc are different and may refer to completely different content
(even if it in practice usually is the same). As https is a more secure protocol we will follow redirects to the equivalen
https URL and links to https in a page served with http. But not the other way around.
- Conditional GET
- The server should support "Conditional GET" requests (with If-Modified-Since). This is how we find out if the
file has been updated and should be mirrored. If conditional GET is not supported we must download all
files every day and check them for updates. All recent web servers supports conditional GET if you have
a standard file based RePEc archives so this is mainly an issue for those who generate the ReDIF templates
dynamically. It is then up to the application generating the templates to implement conditional requests properly.
- The URL can not be password protected
- The data in a RePEc archive is hardly sensitive and will be publicly available in several RePEc services. There is
thus no need to password protect the data as this offer no extra security. Not password protecting the data is,
in any case, far better than handing out user ids and passwords to the relative strangers that run RePEc.
- Microsoft IIS 6 and later
- Version 6 and later of IIS will only serve files with known extensions and the extensions used by RePEc, .rdf and .redif, are not included in the default list.
This results in the server giving a "404 Not Found" error when you follow a link to a .rdf/.redif file even
if the link is correct and the file exists. This is corrected by setting a MIME type for .rdf and .redif files.
IIS 6: Open the properties dialog for the webserver in the IIS management applet and go to the
HTTP headers tab. There you click on MIME Types and add the .rdf and .redif extensions with a
MIME type of text/plain.
IIS 7 and later: Go to the directory holding your archive in the IIS manager and select MIME Types. Add the .rdf and .redif extensions with a
MIME type of text/plain.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
Normally these changes take effect immediately but in some cases it might be necessary to stop and restart the
virtual web server or the WWW service.
If restarting doesn't help Microsoft
suggests adding a MIME type for the '*' extension with a MIME type of application/octet-stream. This will allow
the server to serve all files and should only be set for the archive directory. Finally, IIS will use the extensions listed
under the HKEY_CLASSES_ROOT key in the registry as a last resort so adding a file type for .rdf/.redif files in Windows
Explorer might resolve the problem.
- User-Agent string
- The RePEc robot software identifies itself with a User-Agent string of the form "remi+version/webmirror+(URL)"
indicating the current version of the robot, which software the robot is based on and a URL to the RePEc service doing the mirroring.
For EconPapers this is as of this writing "remi+2.03/webmirror+(https://econpapers.repec.org)". If you check your web server logs you will find
that several RePEc services are mirroring your data. They all need to do this. The robot should, however, be quite well
behaved. Please let us know ()
if you are experiencing problems.
Please note that the only significant parts in the user agent string are "remi" and "webmirror", all other parts are subject to change.
- Changing archive locations
- Moving an archive
- The following procedure should be used when moving an archive:
- Edit your archive template (xxxarch.rdf) and change the URL line to indicate the new location of the archive
- Copy all the files, keeping the directory structure intact, to the new location.
- Wait a day or two for us to pick up the change. Check that we have picked up the change and that everything
is OK by going the to the
RePEc Data Check page for your archive and viewing our copy of the
archive template. You can delete the files from the old location if the URL line reflects
the new location and no mirroring problems are indicated.
- Contact
if there are any problems.
- The archive has moved and does not mirror anymore
- Edit the archive template (xxxarch.rdf) and update the URL line to reflect the new location. If possible
copy the edited archive template to the old location. The change should then be picked within a day or two.
Contact
if the change isn't picked up or it is not possible to copy the file to the old location.
- It is impossible for us to set up a server that complies with RePEc mirroring requirements. Is all hope lost?
- See "Using GitHub to host a RePEc archive" below.
- If you only maintain working paper series, you can join the RePEc Input Service.
- Using GitHub to host a RePEc archive
- Create a repository on github.com and populate it with the files for archive and using the directory structure of a RePEc archive.
- Unfortunately we cannot mirror directly from the repository as github.com uses javascript and embeds the files we want in html pages.
- To make the archive accessible to RePEc you need to expose it through Github pages (https://docs.github.com/en/pages). This will create a site for you in the github.io domain.
- Create index documents with links to the files so we can find them.
- Edit the URL line of your archive template (xxxarch.rdf or xxxarch.redif file) to point to your github.io page for the archive (the directory holding the archive template file).
- Send an e-mail to
to let us know that your archive is ready to be mirrored.
- My server is getting a lot of hits from RePEc, what is this?
- RePEc needs to mirror the data in your RePEc archive, due to the decentralised nature of RePEc this is done by several different services like EconPapers and IDEAS. To ensure that
our content is up to date this is done on a daily basis. In the web logs this will show up as the User-Agent "remi+2.03/webmirror+(https://econpapers.repec.org)" or similar.
We also run a link checker to ensure that the links to full text files are correct. This is done about once a week for a given series. The User-Agent of the link checker is
"RePEc link checker (https://EconPapers.repec.org/check/)".
- The link checker
- We run a link checker to verify that links to the full text files work (most journals are excluded from the link check). In archives with many papers this might trigger security software on the web
server to identify the accesses as excessive or malicious and the link checker is blocked. If a link doesn't work for the link checker (genuine bad link or the checker is being blocked) the link will
be flagged as bad on RePEc services and the paper will not be included in the New Economics Papers e-mails about new work. To avoid this you might need to white list the User-Agent of the links checker.
The User-Agent string is currently "RePEc link checker (https://EconPapers.repec.org/check/)" and you should match on "RePEc link checker" as this will not change.
-
Error Messages in the Mirroring Logs
This is a list of the most common error messages in the mirroring logs. Contact
if an errror message is unclear or missing from this list.
- FTP error messages
- Cannot login, skipping package
- Your FTP server does not allow anonymous FTP. Anonymous FTP is required for the
mirroring to work.
- Cannot connect, skipping package
- Your FTP server is not responding when we try to connect to it or the DNS-lookup for the server fails.
This could be a temporary problem but should be investigated if it persists for more than a few days.
- Cannot get remote directory details (directory_name)
- This might be a temporary problem, the connection with your server was lost for some reason.
The problem should be investigated if it persists. The most likely cause for persistent problems is that
the archive directory is missing from the server. Please reinstate the directory and the .rdf/.redif files.
(We might have a backup of your data if you have lost the files, contact
if this is the case.)
- Cannot get remote directory listing because: 150 some more details
- This might be a temporary problem, the connection with your server was lost for some reason.
The problem should be investigated if it persists. The most likely cause for persistent problems is that
your ftp server is using a non-standard data port and the transmission of data from your server is being
blocked by a firewall. This behaviour violates the standard defining FTP transactions (see section
3.2 and 3.3 of RFC 959) and should be avoided. Please configure
your ftp server to behave in a standards compliant way. (We are aware that this non-standard behaviour is
intended to enhance security. If you care about security you should simply refuse to establish a data
connection without a preceeding PASV (and say so in the response) or at a minimum indicate that a
non-standard port is used in the response.)
- No files to transter
- This is, strictly speaking, not an error message but indicates a mirroring problem if you have .rdf/.redif files
in your archive and there are no files mirrored to EconPapers (check the "mirrored files" link on the check
page for your archive).
This condition is caused by our mirroring software not being able to read or interpret the directory listings
generated by your server. Likely causes are that the anonymous user does not have read access
to the directory or that there is a problem with the way directory listings
are generated (particularly common with Microsofts ftp server).
Providing data to get the most out of EconPapers and other RePEc services
- First of all, provide as much data on each paper as possible. Abstract, key words, JEL-codes, the date the paper was
written. This will increase the exposure for your papers and makes it more likely that they are found when
people search in EconPapers and other RePEc services.
- For working papers, use the Number field to provide the working paper number. This makes it easier for people
to reference the paper. The working paper number is also the basis for the sorted list of papers in a series provided
by EconPapers.
- EconPapers must fall back on other information and may end up sorting in a strange order if the working paper
number is missing for some papers or not provided in a consistent format (e.g. nn or
yyyy-nn).
- The working paper number is often encoded in the Handle field and EconPapers tries to parse the handle
if the number is missing for some papers.
- If EconPapers fails to parse the handle the sort will be based on the Creation-Date or Revision-Date
fields if there is a date field for each paper.
- If all else fails EconPapers will do a character based sort on the handle.
- For journal articles, use the Year, Volume, Issue and Pages fields if applicable. This makes
it easier for people to reference the paper. This informatione is also the basis for the grouped and sorted
sorted list of articles in a journal provided by EconPapers.
- EconPapers will use as much as possible of this information. Pages are sorted within issues and issues within
volumes or years.
- Pages are sorted within years or volumes if the issue information is missing for some articles.
- The sort within issue, year or volume is based on the Handle field if pages is missing for some articles.
- The sort is based on the handle if year, volume and issue is missing.
- IDEAS tries to list items in chronological order based on the date in
the template. Use the Creation-Date date field for for papers and software components and the Year
field for articles, books and chapters to ensure a proper sort order.
- Never change the Handle of an item. The handle is a persistent and unique identifier for items in the RePEc data
base. Changing handles causes the RePEc Author Service and
LogEc to loose track of the item.
- Consider using the Author-Name-Last and Author-Name-First fields. This makes proper parsing of
author names much easier and is one of the requirements for inclusion in the
EconLit data base.
|
|
|