EconPapers FAQ for Archive Maintainers
Server Setup and Management
For RePEc to be able to mirror your archive, some minimal requirements on your server
must be fulfilled. These are listed here. In addition, note that ftp based archives are strongly preferred
if the archive contains many files. With a web based archive we suggest that you put papers from the
same year into a single .rdf file instead of creating a new file for each paper.
See the RePEc Data Check for the current mirroring status of your archive.
- FTP based archives
- The server must allow anonymous FTP
- The server must generate Unix-style directory listings
- This is typically only an issue with Windows based FTP servers. Using a command line
FTP client you should see something like this
C:\>ftp your.server.here
Connected to server
220 Swopec Microsoft FTP Service (Version 5.0).
User: anonymous
331 Anonymous access allowed, send identity (e-mail name) as password.
Password:
230 Anonymous user logged in.
ftp> dir
200 PORT command successful.
150 Opening ASCII mode data connection for /bin/ls.
dr-xr-xr-x 1 owner group 0 Nov 4 4:41 LogEc
dr-xr-xr-x 1 owner group 0 May 29 2003 RePEc
226 Transfer complete.
ftp: 132 bytes received in 0,00Seconds 132000,00Kbytes/sec.
Microsoft IIS: Open the properties dialog for the FTP server in the IIS management console.
Select Unix directory listing style in the Home Directory tab. Click on OK.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
- Web based archives
- All .rdf files must be linked (Directory Browsing)
- The mirroring process works by our robot accessing the URL given in your archive template (xxxarch.rdf) and follow
the links to all .rdf files that are reachable from this URL. If there is no link to a file our robot
will not know that the file exists and the file can not be mirrored. If a link to a file is removed the
file will be deleted from our copy of your archive.
Providing correct links to your .rdf files is thus crucial. The easiest way to ensure this is to let the
server automatically generate a listing of the directory content. In other words to enable directory
browsing.
If this, for some reason, is not possible you must maintain the directory listing yourself by placing
a html file in the directoyr with links to all the .rdf files. This file must then be served automatically
by the server when the directory is requested (i.e. when you enter the URL from your archive template in
a browser without adding any file name to the URL). With most servers the behaviour of automatically
serving a file is controlled by giving the file a special name (which is different for different servers).
Common names are default.htm, index.html and welcome.html. Consult your web master to find out the appropriate
way to do this on your web server.
Apache: Directory browsing is enabled by default. Otherwise Options Indexes in httpd.conf
or the per directory .htaccess files enables directory listings.
(Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
Microsoft IIS: Directory browsing is enabled for a directory by opening the properties dialog for
the directory in the IIS management console. Select the Directory tab and check Directory browsing.
Click on OK. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
- The URL can not be redirected
- Our robot will only follow links that are under the URL in your archive template (xxxarch.rdf). This prevents the robot
from trying to mirror the whole web. One consequence of this is that the robot will not follow redirects.
Redirects are typically used when content is moved to a different server or to a new location on
the same server. See the Moving an archive topic below if you need to move your archive or your archive
has moved.
- Microsoft IIS 6
- IIS 6 will only serve files with known extensions and .rdf is not included in the default list.
This results in the server giving a "404 Not Found" error when you follow a link to a .rdf even
if the link is correct and the file exists. This is corrected by setting a MIME type for .rdf files.
Open the properties dialog for the webserver in the IIS management applet and go to the
HTTP headers tab. There you click on MIME Types and add the .rdf extension with a
MIME type of text/plain. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
Normally these changes take effect immediately but in some cases it might be necessary to stop and restart the
virtual web server or the WWW service.
If restarting doesn't help Microsoft
suggests adding a MIME type for the '*' extension with a MIME type of application/octet-stream. This will allow
the server to serve all files and should only be set for the archive directory. Finally, IIS will use the extensions listed
under the HKEY_CLASSES_ROOT key in the registry as a last resort so adding a file type for .rdf files in Windows
Explorer might resolve the problem.
- General Issues
- Moving an archive
- The following procedure should be used when moving an archive:
- Edit your archive template (xxxarch.rdf) and change the URL line to indicate the new location of the archive
- Copy all the files, keeping the directory structure intact, to the new location.
- Wait a day or two for us to pick up the change. Check that we have picked up the change and that everything
is OK by going the to the
RePEc Data Check page for your archive and viewing our copy of the
archive template. You can delete the files from the old location if the URL line reflects
the new location and no mirroring problems are indicated.
- Contact
if there are any problems.
- The archive has moved and does not mirror anymore
- Edit the archive template (xxxarch.rdf) and update the URL line to reflect the new location. If possible
copy the edited archive template to the old location. The change should then be picked within a day or two.
Contact
if the change isn't picked up or it is not possible to copy the file to the old location.
Error Messages in the Mirroring Logs
This is a list of the most common error messages in the mirroring logs. Contact
if an errror message is unclear or missing from this list.
- HTTP error messages (web based archives)
The exact error message can vary here, the significant part is the numeric code.
- w3mir: index.html: 99 Host lookup failure
- The DNS lookup for your server failed. This could be a temporary problem but it is quite
likely that the server indicated in the archive template (xxxarch.rdf) no longer exists
or has changed name. See The archive has moved above.
- w3mir: index.html =>> http://some.server/some/location/, don't want it
w3mir: Warning: http://some.server/some/location/ is marked as -302.
- Your archive directory is being redirected to a different location, presumably because
your server has been reorganized. While this works with normal browsers it does not work
with our mirroring software. See The URL can not be redirected
and The archive has moved above.
Note that this is some times a redirection to an error page. This is an incredibly vile
practice which makes the web much less useful. Web servers should send the appropriate
error code so that the correct action can be taken programmatically.
- w3mir: filename 401 Unauthorized
- Your archive is password protected. Ask your IT support staff to remove the password
protection.
- w3mir: filename 403 Access Denied
- Your server is refusing to send us the requested file. In almost all cases
the filename is index.html meaning that the server is not set up to send us the
directory listing we need in order to mirror the files. You need to enable
directory browsing or create a html file with links
to all the .rdf files.
- w3mir: filename 404 Not Found
- The file can not be found on the server. If the filename is index.html it is likely
that the whole archive is missing from the server. Please reinstate the files. (We might
have a backup of your data if you have lost the files, contact
if this is the case.) Changes to the structure of your web server is another likely source of this error.
See the archive has moved topic.
If the error occurs for a single .rdf file this means the link is bad. Please correct
the link.
If the error occurrs for all .rdf files you are likely running IIS6 and need
to configure a MIME type for .rdf files.
- w3mir: filename 500 Server Error
- Your server was unable to fulfill the request for this resource. Please consult your IT support staff.
- w3mir: Some error occured, conservative file removal
- The connection to your server was closed unexpectedly and the mirroring process could not
be completed. This is typically a temporary problem. It should be investigated if it persists.
- w3mir: directory namerm_rf(directory name); at c:\perl\bin\w3mir4remi.pl line nnnn.
- The most likely cause of this error message is that a link to a series directory did not include a trailing slash (/).
With some web servers this causes our mirroring software to believe that the link is to a file and this
causes problem when saving the data.
It is always good practice to include the trailing slash when linking to the series directories even if
mirroring will work fine without it in most cases.
- remi: w3mir timed out (killed) after 7201 seconds
- The connection to your server is extremely slow or timed out in a way not detected by
our robot. This is typically a temporary problem but should be investigated if it
persists.
- FTP error messages
- Cannot login, skipping package
- Your FTP server does not allow anonymous FTP. Anonymous FTP is required for the
mirroring to work.
- Cannot connect, skipping package
- Your FTP server is not responding when we try to connect to it or the DNS-lookup for the server fails.
This could be a temporary problem but should be investigated if it persists for more than a few days.
- Cannot get remote directory details (directory_name)
- This might be a temporary problem, the connection with your server was lost for some reason.
The problem should be investigated if it persists. The most likely cause for persistent problems is that
the archive directory is missing from the server. Please reinstate the directory and the .rdf files.
(We might have a backup of your data if you have lost the files, contact
if this is the case.)
- Cannot get remote directory listing because: 150 some more details
- This might be a temporary problem, the connection with your server was lost for some reason.
The problem should be investigated if it persists. The most likely cause for persistent problems is that
your ftp server is using a non-standard data port and the transmission of data from your server is being
blocked by a firewall. This behaviour violates the standard defining FTP transactions (see section
3.2 and 3.3 of RFC 959) and should be avoided. Please configure
your ftp server to behave in a standards compliant way. (We are aware that this non-standard behaviour is
intended to enhance security. If you care about security you should simply refuse to establish a data
connection without a preceeding PASV (and say so in the response) or at a minimum indicate that a
non-standard port is used in the response.)
- No files to transter
- This is, strictly speaking, not an error message but indicates a mirroring problem if you have .rdf files
in your archive and there are no files mirrored to EconPapers (check the "mirrored files" link on the check
page for your archive).
This condition is caused by our mirroring software not being able to read or interpret the directory listings
generated by your server. Likely causes are that the anonymous user does not have read access
to the directory or that there is a problem with the way directory listings
are generated (particularly common with Microsofts ftp server).
Providing data to get the most out of EconPapers and RePEc
- First of all, provide as much data on each paper as possible. Abstract, key words, JEL-codes, the date the paper was
written. This will increase the exposure for your papers and makes it more likely that they are found when
people search in EconPapers and other RePEc services.
- For working papers, use the Number field to provide the working paper number. This makes it easier for people
to reference the paper. The working paper number is also the basis for the sorted list of papers in a series provided
by EconPapers.
- EconPapers must fall back on other information and may end up sorting in a strange order if the working paper
number is missing for some papers or not provided in a consistent format (e.g. nn or
yyyy-nn).
- The working paper number is often encoded in the Handle field and EconPapers tries to parse the handle
if the number is missing for some papers.
- If EconPapers fails to parse the handle the sort will be based on the Creation-Date or Revision-Date
fields if there is a date field for each paper.
- If all else fails EconPapers will do a character based sort on the handle.
- For journal articles, use the Year, Volume, Issue and Pages fields if applicable. This makes
it easier for people to reference the paper. This informatione is also the basis for the grouped and sorted
sorted list of articles in a journal provided by EconPapers.
- EconPapers will use as much as possible of this information. Pages are sorted within issues and issues within
volumes or years.
- Pages are sorted within years or volumes if the issue information is missing for some articles.
- The sort within issue, year or volume is based on the Handle field if pages is missing for some articles.
- The sort is based on the handle if year, volume and issue is missing.
- Never change the Handle of an item. The handle is a persistent and unique identifier for items in the RePEc data
base. Changing handles causes the RePEc Author Service and
LogEc to loose track of the item.
- Consider using the Author-Name-Last and Author-Name-First fields. This makes proper parsing of
author names much easier and is one of the requirements for inclusion in the
EconLit data base.