*** mibofra has quit IRC | 00:48 | |
*** mibofra has joined #schooltool | 00:48 | |
*** mibofra- has joined #schooltool | 00:49 | |
*** mibofra- is now known as mibofra | 00:52 | |
*** mibofra has joined #schooltool | 00:52 | |
*** replaceafill has quit IRC | 00:54 | |
*** menesis has quit IRC | 02:23 | |
*** th1a has quit IRC | 02:26 | |
*** yvl has joined #schooltool | 09:39 | |
*** menesis has joined #schooltool | 12:52 | |
*** yvl has quit IRC | 13:47 | |
*** yvl has joined #schooltool | 13:51 | |
*** yvl has quit IRC | 15:00 | |
*** menesis has quit IRC | 15:25 | |
*** th1a has joined #schooltool | 15:34 | |
*** menesis has joined #schooltool | 16:26 | |
*** replaceafill has joined #schooltool | 17:26 | |
*** menesis has quit IRC | 20:35 | |
*** menesis has joined #schooltool | 21:23 | |
*** replaceafill has quit IRC | 21:39 | |
*** replaceafill has joined #schooltool | 22:37 | |
th1a | hi replaceafill. | 22:40 |
---|---|---|
replaceafill | hey th1a | 22:40 |
th1a | I'm talking to our friend Telly. | 22:40 |
replaceafill | !!! | 22:40 |
replaceafill | how's she doing? | 22:40 |
th1a | She says "Had a meeting this morning | 22:41 |
th1a | And my project is at risk of dying if I don't establish this database" | 22:41 |
replaceafill | :| | 22:41 |
th1a | I think we need some kind of a form optimized for copy/pasting from her pdf's. | 22:42 |
th1a | It just needs to do some auto-numbering and things to cut steps out. | 22:43 |
replaceafill | why don't we extract data from the pdfs? | 22:44 |
th1a | Their internal structure is just a mess. | 22:46 |
th1a | As I recall, the document structure and the text are just jumbled together. | 22:46 |
replaceafill | i'd like to see one, do you have any? | 22:48 |
replaceafill | or are they in the web? | 22:48 |
th1a | http://www.google.com/url?q=http%3A%2F%2Fwww.ntatvetcentre.org%2FPlanDocuments%2FQualPlan%2FCCBSB10103%2FGeneral%2520Office%2520Administration%2520Level%25201.pdf&sa=D&sntz=1&usg=AFQjCNF3EbtM8dJeEZnH48NN4Aqu0gHZFA | 22:48 |
th1a | Did you look at it before? | 22:48 |
replaceafill | i don't think so | 22:49 |
replaceafill | i've been "on pause" for the whole Telly story :) | 22:49 |
th1a | OK. | 22:50 |
replaceafill | well, all i can identify are skillsets and skills maybe | 22:52 |
replaceafill | under element of competency/performance criteria tables? | 22:52 |
th1a | There's some higher structure but that's not the scary part. | 22:52 |
th1a | For the skills and skillsets if you could just copy/paste without having to click again, enter the number, click... | 22:55 |
replaceafill | ah, i see what you mean | 22:56 |
th1a | So she can pay for some coding if we can make it easier for someone there to just grind through. | 22:56 |
replaceafill | i could play with the pdf tonight | 22:58 |
replaceafill | if you think we should explore that route | 22:58 |
replaceafill | or maybe you're already set on the copy/paste approach | 22:59 |
th1a | Well, you can look at it. | 22:59 |
th1a | I think you'll realize pretty quickly it's a mess, but maybe not. | 22:59 |
replaceafill | in my head those tables become skillsets/skills, right? | 22:59 |
replaceafill | the rest of the hierarchy, not sure about it | 23:00 |
th1a | Yeah, but they aren't really tables in the pdf. | 23:00 |
replaceafill | we should ignore the whole Range Statement part, right? | 23:01 |
replaceafill | and Evidence Guide | 23:01 |
th1a | I'm ignoring it for now. | 23:02 |
replaceafill | this looks very similar to the salvadorean standards list | 23:02 |
th1a | They got it from Trinidad. It may be used all over Latin America. | 23:03 |
th1a | We just need a non-pdf version. | 23:03 |
replaceafill | extracting pdf text, splitting from PERFORMANCE CRITERIA to RANGE STATEMENT | 23:05 |
replaceafill | seems to get the list | 23:05 |
th1a | Well, used throughout the carribean. | 23:05 |
th1a | Agh. | 23:05 |
replaceafill | :D | 23:05 |
th1a | Splitting using what? | 23:05 |
replaceafill | they're just text now | 23:05 |
replaceafill | http://pastebin.com/L6BLtsMu | 23:06 |
th1a | I tried several pdf parsing things... | 23:06 |
replaceafill | :| | 23:06 |
replaceafill | pdftotext + python? | 23:06 |
th1a | I don't even remember. Probably. | 23:06 |
th1a | Is that what you're using? | 23:07 |
replaceafill | yeah | 23:07 |
th1a | Pfft. I guess that's why I'm the project manager. | 23:07 |
replaceafill | :)) | 23:07 |
replaceafill | i mean, it's not perfect, but it works | 23:08 |
replaceafill | leaves some trash around | 23:08 |
replaceafill | but cleaning it is definitely faster than copy/paste ;) | 23:08 |
th1a | Yeah... | 23:09 |
replaceafill | ok, i'll finish the levels work and will write a decent script later | 23:09 |
replaceafill | but i think it's doable | 23:10 |
replaceafill | at least for *this* pdf :D | 23:10 |
th1a | Yeah, don't start now. | 23:10 |
replaceafill | kk | 23:10 |
replaceafill | brb, rebooting... | 23:13 |
*** replaceafill has quit IRC | 23:13 | |
*** replaceafill has joined #schooltool | 23:15 | |
th1a | OK, so fyi, that document is a single course, and the table at the beginning lays out the included units. | 23:27 |
replaceafill | have you thought of a doc structure for this? | 23:27 |
replaceafill | Unit -> Skill Set -> Skill | 23:28 |
replaceafill | Units being "prepare for work, etc" | 23:28 |
th1a | Quoting me: | 23:28 |
th1a | So from your point of view it is Course > Unit > Element of Competency > Performance Criteria | 23:28 |
th1a | And then we need the official ID for the course and unit. | 23:28 |
th1a | Telly: Correct. | 23:28 |
replaceafill | :D | 23:29 |
replaceafill | oh, we can change names for SkillSets and Skills? | 23:29 |
replaceafill | right? | 23:29 |
th1a | Yes. | 23:30 |
replaceafill | kk | 23:30 |
replaceafill | th1a, do you have the index page for the rest of the courses? | 23:33 |
th1a | I asked Telly for more links. | 23:42 |
replaceafill | ah ok | 23:42 |
th1a | http://www.google.com/url?q=http%3A%2F%2Fwww.ntatvetcentre.org%2FPlanDocuments%2FQualPlan%2FCCITC10207%2Fcustomer%2520service%25201.pdf&sa=D&sntz=1&usg=AFQjCNFEKDvJ7VcS9vfRB983f84TOrjLwg | 23:47 |
th1a | There are duplicates. | 23:48 |
th1a | Duplicate units. | 23:48 |
replaceafill | :( | 23:48 |
th1a | Good thing our model doesn't require a hierarchy. | 23:48 |
th1a | http://www.google.com/url?q=http%3A%2F%2Fwww.ntatvetcentre.org%2FPlanDocuments%2FQualPlan%2FCCCSB10103%2Fgeneral%2520cosmetology%2520level%25201%2520new.pdf&sa=D&sntz=1&usg=AFQjCNEtXnzUJ3bAjR9TrfKujUNbsY06Gw | 23:48 |
th1a | Well, I'll ask if we need to duplicate them a la VA. | 23:49 |
replaceafill | that would be easier i think | 23:49 |
th1a | That is the sane way to do it. ;-) | 23:50 |
th1a | Yeah, two courses point to the same unit. | 23:51 |
th1a | http://www.google.com/url?q=http%3A%2F%2Fwww.ntatvetcentre.org%2FPlanDocuments%2FQualPlan%2FCCFSF30206%2Fpayroll%2520administration%2520level%25203.pdf&sa=D&sntz=1&usg=AFQjCNGNPHNjacoBWNPJr_lUu8uiYygKmQ | 23:51 |
replaceafill | sections skills will prevent that from being effective i think | 23:52 |
th1a | It will prevent the scores from showing up. | 23:52 |
replaceafill | right | 23:52 |
replaceafill | we need the history view ;) | 23:52 |
replaceafill | fancy | 23:52 |
replaceafill | :D | 23:52 |
th1a | Some different units may have the same title but different id's so we work off ID's. | 23:53 |
replaceafill | got it | 23:54 |
replaceafill | oh! code units have meanings | 23:55 |
replaceafill | nice | 23:55 |
replaceafill | industry, sector, sub-sector, even version control! | 23:55 |
th1a | I don't think we need that hieararchy in the system though. | 23:56 |
replaceafill | sure | 23:56 |
replaceafill | i just thought it's cool :P | 23:56 |
th1a | Well, it helps! | 23:57 |
Generated by irclog2html.py 2.15.1 by Marius Gedminas - find it at mg.pov.lt!